5,827 Matching Annotations
  1. Nov 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1:

      (1) Which allele is alr1, the one upstream of mazEF or the one in the lysine biosynthetic operon?

      Alr1 is encoded by SAUSA300_2027 and is the gene upstream to mazEF. We have now incorporated this information in the manuscript (Line# 127).

      (2) Figure 3B. Where does the C3N2 species come from in the WT and why is it absent in the mutants? It is about 25% of the total dipeptide pool.

      In Figure 3B, C3N2 species results from the combination of C3N1 (from Alr1) and C0N1 (from Dat). The reason this species is completely absent in either of the two mutants is because it requires one D-Ala from both Alr1 and Dat proteins to generate C3N2 D-Ala-D-Ala.

      (3) Figure 3D could perhaps be omitted. I understand that the authors attained statistical significance in the fitness defect, but biologically this difference is very minor. One would have to look at the isotopomer distribution in the Dat overexpressing strain to make sure that increased flux actually occurred since there are other means of affecting activity (e.g. allosteric modulators).

      Thank you for the suggestion. We agree with the reviewer that the fitness defect observed after increased dat expression is relatively minor and have moved this figure to the supplementary section as Figure 3-figure supplement 1.

      Although we attempted to amplify the fitness defect of dat expression by cloning dat on to a multicopy vector, we couldn't maintain its stable expression in S. aureus. This instability may be due to the depletion of D-Ala when dat is overexpressed. As a result, we switched to expressing dat from a single additional copy integrated into the SaPI locus, which was sufficient to cause the expected fitness defect, albeit a minor one.

      (4) In Figure 4A, why is the complete subunit UDP-NAM-AEKAA increasing in each strain upon acetate challenge if there was such a stark reduction in D-Ala-D-Ala, particularly in the ∆alr1 mutant? For that matter, why are the levels of UDP-NAM-AEKAA in the ∆alr1 mutant identical to that of WT with/out acetate?

      Thank you for raising this important point. We have addressed this in line# 299-302 and 451-455 of the revised manuscript. In short, we believe that the inhibition of Ddl by acetate significantly increases the intracellular pool of the tripeptide UDP-NAM-AEK, which then outcompetes the substrate (pentapeptide; UDP-NAM-AEKAA) of MraY. As a result, the intracellular concentration of the pentapeptide increases since it is no longer efficiently consumed by MraY. This explanation is also supported by a kinetic study conducted in Ref (1), where the competition between UDP-NAM-AEKAA and UDP-NAM-AEK as substrates for MraY is demonstrated.

      (5) Figure 4B. Is there no significant difference between ddl and murF transcripts between WT and ∆alr1 under acetate stress? This comparison was not labeled if the tests were done.

      Thank you for suggesting this comparison. The ddl and murF transcripts between WT and alr1 under acetate stress were significantly different. We have added this comparison to Figure 4B.

      (6) Although tricky, it is possible to measure intracellular acetate. It might be of interest to know where in the Ddl inhibition curve the cells actually are.

      Thank you for the suggestion. We agree this would have been an excellent addition to the manuscript. However, accurately measuring intracellular acetate would require the use of radiolabeled acetate (2), and we currently lack the expertise to do this experiment. However, since our study clearly shows that acetate-mediated growth impairment is due to Ddl inhibition, and the IC50 of acetate for Ddl is around 400 mM, we predict that the intracellular concentration must be close to or above this IC50 to observe the growth phenotypes we report.

      Reviewer #2:

      Although the authors have conclusively shown that Ddl is the target of acetic acid, it appears that the acetic acid concentration used in the experiments may not truly reflect the concentration range S. aureus would experience in its environment. Moreover, Ddl is only significantly inhibited at a very high acetate concentration (>400 mM). Thus, additional experiments showing growth phenotypes at lower organic acid concentrations may be beneficial.

      Thank you for the suggestion. In response to the reviewer, we have measured growth at various acetate concentrations and demonstrate a concentration-dependent effect (Figure 1C).

      We use 20 mM acetic acid in our study. In the gut, where S. aureus colonizes, acetate levels can reach up to 100 mM, so we believe our concentrations are physiologically relevant. When S. aureus encounters 20 mM acetate, the intracellular concentration can rise to 600 mM if the transmembrane pH gradient is 1.5 units, which is well above the ~400 mM IC50 we report for Ddl.

      Another aspect not adequately discussed is the presence of D-ala in the gut environment, which may be protective against acetate toxicity based on the model provided.

      Thank you for pointing this out. We agree that D-Ala from the gut microbiota could protect against acetate toxicity, and we’ve included this in the discussion. However, our study clearly indicates that S. aureus itself maintains high intracellular D-Ala levels through Alr1 activity which is sufficient to counter acetate anion intoxication.

      Recommendation for the authors:

      Reviewer #2:

      Major Comments:

      (1) In Line 85, authors indicate S. aureus may encounter a high concentration of ~100 mM acetic acid (extracellular?). Could the authors cite more (and recent) references indicating S. aureus encounters >100 mM acetic acid in the environment?

      To the best of our knowledge, no studies have specifically examined whether S. aureus encounters high mM concentration of acetate in the gut. Line 85 was surmised from multiple studies: recent findings that S. aureus colonizes the gut (3, 4) and that the gut environment has high acetate levels (~100 mM) (5). In response to the reviewers request, more recent references supporting high acetate concentrations in the gut (6, 7) have been added in Line# 86.

      (2) In Line 117, it is mentioned that S. aureus when grown in vitro at 20 mM acetic acid can accumulate ~600 mM acetic acid in the cytoplasm.

      a. Does the intracellular concentration go up proportionally if grown in 100 mM acetic acid? Given the IC50 of acetic acid-mediated inhibition of Ddl is ~400 mM, I wonder how physiologically relevant this finding presented here is.

      Thank you for the opportunity to explain this further. If S. aureus encounters a concentration of 100 mM acetate and its transmembrane pH gradient (pHin-pHout) is held at 1.5, the intracellular concentration of acetate could theoretically increase up to 3 M based on Ref (8). However, previous studies have shown that bacteria can lower the magnitude of transmembrane pH gradient by decreasing their intracellular pH to limit accumulation of anions within cells (9, 10).

      Although our study shows that the IC50 of Ddl inhibition by acetate is relatively high (~400 mM), we believe it’s still relevant because just 20 mM of environmental acetate at a pH of 6.0 can raise the intracellular concentration of acetate to over 600 mM, which is well above the IC50 we report for Ddl. Moreover, since S. aureus may encounter high concentrations of acetate during gut colonization, we believe our findings are physiologically relevant.

      b. Could the authors show concentration-dependent growth inhibition in alr::tn by titrating a range of acetic acid concentrations (for example 0, 0.5, 1, 5, 10, 20 mM)? Measuring intracellular acetate concentration may be beneficial as well.

      Thank you for this question. We now provide data to support that acetate-mediated inhibition of the alr1 mutant is concentration-dependent (see Figure 1C).

      c. It appears that there may be excess D-ala in the gut environment (PMIDs: 30559391; 35816159), which could counter the high acetate based on the model presented here. Could the authors clarify and/or include this information in the manuscript?

      This is an excellent point, and we have now included it in the discussion (Line# 470-475). It is indeed possible that D-Ala produced by the gut microbiome may further enhance S. aureus resistance to organic acid anions, in addition to the inherent contribution of Alr1 activity.

      (3) The following is not needed; however, it would be interesting if the authors could show that S. aureus cells grown in the presence of acetate are highly sensitive to cycloserine (which targets Alr and Ddl) compared to cells grown in the absence of acetate.

      Thank you for the suggestion. We are currently studying D-cycloserine (DCS) resistance in S. aureus. Although we provide the data below for clarification, it is not included in the current manuscript as it is part of a separate study.

      As the reviewer speculated, S. aureus is more susceptible to DCS when grown in the presence of acetate (see figure below). Normally, complete growth inhibition occurs at 32 µg/ml of DCS. However, with 20 mM acetic acid present, complete inhibition is achieved at just 8 µg/ml of DCS. Furthermore, the growth inhibition is completely rescued when externally supplemented with 5 mM D-Ala. We believe that DCS works synergistically with acetate to inhibit Ddl activity, and we are conducting additional studies to explore this further.

      Minor Comments:

      (1) Many commas are missing.

      Missing commas are now incorporated.

      (2) Line 77: disassociate --> dissociate

      Corrected.

      (3) Line 103: that --> which

      Corrected.

      (4) Lines 199-203: authors could have used gfp/luciferase reporter to test their hypotheses.

      Thank you for the suggestion. Initially, we created GFP translational fusions for all the mutants mentioned in Line# 199-203. However, the fluorescence intensity was too low to test the hypothesis, as these were single-copy fusions inserted at the SaPI site of the S. aureus genome. Because of this limitation, we took advantage of the essentiality of D-Ala-D-Ala in S. aureus to report on various mutants instead of a fluorescent reporter. In hindsight, a LacZ reporter assay might have been equally effective.

      (5) Line 339: It would be beneficial to introduce that Ddl has two independent ATP and D-ala binding sites.

      We have now added that information (Line# 338-339).

      (6) Is ddl an essential gene? If so, explicitly mention that.

      Yes, ddl is an essential gene and we have now incorporated this information in Line 103.

      (7) Line 354: shows a difference in density?

      The use of the term “difference density” is a technical crystallographic term commonly used to connote density observed for ligands in X-ray crystal structures. In this case, it simply refers to the observed density that corresponds to the two acetate ions bound within the Ddl active site.

      (8) Line 498: "Thus." Typo, change period to comma.

      We have corrected as suggested in Line 496.

      (9) Figure 1 legend says "was screen" instead of screened.

      This is now corrected.

      (10) Figure 1- Figure Supplement 1B: including data for alr2::tn dat::tn may ensure no redundancy (Lines 171-172). It is currently missing.

      Thank you for the suggestion. We now include both alr2dat double mutant and the alr1alr2dat triple mutant in Figure 1 - Figure Supplement 1B. In addition we also show that the alr1alr2dat mutant is resuced by the addition of D-Ala in Figure 1 - Figure Supplement 1C. The mutant information is also added to Table S5.

      (11) Figure 7: pentaglycine coming off of NAM is misleading. Remove untethered pentaglycine bridges.

      We thank you for pointing this out. We have modified the figure in the manuscript as suggested by the reviewer.

      (12) Are alr1/ddl cells (with limited 4-3 PG crosslink) less sensitive to vancomycin?

      On the contrary, the alr1 mutant is slightly more sensitive to vancomycin compared to the wild-type strain (see Figure below). We believe this happens because the alr1 mutant incorporates less D-Ala-D-Ala into the peptidoglycan, reducing the number of targets for vancomycin. As a result, vancomycin may be able to saturate the available D-Ala-D-Ala targets on the cell wall at a lower concentration in the alr1 mutant than in the wild type strain, leading to increased sensitivity. We haven’t included this data in the manuscript as it is part of a separate study.

      (13) Based on the structural studies, could the authors mutate the residues of Ddl involved in acetic acid binding, thereby making it resistant to acetic acid stress?

      The residues that the acetate anion interacts with are located within the ATP-binding and D-Ala-binding sites of Ddl. Since these residues are essential for Ddl function, we are unable to mutate them.

      (14) Microscopy to show the cell morphologies of wild-type and mutants exposed to acetic acid (and with D-ala supplementation) could be potentially interesting.

      Thank you for the suggestion. We did perform microscopy, expecting changes in cell shape or size, but the results were unremarkable and not included in the manuscript.

      References:

      (1) Hammes WP & Neuhaus FC (1974) On the specificity of phospho-N-acetylmuramyl-pentapeptide translocase. The peptide subunit of uridine diphosphate-N-actylmuramyl-pentapeptide. J Biol Chem 249(10):3140-3150.

      (2) Roe AJ, McLaggan D, Davidson I, O'Byrne C, & Booth IR (1998) Perturbation of anion balance during inhibition of growth of Escherichia coli by weak acids. J Bacteriol 180(4):767-772.

      (3) Acton DS, Plat-Sinnige MJ, van Wamel W, de Groot N, & van Belkum A (2009) Intestinal carriage of Staphylococcus aureus: how does its frequency compare with that of nasal carriage and what is its clinical impact? Eur J Clin Microbiol Infect Dis 28(2):115-127.

      (4) Piewngam P_, et al. (2023) Probiotic for pathogen-specific _Staphylococcus aureus decolonisation in Thailand: a phase 2, double-blind, randomised, placebo-controlled trial. Lancet Microbe 4(2):e75-e83.

      (5) Cummings JH, Pomare EW, Branch WJ, Naylor CP, & Macfarlane GT (1987) Short chain fatty acids in human large intestine, portal, hepatic and venous blood. Gut 28(10):1221-1227.

      (6) Correa-Oliveira R, Fachi JL, Vieira A, Sato FT, & Vinolo MA (2016) Regulation of immune cell function by short-chain fatty acids. Clin Transl Immunology 5(4):e73.

      (7) Hosmer J, McEwan AG, & Kappler U (2024) Bacterial acetate metabolism and its influence on human epithelia. Emerg Top Life Sci 8(1):1-13.

      (8) Carpenter CE & Broadbent JR (2009) External concentration of organic acid anions and pH: key independent variables for studying how organic acids inhibit growth of bacteria in mildly acidic foods. J Food Sci 74(1):R12-15.

      (9) Russell JB (1992) Another explanation for the toxicity of fermentation acids at low pH: anion accumulation versus uncoupling. Journal of Applied Bacteriology 73(5):363-370.

      (10) Russell JB & Diez-Gonzalez F (1998) The effects of fermentation acids on bacterial growth. Adv Microb Physiol 39:205-234.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1: 

      Limitations are that only the cytosolic fragments of the channel were studied, and the current manuscript does not do a good job of placing the results in the context of what is already known about CNBDs from other methods that yield similar information.

      In the revision, we have now added a paragraph in the discussion that addresses why the cytosolic fragment was used and a paragraph putting our results into the context of previous work on CNBD channels where possible. 

      (1) Why do the authors not apply their approach to the full-length channel? A discussion of any limitations that make this difficult would be worthwhile.” Full-length ion channel protein expression is more challenging, and it was important to start with a simpler system. This is now stated in the discussion.

      (2) …nonetheless a comparison of the conformational heterogeneity and energetics obtained from these different approaches would help to place this work in a larger context.

      We have now added a paragraph in the discussion putting our work in a larger context and addressing the challenges of comparing our results to previous studies. 

      (3) Page 5 - 3:1 unlabeled:labeled subunits in mix => 42% of molecules have 3:1 stoichiometry as desired and 21% of molecules have 2:2 stoichiometry!!! (binomial distribution p=0.25, n=4). So 1/3 of molecules with labels have two labeled subunits. This does not seem like it is at all avoiding the problem of intersubunit FRET…

      From the experimental perspective, the 3:1 molar ratio stated is certainly a low estimate of the actual subunit ratios given our FSEC data in Figure 2D and the higher expression of the WT protein compared to labeled protein. Furthermore, even without the addition of any WT protein, the calculated contribution of intersubunit FRET is negligible given that the FRET efficiency is heavily dominated by the closest donor-acceptor distances (Figure 4). 

      (4) Figure 2E - Some monomers appear to still be present in the collected fraction. The authors should discuss any effect this might have on their results.

      We now describe in the text that, at the low concentrations (~10nM) used for mass photometry, a second small peak was observed of ~30kDa, which is below the analytical range for this method. This would not affect our results since all tmFRET experiments used higher protein concentrations to ensure tetramerization.

      (5) page 4 - "Time-resolved tmFRET, therefore, resolves the structure and relative abundance of multiple conformational states in a protein sample." - structure is not resolved, only a single distance.

      We have reworded this sentence.  

      Reviewer #2:

      Regarding cyclic nucleotide-binding domain (CNBD)-containing ion channels, I disagree with the authors when they state that "the precise allosteric mechanism governing channel activation upon ligand binding, particularly the energetic changes within domains, remains poorly understood". On the contrary, I would say that the literature on this subject is rather vast and based on a significantly large variety of methodologies…

      Despite this vast literature on the energetics of CNBD channels there is no consensus about the energetics and coupling of domains that underlies the allosteric mechanism in any CNBD channel. We have added a separate paragraph in the discussion to clarify our meaning.

      In light of the above, I suggest the authors better clarify the contribution/novelty that the present work provides to the state-of-the-art methodology employed (steady-state and time-resolved tmFRET) and of CNBD-containing ion channels…

      …In light of the above, what is the contribution/novelty that the present work provides to the SthK biophysics?

      This work is the first use of the time-resolved tmFRET method to obtain intrinsic G (of an apo conformation) and G values for different ligands. It is also the first application of this approach to SthK or, indeed, to any protein other than MBP. This is mentioned in the introduction.  

      …On the basis of the above-cited work (Evans et al., PNAS, 2020) the authors should clarify why they have decided to work on the isolated Clinker/CNBD fragment and not on the full-length protein…

      We chose to start on the C-terminal fragment to provide a technically more tractable system for validating our approach using time-resolved tmFRET before moving to the more challenging full-length membrane protein. This is now addressed in a new paragraph in the discussion. 

      What is the advantage of using the Clinker/CNBD fragment of a bacterial protein and not one of HCN channels, as already successfully employed by the authors (see above citations)?

      We have chosen to perform these studies in SthK rather than a mammalian CNBD channel as SthK presents a useful model system that allows us to later express fulllength channels in bacteria. In addition, the efficiency of noncanonical amino acid incorporation is much higher in bacteria than in mammalian cells.

      Reviewer #3: 

      While the use of a truncated construct of SthK is justified, it also comes with certain limitations…

      We agree that the truncated channel comes with limitations, but we still think that there is relevant energetic information from studies of the isolated CNBD. This is now addressed in the discussion. 

      I recommend the authors carefully assess their statements on allostery. …The authors also should consider discussing the discrepancies between their truncated construct and full-length channels in more detail.

      We added a paragraph in the introduction that now puts the conformational change of the CNBD in the context of the allosteric mechanism of the full-length channel. We also added a paragraph discussing in more detail the relationship between the energetics of the C-terminal fragment and the full-length channel.  

      Regarding the in silico predictions, it is unclear to me why the authors chose the closed state of SthK Y26F and the 'open' state of the isolated C-linker CNBD construct…

      The active cAMP bound structure (4d7t) was a high resolution X-ray crystallography structure chosen as the only model with a fully resolved C-helix. The resting state structure (7rsh) was selected as a the only resting state to resolve the acceptor residue studied here (V417).     

      Previously it has been shown that SthK (and CNG) goes through multiple states during gating. This may be discussed in more detail, especially when it comes to the simplified four-state model…

      As stated above, we added paragraphs to the introduction and discussion placing the conformational change of the CNBD in the context of the full-length channel.  

      It would be interesting to see how the conformational distribution of the C-helix position integrates with available structural data on SthK. In general, putting the results more into the context of what is known for SthK and CNG channels, could increase the impact.

      We now discuss the relationship between existing structures and energetics in the introduction.  

      This may be semantics, but when working with a truncated construct that is missing the transmembrane domains using 'open' and 'closed' state is questionable. I recommend the authors consider a different nomenclature.

      We refer to the conformational states of the CNBD as ‘resting’ and ‘active’ and used ‘closed’ and ‘open’ only for the conformational states of the pore.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The sample size of the in-house dataset used for training the model was relatively small (34 patients), which might limit the generalizability of the findings.

      (2) The authors did not perform functional experiments to directly validate the roles of the identified key genes in radiotherapy sensitivity, relying instead on associations with immune features and signaling pathways.

      (3) The study did not discuss the potential limitations of using machine learning algorithms, such as the risk of overfitting and the need for larger, diverse datasets for more robust model development and validation.

      (1) Currently, we are actively expanding the dataset by incorporating additional patient samples to enhance the model's robustness and generalizability. Furthermore, we implement advanced statistical techniques, including cross-validation, during model development to mitigate the potential limitations associated with the small sample size on our results. This limitation has been comprehensively addressed in the discussion section of our manuscript.

      (2) Given the current resource limitations, our study predominantly employed bioinformatics analyses. We acknowledge the critical importance of experimental validation and are actively pursuing additional funding and collaborative opportunities to facilitate future experimental studies. Concurrently, we have enhanced the discussion section to comprehensively address the limitations of our approach and emphasize the necessity for future experimental validation.

      (3) We appreciate the reviewers' insightful comments regarding the potential limitations of machine learning algorithms, particularly the risk of overfitting. In response, we have incorporated a comprehensive discussion of these concerns, detailing the measures implemented to mitigate such risks, including the application of regularization techniques and the adoption of more rigorous cross-validation methodologies. We further acknowledge the necessity for larger and more diverse datasets to enhance model validity and generalizability, a concern we intend to address in our future research endeavors. The revised manuscript includes an expanded discussion on these critical points.

      Here is the limitation section in the revised Manuscript:

      “This study primarily focuses on specific subtypes of nasopharyngeal carcinoma (NPC), potentially limiting its direct generalizability to other NPC subtypes or related head and neck malignancies. Furthermore, the limited sample size of our dataset may impact the model's generalizability and extrapolation capabilities. To mitigate the potential limitations associated with the small sample size, we employed advanced statistical methodologies, including cross-validation, to enhance the robustness and reliability of our findings. Nevertheless, we acknowledge the necessity for larger datasets and are actively collaborating with other research institutions to expand our sample size, thereby enhancing the robustness and broader applicability of our findings. Additionally, while our study utilizes bioinformatics approaches to identify and analyze key genes, we recognize that the absence of direct experimental functional validation represents a significant limitation. To address this limitation, we are actively pursuing additional funding and establishing collaborations with specialized laboratories to conduct crucial functional validation experiments, which will further elucidate the specific roles of these genes in radiotherapy response. Moreover, we acknowledge the potential risk of overfitting inherent in the application of machine learning algorithms to biomedical data analysis. To mitigate this risk, we implemented regularization techniques during model development and adopted a rigorous cross-validation strategy for model validation. These methodological approaches aim to ensure that our models maintain robust predictive performance on unseen data. Notwithstanding these limitations, our study offers novel insights into the molecular mechanisms underlying radiotherapy sensitivity in NPC and indicates promising avenues for future investigation. Future research endeavors will prioritize expanding the dataset, conducting comprehensive experimental validation, and refining our predictive model to enhance its accuracy and clinical applicability.”

      Reviewer #2 (Public Review):

      (1) The study focuses on a specific type of nasopharyngeal carcinoma (NPC) and may not be generalizable to other subtypes or related head and neck cancers. The applicability of NPC-RSS to a broader range of patients and tumor types remains to be determined.

      (2) The study does not account for potential differences in radiotherapy protocols, doses, and techniques between the training and validation cohorts, which could influence the performance of the predictive model. Standardization of treatment parameters would be important for future validation studies.

      (3) The binary classification of patients into radiotherapy-sensitive and resistant groups may oversimplify the complex spectrum of treatment responses. A more granular stratification system that captures intermediate responses could provide more nuanced predictions and better guide personalized treatment decisions.

      (4) The study does not address the potential impact of other relevant factors, such as tumor stage, histological subtype, and concurrent chemotherapy, on the predictive performance of NPC-RSS. Incorporating these clinical variables into the model could enhance its accuracy and clinical utility.

      (1) We appreciate the reviewers' interest in the applicability of our study. This study specifically focuses on a particular subtype of nasopharyngeal carcinoma (NPC), which may limit its direct generalizability to other NPC subtypes or related head and neck malignancies. We have incorporated a detailed discussion of this limitation in the Discussion section and intend to investigate the applicability of NPC-RSS across a broader spectrum of tumor types and subtypes in subsequent studies.

      (2) We acknowledge the reviewers' emphasis on the significance of potential variations in radiotherapy regimens, doses, and techniques. In the current study, we did not sufficiently account for these factors, potentially impacting the model's generalizability and accuracy. We aim to improve data consistency and strengthen model validation by standardizing treatment parameters in future investigations.

      (3) We concur with the reviewers' assessment that binary categorization may oversimplify the intricate nature of treatment responses. Indeed, radiotherapy responses likely exist on a continuous spectrum. Consequently, we intend to develop more refined stratification systems to capture intermediate responses, thereby enhancing the accuracy of treatment outcome predictions and facilitating personalized treatment decisions.

      (4) We appreciate the reviewers' recommendation to incorporate clinical variables, including tumor stage, histological subtype, and concurrent chemotherapy, into the model. We acknowledge that these factors are crucial for enhancing the accuracy and clinical applicability of predictive models. We are presently compiling these additional data and intend to integrate these variables into subsequent model iterations.

      Reviewer #1 (Recommendations For The Authors):

      (1) The manuscript would benefit from a more comprehensive comparison of the NPC-RSS with existing prognostic models or biomarkers for nasopharyngeal carcinoma. This would help highlight the unique value and potential superiority of the NPC-RSS in predicting radiotherapy sensitivity.

      2) The authors should consider expanding their discussion on the potential molecular mechanisms underlying the association between the key NPC-RSS genes and radiotherapy response. They could explore whether these genes have been previously implicated in radiotherapy resistance in other cancer types and discuss the potential functional roles of these genes in the context of nasopharyngeal carcinoma.

      (1) We appreciate your thorough review and valuable suggestions concerning our study. In response to the suggestion of comparing the Nasopharyngeal Carcinoma Radiotherapy Sensitivity Score (NPC-RSS) with existing prognostic models or biomarkers, we have carefully considered this proposal and determined that such a comparison is beyond the scope of our current study. The primary focus of our research is on the development and internal validation of the NPC-RSS model's accuracy and reliability. At present, we do not have access to the necessary external data to conduct a valid comparison, and the integration of such data extends beyond the parameters of this study. We intend to incorporate this comparative analysis in future studies to further validate the efficacy and explore the clinical application potential of the NPC-RSS model. We appreciate your understanding and continued support for our research endeavors.(2) In the revised manuscript, we have incorporated a comprehensive review of the functions of these key genes in various cancer types and explored their potential mechanisms of action in nasopharyngeal carcinoma (NPC). Through the citation of pertinent studies, we have elucidated the impact of these genes on radiotherapy sensitivity and resistance. Furthermore, we have proposed future research directions to elucidate the specific roles of these genes in the radiotherapy response of NPC.

      The following are new additions to the revised draft:

      “Previous studies have demonstrated that SMARCA2 significantly influences the radiotherapy response in non-small cell lung cancer (NSCLC). Depletion of SMARCA2 has been shown to enhance radiosensitivity, suggesting its potential as a therapeutic target for radiosensitization [30478150]. Additionally, the DMC1 gene has been incorporated into the radiosensitivity index (RSI) to evaluate radiotherapy sensitivity and prognosis, particularly in endometrial cancers. This inclusion provides valuable insights into the DNA damage repair process [38628740]. Studies on CD9 in glioblastoma multiforme (GBM) have revealed that post-radiotherapy increases in CD9 and CD81 levels in extracellular vesicles (EVs) are strongly correlated with the cytotoxic response to treatment. This finding suggests the potential of CD9 as a novel biomarker for monitoring radiotherapy efficacy [36203458]. In contrast, the association of PSG4 and KNG1 with radiotherapy resistance remains unexplored in the current literature.

      Future research should focus on analyzing the expression patterns of SMARCA2 in NPC patients and its correlation with radiotherapy efficacy using clinical samples. This analysis could elucidate its potential as a target for radiosensitization therapy. Investigating the correlation between DMC1 expression levels and radiotherapy sensitivity in NPC could potentially aid in predicting treatment efficacy and optimizing therapeutic regimens. Furthermore, analysis of extracellular vesicles, particularly those containing CD9, in post-radiotherapy NPC patients could assess their feasibility as biomarkers for monitoring treatment response. These proposed studies would not only contribute to a deeper understanding of the mechanisms underlying the role of these genes in NPC radiotherapy but could also potentially lead to the development of novel strategies for enhancing radiotherapy efficacy.”

      Minor Recommendations:

      (1) It is recommended that the author share the code for the article on Github or a similar open source platform.

      (2) The manuscript would benefit from a thorough review of the punctuation and sentence structure to improve readability and clarity.

      (1) You suggest sharing the code utilized in this study on GitHub or a comparable open-source platform to enhance the transparency and reproducibility of the research. I fully recognize the significance of this suggestion. However, due to the sensitivity of the data involved and the existing intellectual property agreement with my research team, we are unable to make the code publicly available at this time. We are actively seeking a method to safeguard the intellectual property of the project while also planning to share our tools and methodologies in the future. At this stage, we are open to collaborating with other researchers under appropriate frameworks and conditions to validate and replicate our findings by providing essential code execution snippets or assisting with data analysis.

      (2) Your suggestions are vital for enhancing the quality of the manuscript. I will perform a comprehensive linguistic and structural review of the manuscript to ensure that statements flow coherently and punctuation is employed correctly. We also intend to engage a professional scientific and technical writing editor to ensure that the manuscript adheres to the high standards required for academic publishing.

      Reviewer #2 (Recommendations For The Authors):

      (1) The manuscript would benefit from a more in-depth discussion of the potential clinical implications of the NPC-RSS. The authors should elaborate on how this score could be integrated into clinical decision-making and patient management.

      (2) The authors should consider including a section discussing the limitations of their study and potential areas for future research. This could include the need for prospective validation of the NPC-RSS in larger patient cohorts and the exploration of additional biological mechanisms.

      (1) We concur that a more comprehensive discussion regarding the application of the NPC-RSS in clinical decision-making would significantly enhance the practical value of this study. In the revised draft, we will include a section that elaborates on the integration of the NPC-RSS scoring system into daily clinical practice, detailing how it can assist physicians in developing individualized treatment plans and optimize patient management by predicting treatment responses.

      The following are new additions to the revised draft:

      “The incorporation of the NPC-RSS scoring system into clinical decision-making and patient management involves several key steps: first, establishing genetic testing as a standard component of nasopharyngeal cancer diagnosis and ensuring that physicians have prompt access to scoring results to guide treatment planning. Second, physicians should utilize the scoring results to tailor individualized treatment plans and engage in multidisciplinary discussions to optimize decision-making. Concurrently, physicians should elucidate the clinical significance of the scores and effectively communicate with patients to facilitate shared decision-making. Furthermore, continuous monitoring of the relationship between scoring and treatment outcomes, optimizing the scoring model based on empirical data, and ensuring the integration of technological platforms along with regulatory compliance are essential for safeguarding the effective operation of the scoring system and the protection of patient information.

      (2) In light of the reviewers' valuable suggestions, we acknowledge the significance of prospective validation of the NPC-RSS scoring system in a broader patient population and the necessity for thorough exploration of the underlying biological mechanisms. Accordingly, we are incorporating a new section in the revised manuscript that elaborates on the limitations of the current study and outlines potential directions for future research. This encompasses plans to increase the sample size for validation and further investigations into the biological basis of the scoring system to enhance its predictive validity and clinical applicability. We believe that these additions will significantly enrich the depth and breadth of the study, thereby serving the scientific community and clinical practice more effectively.”

      Minor Recommendations:

      (1) The authors should ensure that all abbreviations are defined at their first mention in the text.

      (2) The figure legends should be more descriptive and self-explanatory, allowing readers to understand the main findings without referring back to the main text.

      (1) You pointed out the need to define all acronyms at the first mention in the text and suggested that a comprehensive list of acronyms be included in the revised draft. We fully concur and have included a comprehensive list of acronyms in the revised text. Additionally, to enhance clarity, we have included the full name and definition of each acronym alongside its first occurrence in the text. This will assist readers in comprehending the study without the need to repeatedly refer to the glossary.

      (2) You recommended enhancing the descriptive quality of the figure legends to enable readers to discern the key findings from the figures without consulting the text. We have redesigned and refined all charts and legends to ensure they provide adequate information and are more descriptive. Each legend now outlines the experimental conditions, the variables employed, and the primary conclusions, ensuring that the charts themselves sufficiently convey the key findings of the study.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In their paper, Kang et al. investigate rigidity sensing in amoeboid cells, showing that, despite their lack of proper focal adhesions, amoeboid migration of single cells is impacted by substrate rigidity. In fact, many different amoeboid cell types can durotax, meaning that they preferentially move towards the stiffer side of a rigidity gradient. 

      The authors observed that NMIIA is required for durotaxis and, buiding on this observation, they generated a model to explain how durotaxis could be achieved in the absence of strong adhesions. According to the model, substrate stiffness alters the diffusion rate of NMAII, with softer substrates allowing for faster diffusion. This allows for NMAII accumulation at the back, which, in turn, results in durotaxis. 

      The authors responded to all my comments and I have nothing to add. The evidence provided for durotaxis of non adherent (or low-adhering) cells is strong. I am particularly impressed by the fact that amoeboid cells can durotax even when not confined. I wish to congratulate the authors for the excellent work, which will fuel discussion in the field of cell adhesion and migration.

      We thank the reviewer for critically evaluating our work and giving kind suggestions. We are glad that the reviewer found our work to be of potential interest to the broad scientific community.

      Reviewer #2 (Public Review):

      Summary:

      The authors developed an imaging-based device that provides both spatialconfinement and stiffness gradient to investigate if and how amoeboid cells, including T cells, neutrophils, and Dictyostelium, can durotax. Furthermore, the authors showed that the mechanism for the directional migration of T cells and neutrophils depends on non-muscle myosin IIA (NMIIA) polarized towards the soft-matrix-side. Finally, they developed a mathematical model of an active gel that captures the behavior of the cells described in vitro.

      Strengths:

      The topic is intriguing as durotaxis is essentially thought to be a direct consequence of mechanosensing at focal adhesions. To the best of my knowledge, this is the first report on amoeboid cells that do not depend on FAs to exert durotaxis. The authors developed an imaging-based durotaxis device that provides both spatial confinement and stiffness gradient and they also utilized several techniques such as quantitative fluorescent speckle microscopy and expansion microscopy. The results of this study have well-designed control experiments and are therefore convincing.

      Weaknesses:

      Overall this study is well performed but there are still some minor issues I recommend the authors address:

      (1) When using NMIIA/NMIIB knockdown cell lines to distinguish the role of NMIIA and NMIIB in amoeboid durotaxis, it would be better if the authors took compensatory effects into account.

      We thank the reviewer for this suggestion. We have investigated the compensation of myosin in NMIIA and NMIIB KD HL-60 cells using Western blot and added this result in our updated manuscript (Fig. S4B, C). The results showed that the level of NMIIB protein in NMIIA KD cells doubled while there was no compensatory upregulation of NMIIA in NMIIB KD cells. This is consistent with our conclusion that NMIIA rather than NMIIB is responsible for amoeboid durotaxis since in NMIIA KD cells, compensatory upregulation of NMIIB did not rescue the durotaxis-deficient phenotype. 

      (2) The expansion microscopy assay is not clearly described and some details are missed such as how the assay is performed on cells under confinement.

      We thank the reviewer for this comment. We have updated details of the expansion microscopy assay in our revised manuscript in line 481-485 including how the assay is performed on cells under confinement:

      Briefly, CD4+ Naïve T cells were seeded on a gradient PA gel with another upper gel providing confinement. 4% PFA was used to fix cells for 15 min at room temperature. After fixation, the upper gradient PA gel is carefully removed and the bottom gradient PA gel with seeded cells were immersed in an anchoring solution containing 1% acrylamide and 0.7% formaldehyde (Sigma, F8775) for 5 h at 37 °C.

      (3) In this study, an active gel model was employed to capture experimental observations. Previously, some active nematic models were also considered to describe cell migration, which is controlled by filament contraction. I suggest the authors provide a short discussion on the comparison between the present theory and those prior models.

      We thank the reviewer for this suggestion. Active nematic models have been employed to recapitulate many phenomena during cell migration (Nat Commun., 2018, doi: 10.1038/s41467-018-05666-8.). The active nematic model describes the motion of cells using the orientation field, Q, and the velocity field, u. The director field n with (n = −n) is employed to represent the nematic state, which has head-tail symmetry. However, in our experiments, actin filaments are obviously polarized, which polymerize and flow towards the direction of cell migration. Therefore, we choose active gel model which describes polarized actin field during cell migration. In the discussion part, we have provided the comparison between active gel model and motor-clutch model. We have also supplemented a short discussion between the present model and active nematic model in the main text of line 345-347:

      The active nematic model employs active extensile or contractile agents to push or pull the fluid along their elongation axis to simulate cells flowing (61). 

      (4) In the present model, actin flow contributes to cell migration while myosin distribution determines cell polarity. How does this model couple actin and myosin together?

      We thank the reviewer for this question. In our model, the polarization field is employed to couple actin and myosin together. It is obvious that actin accumulate at the front while myosin diffuses in the opposite direction. Therefore, we propose that actin and myosin flow towards the opposite direction, which is captured in the convection term of actin ) and myosin () density field.

    1. Author response:

      We want to thank the reviewers for their positive and constructive comments on the manuscript. We already addressed some of their concerns and are planning the following revisions to both BEHAV3D-TP and the corresponding manuscript to address the reviewers’ comments. Below, we provide a response to the most significant comments, followed by a detailed, point-by-point response:

      (1) We acknowledge the reviewer's suggestion to incorporate open-source segmentation and tracking functionalities, increasing its accessibility to a wider user base; however, these additions fall outside the primary scope of our current work and represent a substantial undertaking in their own right. This topic has been comprehensively explored in other studies (e.g. https://doi.org/10.4049/jimmunol.2100811 ; https://doi.org/10.7554/eLife.60547 ; https://doi.org/10.1016/j.media.2022.102358 ; https://doi.org/10.1038/s41592-024-02295-6), which we will cite in our revised manuscript as indicated in our responses to the reviewers’ comments. Instead, the goal of our manuscript is to provide an analytical framework for processing data generated by existing segmentation and tracking pipelines. In our analyses, we used data processed with Imaris, a commercial software that, despite its limitations, is widely used by the intravital microscopy community due to its user-friendly platform for 3D image visualization and analysis. Nevertheless, to enhance compatibility with tracking data from various pipelines, we have modified our tool to accept data formats, such as those generated by open-source Fiji plugins like TrackMate (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). These updates are available in our GitHub repository, and we will describe this feature in the revised manuscript to emphasize compatibility with segmented and tracked data from diverse open-source platforms.

      (2) We appreciate the reviewer’s suggestion to incorporate additional features into our analytical pipeline. In response, we have already updated the GitHub repository to allow users to input and select which features (dynamic, morphological, or spatial) they wish to include in the analysis (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#feature-selection ) . In the revised manuscript, we will highlight this new functionality and provide examples using alternative datasets to demonstrate the application of these features.

      (3) We appreciate the constructive feedback of reviewers #1 and #2 regarding the statistical analysis and interpretation of the data presented in Figures 3 and 4. We understand the importance of clarity and rigor in data analysis and presentation, and we are committed to addressing the concerns raised in the revised version of the manuscript.

      (4) We appreciate Reviewer #1's suggestion regarding the inclusion of demo data, as we believe it would greatly enhance the usability of our pipeline. We acknowledge that this was an oversight on our part. To address this, we have now added demo data to our GitHub repository (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler/tree/BEHAV3D_TP-v2.0/demo_datasets). In the upcoming revised manuscript, we will also ensure to reference this addition. Additionally, we will  provide both original and processed IVM movie samples to support users in navigating the complete pipeline effectively.

      (5) Finally, we agree with the reviewers to make some small changes to the manuscript based on their feedback.

      Below we provide a point-by-point response to the reviewers’ comments, along with proposed revisions.

      Reviewer #1:

      Comment: A key limitation of the pipeline is that it does not overcome the main challenges and bottlenecks associated with processing and extracting quantitative cellular data from timelapse and longitudinal intravital images. This includes correcting breathing-induced movement artifacts, automated registration of longitudinal images taken over days/weeks, and accurate, automated segmentation and tracking of individual cells over time. Indeed, there are currently no standardised computational methods available for IVM data processing and analysis, with most laboratories relying on custom-built solutions or manual methods. This isn't made explicit in the manuscript early on (described below), and the researchers rely on expensive software packages such as IMARIS for image processing and data extraction to feed the required parameters into their pipeline. This limitation unfortunately reduces the likely impact of BEHAV3D-TP on the IVM field.

      As highlighted above, the tool does not facilitate the extraction of quantitative kinetic cellular parameters (e.g. speed, directionality, persistence, and displacement) from intravital images. Indeed, to use the tool researchers must first extract dynamic cellular parameters from their IVM datasets, requiring access to expensive software (e.g. IMARIS as used here) and/or above-average computational expertise to develop and use custom-made open-source solutions. This limitation is not made explicit or discussed in the text.

      As mentioned previously, we agree with the reviewer that image processing steps, such as segmentation, tracking, and motion correction, present significant challenges in intravital microscopy (IVM) data processing. While these aspects are being addressed by other researchers, our publication centers on the analysis of acquired data rather than on the image processing itself. Our motivation, as outlined in the manuscript, arises from our own experience: despite the substantial effort invested in image processing, researchers often rely on simplistic analytical approaches, such as averaging single parameters and comparing them across conditions. These approaches tend to overlook potential tumor heterogeneity.

      Our work aimed to develop an analytical tool that provides a comprehensive framework for extracting more insights from processed IVM data, with a focus on two key aspects: capturing the heterogeneity of tumor behavior and examining the spatial distribution of these behaviors within the tumor microenvironment. In the revised manuscript, we will clarify the scope of our study, emphasizing its limitations as an analytical tool rather than an image-processing solution. Additionally, we will provide references to relevant literature on available (open-source) software options for image processing (e.g. Diego Ulisse Pizzagalli et al J Immunol (2022); Aby Joseph et al eLife (2020) ;Molina-Moreno M et al Medical Image Analysis (2022); Hidalgo-Cenalmor, I et al, Nat Methods  (2024); Ershov. D et al Nat Methods  (2022)).

      Regarding the reviewer’s comment on our use of Imaris, we acknowledge that Imaris is a costly commercial software. However, based on our experience, it is widely used by the intravital microscopy community due to its user-friendly interface for 3D image visualization and analysis. Despite its limitations in accuracy and the fact that it is not open-source, we believe that including data processed with Imaris will be valuable to the IVM community.

      However, to improve compatibility with data from other segmentation and tracking pipelines, we have already updated our tool to support formats generated by open-source Fiji plugins like TrackMate. These updates are available in our GitHub repository, and we will describe this functionality in detail in the revised manuscript to ensure compatibility with segmented and tracked data from various open-source platforms.

      Comment: The number of cells (e.g. per behavioural cluster), and the number of independent mice, represented in each result figure, is not included in the figure legends and are difficult to ascertain from the methods.

      We appreciate the reviewer's constructive feedback regarding the clarity of the number and type of replicates used in our analyses. In the revised manuscript, we will include detailed information in the figure legends regarding the number of cells (e.g., per behavioral cluster) and the number of independent mice represented in each result figure to ensure transparency.

      Comment: The data used to test the pipeline in this manuscript is currently not available, making it difficult to assess its usability. It would be important to include this for researchers to use as a 'training dataset'.

      As stated above we acknowledge that this was an oversight on our part and thank the reviewer for pointing this out. To address this, we have now added demo data to our GitHub repository (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler/tree/BEHAV3D_TP-v2.0/demo_datasets). In the upcoming revised manuscript, we will also make sure to reference this addition. Additionally, we intend to provide both original and processed IVM movie samples to support users in navigating the complete pipeline effectively.

      Comment: Precisely how the BEHAV3D-TP large-scale phenotyping module can map large-scale spatial phenotyping data generated using LSR-3D imaging data and Cytomap to 3D intravital imaging movies is unclear. Further details in the text and methods would be beneficial to aid understanding.

      We appreciate the reviewer’s comment and will provide additional details in the text and methods of the revised manuscript to clarify how the BEHAV3D-TP module maps LSR-3D and Cytomap data to 3D intravital imaging movies.

      Comment: The analysis provides only preliminary evidence in support of the authors' conclusions on DMG cell migratory behaviours and their relationship with components of the tumour microenvironment. Conclusions should therefore be tempered in the absence of additional experiments and controls.

      We appreciate the reviewer’s comment and acknowledge that our conclusions should be tempered due to the preliminary nature of our evidence. To be able to directly analyze the impact of the brain tumor microenvironment on cancer cell behavior, we will include a new set of analyses in the revised manuscript. Specifically, we will utilize BEHAV3D-TP to analyze existing IVM data from adult gliomas with and without macrophage depletion (Alieva et al, Scientific Reports, 2017; https://doi.org/10.1038/s41598-017-07660-4 ) to evaluate the differences in heterogeneous cell populations under these conditions. Since this analysis pertains to a different tumor type, we will revise our conclusions accordingly and emphasize the necessity for additional experiments and controls to further validate our findings on DMG cell migratory behaviors and their relationship with the tumor microenvironment.

      Reviewer #2:

      Comment: The strength of democratizing this kind of analysis is undercut by the reliance upon Imaris for segmentation, so it would be nice if this was changed to an open-source option for track generation.

      As noted in our previous response to Reviewer #1, we would like to point out that although Imaris is a commercial software, it is widely used in the intravital microscopy (IVM) community due to its user-friendly interface. One of its key advantages, which we also utilized, is semi-automated data tracking that allows for manual corrections in 3D—a process that can be more challenging in other open-source software with less effective data visualization.

      However, we recognize that enhancing our pipeline's compatibility with open-source options is important. To this end, we have already updated our tool to support data formats generated by open-source Fiji plugins like TrackMate, improving compatibility with various segmentation and tracking pipelines (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). We will describe these updates in the revised manuscript to clarify our study's scope and the available image processing options.

      Comment: The main issue is with the interpretation of the biological data in Figure 3 where ANOVA was used to analyse the proportional distribution of different clusters. Firstly the n is not listed so it is unclear if this represents an n of 3 where each mouse is an individual or whether each track is being treated as a test unit. If the latter this is seriously flawed as these tracks can't be treated as independent. Also, a more appropriate test would be something like a Chi-squared test or Fisher's exact test. Also, no error bars are included on the stacked bar graphs making interpretation impossible. Ultimately this is severely flawed and also appears to show very small differences which may be statistically different but may not represent biologically important findings. This would need further study.

      We appreciate the reviewer’s insightful comments regarding the interpretation of the biological data in Figure 3. To clarify, each mouse serves as an independent unit in this analysis. We believe that ANOVA is the appropriate test for comparing the proportions of different behavioral signatures across the tumor microenvironment (TME) regions identified by large-scale phenotyping. However, we acknowledge that using a stacked bar plot may have been misleading. While a Chi-squared test could show differences in the distribution of behavioral signatures, it would not indicate which specific signatures are responsible for those differences. Therefore, in the revised manuscript, we will retain the ANOVA analysis but will represent the proportions using a bar chart that clearly illustrates multiple conditions for each behavioral cluster. We also appreciate the reviewer’s concern regarding the transparency of our data. In the revised manuscript, we will include the number of replicates for all figures to enhance clarity and understanding.

      Comment:  Figure 4 has similar statistical issues in that the n is not listed and, again, it is unclear whether they are treating each cell track as independent which, again, would be inappropriate. The best practice for this type of data would be the use of super plots as outlined in Lord et al. (2020) JCI - SuperPlots: Communicating reproducibility and variability in cell biology.

      We appreciate the reviewer’s comments and suggestions regarding Figure 4. In the revised manuscript, we will clarify the number of replicates used and our approach to treating cell tracks as independent units. We will implement super-plots where appropriate, to enhance the communication of reproducibility and variability in our data.

      Comment: The main issue that this raises is that the large-scale phenotyping module and the heterogeneity module appear designed to produce these statistical analyses that are used in these figures and, if they are based on the assumption that each track is independent, then this will produce inappropriate analyses as a default.

      We appreciate the reviewer’s comment, though we find ourselves unsure about the specific concern being raised. To clarify, each mouse is treated as an independent unit in our analyses. For each large-scale phenotyping region, we measure the proportion of tumor cells displaying a specific behavioral phenotype independently for each mouse. These proportions are then used for statistical analysis. We hope this explanation provides clarity, and we will adjust the manuscript to better convey this methodology.

      Reviewer #3:

      Comment: The most challenging task of analyzing 3D time-lapse imaging data is to accurately segment and track the individual cells in 3D over a long time duration. BEHAV3D Tumor Profiler did not provide any new advancement in this regard, and instead relies on commercial software, Imaris, for this critical step. Imaris is known to have a very high error rate when used for analyzing 3D time-lapse data. In the Methods section, the authors themselves stated that "Tumor cell tracks were manually corrected to ensure accurate tracking". Based on our own experience of using Imaris, such manual correction is tedious and often required for every time step of the movie. Therefore, Imaris is not a satisfactory tool for analyzing 3D time-lapse data. Moreover, Imaris is expensive and many research labs probably can't afford to buy it. The fact that BEHAV3D Tumor Profiler critically depends on the faulty ImarisTrack module makes it unclear whether the BEHAV3D tool or the results are reliable.

      If the authors want to "democratize the analysis of heterogeneous cancer cell behaviors", they should perform image segmentation and tracking using open-source codes (e.g., Cellpose, Stardisk & 3DCellTracker) and not rely on the expensive and inaccurate ImarisTrack Module for the image analysis step of BEHAV3D.

      We appreciate the reviewer’s comments on the challenges of segmenting and tracking individual cells in 3D time-lapse imaging data. As mentioned previously, our primary focus is to develop an analytical tool for comprehensive data analysis rather than developing tools for image processing. To enhance accessibility, we have updated our tool to support data formats from open-source Fiji plugins, such as TrackMate, which will benefit users without access to commercial software (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ).

      While we recognize the limitations of Imaris, it remains widely used in the intravital microscopy community due to its user-friendly interface for 3D visualization and semi-automated segmentation capabilities. Since no perfect tracking method currently exist, we utilized Imaris for its ability to allow manual corrections of faulty tracks, ensuring the reliability of our results. This approach was the best available option when we began our analysis, allowing us to obtain accurate results efficiently.

      In the revised manuscript, we will clarify our methodology and provide information on both Imaris and alternative processing options to strengthen the reliability of our findings.

      Comment: The authors developed a "Heterogeneity module" to extract distinctive tumor migratory phenotypes from the cell tracks quantified by Imaris. The cell tracks of the individual tumor cells are all quite short, indicating relatively low motility of the tumor cells. It's unclear whether such short migratory tracks are sufficient to warrant the PCA analysis to identify the 7 distinctive migratory phenotypes shown in Figure 2d. It's also unclear whether these 7 migratory phenotypes correspond to unique functional phenotypes.

      For the 7 distinctive motility clusters, the authors should provide a more detailed analysis of the differences between them. It's unclear whether the difference in retreating, slow retreating, erratic, static, slow, slow invading, and invading correspond to functional difference of the tumor cells.

      While some tumor cells exhibit limited motility, indicated by short tracks, others demonstrate significant migratory capabilities. This variability in tumor cell behavior is a central focus of our analysis, and our tool is specifically designed to identify and distinguish these differences. Our PCA analysis effectively captures this variability, as illustrated in Figure 2 d-f. It differentiates between cells exhibiting varying degrees of migratory behavior, including both highly migratory and less migratory phenotypes, as well as their directionality relative to the tumor core and the persistence of their movements. Thus, we believe that our approach provides valuable insights into the distinct migratory phenotypes within the tumor microenvironment. We will clarify these aspects further in the revised manuscript to enhance the reader's understanding of our findings.

      While our current manuscript does not provide explicit evidence linking each motility cluster to functional differences among the tumor cells, it is important to note that the state of the field supports the idea that cell dynamics can predict cell states and phenotypes. Research conducted by ourselves (Dekkers, Alieva et al., Nat Biotech, 2023) and others, such as Craiciuc et al. (Nature, 2022) and Freckmann et al. (Nat Comm, 2022) has shown that variations in cell motility patterns are indicative of underlying functional characteristics. For instance, cell morphodynamic features have been shown to reflect differences in cell types, T cell targeting states, tumor metastatic potential, and drug resistance states. In the revised manuscript, we will reference relevant studies to underscore the biological significance of these behaviors. By doing so, we hope to clarify the potential implications of our findings and strengthen the overall narrative of our research.

      Comment: Using only motility to classify tumor cell behaviours in the tumor microenvironment (TME) is probably not sufficient to capture the tumor cell difference. There are also other non-tumor cell types in the TME. If the authors aim to develop a computational tool that can elucidate tumor cell behaviors in the TME, they should consider other tumor cell features, e.g., morphology, proliferation state, and tumor cell interaction with other cell types, e.g., fibroblasts and distinct immune cells.

      The authors should expand the scale of tumor behavior features to classify the tumor phenotype clusters, e.g., to include tumor morphology, proliferation state, and tumor cell interaction with other TME cell types.

      We believe that using dynamic features alone is sufficient to capture differences in tumor behavior, as demonstrated by our results in Figure 2. However, we appreciate the reviewer’s suggestion to consider additional features, such as cell morphology and interactions with other cell types, to finetune our analyses. To this end, we have adapted our pipeline to be compatible with various features present in the data (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler/tree/BEHAV3D_TP-v2.0?tab=readme-ov-file#feature-selection ). We will emphasize this in the revised manuscript. However, we would like to point out that not all features may provide informative insights and that a wide range of features can instead introduce biologically irrelevant noise, making interpretation more challenging. For instance, in 3D microscopy, the z-axis resolution is typically lower, which can lead to artifacts like elongation in that direction. Adding morphological features that capture this may skew the analysis. Therefore, we believe that incorporating additional features should be approached with caution. We will clarify these considerations in the revised manuscript to better guide users in utilizing our computational tool effectively. We will also reference the use of unbiased feature selection techniques, such as bootstrapping methods, to identify biologically relevant features based on the conditions provided (D.G. Aragones et al, Computers in Biology and Medicine (2024)).

      Comment: The authors have already published two papers on BEHAV3D [Alieva M et al. Nat Protoc. 2024 Jul;19(7): 2052-2084; Dekkers JF, et al. Nat Biotechnol. 2023 Jan;41(1):60-69]. Although the previous two papers used BEHAV3D to analyze T cells, the basic pipeline and computational steps are similar, in particular regarding cell segmentation and tracking. The addition of a "Heterogeneity module" based on PCA analysis does not make a significant advancement in terms of image analysis and quantification.

      We want to emphasize that we have no intention of duplicating our previous publications. In this manuscript, we have consistently cited our foundational papers, where BEHAV3D was first developed for T cell migratory analysis in in vitro settings. In the introduction, we clearly state that our earlier work inspired us to adopt a similar approach for analyzing cell behavior in intravital microscopy (IVM) data, addressing the specific needs and complexities of analyzing tumor cell behaviors in the tumor microenvironment.

      Importantly, our new work provides several key advancements: 1) a pipeline specifically adapted for intravital microscopy (IVM) data; 2) integration of spatial characteristics from both large-scale and small-scale phenotyping; and 3) a zero-code approach designed to empower researchers without coding skills to effectively utilize the tool. We believe that these enhancements represent meaningful progress in the analysis of cell behaviors within the tumor microenvironment which will be valuable for the IVM community. We will ensure that these points are clearly articulated in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 1: IPA analysis was performed after scRNA-seq. Although it is knowledge-based software with convenient graphic utilities, it is questionable whether an unbiased genome-level analysis was performed. Therefore, it is not convincing if WNT is the only and best signal for the branching-off marker. Perhaps independent approaches, such as GO, pathway, or module analyses, should be performed to validate the finding.

      Thanks for your comment. We agree with the reviewer that IPA is a knowledge-based and a hypothesis-driven method. Our hypothesis was that WNT/BMP pathways, among others, are heavily involved in the development of mesenchymal tissues in general and differentiation of tendons specifically. Therefore, we have looked at differentially expressed genes between clusters from a broad array of pathways featured in IPA that could point us towards molecular function that could make a difference. We further corroborated this hypothesis by using WNT inhibitors in subsequent experiments. To address this point, we have supplemented the discussion section with the following remark:

      “This study is not without limitations. The IPA network analysis is a knowledge-based and hypothesis driven platform. We have specifically targeted known pathways to be involved in syndetome differentiation. However, WNT signaling stood out with very specific affinity to the off-target populations and we have verified our findings with experiments proving this hypothesis.”

      Per the reviewer’s suggestion, we also performed a non-biased GO analysis (Supp. Fig. 6). Multiple pathways were detected in the three clusters of interest (Supp. Fig. 6A-C), including integrin-related and TGFβ-related pathways. However, in these three clusters of interest, WNT signaling was also detected as a prominent pathway. Therefore, we could conclude that it plays a pivotal role in the differentiation process. This hypothesis was later corroborated with WNT inhibitor experiments.

      Comment 2: According to the method section, two iPSC lines were used for the study. However, throughout the manuscript, it is not clearly described which line was used for which experiment. Did they show similar efficiency in differentiation and in responses to WNTi? It is also worrisome if using only two lines is the norm in the stem cell field. Please provide a rationale for using only two lines, which will restrict the observation of individual-specific differential responses throughout the study.

      Thanks for your comment. This proof-of-concept study is the first investigation that compares data of an in vitro tenogenic induction protocol that has been tested in more than one human iPSC lines. We agree that line-specific phenomena are difficult to interpret and reproduce. Therefore, it is critical to provide data supporting that the findings can be reproduced in more than one line. Some early studies used one line as proof of concept, however now we realize the need to show that the protocol works in at least one additional line.

      Here we used the GMP-ready iPSC line CS0007iCTR-n5 for all optimization experiments. This newer low passage feeder-free line was generated from PBMCs and was designated as GMP-ready in the manuscript because it has been derived and cultured using cGMP xeno-free components (mTESR plus medium and rhLaminin-521 matrix substrate instead of Matrigel). We then wanted to confirm the application of the optimized protocol using the reference control line CS83iCTR-22n1 which has already been more widely used by our group1-5 and others.6 This line has been derived from fibroblasts and has been grown and expanded using MatrigelTM and mTESR1, followed by mTESR plus media. 

      The question of number of lines needed is stage-dependent. In our opinion at the proof-of-concept level, two lines, one of which has been generated in GMP-like conditions is sufficient. Confirmation with multiple lines becomes more pertinent as we move towards scale-up/manufacturing, where considerations regarding robustness and consistency are raised. However, at this stage, it is crucial to understand the developmental processes that are involved in cell differentiation to ensure a more robust protocol can be modified and adapted later. In future studies, as we move towards clinical translation, it is warranted that the approach presented in this work will be further optimized and subsequently evaluated using at least 3 different cell lines that have been generated from various sources.

      Comment 3: How similar are syndetome cells with or without WNTi? It would be interesting to check if there are major DEGs that differentiate these two groups of cells.

      Thanks for your comment. Single cell RNAseq analysis revealed that treatment with WNTi upregulated tenogenic markers. In SYNWNTi, the expression levels of stage-specific markers COL1A1, COL3A1, SCX, MKX, DCN, BGN, FN1, and TNMD were higher compared to the untreated SYN group, as shown in Figure 5C. Density plots depicted an increase in the number of cells expressing COL1A1, COL3A1, SCX and TNMD in SYNWNTi compared to the SYN group, as illustrated in Figure 5D. Trajectory analysis of the WNTi-treated group revealed the absence of bifurcations observed in the untreated group (Fig. 5E). Therefore, it can be conjured that syndetome cells with and without WNTi are different.

      Comment 4: Please discuss the improvement of the current study compared to previous ones (e.g., PMID 36203346 my study, 35083031- Tsutsumi, 35372337- Yoshimoto).

      Thanks for your comment. In Papalamprou et al (2023)3, we differentiated iPSCs to mesenchymal stromal-like cells (iMSCs), which were then cultured into a 2D dynamic bioreactor for 7 days. In that study, we examined the impact of simultaneous overexpression of the tendon transcription factor Scleraxis (SCX) using a lentiviral vector and mechanical stimulation on the process of tenogenic differentiation. Following 7 days of uniaxial cyclic loading, we observed notable modifications in the morphology and cytoskeleton organization of iPSC-derived MSCs (iMSCs) overexpressing SCX. Additionally, there was an increase in extracellular matrix (ECM) deposition and alignment, along with upregulation of early and late tendon markers. This proof-of-concept study showed that iPSC-derived MSCs could be a viable cell candidate for cell therapy applications and that mechanical stimulation is contributing to the differentiation of iMSCs towards the tenogenic lineage.

      Similarly, Tsutsumi et al7 overexpressed the tendon transcription factor Mohawk (MKX) stably in iPSC-derived MSCs using lentiviral vectors. These cells were then used to seed collagen hydrogels which were mechanically stimulated in a cyclic stretch 3D culture bioreactor for 15 days to create artificial tendon-like tissues, which the authors termed “bio-tendons”. Bio-tendons were then decellularized to remove cellular remnants from the xenogeneic human iPSC-derived cells and were subsequently transplanted in an in vivo Achilles tendon rupture mouse model. The authors reported improved histological and biomechanical properties in the Mkx-bio-tendon mice vs. the GFP-bio-tendon controls, providing another proof-of-concept study in favor of the utilization of iPSC-derived MSCs for tendon cell therapies, while also addressing the immunogenicity of cells of allogeneic/xenogeneic origin. Therefore, the above two studies used tendon transcription factor overexpression and mechanical loading either in 2D or 3D to differentiate MSCs towards the tendon/ligament lineage.

      Yoshimoto et al8 optimized a stepwise iPSC to tenocyte induction protocol using a SCX-GFP transgenic mouse iPSC line, by monitoring GFP expression over time. The group performed scRNA-seq to characterize the induction of mesodermal progenitors towards the tenogenic lineage and to shed light into their developmental trajectory. That study unveiled that Retinoic Acid (RA) signaling activation enhanced chondrogenic differentiation, which was in contrast to the study of Kaji et al (2021), which also used a SCX-GFP mouse iPSC line. Kaji et al inhibited TGF and BMP signaling during the process of mesodermal induction and reported that RA signaling eliminated SCX induction entirely and promoted a switch to neural fate. Yoshimoto et al suggested that variations in mesodermal cell identity could be due to the different methods used for mesodermal differentiation. In contrast to the Kaji et al study, Yoshimoto et al opted to stimulate WNT and block the Hedgehog pathway during mesoderm induction. Loh et al (2016) identified the branchpoint from the primitive streak to either the paraxial mesoderm (PSM) or the lateral plate mesoderm (LPM) as the result of two mutually exclusive signaling conditions. Specifically, they reported that induction of PSM was achieved through BMP suppression and WNT stimulation, while the specification of lateral mesoderm was accomplished by BMP stimulation and WNT suppression, all with concurrent TGFβ suppression/FGF stimulation. Lastly, a similar approach towards PSM induction from primitive streak (TGF off/BMP off/ WNT on/FGF on) has been used by many subsequent studies Matsuda et al (2020),9 Wu et al (2021)10 and Nakajima et al (2021).11 The diversity of the above-mentioned approaches points to the plasticity of mesodermal progenitors and the need for additional studies to better understand mesodermal specification and subsequent induction towards sclerotome and syndetome.   

      In the current study we optimized a stepwise differentiation protocol using xeno-free cGMP ready media and two different cell lines, one of which was cGMP-ready. We used scRNA-seq to characterize the differentiation, which led us to identify off-target cells that were closer to a neural phenotype. We performed pathway analyses and hypothesized that WNT signaling activity might have contributed to the emergence of the off-target cells. To test this, we used a WNT inhibitor (PORCN) to block WNT activity at the SCL stage and at the SYN stage. We found that blockade of WNT signaling at the end of the SM stage and during SCL and SYN induction resulted in a more homogeneous population, while eliminating the neural-like cell cluster. This is the first study that utilized scRNA-seq to shed light into the developmental trajectory of stepwise iPSC to tendon differentiation of human iPSCs and provided a proof-of-concept for the generation of a more homogeneous syndetome population. Further studies are needed to further fine-tune both the process and the final product, as well as elucidate the functionality of iPSC-derived syndetome cells in vitro and in vivo.

      Reviewer 2:

      General concerns: The authors demonstrated the efficiency of syndetome induction solely by scRNA-seq data analysis before and after pathway inhibition, without using e.g. FACS analysis or immunofluorescence (IF)-staining based assessment. A functional assessment and validation of the induced cells is also completely missing.

      We appreciate and agree with the reviewer’s critique regarding further analyses of differentiated iPSC-derived syndetome-like cells, including functional assessment of the differentiated cells. Immunofluorescence was used at all timepoints of induction for phenotype confirmation (Fig. 2,4). Flow cytometry for DLL1 was utilized to benchmark efficient differentiation to PSM (Loh et al,12 Nakajima et al11. Specifically, DLL1 expression was assessed with flow cytometry after 4 days of induction, and was used to optimize the parameter of initial iPSC aggregate seeding density, which has been previously found to be crucial for in vitro differentiation protocols (Loh et al12). Unfortunately, this parameter is usually not reported although it could be critical to establish protocol replication between different lines.

      The function of tendon progenitors is usually reported as response to mechanical cues and the ability to regenerate tendon injuries. In future studies we intend to assess the functionality of the generated syndetome and tendon progenitors and their response to in vitro biomechanical stimulation as previously reported to iMSCSCX+ cells3, 13 and in vivo in a critical tendon defect  similarly to what has been previously reported.2 

      Comment 1: Notably, in Figure 1D, certain PSM markers (TBXT, MSGN1, WNT3A) show higher expression on day 3. If the authors initiate SM induction on day 3 instead of day 4, could this potentially enhance the efficiency of syndetome-like cell induction?

      Thanks for your comment. In the current work, we initially optimized differentiation to PSM via expression of DLL1, whose gene expression peaked at d4. We found that this was influenced by the initial iPSC aggregate seeding density. We wanted to generate a homogeneous DLL1+ population which we assessed via gene expression, flow cytometry, IF and scRNA-seq (Fig. 1D, 2C, 3C and Suppl. Fig.1). Given the fact that different lines might display a diverse developmental timeline, we also confirmed reproducibility of the protocol with a second cell line. We appreciate the reviewer’s suggestion to investigate additional protocol iterations, such as the proposed one at the PSM stage, as we move towards a better understanding of key developmental events during in vitro induction.

      Comment 2:  In the third paragraph of the result section the authors note, "Interestingly, SCX, a prominent tenogenic transcription factor, was significantly downregulated at the SCL stage compared to iPSC, but upregulated during the differentiation from SCL to SYN." Despite this increase, the expression level of SCX in SYN remains lower than that in iPSCs in Fig.1G and Fig.3C. Can the authors provide an explanation for this? Can the authors provide IF data using iPSCs and compare it with in vitro-induced SYN cells? Can the authors provide e.g. additional scRNA-seq data which could support this statement?

      Thank you for your comment. In Fig. 1G, SCX expression in SYN was upregulated compared to SCL, however, it was shown to be similar to iPSCs. This suggests a baseline stochastic expression of SCX possibly stemming from spontaneous differentiation of iPSCs in culture (Fig. 3C). Previous research has shown that tenogenic marker gene expression tends to reduce during postnatal tendon maturation (Yin et al., 2016b14 Grinstein et al., 2019.15 Yoshimoto et al (2022) utilized a transgenic mouse iPSC-SCX-GFP line  to track SCX expression. It was shown that SCX expression peaked after 7d of tenogenic induction and was then decreased at day 14, which marked the end of tenogenic induction. The authors postulated that this pattern of gene expression could either indicate further maturation of tenocytes at subsequent time points, or that the number of non-tenogenic cells increased from T7 to T14.

      In the present work, we showed SCX gene expression upregulation in SYN compared to SCL, as well as significant upregulation of TNMD, EGR1, COL1A1 and COL3A1 (Fig.1G). Supp. Fig.8 has been added to show feature plots of SCX and TNMD expression from SCL, SYN and SYNWNTi.  The significant upregulation of later markers of tenogenic differentiation suggests that the 21 days of tenogenic induction might have matured the cells. Since gene expression analysis only conveys a snapshot of the transcriptional profile of a cell population, it is likely that we might have missed the peak of SCX upregulation (Supp. Fig. 5). Following treatment with the WNT inhibitor, the SYNWNTi group displayed increased SCX expression (% cells expressing SCX) compared to SYN, which might also be due to a more homogeneous population of syndetome-like cells following treatment with WNTi. In the SYNWNTi group, TNMD was shown to be expressed in the SYN cluster, whereas SCX was mostly found in the cluster that was labelled as fibrocartilage (FC) cluster based on the expression of COL2A1/SOX9/FN1/BGN/COL1A1 markers. Due to the fact that SCX+/SOX9+ progenitor cells are able to give rise to both tendon and cartilage (Sugimoto 2013)16, it could be postulated that this cluster contains tendon progenitors. Interestingly, the FC cluster was not observed in the second iPSC line that we tested, which resulted in a more homogeneous induction to syndetome (78.5% vs. 66.9% SYN cells, Supp. Table 1 & Supp. Fig.3). This slight discrepancy between the two lines and more specifically the presence of the FC cluster only in the 007i line, warrants further investigation. Taken together, these data indicate that the tenogenic induction duration could likely be shortened. Further work to assess the time course of SCX expression over the entire tenogenic induction could be used to further optimize the in vitro induction. For instance, a human edited iPSCSCX-GFP+ line could be generated and used to track SCX expression during the entire induction.

      Comment 3: In the fourth paragraph of the result section the authors state, "SM markers (MEOX1, PAX3) and SCL markers (PAX1, PAX9, NKX3.2, SOX9) were upregulated in a stepwise manner." However, the data for MEOX1 and NKX3.2 seems to be missing from Figure 3B-C. The authors should provide this data and/or additional support for their claim.

      Thanks for your comment. Feature plots for MEOX1 and NKX3.2 have been added to the Supplemental information (Supp. Fig. 9).

      Comment 4: In Figures 2B and 2E, the background of the red channel seems extremely high. Are there better images available, particularly for MEOX1? Given the expected high expression of MEOX1 in SM cells, the authors should observe a strong signal in the nucleus of the stained somitic mesoderm-like cells, but that is not the case in the shown figure. The authors should provide separate channel images instead of merged ones for clarity. The antibody which the authors used might not be specific. Can the authors provide images using an antibody which has been shown to work previously e.g. antibody by ATLAS (Cat#: HPA045214)?

      As requested by the reviewer, we have provided separate channels for those images in the Supplement (Supp. Fig. 7). The images show relatively high expression of these markers in SM cells.

      Comment 5: In Fig. 2C and Supplementary Fig. 1, the authors present data from immunofluorescence (IF) staining and FACS analysis using a DLL1 antibody. While FACS analysis indicates an efficiency of 96.2% for DLL1+ cells, this was not clearly observed in their IF data. How can the authors explain this discrepancy? Could the authors quantify their IF data and compare it with the corresponding FACS data?

      Thanks for your comment. We performed flow cytometric analysis of DLL1 expression to optimize cell seeding density using the 007i line. In the present study, we used IF only in a qualitative manner, that is to confirm protein expression of selected markers. It could be noted that the use of poly-lysine coated coverslips, which are needed for IF, might have slightly altered the density of the cells on the coverslip vs. the plate. Lastly, it cannot be ruled out that the different substrate could have influenced their phenotype differentially through matrix interactions and signaling. On the other hand, flow cytometry by nature is a quantitative and single cell approach, whereas IF staining is qualitative. Therefore, for the purpose of this proof-of-concept work, we tend to trust the quantitative data from the flow cytometry results more than semi-quantitative confirmation achieved through IF staining using coverslips. 

      Comment 6: In Fig. 2G, PAX9 is expected to be expressed in the nucleus, but the shown IF staining does not appear to be localized to the nucleus. Could the authors provide improved or alternative images to clarify this? The authors should use antibodies shown to work with high specificity as already reported by other groups.

      Thanks for your comment. Indeed, the staining seems to be mostly cytoplasmic. We have used antibodies that were previously reported3 and repeated the staining, however, the same results were replicated. We can speculate that this transcription factor has additional role in the iPSC-derived cells and might be traveling to the cytoplasm. Unfortunately, we have no evidence to this phenomenon.  

      Comment 7: Why did the authors choose to display day 10 data for SYN induction in Fig. 4A? Could they provide information about the endpoint of their culture at day 21?

      Thank you for your comment. In Fig. 1G we provided gene expression analyses results for several selected early and later tendon markers for the endpoint of our culture, that is day 21. Following scRNA-seq at each stage of the differentiation (iPSC at d0, PSM at d4, SM at d8, SCL at d11 and the endpoint day 32 for SYN), we performed DEG analysis using the IPA platform. We identified activation of genes associated with the WNT signaling pathway in the off-target clusters. We hypothesized that WNT pathway inhibition might block the formation of unwanted fates and induce a more homogeneous differentiation outcome. We thus tested a WNT inhibitor and compared the inhibitor-treated group with a non-treated group. We then assessed selected neural markers during the course of the inhibitor application. In Fig. 4A we presented gene expression of key selected markers at day 21 using qPCR, which was approximately in the middle of the syndetome induction. Since we observed that the inhibitor downregulated the selected neural markers, we then applied the inhibitor until the endpoint of the initial induction and proceeded to analyze the results using scRNA-seq (Fig. 5). Lastly, it should be acknowledged that this was a proof-of-concept study, and additional optimizations are needed regarding the application of the inhibitor (timing, duration, concentration, etc).

      Comment 8: In Supplementary Fig. 5, the authors depicted the expression level of SCX, a SYN marker, which peaked at day 14 and then decreased. By day 21, it reached a level comparable to that of iPSCs. Given this observation, could the authors provide a characterization of the cells at day 21 during SYN induction using IF? What was the rationale behind selecting 21 days for SYN induction? The authors also need to show 'n numbers'; how many times were the experiments repeated independently (independent experiments)?

      Thanks for your comment. During the optimization process, we initially used RT-qPCR to track gene expression of selected tenogenic markers using the 007i line. We found that after 21 days of tenogenic induction there was upregulation of the few established tendon markers, that is COL1A1, COL3A1, EGR1 and quite importantly, the more definitive later tendon marker, TNMD. Thus, we decided to proceed with this protocol prior to testing other compounds including the WNT inhibitor WNT-C59. However, as has been discussed in the manuscript, this extended tenogenic induction resulted in cell attrition without the application of the WNT inhibitor. This phenomenon was ameliorated following WNT inhibition. Thus, it could be postulated that the protocol could be further optimized by shortening tenogenic induction to less than 21 days.

      The experiments that were conducted to optimize the differentiation process were repeated independently at least n=3 times using qPCR and IF using two lines, that is the 007i and the 83i line as described in the manuscript. The scRNAseq analysis represents a population of cells from in vitro differentiation that originated from the same donor line, therefore it was performed on n=1 sample at each stage. However, the effects of inhibitor application (sample SYNWNTi) were also confirmed using a second cell line (83i), thus a total of n=2 independent samples were analyzed.  

      Comment 9: Overall the shown immunofluorescence (IF) data does not appear convincing. Could the authors please provide clearer images, including separate channel images, a bright field image, and magnified views of each staining?

      Thanks for your comment. The separate channels images were added to the supplemental data (Supp. Fig. 7). We agree with the reviewer regarding the limitations of IF staining, especially with the added confounding factor of using poly-lysine coated coverslips. We would like to point out, that in the current work IF staining is not the main finding or the primary outcome measure, and that it is only used to further support the differentiation by providing a qualitative assessment of protein presence and localization. We describe in this paper our thesis regarding the limitations of IF and the need for more high-throughput unbiased approaches to quantification when using IF staining. For instance, spatial transcriptomics combined with mass cytometry or flow cytometry could be used for a more unbiased approach. Thus, in the present manuscript we based our conclusion on the quantitative gene expression, single cell sequencing and flow cytometry.

      Comment 10: As stated by the authors in the manuscript, another research group performed FACS analysis to assess the efficiency of syndetome induction using SCX antibody, and/or quantification of immunofluorescence (IF) with SCX, MKX, COL1A1, or COL2A1 antibodies. Could the authors conduct a comparative analysis of syndetome induction efficiency both before and after protocol optimization, utilizing FACS analysis in conjunction with an SCX reporter line or antibody staining, e.g. quantifying induction efficiency via immunofluorescence (IF) staining with syndetome-specific marker genes?

      Thank you for your comment. As discussed in a previous comment, we agree with the reviewer that the generation of a human iPSC-SCX-GFP line would shed light into SCX expression over the entire course of induction. In the current work we used IF as qualitative confirmation of specific marker expression and we showed the presence of SCX, MKX, COL1 and COL3 in SYNWNTi as well as the absence of neuronal markers. As we also pointed it out in the present manuscript, IF can only be considered as a semi-quantitative assessment burdened with several technical limitations as well as operator bias and lower sensitivity and accuracy compared to flow cytometry or scRNA-seq, unless performed in a more unbiased manner. To further clarify this point, firstly, using poly-lysine coated coverslips for IF staining, results in a different substrate environment compared to the Geltrex-coated plates that were used for the induction. Additionally, we noticed that cells grew overconfluent at the edges of the coverslips. This is an important point, since as we have observed in this work, seeding density is critical for the reproducibility of the protocol. It could further be postulated that a different cell substrate stiffness might also have an effect on this process. In our opinion, in this context IF should rather be used qualitatively and a combination of flow cytometry with scRNAseq should be utilized to draw quantitative conclusions such as induction efficiencies of a certain cell type. Since we also observed inconsistencies with the SCX antibodies we tested, the generation of edited human iPSC lines (such as SCX-GFP, MKX-GFP and TNMD-GFP) would be the preferred approach to further explore the efficiency of differentiation.

      Comment 11: To enhance the paper's significance, the authors should conduct functional validation experiments and proper assessment of their induced syndetome-like cells. They could perform e.g. xeno-transplantation experiments with syndetome cells into SCID-mice or injury models. They could also assess whether the in vitro induced cells could be applied for in vitro tendon/ligament formation.

      Thanks for your comment. For the purpose of this proof-of-concept in vitro study, our primary goal was to initially evaluate a stepwise tenogenic induction protocol using GMP-ready cell lines and chemically defined media. Then, we wanted to utilize the analytical power of scRNA-seq in order to characterize and optimize the protocol, thus focusing on one developmental stage that is not well understood, that of syndetome specification from sclerotome, and hypothesized that by fine-tuning the WNT pathway we would be able to generate a more homogeneous syndetome cell population. We fully agree with the reviewer that the warranted next steps should be to conduct several functional validation experiments, such as in vitro 2D/3D tendon/ligament formation and in vivo transplantation in allogeneic or xenogeneic injury models.

      Comment 12: The authors should also compare their scRNA-seq data with actual human embryo data sets, something which could be done given the recent increase in available human embryo scRNA-seq data sets.

      This is a great idea and intriguing study. Unfortunately, not all data sets are available at the moment and specifically embryonic and MSK scRNA-seq data is very scarce, although growing. We have no access to data sets from human tendon development, and thus will have to leave this comparison for future studies.

      Reviewer 3:

      Comment 1: The data outlining the differences between the differentiation outcome of the two tested iPSCs is intriguing, but the authors fail to comment on potential differences between the two iPSC lines that could result in drastically different cell outputs from the same differentiation protocol. This is a critically important point, as the majority of the SCX+ cells generated from the 007i cells using their WNTi protocol were found in the FC subpopulation that failed to form from the 83i line under the same protocol. From the analysis of only these 2 cell lines in vitro, it is difficult to assess whether this WNTi protocol can be broadly used to generate tenogenic cells.

      Thanks for your comment. This proof-of-concept study is the first investigation that compares data of an in vitro tenogenic induction protocol that has been tested into more than one cell lines. Using unsupervised clustering we identified 11 clusters, which were classified into 6 cell subpopulations. The only observed difference between the two lines was a small subset that was labeled as fibrocartilage (FC), which displayed expression of both tenogenic and chondrogenic markers. This subpopulation was observed in 007i line but not in the 83i line at the end of the SYN induction. Importantly, DEG analysis also showed that it was enriched for SCX. It has been shown that SCX+/SOX9+ progenitors are a distinct multipotent cell group, responsible for the development of SCX−/SOX9+ chondrocytes and SCX+/SOX9− tenocytes/ligamentocytes (Sugimoto 2013)16. As noted in a previous comment (Comment 2 from Reviewer 1), we might have missed SCX upregulation during the 21-day syndetome induction. This can be further supported by Fig. 5E trajectory analysis which shows that this subpopulation (FC) precedes the SYN cell subpopulation. The fact that this subpopulation was present in one line but not the other, might indicate that 83i line resulted in a more mature tendon population. Therefore, we would rather posit that in the case of 83i line, it might not be that the FC subpopulation failed to form, but rather that it was missed in our scRNAseq endpoint analysis which showed that a more homogeneous SYN population was formed (8.7 % in 007i vs. 0.26 % in 83i, Supp. Table 1 & Supp. Fig. 3B). Future studies are warranted to characterize the SYN induction timeline as it pertains to SCX expression followed up by maturation from tenogenic progenitor to tenocytes.

      Comment 2: The authors make claims to changes in protein expression but fail to quantify either fluorescence intensity or percent cell expression from their immunofluorescence analyses to substantiate these claims. These claims are not fully supported by the data as presented as it is unclear whether there is increased expression of tendon markers at the protein level or more cells surviving the protocol. Additionally, in images where 3 channels are merged, it would be helpful to show individual channels where genes are shown in similar spectra (ie. Fig 2I SCX/MKX). Furthermore, the current layout and labelling scheme of Figure 4 makes it very difficult to compare conditions between SYN and SYNWNTi protocols.

      Thanks for your comment. Protein expression at each stage was verified with immunofluorescence cytochemistry whereby cells were cultured onto poly-lysine coated coverslips, which were then fixed, stained and imaged (Fig. 2). However, prior to WNT inhibitor application, we noticed gradual cell attrition in the cultures at the end of differentiation (Fig. 1B, 2I). The images show qualitative differences with and without the WNT inhibitor. This could be attributed to the heterogeneity of the cell population at SCL stage, which was confirmed by scRNA-seq (Fig. 3A). As it has been discussed previously (Reviewer 2 comments 5 & 9), in the current paper we didn’t provide any IF quantitative analysis because of the qualitative nature of the staining technique. In future work another high-resolution imaging modality will be considered like single cell proteomics and flow cytometry or mass cytometry in order to perform a more unbiased quantitative single cell analysis across different stages and samples. Furthermore, we have added single channel images in the supplemental information.

      Comment 3: Individual data points should also be presented for all qPCR experiments (ie. Fig 4A). Biological replicate information is missing from several experiments, particularly the immunofluorescence data, and it is unclear whether the qPCR data was generated from technical or biological replicates.

      Thanks for your comment. We have added additional information regarding replicates in each figure legend. We have also changed Fig. 4A.

      (1) Glaeser JD, Bao X, Kaneda G, et al. iPSC-neural crest derived cells embedded in 3D printable bio-ink promote cranial bone defect repair. Sci Rep. Nov 4 2022;12(1):18701. https://www.ncbi.nlm.nih.gov/pubmed/36333414

      (2) Kaneda G, Chan JL, Castaneda CM, et al. iPSC-derived tenocytes seeded on microgrooved 3D printed scaffolds for Achilles tendon regeneration. J Orthop Res. Oct 2023;41(10):2205-2220. https://www.ncbi.nlm.nih.gov/pubmed/36961351

      (3) Papalamprou A, Yu V, Chen A, et al. Directing iPSC differentiation into iTenocytes using combined scleraxis overexpression and cyclic loading. J Orthop Res. Jun 2023;41(6):1148-1161. https://www.ncbi.nlm.nih.gov/pubmed/36203346

      (4) Sheyn D, Ben-David S, Tawackoli W, et al. Human iPSCs can be differentiated into notochordal cells that reduce intervertebral disc degeneration in a porcine model. Theranostics. 2019;9(25):7506-7524. https://www.ncbi.nlm.nih.gov/pubmed/31695783

      (5) Später T, Kaneda G, Chavez M, et al. Retention of Human iPSC-Derived or Primary Cells Following Xenotransplantation into Rat Immune-Privileged Sites. Bioengineering. 2023;10(9):1049. https://www.mdpi.com/2306-5354/10/9/1049

      (6) Sareen D, O'Rourke JG, Meera P, et al. Targeting RNA foci in iPSC-derived motor neurons from ALS patients with a C9ORF72 repeat expansion. Sci Transl Med. Oct 23 2013;5(208):208ra149. https://www.ncbi.nlm.nih.gov/pubmed/24154603

      (7) Tsutsumi H, Kurimoto R, Nakamichi R, et al. Generation of a tendon-like tissue from human iPS cells. J Tissue Eng. Jan-Dec 2022;13:20417314221074018. https://www.ncbi.nlm.nih.gov/pubmed/35083031

      (8) Yoshimoto Y, Uezumi A, Ikemoto-Uezumi M, et al. Tenogenic Induction From Induced Pluripotent Stem Cells Unveils the Trajectory Towards Tenocyte Differentiation. Front Cell Dev Biol. 2022;10:780038. https://www.ncbi.nlm.nih.gov/pubmed/35372337

      (9) Matsuda M, Yamanaka Y, Uemura M, et al. Recapitulating the human segmentation clock with pluripotent stem cells. Nature. Apr 2020;580(7801):124-129. https://www.ncbi.nlm.nih.gov/pubmed/32238941

      (10) Wu CL, Dicks A, Steward N, et al. Single cell transcriptomic analysis of human pluripotent stem cell chondrogenesis. Nat Commun. Jan 13 2021;12(1):362. https://www.ncbi.nlm.nih.gov/pubmed/33441552

      (11) Nakajima T, Nakahata A, Yamada N, et al. Grafting of iPS cell-derived tenocytes promotes motor function recovery after Achilles tendon rupture. Nat Commun. Aug 18 2021;12(1):5012. https://www.ncbi.nlm.nih.gov/pubmed/34408142

      (12) Loh KM, Chen A, Koh PW, et al. Mapping the Pairwise Choices Leading from Pluripotency to Human Bone, Heart, and Other Mesoderm Cell Types. Cell. Jul 14 2016;166(2):451-467. https://www.ncbi.nlm.nih.gov/pubmed/27419872

      (13) Yu V, Papalamprou A, Sheyn D. Generation of Induced Pluripotent Stem Cell-Derived iTenocytes via Combined Scleraxis Overexpression and 2D Uniaxial Tension. JoVE. 2024/03/01 2024(205):e65837. https://app.jove.com/65837

      (14) Yin Z, Hu JJ, Yang L, et al. Single-cell analysis reveals a nestin(+) tendon stem/progenitor cell population with strong tenogenic potentiality. Sci Adv. Nov 2016;2(11):e1600874. https://www.ncbi.nlm.nih.gov/pubmed/28138519

      (15) Grinstein M, Dingwall HL, O'Connor LD, Zou K, Capellini TD, Galloway JL. A distinct transition from cell growth to physiological homeostasis in the tendon. Elife. Sep 19 2019;8. https://www.ncbi.nlm.nih.gov/pubmed/31535975

      (16) Sugimoto Y, Takimoto A, Akiyama H, et al. Scx+/Sox9+ progenitors contribute to the establishment of the junction between cartilage and tendon/ligament. Development. Jun 2013;140(11):2280-2288. https://www.ncbi.nlm.nih.gov/pubmed/23615282

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors want to elucidate which are the mechanisms that regulate the immune response in physiological conditions in cortical development. To achieve this goal, authors used a wide range of mutant mice to analyse the consequences of immune activation in the formation of cortical ectopia in mice.

      Strengths:

      The authors demonstrated that Abeta monomers are anti-inflammatory and inhibit microglial activation. This is a novel result that demonstrates the physiological role of APP in cortical development.

      Weaknesses:

      -On the other hand, cortical ectopia has been already described in mouse models in which the amyloid signalling has been disrupted (Herms et al., 2004; Guenette et al., 2006), making the current study less novel.

      We agree these previous studies have implicated amyloid precursor protein in cortical ectopia. However, since these studies use whole-body knockouts, they have not implicated the functional roles of specific cell types.  Nor have they identified the specific mechanisms underlying the formation of this unique class of cortical ectopia. In contrast, our studies show that the disruption of a novel Abeta-regulated signaling pathway in microglia is the primary cause of ectopia formation in this class of ectopia mutants. This is the first time that microglia have been specifically implicated in the development of cortical ectopia. We further show that elevated MMP activity and resulting cortical basement membrane degradation is the underlying mechanism leading to ectopia formation.  This is also the first time that MMP activity and basement membrane degradation (instead of maintenance) have been implicated in cortical ectopia development. As such, our results have provided novel insights into the diverse mechanisms underlying cortical ectopia formation in developmental brain disorders.

      One of the molecules analysed is Ric8a, a GTPase activator involved in neuronal development. Authors used the conditional mutant mice Emx1-Ric8a to delete Ric8a from early progenitors and glutamatergic neurons in the pallium. Emx1-Ric8a mutant mice present cortical ectopias and authors attributed this malformation to the increase in inflammatory response due to Ric8a deletion in microglia. Several discordances do not fit this interpretation:

      - The role of Ric8a in cortical development and function has been already described in several papers, but none of them has been cited in the current manuscript (Kask et al., 2015, 2018; Ruisu et al., 2013; Tonissoo et al., 2006).

      We have included reference to the published works on ric8a in cortical development in revision.

      - Ectopia formation in the cortex has been already described in Nestin-Ric8a cKO mice (Kask et al., 2015). In the current manuscript, authors analyzed the same mutant mice (Nestin-Ric8a), but they did not detect any ectopia. Authors should discuss this discordance.

      The expression pattern of nestin-cre is known to vary dependent on factors including transgene insertion site, genetic background, and sex. Early studies show, for example, that the nestin gene promoter drives cre expression in many non-neural tissues in another transgenic line in the FVB/N genetic background (Dubois et al Genesis. 2006 Aug;44(8):355-60. doi: 10.1002/dvg.20226).  The specific nestin-cre line used in Kask et al 2015 has also been shown to be active in brain microglia and lead to increased microglia pro-inflammatory activity upon breeding to a conditional allele of a cholesterol transporter gene (Karasinska et al., Neurobiol Dis. 2013 Jun:54:445-55; Karasinska et al.,  J Neurosci. 2009 Mar 18; 29(11): 3579–3589; Takampri et al., Brain Res. 2009 May 13:1270:10-8). These factors may in part underlie the apparent discrepancy.  We have now incorporated this discussion into the revision.

      - Authors claim that microglia express Emx1, and therefore, Ric8a is deleted in microglia cells. However, the arguments for this assumption are very weak and the evidence suggests that this is not the case. This is an important point considering that authors want to emphasise the role of Ric8a in microglia activation, and therefore, additional experiments should demonstrate that Ric8a is deleted in microglia in Emx1-Ric8a mutant mice.

      We have observed altered mRNA expression of several genes in purified microglia cultured from the emx1-cre mutants (Supplemental Fig. 8), which indicates that ric8a is deleted from microglia and suggests a role of microglial ric8a deficiency in ectopia formation.  This interpretation is further strengthened by the observation that deletion of ric8a from microglia using a microglia-specific cx3cr1-cre results in similar ectopia (Fig. 2). We also have other data supporting this interpretation, including data showing induction of the expression of a cre reporter in brain microglia by emx1-cre and loss of ric8a gene expression in microglia cells isolated from emx1-cre mutants. These data have now been incorporated into the text and in revised Supplemental Fig. 8 (new panels c-c” & d).

      Reviewer #2 (Public Review):

      Kwon et al. used several conditional KO mice for the deletion of ric8a or app in different cell types. Some of them exhibited pial basement membrane breaches leading to neuronal ectopia in the neocortex.

      They first investigated ric8a, a Guanine Nucleotide Exchange Factor for Heterotrimeric G Proteins. They observed the above-mentioned phenotype when ric8a is deleted from microglia and neural cells (ric8a-emx1-cre or dual deletion with cre combination cx3cr1 (in microglia) and nestin (in neural cells)) but not in microglia alone or neural cells alone (whether it is in CR cells (ric8a-Wnt3a-cre), post-mitotic neurons (nex-cre or dlx5/6-cre), or in progenitors and their progeny (nestin-cre or foxg1-cre). They also show that ric8a KO mutant microglia cells stimulated in vitro by LPS exhibit an increased TNFa, IL6 and IL1b secretion compared to controls (Fig 2). They therefore injected LPS in vivo and observed the neuronal ectopia phenotype in the ric8a-cx3cr1-cre (microglial deletion) cortices at P0 (Fig 2). They suggest that ric8a KO in neuronal cells mimics immune stimulation (but we have no clue how ric8a KO in neural cells would induce immune stimulation).

      We agree we do not currently know the precise mechanisms by which mutant microglia are activated in the mutant brain.  However, this does not affect the conclusion that deficiency in the Abeta monomer-regulated APP/Ric8a pathway in microglia is the primary cause of cortical ectopia in these mutants, since we have shown that genetic disruption of this pathway in microglia alone by targeting different pathway components, using cell type specific cre, in several different approaches, all results in similar cortical ectopia phenotypes.  Regarding the source of the immunogens, there are several possibilities which we plan to investigate in future studies. For example, the clearance of apoptotic cells and associated cellular debris is an important physiological process and deficits in this process have been linked to inflammatory diseases throughout life (Doran et al., Nat Rev Immunol. 2020 Apr;20(4):254-267; Boada-Romero et al., Nat Rev Mol Cell Biol. 2020 Jul;21(7):398-414.).  In the embryonic cortex, studies have shown that large numbers of cell death take place starting as early as E12 (Blaschke et al., Development. 1996 Apr;122(4):1165-74; Blaschke et al., J Comp Neurol. 1998 Jun 22;396(1):39-50).  Studies have also shown that radial glia and neuronal progenitors play critical roles in the clearance of apoptotic cells and associated cellular debris in the brain (Lu et al., Nat Cell Biol. 2011 Jul 31;13(9):1076-83; Ginisty et al., Stem Cells. 2015 Feb;33(2):515-25; Amaya et al., J Comp Neurol. 2015 Feb 1;523(2):183-96). Moreover, Ric8a-dependent heterotrimeric G proteins have been found to specifically promote the phagocytic activity of both professional and non-professional phagocytic cells (Billings et al., Sci Signal. 2016 Feb 2;9(413):ra14; Preissler et al., Glia. 2015 Feb;63(2):206-15; Pan et al. Dev Cell. 2016 Feb 22;36(4):428-39; Flak et al. J Clin Invest. 2020 Jan 2;130(1):359-373; Zhang et al., Nat Commun. 2023 Sep 14;14(1):5706).  Thus, it is probable that the failure to promptly clear up apoptotic cells and debris by mutant radial glia may play a role in triggering mutant microglial activation in ric8a-emx1-cre mutants. We have now included these possibilities in the text of the revised manuscript. However, the precise mechanisms remain to be determined in future studies, which, however, do not affect the conclusion of the current study.

      The authors then turned their attention on APP. They observed neuronal ectopia into the marginal zone when APP is deleted in microglia (app-cxcr3-cre) + intraperitoneal LPS injection (they did not show it, but we have to assume there would not be a phenotype without the injection of LPS) (Fig 3). (The phenotype is similar but not identical to ric8a-cx3cr1-cre + LPS. They suggest that the reason is because they had to inject 3 times less LPS due to enhanced immune sensitivity in this genetic background but it is only a hypothesis). After in vitro stimulation by LPS, app mutant microglia show a reduced secretion of TNFa and IL6 but not IL1b (this is the opposite to ric8a-cx3cr1-cre microglia cells) while peritoneal macrophages in culture show increased secretion of TNFa, IL1, IL6 and IL23 (fig 3 and Suppl. Fig 9).

      We have data showing that that app-cxcr3-cre mutants without LPS injection do not show ectopia, which has now been included in the revised supplemental Fig. 9 (new panels c-d).  The reason we employ LPS injection is, in the first place, that we do not see a phenotype without the injection. We agree, and have also stated in the text, that the phenotype of the app mutants is not as severe as that of the ric8a mutant.  Besides the low LPS dosage used, we also suggest that other app family members may compensate since the ectopia in the app family gene mutants reported previously were only observed in app/aplp1/2 triple knockouts, not even in any of the double knockouts (Herms et al., 2004). We have further clarified this point in the text. These possibilities are also not mutually exclusive. Nonetheless, the results clearly show that microglia specific app mutation causes cortical ectopia upon embryonic immune stimulation. They have thus implicated a specifical role of microglial APP in cortical ectopia formation.

      The different response of ric8a and app mutant microglia to LPS results from in vitro culturing of microglia. We have shown that, when acutely isolated macrophages are used, these mutants show changes in the same direction (both increased cytokine secretion) (Fig. 4).  This demonstrates without culturing app mutant microglial lineage cells indeed behave in the same way as ric8a mutant cells.

      The microglia used for analysis in in vitro assays in this study have all been cultured for two weeks before assay. They have thus been under chronic stimulation exposed to dead cells and debris in the culture dish through this period.  Previous studies have shown that dependent on the degree of perturbation to the inflammation-regulating pathways, such exposures can differentially affect microglial cytokine expression, sometimes in an opposite direction from expected.  For example, under chronic immune stimulation, while the trem2+/- microglia, which are heterozygous mutant for the anti-inflammatory Trem2, show elevated pro-inflammatory cytokine expression (as is expected), trem2-/- (null) microglia under the same conditions instead not only do not show increases but for some pro-inflammatory cytokines, actually show decreases in expression (Sayed et al.,, Proc Natl Acad Sci U S A. 2018 Oct 2;115(40):10172-10177).  In several systems, Ric8a-dependent heterotrimeric G proteins have been shown to act downstream of APP and mediate one of the branches of the signaling activated by APP (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).  Indeed, APP cytoplasmic domain is known to also bind to and signalig through several other proteins including FE65, Mena, and TIP60 (Cao & Sudhof, Science 2001. 293:115-120).  It is likely that in microglia Ric8a-dependent heterotrimeric G proteins may also mediate only a subset of the signaling downstream of APP.  As such, app knockout in microglia may have more severe effects on microglial anti-inflammatory regulation than ric8a knockout.  As a result, upon chronic immune activation, app knockout may lead to a microglial phenotype similar to the trem2 null mutation phenotype as discussed above, while ric8a knockout leads to a phenotype similar to trem2+/- phenotype). This may explain the subdued TNF and IL6 secretion by cultured app (but not ric8a) mutant microglia.

      Amyloid beta (Ab) being one of the molecules binding to APP, the authors showed that Ab40 monomers (they did not test Ab40 oligomers) partially inhibit cytokines (TNFa, IL6, IL1b, MCP-1, IL23a, IL10) secretion in vitro by microglia stimulated by LPS but does not affect secretion by microglia from app-cx3cr1-cre (tested for TNFa, IL6, IL1b, IL23a, IL10) (Fig 4, Suppl fig 10) (but still does it in aplp2-cx3cr1-cre) and does not affect secretion by ric8a-cx3cr1-cre microglia (tested for TNFa and IL6 but still suppress IL1b) (Therefore here is another difference between app and ric8a KO microglia).

      We have tested the effects of Abeta40 oligomers, which induce instead of suppressing microglial cytokine secretion, and have included the data (new panel j in supplemental Fig. 10).  As mentioned above, in several systems, Ric8a-dependent heterotrimeric G proteins have been shown to act downstream of APP and mediate one of the branches of the signaling activated by APP (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).  We assume that this is likely also true in microglia and that Ric8a-dependent heterotrimeric G proteins may mediate a subset and only a subset of the signaling downstream of APP.  This may explain the difference in the effects of app and ric8a knockout mutation in abolishing the anti-inflammatory effects of Abeta monomers on IL-1b vs TNF/IL-6.  This difference also suggests that TNF/IL-6 and IL-1b secretion must be regulated by different mechanisms in microglia. Indeed, it is well established in immunology that the secretion of IL1b, but not of TNF or IL6, is regulated by inflammasome-dependent mechanisms (see, for example, Proz & Dixit. Nat Rev Immunol. 2016 Jul;16(7):407-20. doi: 10.1038/nri.2016.58).

      The authors injected inhibitors of Akt or Stat3 in the ric8a-emx1-cre cortex and found it suppressed neuronal ectopia (Fig 5, Suppl fig 11). It is not clear whether it suppresses immune stimulation from neuronal cells or immune reaction from microglia cells.

      We agree at present the pharmacological approaches we have taken are not able to distinguish these possibilities.  However, no matter which is the case, our results still implicate a role of excessive microglial activation in the formation of cortical ectopia and support the conclusion of the study.  Thus, while worthwhile of further investigation, this question does not impact the conclusion of the current study. Furthermore, as mentioned, we plan to determine the mechanisms of how ric8a mutation in neural cells induces immune activation in future studies. These results will likely enable us to more specifically address this question.

      Finally, the authors examined the activities of MMP2 and MMP9 in the developing cortex using gelatin gel zymography. The activity and protein levels of MMP9 but not MMP2 in the ric8a-emx1-cre cortex were claimed significantly increased (Fig 5, Suppl fig 12). Unfortunately, they did not show it in the app-cx3cr1-cre +LPS mouse. They make a connection between ric8a deletion and MMP9 but unfortunately do not make the connection between app deletion and MMP9, which is at the center of the pathway claimed to be important here). Then they injected BB94, a broad-spectrum inhibitor of MMPs or an inhibitor specific for MMP9 and 13. They both significantly suppress the number and the size of the ectopia in ric8a mutants (Fig5).

      For all the gelatin gel zymography analysis, we quantify protein concentrations in the cortical lysates using the Bio-Rad Bradford assay kit and load the same amounts of proteins per lane. The results across lanes are all directly comparable. From the quantification, our results clearly show that MMP9 activity levels are increased in the mutants (we have now included whole gel images and quantification in a new supplemental Figure 13).  The similar levels of MMP2 in all lanes also provide an internal control further supporting the observation of a specific change in MMP9.  For this analysis, we focus on the ric8a-emx1-cre mutants since the app-cx3cr1-cre +LPS animals show ectopia only in only subsets of mutants and in most cases only in one of the hemispheres.  Experiments examining potential changes in MMP9 are therefore unlikely to yield meaningful results.  On the other hand, we have clearly shown that the administration of different classes of MMP inhibitors significantly eliminate ectopia in ric8a-emx1-cre mutants. This has strongly implicated a functional contribution of MMPs.

      After reading the manuscript, I still do not know how ric8a in neural cells is involved in the immune inhibition. Is it through the control of Ab monomers? In addition, the authors did not show in vivo data supporting that Ab monomers are the key players here. As the authors said, this is not the only APP interactor. Finally, I still do not know how ric8a is linked to APP in microglia in the model.

      As detailed above, there are several possibilities including potential deficits in the clearance of apoptotic cells and associated debris that may trigger microglial activation in ri8ca-emx1-cre mutants. We will investigate these possibilities in future studies.  We have now incorporated these possibilities in the revised text.  As for the role of Abeta monomers, we have indicated that we currently do not have evidence that in the developing cortex Abeta monomers play a role in inhibiting microglia.  We have also indicated in the manuscript that our conclusion is that a microglial signaling pathway that is activated by Abeta monomers in vitro regulates normal brain development in vivo, not that Abeta monomers themselves regulate brain development.  Regarding the link between Ric8a and APP, the reviewer has missed several major lines of supporting evidence. For example, we have shown that Abeta monomers activate a pathway in microglia that inhibits the secretion of several proinflammatory cytokines including TNF, IL-6, IL-10, and IL-23 (Figure 4 and Supplemental Figures 8-10).  This inhibition is abolished when either app or ric8a gene is deleted from microglia.  This clearly indicates that app and ric8a act in the same genetic pathway (the pathway activated by Abeta monomers) in microglia. We also show that this Abeta monomer-activated pathway also inhibits the transcription of several cytokines in microglia.  This inhibition is also abolished when either app or ric8a gene is deleted from microglia.  This reinforces the conclusion that app and ric8a act in the same pathway in microglia.  Furthermore, cell type specific deletion of app or ric8a from microglia in vivo also results in similar phenotypes of cortical ectopia. Together, these results strongly support the conclusion that app and ric8a act in the same pathway that is activated by Abeta monomers in vitro in microglia. This conclusion is also consistent with published findings that Ric8a dependent heterotrimeric G proteins bind to APP and mediate subsets of APP signaling across different species (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).         

      While several of the findings presented in this manuscript are of potential interest, there are a number of shortcomings. Here are some suggestions that could improve the manuscript and help substantiate the conclusions:

      (1) As the title suggests it, the focus is on Ab and APP functions in microglia. However, the analysis is more focused on ric8a. The connection between ric8a and APP in this study is not investigated, besides the fact that their deletion induces somewhat similar but not identical phenotypes. Showing a similar phenotype is not enough to conclude that they are working on the same pathway. The authors should find a way to make that connection between ric8a and app in the cells investigated here.

      As discussed above, the reviewer misses several major lines of evidence showing that APP and Ric8a acts in the same pathway in microglia.  Besides the similarity of the ectopia phenotypes, for example, we have shown that Abeta monomers activates a pathway in microglia that inhibits the secretion of several proinflammatory cytokines including TNF, IL-6, IL-10, and IL-23 (Figure 4 and Supplemental Figures 8-11).  These inhibitory effects are abolished when either app or ric8a gene is deleted from microglia.  This clearly indicates that app and ric8a act in the same genetic pathway, a pathway that is activated by Abeta monomers in vitro, in microglia. We also show that this Abeta monomer-activated pathway inhibits the transcription of several cytokine genes in microglia.  These effects are again abolished when either app or ric8_a gene is deleted from microglia.  This further reinforces the conclusion that _app and ric8a act in the same pathway in microglia.  Not only so we also show that the same results are true in macrophages.  Thus, these results strongly support the conclusion that app and ric8a act in the same genetic pathway in microglia. This conclusion is also consistent with published findings that Ric8a dependent heterotrimeric G proteins biochemically bind to APP and mediate subsets of APP signaling across different species (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).  

      (2) This would help to show the appearance of breaches in the pial basement membrane leading to neuronal ectopia; to investigate laminin debris, cell identity, Wnt pathway for app-cxcr3-cre + LPS injection as you did for ric8a-emx1-cre.

      We have now provided further data on pial basement membrane breaches in the app-cxcr3-cre + LPS animals (new panels e-f” in supplemental Fig 9).  We have not observed any changes in cell identity or Wnt pathway activity in ric8a-emx1-cre mutants.  It is thus of limited value to examine potential changes in these areas in the app-cxcr3-cre + LPS animals.   

      (3) As a control, this would help to show that app-cxcr3-cre without the LPS injection does not display the phenotype.

      We have the data on app-cx3cr1-cre mutants without LPS injection, which show no ectopia.  We have now included the data in the revised supplemental Fig. 9 (new panels c-d).

      (4) This would help to show the activity and protein levels of MMP9 and MMP2 and perform the rescue experiments with the inhibitors in the app-cx3cr1-cre cortex +LPS.

      As discussed above, we focus analysis on the ric8a-emx1-cre mutants since app-cx3cr1-cre +LPS animals show ectopia in only a subset of mutants and in most cases only in one of the hemispheres.  Determining potential changes in MMP9 levels and effects of MMP inhibitors are therefore not likely to yield meaningful data.  On the other hand, we have shown that MMP9 levels are increased and administration of different classes of MMP inhibitors eliminate cortical ectopia in ric8a-emx1-cre mutants.  We have also shown a similar break in the basement membrane in app-cx3cr1-cre +LPS animals (new panels e-f” in supplemental Fig 9). These results together strongly implicates a role played by MMPs.

      (5) Is MMP9 secreted by microglia cells or neural cells?

      Our in situ hybridization data show MMP9 is most highly expressed in a sparse microglia-like cell population in the embryonic cortex, suggesting that microglia may be a major source of MMP9. We have incorporated these data in a new supplemental Fig. 12 (panel a). The precise identity of these cells, however, requires further validation.

      (6) The in vitro evidence indicates that one of the multiple APP interactors, ie Ab40 monomers, is less effective in suppressing the expression of some cytokines by microglia cells mutants for ric8a (TNFa and IL6 but still suppress IL1b) or APP (TNFa, IL6, IL1b, IL23a, IL10) when compared to WT. But there are other interactors for APP. In order to support the claim, it seems crucial to have in vivo data to show that Ab40 monomers are the molecules involved in preventing the breach in the pial basement membrane.

      As addressed in detail above, we have indicated that our conclusion is that a microglial signaling pathway that is activated by Abeta monomers in vitro regulates normal brain development in vivo, not that Abeta monomers themselves regulate brain development in vivo.  We currently do not have evidence that the Abeta monomers play a role in inhibiting microglia during cortical development.  There are candidate ligands for the pathway in the developing cortex, the functional study of which, however, is a major undertaking beyond the scope of the current study.

      (7) In order to claim that this is specific to Ab40 monomers and not oligomers, it is necessary to show that the Ab40 oligomers do not have the same effect in vitro and in vivo. Also, an assay should be done to show that your Ab preparations are pure monomers or oligomers.

      We have tested the effects of Abeta40 oligomers, which induce instead of suppressing microglial cytokine secretion, and have included these data in revision in a new panel j in supplemental Fig. 10. The protocols we use in preparing the monomers and oligomers are standard protocols employed in the field of Alzheimer’s disease research. They have been repeatedly optimized and validated over the past decades.  

      (8) Most of the cytokine secretion assays used microglia cells in culture. Two results draw my attention. Ric8a deletion increases TNFa and IL6 secretion after LPS stimulation in vitro on microglia cells while app deletion decreases their secretion. Then later, papers show that the decrease in IL1b induced by Ab on microglia cells is prevented by APP deletion but not ric8a deletion. Those two pieces of data suggest that ric8a and APP might not be in the same pathway. In addition, the phenotype from app-cxcr3-cre + LPS injection and ric8a-cxcr3-cre + LPS injection are not exactly the same. It could be due to the level of LPS as the author suggests or it might not be. More experiments are needed to prove they are in the same pathway.

      As discussed above, the reviewer misses several major lines of evidence, which strongly support the conclusion that APP and Ric8a act in the same pathway activated by Abeta monomers in microglia (see detailed discussion in point 1 above).  The differential response of TNFa/IL-6 of app and ric8a mutant microglia likely results from chronic immune stimulation during in vitro culturing, which is known to alter microglial cytokine response (see detailed discussion in point 9 below). We have demonstrated that this is indeed the case by showing that, without culturing, acutely isolated app and ric8a mutant macrophages both display elevated TNFa/IL-6 secretion (Figure 4). 

      Regarding the different regulation of TNF/IL-6 vs IL-1b by APP and Ric8a, as discussed above, in several systems, Ric8a-dependent heterotrimeric G proteins (which are degraded in ric8a mutant cortices, see new supplemental Fig. 9) have been shown to act downstream of APP and mediate one of the branches of the signaling activated by APP (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).  This is likely also the case in microglia and Ric8a-dependent heterotrimeric G proteins may mediate only a subset of the anti-inflammatory signaling activated by APP.  As such, app, mutation may abolish all the inhibitory effects of Abeta monomers (both those on TNF/IL-6 and those on IL-1b), but ric8a mutation may abolish only a subset only those on TNF/IL-6 but not those on IL-1b).  This also suggests that the secretion of TNF/IL-6 and IL-1b must be regulated by different mechanisms in microglia.  Indeed, it is well established in immunology that the secretion of IL1b, but not that of TNF or IL6, is regulated by inflammasome-dependent mechanisms (see, for example, Proz & Dixit. Nat Rev Immunol. 2016 Jul;16(7):407-20. doi: 10.1038/nri.2016.58).

      (9) How do the authors reconcile the reduced TNFa and IL6 secretion upon stimulation of app mutant microglia with the model where app is attenuating immune response in vivo? Line 213 says that microglia exhibit attenuated immune response following chronic stimulation but I don't know if 3 hours of LPS in vitro is a chronic stimulation.

      The reviewer has misunderstood.  The microglia used in this study have all been cultured in vitro for approximately two weeks before assay. They have thus been under chronic stimulation exposed to dead cells and debris in the culture dish.  Dependent on the degree of perturbation to the inflammation-regulating pathways, such exposures are known to change microglial cytokine expression, sometimes in an opposite direction than expected.  For example, under chronic immune stimulation, while the trem2+/- microglia, which are heterozygous mutant for the anti-inflammatory Trem2, show elevated pro-inflammatory cytokine expression, trem2-/- (null) microglia under the same conditions instead not only do not show increases but for some pro-inflammatory cytokines, actually show decreases in expression (Sayed et al.,, Proc Natl Acad Sci U S A. 2018 Oct 2;115(40):10172-10177).  As mentioned, in several systems, Ric8a-dependent heterotrimeric G proteins have also been shown to bind to APP and mediate one of the branches of the signaling activated by APP (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9). Thus, it is likely that in microglia, Ric8a-dependent heterotrimeric G proteins also mediate only a subset of the anti-inflammatory signaling activated by APP.  As such, app knockout in microglia may have more severe effects than ric8a knockout on microglial immune activation, resembling the relationship between trem2 null vs heterozygous mutation discussed above. As such, it is predicted that chronic immune stimulation such as in vitro culturing will result in attenuated pro-inflammatory cytokine expression in app mutant microglia but elevated cytokine expression in ric8a mutant microglia. This may explain why TNF and IL6 secretion by cultured app mutant microglia is subdued, but acutely isolated _a_pp mutant macrophages instead show increased cytokine secretion. The latter may be more representative of the response of app mutant microglia in the absence of chronic stimulation.

      (10) Line 119: In their model, the authors suggest that there is a breach in pial basement membrane but that the phenotype is different from the retraction of the radial fibers due to reduced adhesion. So, could the author discuss to what substrate the radial fibers are attached to, in their model where the pial surface is destroyed?

      Radial glial endfeet normally bind to the basement membrane via cell surface receptors including the integrin and the dystroglycan protein complexes. We observe free radial glial endfeet at the breach sites, apparently without attachment to any basement membrane.  However, we cannot exclude the possibility that there may be residual, broken-off basement membrane components bound to the endfeet that are not detected by the methodology employed. 

      (11) The authors should show that the increased cytokine secretion observed in vitro is also happening in vivo in ric8a-emx1-cre compared to WT mice and compared to ric8a-nestin-cre mice. Or when app is deleted in microglia (app-cxcr3-cre) + LPS injection compared to WT mice +LPS.

      Unfortunately, this is not technically feasible since it is not possible to extract the extracellular (secreted) fractions of cytokines from an embryonic brain without causing cell lysis and the release of the intracellular pool.  This, however, does not affect our conclusion that the Abeta monomer-regulated microglia pathway plays a key role in regulates normal brain development since its genetic disruption, by different approaches, clearly results in brain malformation.

      (12) The authors injected inhibitors of Akt or Stat3 in the ric8a-emx1-cre cortex and found that it suppressed neuronal ectopia (Fig 5, Suppl fig 11). Does it suppress immune stimulation from neuronal cells or immune reaction from microglia cells?

      As discussed above, we agree at present the pharmacological approaches we have taken are not able to distinguish these two possibilities.  However, whichever is true, it does not affect our conclusion.  Also, we plan to determine the mechanisms of how ric8a mutation in neural cells induce immune activation in future studies. These results will likely enable us to adopt specific approaches to address this question.

      (13) Fig 5 and Supplementary fig 12: Please show a tubulin loading control in Fig 5i as you did in suppl fig 12 d (gel zymography). Please provide a gel zymography showing side by side Control, mutant and mutant +DM/S3I treatment. The same request for the MMP9 staining. Please provide statistics for control vs mutant for suppl fig 12c and d..

      We have now included whole gel zymography images with four control and four mutant individual samples as well as quantification in a new supplemental Fig.13 (panels b-c). This clearly shows increases in MMP9, while the MMP2 levels appear similar between controls and mutants. For all of the experiments of gelatin gel zymography, we quantify protein concentrations in the cortical lysates using the Bio-Rad Bradford assay kit and load the same amounts of proteins per lane. The results across lanes are thus all comparable.  The MMP9 staining images for the controls and mutants have also all been taken with the same parameters on the microscope and can be directly compared.  The statistics have now been provided as suggested.

      (14) Please provide the name and the source of the MMP9/13 inhibitor used in this study.

      This inhibitor is MMP-9/MMP-13 inhibitor I (CAS 204140-01-2), from Santa Cruz Biotechnology. This information has been included in revision.

      (15) The results show that deletion of ric8a in microglia and neural cells induced pia membrane breaches but no phenotype is apparent in ric8a deletion in microglia or neural cells alone. Then, the results showed that intraperitoneal injection of LPS induced the phenotype in ric8a-cxcr3-cre mutants. It would be beneficial as a control supporting the model to show that the insult induced by LPS injection does not induce the phenotype in the ric8a-foxg1-cre mice.

      We agree it may potentially be useful to show that LPS injection does not induce ectopia in ric8a-foxg1-cre mice.  Unfortunately, since the ric8a-foxg1-cre mutation shows no phenotype, we are no longer in possession of this line.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - The information in the abstract and the introduction is only related to app. So, it is very abrupt how authors start the manuscript studying the role of Ric8a, with no information at all about this protein and why the authors want to investigate this role in microglial activation. Later in the manuscript, the authors tried to link Ric8a with app to study the role of app in the inflammatory response and ectopia formation. This link is quite weak as well.

      In the last paragraph of the Introduction, we explain the use of the ric8a mutant and how it leads to discovery of the Abeta monomer-regulated pathway. We have now improved the writing in revision to make these points especially the link between APP and Ric8a-regulated G proteins more clear.  In the Results section, we have also improved the writing on the potential link of Ric8a to APP by highlighting, among others, the fact that ric8a and app pathway mutants are among a unique group of a few mouse mutants (ric8a, app/aplp1/2, and apbb1/2) that show cortical ectopia exclusively in the lateral cortex, while all other cortical ectopia mutants also show severe ectopia are at the cortical midline.  This suggests that similar mechanisms may underlie the ectopia formation in this small group of mutants.

      -In order to validate the mouse model, double immunofluorescence or immunofluorescence+in situ hybridization should be performed to show that microglia express ric8a and that is eliminated in the Emx1-Ric8a mutant mice.

      As mentioned above, we have additional lines of evidence showing that ric8a is deleted from microglia in emx1-cre mutants. This includes data showing induction of the expression of a cre reporter in brain microglia by emx1-cre and loss of ric8a mRNA expression in microglia cells isolated from emx1-cre mutants.  These data have now been included in revised supplemental Fig. 8.

      -In Supplemental Fig. 6, the authors claimed that cell proliferation is normal in Ric8a mutant mice without doing any quantification. They also quantified the angle of mitotic division of progenitors in the ventricular zone, but there are no images for the spindle orientation quantification, and no description of how they did it. In addition, this data is contrary to what has already been published in conditional Ric8a mutant mice (Kask et al., 2015). The Vimentin staining should be improved.

      We have provided quantification of cell proliferation (phospho-histone 3 staining at the ventricular surface) in revised supplemental Fig. 6g, which shows no significant differences in the number of positive cells. We have also provided details on the definition of the angle of cleavage plane orientation in revised supplemental Fig. 6h and in the Methods section.  We are not sure why the results are different from the other study. We were indeed anticipating deficits in mitotic spindle orientation and spent major efforts in the analysis of this potential deficit.  However, based on the data, we could not draw the conclusion.     

      -Analysis of the MMP9 expression should be done by western blot and not by immunofluorescence. In fact, the MMP9 expression shown in Figure 5g,h, does not correspond with RNA expression shown in gene expression atlas like genepaint or the allen atlas, doubting the specificity of the antibody. The expression of Mmp9 is quite low or absent in the cortex at E13.5-E14.5, making this protein very unlikely to be responsible for laminin degradation during development.

      We have performed gelatin gel zymography on MMP2/9, which shows increased MMP9 activity levels in the mutant cortex. This is similar to Western blot analysis (all lanes are loaded with the same amounts of cortical lysates).  We have now included whole gel zymography images with four control and four mutant individual samples as well as quantification in a new supplemental Fig.13 (panels b-c).  The immunofluorescence staining of MMP9, a different type of analysis, was designed as a complementary approach, the results of which also support the interpretation of increases in MMP9 protein.  Regarding MMP9 RNA expression, please also note that MMP9 is secreted, and the protein expression pattern is expected to be different from that of RNA. We have performed wholemount in situ using dissected E13.5 mouse forebrains.  Our data (in new supplemental Fig.13a) show that MMP9 mRNA is strongly expressed in a sparse population of cells many of which appear to align along blood vessels. We suspect these are microglial lineage cells populating the embryonic cortex at this stage (see, for example, Squarzoni et al., Cell Rep. 2014 Sep 11;8(5):1271-9. doi: 10.1016/j.celrep.2014.07.042.).  Our control in situ using a Tnc5 probe also shows that the MMP9 signal is not a result of nonspecific probe binding.  Since the MMP9 expressing cells are very sparse even in the wholemount specimens while most database RNA in situ expression data are obtained using thin sections, we suspect this may be why the signal may have been missed in the databases.  As for functional contributions, we agree that we cannot rule roles played by other MMPs.  However, based on the ectopia suppression data, our results clearly indicate a critical contribution by MMP9/13.

      For MMP9 activity, authors should show the whole membrane with a minimum of three control and three mutant individual samples and with the quantification.<br /> - The graphs should be improved, including individual values and titles of the Y axes.

      We have included whole membrane zymography images with four control and four mutant individual samples as well as quantification in a new supplemental Fig.13b-c.  The graphs have also been improved as suggested.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We are grateful to the reviewers for their positive assessment of the revised version of the article.

      Please find below our answers to the last, minor comments of the reviewers.

      We thank the reviewer for this important comment. In our live imaging experiments, we actually tracked the dorsal and ventral borders of the omp:yfp positive clusters in control and sly mutant embryos. These measurements showed that the omp:yfp positive clusters are more elongated along the DV axis in mutants as compared with control siblings, as seen on fixed samples (data not shown), suggesting that this difference in tissue shape is not due to fixation.

      Reviewer #4 (Public review):

      Summary:

      In this elegant study XX and colleagues use a combination of fixed tissue analyses and live imaging to characterise the role of Laminin in olfactory placode development and neuronal pathfinding in the zebrafish embryo. They describe Laminin dynamics in the developing olfactory placode and adjacent brain structures and identify potential roles for Laminin in facilitating neuronal pathfinding from the olfactory placode to the brain. To test whether Laminin is required for olfactory placode neuronal pathfinding they analyse olfactory system development in a well-established laminin-gamma-1 mutant, in which the laminin-rich basement membrane is disrupted. They show that while the OP still coalesces in the absence of Laminin, Laminin is required to contain OP cells during forebrain flexure during development and maintain separation of the OP and adjacent brain region. They further demonstrate that Laminin is required for growth of OP neurons from the OP-brain interface towards the olfactory bulb. The authors also present data describing that while the Laminin mutant has partial defects in neural crest cell migration towards the developing OP, these NCC defects are unlikely to be the cause of the neuronal pathfinding defects upon loss of Laminin. Altogether the study is extremely well carried out, with careful analysis of high-quality data. Their findings are likely to be of interest to those working on olfactory system development, or with an interest in extracellular matrix in organ morphogenesis, cell migration, and axonal pathfinding.

      Strengths:

      The authors describe for the first time Laminin dynamics during the early development of the olfactory placode and olfactory axon extension. They use an appropriate model to perturb the system (lamc1 zebrafish mutant), and demonstrate novel requirements for Laminin in pathfinding of OP neurons towards the olfactory bulb.

      The study utilises careful and impressive live imaging to draw most of its conclusions, really drawing upon the strengths of the zebrafish model to investigate the role of laminin in OP pathfinding. This imaging is combined with deep learning methodology to characterise and describe phenotypes in their Laminin-perturbed models, along with detailed quantifications of cell behaviours, together providing a relatively complete picture of the impact of loss of Laminin on OP development.

      Weaknesses:

      Some of the statistical tests are performed on experiments where n=2 for each condition (for example the measurements in Figure S2) - in places the data is non-significant, but clear trends are observed, and one wonders whether some experiments are under-powered.

      We initially planned the electron microscopy experiments in order to analyse 3 embryos per genotype per stage. However, because of technical issues we could not perform the measurements in all the cases, explaining why we have n = 2 in some of the graphs. The trends were quite clear, so we chose to keep these data in the article. We believe they nicely complement the immunostaining data assessing basement membrane integrity in control and mutant embryos.


      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors describe the dynamic distribution of laminin in the olfactory system and forebrain. Using immunohistochemistry and transgenic lines, they found that the olfactory system and adjacent brain tissues are enveloped by BMs from the earliest stages of olfactory system assembly. They also found that laminin deposits follow the axonal trajectory of axons. They performed a functional analysis of the sly mutant to analyse the function of laminin γ1 in the development of the zebrafish olfactory system. Their study revealed that laminin enables the shape and position of placodes to be maintained late in the face of major morphogenetic movements in the brain, and its absence promotes the local entry of sensory axons into the brain and their navigation towards the olfactory bulb. 

      Strengths: 

      - They showed that in the sly mutants, no BM staining of laminin and Nidogen could be detected around the OP and the brain. The authors then elegantly used electron microscopy to analyse the ultrastructure of the border between the OP and the brain in control and sly mutant conditions. 

      - To analyse the role of laminin γ1-dependent BMs in OP coalescence, the authors used the cluster size of Tg(neurog1:GFP)+ OP cells at 22 hpf as a marker. They found that the mediolateral dimension increased specifically in the mutants. However, proliferation did not seem to be affected, although apoptosis appeared to increase slightly at a later stage. This increase could therefore be due to a dispersal of cells in the OP. To test this hypothesis, the authors then analysed the cell trajectories and extracted 3D mean square displacements (MSD), a measure of the volume explored by a cell in a given period of time. Their conclusion indicates that although brain cell movements are increased in the absence of BM during coalescence phases, overall OP cell movements occur within normal parameters and allow OPs to condense into compact neuronal clusters in sly mutants. The authors also analysed the dimensions of the clusters composed of OMP+ neurons. Their results show an increase in cluster size along the dorso-ventral axis. These results were to be expected since, compared with BM, early neurog1+ neurons should compact along the medio-lateral axis, and those that are OMP+ essentially along the dorso-ventral axis. In addition to the DV elongation of OP tissue, the authors show the existence of isolated and ectopic (misplaced) YFP+ cells in sly mutants. 

      - To understand the origin of these phenotypes, the authors analysed the dynamic behaviour of brain cells and OPs during forebrain flexion. The authors then quantitatively measured brain versus OPs in the sly mutant and found that the OP-brain boundary was poorly defined in the sly mutant compared with the control. Once again, the methods (cell tracks, brain size, and proliferation/apoptosis, and the shape of the brain/OP boundary) are elegant but the results were expected. 

      - They then analysed the dynamic behaviour of the axon using live imaging. Thus, olfactory axon migration is drastically impaired in sly mutants, demonstrating that Laminin γ1dependent BMs are essential for the growth and navigation of axons from the OP to the olfactory bulb. 

      - The authors therefore performed a quantitative analysis of the loss of function of Laminin γ1. They propose that the BM of the OP prevents its deformation in response to mechanical forces generated by morphogenetic movements of the neighbouring brain. 

      Weaknesses: 

      - The authors did not analyse neurog1 + axonal migration at the level of the single cell and instead made a global analysis. An analysis at the cell level would strengthen their hypotheses.  

      - Rescue experiments by locally inducing Laminin expression would have strengthened the paper. 

      - The paper lacks clarity between the two neuronal populations described (early EONs and late OSNs).  

      - The authors quantitatively measured brain versus OPs in the sly mutant and found that the OP-brain boundary was poorly defined in the sly mutant compared with the control. Once again, the methods (cell tracks, brain size, proliferation/apoptosis, and the shape of the brain/OP boundary) are elegant but the results were expected. 

      - A missing point in the paper is the effect of Laminin γ1 on the migration of cranial NCCs that interact with OP cells. The authors could have analysed the dynamic distribution of neural crest cells in the sly mutant. 

      We thank the reviewer for the overall positive assessment of our work, and we carefully responded to all her/his insightful comments below. Live imaging experiments to (1) visualise exit and entry point formation with only a few axons labelled, (2) characterise the behaviour of single neurog1:GFP-positive neurons/axons during OP coalescence and to (3) analyse the migration of cranial NCC are now included in the revised manuscript to address the reviewer’s questions, and reinforce our initial conclusions.

      Reviewer #2 (Public Review): 

      Summary: 

      This manuscript addresses the role of the extracellular matrix in olfactory development. Despite the importance of these extracellular structures, the specific roles and activities of matrix molecules are still poorly understood. Here, the authors combine live imaging and genetics to examine the role of laminin gamma 1 in multiple steps of olfactory development. The work comprises a descriptive but carefully executed, quantitative assessment of the olfactory phenotypes resulting from loss of laminin gamma. Overall, this is a constructive advance in our understanding of extracellular matrix contributions to olfactory development, with a well-written Discussion with relevance to many other systems. 

      Strengths: 

      The strengths of the manuscript are in the approaches: the authors have combined live imaging, careful quantitative analyses, and molecular genetics. The work presented takes advantage of many zebrafish tools including mutants and transgenics to directly visualize the laminin extracellular matrix in living embryos during the developmental process. 

      Weaknesses: 

      The weaknesses are primarily in the presentation of some of the imaging data. In certain cases, it was not straightforward to evaluate the authors' interpretations and conclusions based on the single confocal sections included in the manuscript. For example, it was difficult to assess the authors' interpretation of when and how laminin openings arise around the olfactory placode and brain during olfactory axon guidance. 

      We thank the reviewer for the overall positive assessment of our work, and we carefully responded to all her/his insightful comments below. To address these comments, live imaging data to visualise exit and entry point formation with a sparse labelling of axons, and z-stacks showing how exit and entry points are organised in 3D, have been added to the revised manuscript.

      Reviewer #3 (Public Review): 

      This is a beautifully presented paper combining live imaging and analysis of mutant phenotypes to elucidate the role of laminin γ1-dependent basement membranes in the development of the zebrafish olfactory placode. The work is clearly illustrated and carefully quantified throughout. There are some very interesting observations based on the analysis of wild-type, laminin γ1, and foxd3 mutant embryos. The authors demonstrate the importance of a Laminin γ1-dependent basement membrane in olfactory placode morphogenesis, and in establishing and maintaining both boundaries and neuronal connections between the brain and the olfactory system. There are some very interesting observations, including the identification of different mechanisms for axons to cross basement membranes, either by taking advantage of incompletely formed membranes at early stages, or by actively perforating the membrane at later ones. 

      This is a valuable and important study but remains quite descriptive. In some cases, hypotheses for mechanisms are stated but are not tested further. For example, the authors propose that olfactory axons must actively disrupt a basement membrane to enter the brain and suggest alternative putative mechanisms for this, but these are not tested experimentally. In addition, the authors propose that the basement membrane of the olfactory placode acts to resist mechanical forces generated by the morphogenetic movement of the developing brain, and thus to prevent passive deformation of the placode, but this is not tested anywhere, for example by preventing or altering the brain movements in the laminin γ1 mutant. 

      We thank the reviewer for the overall positive assessment of our work and for suggesting interesting experiments to attempt in the future, and we carefully responded to all her/his constructive comments below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In general, it would be easier to draw conclusions and compare data if the authors used similar stages throughout the article. 

      Throughout the article we tried to focus on a series of stages that cover both the coalescence of the OP (up to 24 hpf) and later stages of olfactory system development spanning the brain flexure process (28, 32, 36 hpf). However, for technical reasons it was not always possible to stick to these precise stages in some of our experiments. Also, in Fig. 1E-J, we picked in the movies some images illustrating specific cell or axonal behaviours, and thus the corresponding stages could not match exactly the stage series used in Fig. 1A-D and elsewhere in the article. Nevertheless, this stage heterogeneity does not affect our main conclusions.

      It would be useful to schematise the olfactory placode and the brain in an insert to clearly visualise the system in each figure. 

      We hope that the schematic which was initially presented in Fig. 1K already helps the reader to understand how the system is organised. Although we have not added more schematic views to represent the system in each figure (we think this would make the figures overcrowded), we have added additional legends to point to the OP and the brain in the pictures in order to clarify the localisation of each tissue.

      In the Summary, the authors refer to the integrity of the basement membrane. I don't think there is any attempt to affect basement membrane integrity in the article. It would be important to do so to look at the effect on CNS-PNS separation and axonal elongation. 

      In the Summary, we use the term « integrity of the basement membrane » to mention that we have analysed this integrity in the sly mutant. Given the results of our immunostainings against three main components of the basement membrane (Laminin, Collagen IV and Nidogen), as well as our EM observations, we see the sly mutant as a condition in which the integrity of the basement membrane is strongly affected.

      Rescue experiments by locally inducing Laminin expression would have strengthened the paper. 

      We have attempted to rescue the sly mutant phenotypes by introducing the mutation in the transgenic TgBAC(lamC1:lamC1-sfGFP) background, in which Laminin γ1 tagged with sfGFP is expressed under the control of its own regulatory sequences (Yamaguchi et al., 2022). To do so, we crossed sly+/-;Tg(omp:yfp) fish with sly+/-; Tg(lamC1:LamC1-sfGFP) fish. Surprisingly, while a rescue of the global embryo morphology was observed, no clear rescue of the olfactory system defects could be detected at 36 hpf. This could be due to the fact that the expression level of LamC1-sfGFP obtained with one copy of the transgene is not sufficient to rescue the olfactory system phenotypes, or that the sfGFP tag specifically affects the function of the Laminin 𝛾1 chain during the development of the olfactory system, making it unable to rescue the defects. Given the results of our first attemps, we decided not to continue in this direction.

      (1) Developing OP & brain are surrounded by laminin-containing BM (already described by Torrez-Pas & Whitlock in 2014). 

      "we first noticed the appearance of a continuous Laminin-rich BM surrounding the brain from 14-18 hpf, while around the OP, only discrete Laminin spots were detected at this stage (Fig. 1A, A'). " 

      Around 8ss for Torrez-Pas & Whitlock (before 14 hpf). Can you modify the text, or show an 8ss stage embryo? As far as I know, the authors do not show images at 14hpf. Please correct this sentence or show a 14 hpf picture. 

      The reviewer is right, we do not show any 14 hpf stage in the images and thus have removed this stage in the text and replaced it by 17 hpf.

      In Figure 1A, the labelling of laminin 111 does not appear to be homogeneous along the brain.

      Is this true? 

      At this stage the brain’s BM revealed by the Laminin immunostaining appears fairly continuous (while the OP’s one is clearly dotty and less defined), but indeed very tiny/local interruptions of the signal can been seen along the structure as detected by the reviewer. We thus modified the text to mention these tiny interruptions.

      How is the Laminin antibody used by the authors specific to laminin 111?  

      We thank the reviewer for raising this important point. The immunogen used to produce this rabbit polyclonal antibody is the Laminin protein isolated from the basement membrane of a mouse Engelbreth Holm-Swarm sarcoma (EHS). It is thus likely to recognise several Laminin isoforms and not only Laminin 111. We thus replaced Laminin 111 by Laminin when mentioning this antibody in the text and Figures.

      Please schematise in Figure 1K the stages you have tested and shown here in the article i.e. stages 18 - 22 - 28 -36 hpf using immunohistochemistry and 17-26-27-29-33 and 38 hpf using transgenics for laminin 111 and LamC1 respectively.  

      As suggested by the reviewer, we changed the stages in the schematics for stages we have presented in Figure 1 (analysed either with immunostaining or in live imaging experiments). We chose to represent 17 - 22 - 26 - 33 hpf (and thus adapted some of the schematics for them to match these stages).  

      Please specify in the Figure 1 legend for panels A to D whether this is a 3D projection or a zsection.

      We indicated in the Figure 1 legend that all these images are single z-sections (as well as for panels E-J).

      Furthermore, the schematisation in Fig. 1K does not reflect what the authors show: at 22 hpf laminin 111 labelling appears to be present only near the brain, and no labelling lateral to the olfactory placode and anteriorly and posteriorly. Thus, the schematisation in Figure 1K needs to be modified to reflect what the authors show.

      We agree with the reviewer that the Laminin staining at this stage is observed around the medial region of the OP, but not more laterally. We modified the schematic view accordingly in Figure 1K. Anterior and posterior sides of the OP are not represented in this schematic because we chose to represent a frontal view rather than a dorsal view.

      The authors suggest that" the laminin-rich BM of OP assembles between 18 and 22 hpf, during the late phase of OP coalescence". However, their data indicate that this BM assembles around 28hpf (Figure 1C). Can they clarify this point?

      What we meant with this sentence is that we cleary see two distinct BMs from 22 hpf. However, as noticed by the reviewer, the OP’s BM is only present around the medial/basal regions of the OP and does not surround the whole OP tissue at this stage. We modified the text to clarify this point (in particular by mentioning that the OP’s BM starts to assemble between 18 and 22 hpf), and replaced the image shown in Figure 1B, B’ with a more representative picture (the previous z-section was taken in very dorsal regions of the OP).

      It would be useful to disrupt these cells that have a cytoplasmic expression of Laminin-sfGFP, to analyse their contribution to BM and OP coalescence.

      Indeed it will be interesting in the future to test specifically the role of the cells expressing cytoplasmic Laminin-sfGFP around and within the OP, as proposed by the reviewer. Laser ablation of these cells could be attempted, but due to their very superficial localisation, close to the skin, we believe these ablations (with the protocol/set-up we currently use in the lab) would impair the skin integrity, preventing us to conclude. We consider that the optimisation of this experiment is out of the scope of the present work.

      Tg(-2.0ompb:gapYFP)rw032 marks ciliated olfactory sensory neurons (OSNs) (Sato et al., 2005). The authors should mention this. 

      Please see our detailed response to the next point below.

      Points to be clarified: 

      -Tg(-2.0ompb:gapYFP)rw032 marks ciliated olfactory sensory neurons (OSNs) (Sato et al., 2005). The authors should mention this here. Moreover, the authors refer to "OP neurons" throughout the article. In the development of the olfactory organ, two types of neurons have been described in the literature: early EONs (12hpf-26hpf) and later OSNs. Each could have a specific role in the establishment and maintenance of the BM described by the authors. The authors need to clarify this point as, in Figure 1 for example, they use a marker for Tg(neurog1:GFP) EONs and a marker for ciliated OSNs without distinction. The distinction between EONs and OSNs comes a little late in the text and should be placed higher up. 

      As mentioned by the reviewer, according to the initial view of neurogenesis in the OP, OP neurons are born in two waves. A transient population of unipolar, dendrite-less pioneer neurons would differentiate first, in the ventro-medial region of the OP and elongate their axons dorsally out of the placode, along the brain wall. These pioneer axons would then be used as a scaffold by later born OSNs located in the dorso-lateral rosette to outgrow their axons towards the olfactory bulb (Whitlock and Westerfield, 1998). 

      Another study further characterised OP neurogenesis and showed that the first neurons to differentiate in the OP (the early olfactory neurons or EONs) express the Tg(neurog1:GFP) transgene (Madelaine et al., 2011). As mentioned by the authors in the discussion of this article, neurog1:GFP+ neurons appear much more numerous than the previously described pioneer neurons, and may thus include pioneers but also other neuronal subtypes.

      We would like here to share additional, unpublished observations from our lab that further suggest that the situation is more complex than the pioneer/OSN and EON/OSN nomenclatures. First, in many of our live imaging experiments, we can clearly visualise some neurog1:GFP+ unipolar neurons, initially located in a medial position in the OP, which intercalate and contribute to the dorsolateral rosette (where OSNs are proposed to be located) at the end of OP coalescence, from 22-24 hpf. Second, in fixed tissues, we observed that most neurog1:GFP+ neurons located in the rosette at 32 hpf co-express the Tg(omp:meRFP) transgene (Sato et al., 2005). These observations suggest that at least a subpopulation of neurog1:GFP+ neurons could incorporate in the dorsolateral rosette and become ciliated OSNs during development. We can share these results with the reviewer upon request. Further studies are thus needed to clarify and describe the neuronal subpopulations and lineage relationships in the OP, but this detailed investigation is out of the scope and focus of the present study. 

      An additional complication comes from the fact that, as shown and acknowledged by the authors in Miyasaka et al., 2005, the Tg(omp:meYFP) line (6kb promoter) labels ciliated OSNs in the rosette but also some unipolar, ventral neurons (around 10 neurons at 1 dpf, Miyasaka et al. 2005, Figure 3A, white arrowheads). This was also observed using the 2 kb promoter Tg(omp:meYFP) line (see for instance Miyasaka et al., 2007) and in our study, we can indeed detect these ventro-medial neurons labelled in the Tg(omp:meYFP) line (2 kb promoter), see for instance Figure 1C’, D’ or Movie 6. It is unclear whether these unipolar omp:meYFPpositive cells are pioneer neurons or EONs expressing the omp:meYFP transgene, or OSN progenitors that would be located basally/ventrally in the OP at these stages.

      For all these reasons, we decided to present in the text the current view of neurogenesis in the OP but instead of attributing a definitive identity to the neurons we visualise with the transgenic lines, we prefer to mention them in the manuscript (and in the rest of the response to the reviewers) as neurons expressing neurog1:GFP or omp:meYFP transgenes (or cells/axons/neurons expressing RFP in the Tg(cldnb:Gal4; UAS:RFP) background).

      What we also changed in the text to be more clear on this point:

      - we moved higher up in the text, as suggested by reviewer 1, the description of the current model of neurogenesis in the OP,

      - we mentioned that neurog1:GFP+ neurons are more numerous than the initially described pioneer neurons, as discussed in Madelaine et al., 2011,

      - we wrote more clearly that the Tg(omp:meYFP) line labels ciliated OSNs but also a subset of unipolar, ventral neurons (Miyasaka et al., 2005), and pointed to these ventral neurons in Figure 1C’, D’,

      - in the initial presentation of the current view of OP neurogenesis we renamed neurog1:GFP+ into EONs to be coherent with Madelaine et al., 2011.

      - To visualise pioneer axons, the authors should use an EONS marker such as neurog1 because, to my knowledge, OMP only marks OSN axons and not pioneer axons.  

      To visualise neurog1:GFP+ axons during OP coalescence, we performed live imaging upon injection of the neurog1:GFP plasmid (Blader et al., 2003) in the Tg(cldnb:Gal4; UAS:RFP) background (n = 4 mutants and n = 4 controls from 2 independent experiments). We observed some GFP+ placodal neurons exhibiting retrograde axon extension in both controls and sly mutants. In such experiments it is very difficult to quantify and compare the number of neurons/axons showing specific behaviours between different experimental conditions/genetic background. Indeed, due to the cytoplasmic localisation of GFP, the axons can only be seen in neurons expressing high levels of GFP, and due to the injection the number of such neurons varies a lot in between embryos, even in a given condition. Nevertheless, our qualitative observations reinforce the idea that the basement membrane is not absolutely required for mediolateral movements and retrograde axon extension of neurog1:GFP+ neurons in the OP. We added examples of images extracted from these new live imaging experiments in the revised Fig. S5A, B.

      - The authors should analyse the presence of laminin in the OP and forebrain in conjunction with neural crest cell dynamics (using a Sox10 transgenic line for example) to refine their entry and exit point hypotheses. 

      As described in the answer to the next point, we performed new experiments in which we visualised NCC migration in the Tg(neurog1:GFP) background, which allowed us to analyse the localisation of NCC at the forebrain/OP boundary, in ventral and dorsal positions, both in sly mutant embryos and control siblings.

      - A dynamic analysis of the distribution of neural crest cells in the sly mutant over time and during OP coalescence would be important. 

      The dynamics of zebrafish cranial NCC migration in the vicinity of the OP has been previously analysed using sox10 reporter lines (Harden et al., 2012, Torres-Paz and Whitlock, 2014, Bryan et al., 2020). To address the point raised by the reviewer, we performed live imaging from 16 to 32 hpf on sly mutants and control siblings carrying the Tg(neurog1:GFP) and Tg(UAS:RFP) transgenes and injected with a sox10(7.2):KalTA4 plasmid (Almeida et al., 2015). This allows the mosaic labelling of cells that express or have expressed sox10 during their development which, in the head region at these stages, represents mostly NCC and their derivatives. 3 independent experiments were carried out (n = 4 mutant embryos in which 8 placodes could be analysed; n = 6 control siblings in which 10 placodes could be analysed). A new movie (Movie 9) has been added to the revised article to show representative examples of control and mutant embryos.

      From these new data, we could make the following observations:

      - As expected from previous studies (Harden et al., 2012, Torres-Paz and Whitlock, 2014, Bryan et al., 2020), in control embryos a lot of NCC had already migrated to reach the vicinity of the OP when the movies begin at 16 hpf, and were then seen invading mainly the interface between the eye and the OP (10/10 placodes). Surprisingly, in sly mutants, a lot of motile NCC had also reached the OP region at 16 hpf in all the analysed placodes (8/8), and populated the eye/OP interface in 7/8 placodes (10/10 in controls). Counting NCC or tracking individual NCC during the whole duration of the movies was unfortunately too difficult to achieve in these movies, because of the low level of mosaicism (a high number of cells were labelled) and of the high speed of NCC movements (as compared with the 10 min delta t we chose for the movies). 

      - in some of the control placodes we could detect a few NCC that populated the forebrain/OP interface, either ventrally, close to the exit point of the axons (4/10 placodes), or more dorsally (8/10 placodes). By contrast, in sly mutants, NCC were observed in the dorsal region of the brain/OP boundary in only 2/8 placodes, and in the ventral brain/OP frontier in only 2/8 placodes as well. Interestingly, in these 2 last samples, NCC that had initially populated the ventral region of the brain/OP interface were then expelled from the boundary at later stages.

      We reported these observations in a new Table that is presented in revised Fig. S6B. In addition, instances of NCC migrating at the eye/OP or forebain/OP interfaces are indicated with arrowheads on Movie 9. Previous Figure S6 was splitted into two parts presenting NCC defects in sly mutants (revised Figure S6) and in foxd3 mutants (revised Figure S7).

      Altogether, these new data suggest that the first postero-anterior phase of NCC migration towards the OP, as well as their migration in between the eye and OP tissues, is not fully perturbed in sly mutants. The subset of NCC that populate the OP/forebrain seem to be more specifically affected, as these NCC show defects in their migration to the interface or the maintenance of their position at the interface. Since the crestin marker labels mostly NCC at the OP/forebrain interface at 32 hpf (revised Fig. S6A), this could explain why the crestin ISH signal is almost lost in sly mutants at this stage.

      (2) Laminin distribution suggests a role in olfactory axon development 

      "Laminin 111 immunostaining revealed local disruptions in the membrane enveloping the OP and brain, precisely where YFP+ axons exit the OP (exit point) and enter the brain (entry point) (Fig. 1C-D')." Can the authors quantify this situation? It would be important to analyse this behaviour on the scale of a neuron and thus axonal migration to strengthen the hypotheses. 

      As suggested by the reviewer, to better visualise individual axons at the exit and entry point, we used mosaic red labelling of OP axons. To achieve this sparse labelling, we took advantage of the mosaic expression of a red fluorescent membrane protein observed in the Tg(cldnb:Gal4; UAS:lyn-TagRFP) background. The unpublished Tg(UAS:lyn-TagRFP) line was kindly provided by Marion Rosello and Shahad Albadri from the lab of Filippo Del Bene. We crossed the Tg(cldnb:Gal4; UAS:lyn-TagRFP) line with the TgBAC(lamC1:lamC1-sfGFP) reporter and performed live imaging on 2 embryos/4 placodes, in a frontal view. A new movie (Movie 3 in the revised article) shows examples of exit and entry point formation in this context.This allowed us to visualise the formation of the exit and entry points in more samples (6 embryos and 12 placodes in total when we pool the two strategies for labelling OP axons) and through the visualisation of a small number of axons, and reinforce our initial conclusions. 

      (3) The integrity of BMs around the brain and the OP is affected in the sly mutant 

      Why do the authors analyse the distribution of collagen IV and Nidogen and not proteoglycans and heparan sulphate? 

      We attempted to label more ECM components such as proteoglycans and heparan sulfate, but whole-mount immunostainings did not work in our hands.

      A dynamic analysis of the distribution of neural crest cells in the sly mutant over time and during OP coalescence would be important. 

      See our detailed response to this point above.  

      (4) Role of Laminin γ1-dependent BMs in OP coalescence 

      The authors use the size of the Tg(neurog1:GFP)+ OP cell cluster at 22 hpf as a marker.  The authors should count the number of cells in the OP at the indicated time using a nuclear dye to check that in the sly mutant the number of cells is the same over time. Two time points as analysed in Figure S2 may not be sufficient to quantify proliferation which at these stages should be almost zero according to Whitlock & Westerfield and Madelaine et al.

      Counting the neurog1:GFP+ cell numbers in our existing data was unfortunately impossible, due to the poor quality of the DAPI staining. We are nevertheless confident that the number of cells within neurog1:GFP+ clusters is fairly similar between controls and sly mutants at 22 hpf, since the OP dimensions are the same for AP and DV dimensions, and only slightly different for the ML dimension. In addition, we analysed proliferation and apoptosis within the neurog1:GFP+ cluster at 16 and 21 hpf and observed no difference between controls and mutants.

      (5) Role of Laminin γ1-dependent BMs during the forebrain flexure 

      In Figure 4F at 32hpf, the presence of 77% ectopic OMP+ cells medially should result in an increase in dimensions along the M-L? This is not the case in the article. The authors should clarify this point. 

      As we explained in the Material and Methods, ectopic fluorescent cells (cells that are physically separated from the main cluster) were not taken into account for the measurement of the OP dimensions. This is now also also mentioned in the legends of the Figures (4 and S3) showing the quantifications of OP dimensions.

      Cell distribution also seems to be affected within the OMP+ cluster at 36hpf, with fewer cells laterally and more medially. The authors should analyse the distribution of OMP+ cells in the clusters. in sly mutants and controls to understand whether the modification corresponds to the absence of BM function. 

      On the pictures shown in Figure 4F,G, we agree that omp:meYFP+ cells appear to be more medially distributed in the mutant, however this is not the case in other sections or samples, and is rather specific to the z-section chosen for the Figure. We found that the ML dimension is unchanged in mutants as compared with controls, except for the 28 hpf stage where it is smaller, but this appears to be a transient phenomenon, since no change is detected at earlier or later stages (Figure 4A-D and Figure S3A-L). The difference we observe at 28 hpf is now mentioned in the revised manuscript.

      The conclusions of Figures 4 and S3 would rather be that laminin allows OMP+ cells to be oriented along the medio-lateral axis whereas it would control their position along the dorsoventral axis. The authors should modify the text. It would be useful to map the distribution of OMP+ cells along the dorsoventral and mediolateral axes. The same applies to Neurog1+ cells. An analysis of skin cell movements, for example, would be useful to determine whether the effects are specific.  

      We are confident that the measurements of OP dimensions in AP, DV and ML are sufficient to describe the OP shape defects observed in the sly mutants. Analysing cell distribution along the 3 axes as well as skin cell movements will be interesting to perform in the future but we consider these quantifications as being out of the scope of the present work.

      (6) Laminin γ1-dependent BMs are required to define a robust boundary between the OP and the brain 

      The authors must weigh this conclusion "Laminin γ1-dependent BMs serve to establish a straight boundary between the brain and OP, preventing local mixing and late convergence of the two OPs towards each other during flexion movement." Indeed, they don't really show any local mixing between the brain and OP cells. They would need to quantify in their images (Figure 5A-A' and Figure S4 A-A') the percentage of cells co-labelled by HuC and Tg(cldnb:GFP). 

      We agree with the reviewer and thus replaced « reveal » by « suggest » in the conclusion of this section. 

      (7) Role of Laminin γ1-dependent BMs in olfactory axon development 

      An analysis of the retrograde extension movement in the axons of OMP+ ectopic neurons in the sly1 mutant condition would be useful to validate that the loss of laminin function does not play a role in this event. 

      Indeed, even though we can visualise instances of retrograde extension occurring normally in sly mutants, we can not rule out that this process is affected in a subset of OP neurons, for instance in ectopic cells, which often show no axon or a misoriented axon. We added a sentence to mention this in the revised manuscript.

      Minor comments and typos: 

      Please check and mention the D-V/L-M or A-P/L-M orientation of the images in all figures. 

      This has been checked.

      Legend Figure 1: "distalmost" is missing a space "distal most". 

      We checked and this word can be written without a space.

      Figure 1 panel C: check the orientation (I am not sure that Dorsal is up). 

      We double-checked and confirm that dorsal is up in this panel.

      Movie 1 Legend: "aroung "the OP should be around the OP. 

      Thanks to the reviewer for noticing the typo, we corrected it.

      Reviewer #2 (Recommendations For The Authors):

      The comments below are relatively minor and mostly raise questions regarding images and their presentation in the manuscript. 

      • Figure 1, visualization of exit and entry points: It is a bit difficult to visualize the axon exit and entry points in these images, and in particular, to understand how the exit and entry points in C and D correspond to what is seen in F, F', H, and H'. There appears to be one resolvable break in the staining in C and D, whereas there are two distinct breaks in F-H'. Are these single optical sections? Is it possible to visualize these via 3-dimensional rendering? 

      All the images presented in Figure 1 are single z-sections, which is now indicated in the Figure legend. As noticed by the reviewer, Laminin immunostainings on fixed embryos at 28 and 36 hpf suggested that the exit and entry points are facing each other, as shown in Figure 1C-D’. However, in our live imaging experiments we always observed that the exit point is slightly more ventral than the entry point (of about 10 to 20 µm). This discrepancy could be due to the fixation that precedes the immunostaining procedure, which could modify slightly the size and shape of cells/tissues. We added a sentence on this point in the text. In addition, we added new movies of the LamC1-sfGFP reporter with sparse red axonal labelling (Movie 3, see response to reviewer 1), as well as z-stacks presenting the organisation of exit and entry points in 3D (Movie 4), which should help to better illustrate the mechanisms of exit and entry point formation.

      • Movie 2, p. 6, "small interruptions of the BM were already present near the axon tips, along the ventro-medial wall of the OP." This is a bit difficult to assess since the movie seems to show at least one other small interruption in the BM in addition to the exit point, in particular, one slightly dorsal to the exit point. Was this seen in other samples, or in different optical sections? 

      Indeed the exit and entry points often appear as regions with several, small BM interruptions, rather than single holes in the BM. We now show in revised Movie 4 the two z-stacks (the merge and the single channel for green fluorescence) corresponding to the last time points of the movies showing exit and entry point formation in Movie 2, where several BM interruptions can be seen for both the exit and entry points. We had already mentioned this observation in the legend of Movie 2, and we added a sentence on this point in the main text of the revised manuscript. This is also represented for both exit and entry points in the new schematics in revised Fig. 1K and its legend. 

      • Movie 2, p. 6, "The opening of the entry point through the brain BM was concomitant with the arrival of the RFP+ axons, suggesting that the axons degrade or displace BM components to enter the brain." Similar to the questions regarding the exit point, it was a bit difficult to evaluate this statement. There appears to be a broader region of BM discontinuity more dorsal to the arrowhead in Movie 2. A single-channel movie of just the laminin fluorescence might help to convey the extent of the discontinuity. As with above, was this seen in other samples, or in different optical sections?  

      See our response to the previous comment.

      • Figure 1H, I, "the distal tip of the RFP+ axons migrated in close proximity with the brain's BM." This is again a bit difficult to see, and quite different than what is seen in Figure 4A, in which the axons do not seem close to the BM in this section. Is it possible to visualize this via 3-dimensional rendering? 

      In fixed embryos or in live imaging experiments, we observed that, once entered in the brain, the distal tips (the growth cones) of the axons are located close to the BM of the brain. However, this is not the case of the axon shafts which, as development proceeds, are located further away from the BM. This can clearly be seen at 36 hpf in Figure 1D’ and Figure 4A, as spotted by the reviewer. We modified the text to clarify this point.

      • Figure 2J, J', p. 7, the gap between the OP and brain cells of sly mutants "was most often devoid of electron-dense material." It is difficult to see this loss of electron-dense material in 2J'. The thickness of the space is quantified well and is clearly smaller, but the change in electron-dense material is more difficult to see.  

      We looked at Figure 2 again and it seems clear to us that there is electron-dense material between the plasma membranes in controls, which is practically not seen (rare spots) in the mutants. We added a sentence mentioning that we rarely see electron-dense spots in sly mutants.

      • Figure 5E-F': There are concerns about evaluating the shape of a tissue based on nuclear position. Is there a way to co-stain for cell boundaries (maybe actin?), and then quantify distortion of the dlx+ cell population using the cell boundaries, rather than nuclear staining? 

      We agree with the reviewer that it is not ideal to evaluate the shape of the OP/brain boundary based on a nuclear staining. As explained in the text, we could not use the Tg(eltC:GFP) or Tg(cldnb:Gal4; UAS:RFP) reporter lines for this analysis, due to ectopic or mosaic expression. However we are confident that the segmentation of the Dlx3b immunostaining reflects the organisation of the cells at the OP/brain tissue boundary: in other data sets in which we performed Dlx3b staining with membrane labelling independently of the present study and in the wild type context, we clearly see that cell membranes are juxtaposed to the Dlx3b nuclear staining (in other words, the cytoplasm volume of OP cells is very small). 

      • Figure S5E: It would be helpful to see representative images for each of the categories (Proper axon bundle; Ventral projections; Medial projections) or a schematic to understand how the phenotypes were assessed. 

      To address this point we added a schematic view to illustrate the phenotypes assessed in each column of the table in revised Figure S5G.

      • Figure 6, p. 12, "Laminin gamma 1-dependent BMs are essential for growth and navigation of the axons...": What fraction of the tracked axons managed to exit the OP? Given the quantitative analyses in Figure 6, one might interpret this to mean that laminin gamma 1 is not essential for axon growth (speed and persistence are largely unchanged), but rather, primarily for navigation. 

      As noticed by the reviewer, the speed and persistence of axonal growth cones are largely unchanged in the sly mutants (except for the reduced persistence in the 200-400 min window, and an increased speed in the 800-1000 min window), showing that the growth cones are still motile. However, as shown by the tracks, they tend to wander around within the OP, close to the cell bodies, which results in the end in a perturbed growth of the axons. The navigation issues are rather revealed by the analysis of fixed Tg(omp:meYFP) embryos presented in the table of Figure S5G. We modified the text to separate more clearly the conclusions of the two types of experiments (fixed, transgenic embryos versus live, mosaically labelled embryos).

      Reviewer #3 (Recommendations For The Authors):

      Testing the hypotheses mentioned in the public review will be interesting experiments for a follow-up study, but are not essential revisions for this manuscript. 

      I have only a few minor suggestions for revisions: 

      P8 subheading 'Role of Laminin γ1-dependent BMs in OP coalescence' - since no major role was demonstrated here, this heading should be reworded.  

      We agree with the reviewer and replaced the previous title by « OP coalescence still occurs in the sly mutant ».

      P11, line 3 - the authors conclude that the forebrain is smaller 'due to' the inward convergence of the OPs. I do not think it is possible to assign causation to this when the mutant disrupts Laminin γ1 systemically - it is equally possible that the OPs move inward due to a failure of the brain to form in the normal shape. Thus, the wording should be changed here. (In the Discussion on p15, the authors mention the 'apparent distortion' of the brain, and say that it is 'possibly due' to the inward migration of the placodes', but again this could be toned down.) 

      We agree with the reviewer’s comment and changed the wording of our conclusions in the Results section.

      P11 and Fig. S5 - The table and text seem to be saying opposite things here. The text on p11 (3rd paragraph) indicates that the normal exit point is ventral and that this is disrupted in the mutant, with axons exiting dorsally. However, in the table, at each time point there is a higher % of axons exiting ventrally in the mutant. Please clarify. The table does not provide a % value for axons exiting dorsally - it might help to add a column to show this value. 

      We are grateful to the reviewer for pointing this out, and we apologize for the lack of clarity in the first version of the manuscript. We have modified the text and Figure S5 in order to clarify the different points raised by the reviewer in this comment. The Table in Fig. S5G does not represent the % of axons showing defects, but the % of embryos showing the phenotypes. In addition, an embryo is counted in the ventral or medial projection category if it shows at least one ventral or medial projection (even if its shows a proper bundle). This is now clearly indicated in the title of the columns in the table itself and in the legend. The embryos in which the axons exit dorsally in sly mutants are actually those counted in the left column of the Table (they exit dorsally and form a bundle), as shown by the new schematics added below the table. We also added this information in the title of the left column, and mention in the legend the pictures in which this dorsal exit can be observed in the article (Figures 4B and S3E’). Having more sly mutant embryos with axons exiting dorsally is thus compatible with more embryos showing at least one ventral projection.

      Fig. S6, shows the lack of neural crest cells between the olfactory placode and the brain in both laminin γ1 mutants (without a basement membrane) and foxd3 mutants (which retain the membrane). Comparison of the two mutants here is a neat experiment and the result is striking, demonstrating that it is the basement membrane, and not the neural crest, that is required for correct morphology of the olfactory placode. I think this figure should be presented as a main figure, rather than supplementary.  

      Our new live imaging characterisation of NCC migration in sly mutants and control siblings (Movie 9) revealed that at 32 hpf, in the vicinity of the OP, NCC (or their derivatives) are much more numerous than the subset of NCC showing crestin expression by in situ hybridisation (compare the end of our control movie – 32 hfp, with crestin ISH shown in Figure S6A for instance). 

      Thus, the extent of the NCC migration defects should be analysed in more detail in the foxd3 mutant in the future (using live imaging or other NCC markers), and for this reason we chose to keep this dataset in the supplementary Figures.

      One of the first topics covered in the Discussion section is the potential role of Collagen. I was surprised to see the description on P15 'the dramatic disorganization of the Collagen IV pattern observed by immunofluorescence in the sly mutant', as I hadn't picked this up from the Results section of the paper. I went back to the relevant figure (Fig. 2) and description on p7, which does not give the same impression: 'in sly mutants, Collagen IV immunoreactivity was not totally abolished'. This suggested to me that there was only minor (not dramatic) disorganisation of the Collagen IV. This needs clarification.  

      The linear, BM-like Collagen IV staining was lost in sly mutants, but not the fibrous staining which remained in the form of discrete patches surrounding the OP. We modified the text in the Results section as well as in the Figure 2 legend to clarify our observations made on embryos immunostained for Collagen IV.

      Typos etc 

      P5 - '(ii) above of the neuronal rosette' - delete the word 'of'. 

      P5 two lines below this - ensheathed. 

      P10 - '3 distinct AP levels' (delete s from distincts). 

      P10 - distortion (not distorsion) . 

      P12 - 'From 14 hpf, they' should read 'From 14 hpf, neural crest cells'. 

      P15, line 1 - 'is a consequence of' rather than 'is consecutive of'? 

      P22 'When the data were not normal,' should read 'When the data were not normally distributed,'. 

      We thank the reviewer for noticing these typos and have corrected them.

      General 

      Please number lines in future manuscripts for ease of reference. 

      This has been done.

    1. Author response:

      Thank you for the positive and constructive feedback on our manuscript. We appreciate you highlighting the importance of our work advancing our understanding of the molecular etiology of acquired immunodeficiency syndrome (AIDS). To extend and further substantiate the observation that the CARD8 inflammasome is activated in response to viral protease during HIV-1 cell-to-cell transmission, we are in the process of completing additional experiments that are responsive to reviewer feedback, including:

      • Primary CD4+ T cell to monocyte-derived macrophage (MDM) transmission:  We have now repeated the cell-to-cell experiments with HIV-1 transfer from primary CD4+ T cells to primary monocyte-derived macrophages, and our findings are consistent with CARD8-dependent IL-1β release from HIV-1-infected macrophages in this more physiologic context. We are in the process of repeating these experiments with additional donors and will add these results to the revised manuscript.

      • Heterogeneity amongst blood donors: We have now repeated the cell-to-cell transfer and CARD8 knockout in MDMs with additional donors. While we continue to observe heterogeneity amongst donors, the key observation that CARD8 is require for inflammasome responses to HIV-1 infection is consistent. We note that some donors, including the one individual reported in the first submission, have markedly diminished CARD8 activity (to both HIV-1 and VbP).

      • Time course experiments: We did conduct a time course experiment when initially establishing these assays. We have now repeated these experiments with additional timepoints and in the presence or absence of the RT inhibitor nevirapine. The results of these experiments will be included in the revised manuscript.

      • The role of Gasdermin D: We are mostly interested in the release of IL-1β from the infected macrophages due to its potential contribution to myeloid-driven inflammation in PLWH. To date, there is no evidence that any other pore-forming protein other than GSDMD can initiate IL-1β release (and pyroptosis) downstream of CARD8. Nonetheless, we will attempt this experiment with the Gasdermin D inhibitor, disulfiram. 

      We believe these and other experiments will further support the importance of the CARD8 inflammasome in myeloid-driven inflammation in PLWH and look forward to submitting the revision.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The authors test whether the archerfish can modulate the fast response to a falling target.

      We have not tested whether archerfish can 'modulate the fast response'. We quantitatively test specific hypotheses on the rules used by the fish. For this the accuracy of the decisions is analyzed with respect to specific points that can be calculated precisely in each experiment. The ill-defined term 'modulate' does in no way capture what is done here. This assessment might explain the question, raised by the reviewer, of 'what is the difference of this study and Reinel, 2016' (i.e. Reinel and Schuster, 2016). In that study, all objects were strictly falling ballistically, and latency and accuracy of the turn decisions were determined when the initial motion was not only horizontal but had an additional vertical component of speed. The question of that study was if the need to account to an additional variable (vertical speed) in the decision would affect its latency or accuracy. The study showed that also then archerfish rapidly turn to the later impact point. It also showed that accuracy and latency (defined in this study exactly as in the present study) were not changed by the added degree of freedom. This is a completely different question and by its very nature does not leave the realm of ballistics.

      By manipulating the trajectory of the target, they claim

      that the fish can modulate the fast response.

      While it is clear from the result that the fish can modulate the fast response, the experimental support for the argument that the fish can do it for a reflex-like behavior is inadequate. 

      This is disturbing: The manuscript is full of data that directly report response latency (a parameter that's critical in all experiments) and there are even graphical displays of the distribution of latency (Figs. 2, 5). How fast the responses are, can also already be seen in the first video. Most importantly, the nature of the 40 ms limit has been discovered and has been reported by our group in 2008 (Schlegel and Schuster, 2008, Fig. 4). For easy reference, we attach Schlegel and Schuster, 2008 with the relevant passages marked in yellow. But later studies also using high speed video (ie. typically 500 fps) and simultaneously evaluating accuracy and kinematics (in the same ways as used here!) to address various questions repeatedly report and even graphically represent minimum latencies of 40 ms, e.g. Krupczynski and Schuster, 2013 (e.g. Fig. 2); Reinel and Schuster, 2014; Reinel and Schuster, 2016;  Reinel and Schuster, 2018a, b (e.g. see Fig. 7 in the first part) and report how latency is increased as urgency is decreased (if the fish are too close or time of falling is increased), as temperature is decreased or as viewing conditions and their homogeneity across the tank change. Moreover, even a field study is available (Rischawy, Blum and Schuster, 2015) that shows why the speed is needed. This is because of massive competition with at least some of the competitor fish also be able to turn to the later impact point. So, speed is an absolute necessity if competitors are around. Interestingly, when the fish are isolated, latency goes up and eventually the fish will no longer respond with C-starts (Schlegel and Schuster, 2008).

      Another aspect: considering the introduction it would not even have mattered if not 40 ms but instead 150 ms were the time needed for an accurate start (which is not the case). That would still be faster than an Olympic sprinter responds to a gun shot. Moreoever, please also note that we carefully talk of reflex-speed not of a reflex-behavior (which is as easy to verify as any other if the false statements made).

      Strengths: 

      Overall, the question that the authors raised in the manuscript is interesting. 

      Given the statement of no difference between the present study and Reinel and Schuster, 2016, it is not clear what this assessment refers to.

      Weaknesses: 

      (1) The argument that the fish can modulate reflex-like behavior relies on the claim that the archerfish makes the decision in 40 ms. There is little support for the 40 ms reaction time.

      The 'little support' is a paper in Science in which this important aspect is directly analyzed (Fig. 4 of that paper) and that has been praised by folks like Yadin Dudai (e.g . in Faculty 1000). The support is also data on latency as presented in the present paper. Furthermore, additional publications are available on the reaction time (see above).

      The reaction time for the same behavior in Schlegel 2008, is 60-70 ms, and in Tsvilling 2012 about 75 ms, if we take the half height of the maximum as the estimated reaction time in both cases. If we take the peak (or average) of the distribution as an estimation of reaction time, the reaction time is even longer. This number is critical for the analysis the authors perform since if the reaction time is longer, maybe this is not a reflex as claimed.

      See above.

      In addition, mentioning the 40 ms in the abstract is overselling the result.

      See above.

      Just for completeness: Considering a very interesting point raised by reviewer 2 we add an additional panel to further emphasize the exciting point that accuracy and latency are unrelated in the start decisions. That point was already made in Fig.4 of the paper in Science but can be directly addressed.  

      The title is also not supported by the results. 

      No: the title is clearly supported by the results that are reported in the paper.

      (2) A critical technical issue of the stimulus delivery is not clear.

      The stimulus delivery is described in detail. Most importantly we emphasize (even mentioning frame rate) that all VR setups require experimental confirmation that they work for the species and for the behavior at hand. Ideally, they should elicit the same behavior (in all aspects) as a real stimulus does that the VR approach intends to mimic. Whether VR works in a given animal and for the behavior at hand in that animal cannot be known or postulated a priori. It must be shown in direct critical experiments. Such experiments and the need to perform them are described in detail in Figure 2 and in the text that is associated with that figure.

      The frame rate is 120 FPS and the target horizontal speed can be up to 1.775 m/s. This produces a target jumping on the screen 15 mm in each frame. This is not a continuous motion. Thus, the similarity between the natural system where the target experiences ballistic trajectory and the experiment here is not clear. Ideally, another type of stimulus delivery system is needed for a project of this kind that requires fast-moving targets (e.g. Reiser, J. Neurosci.Meth. 2008).

      See above. It is quite funny that one of the authors of the present study had been involved in developing a setup with a complete panorama of 6000 LEDs (Strauss, Schuster and Götz, 1997; and appropriately cited in Reiser) that has been the basis for Reiser. This panorama was also used to successfully implement VR in freely walking Drosophila (Schuster et al., Curr. Biol., 2002). However, an LED based approach was abandoned because of insufficient spatial resolution (that, in archerfish, is very different from that of Drosophila).

      But the crucial point really is this: Just looking at Figure 2 shows that our approach could not have worked better in any way - it provided the input needed to cause turn decisions that are in all aspects just as those with real objects. Achieving this was not at all trivial and required enormous effort and many failed attempts. But it allows addressing our questions for the first time after 20 years of studying these interesting decisions.

      In addition, the screen is rectangular and not circular, so in some directions, the target vanishes earlier than others. It must produce a bias in the fish response but there is no analysis of this type. 

      Why 'must' it produce a bias? Is it not conceivable that you can only use a circular part of the screen? Briefly, and as could have been checked by quickly looking into the methods section, this is what we did. But still, why would it have mattered in our strictly randomized design? It could have mattered only in a completely silly way of running the experiments in which exclusively long trajectories are shown in one condition and exclusively short ones in another.

      (3) The results here rely on the ability to measure the error of response in the case of a virtual experiment. It is not clear how this is done since the virtual target does not fall.

      Well, of course it does not fall!!! That is the whole point that enables the study, and this is explained in connection with the glass plate experiment of Fig. 1 and quite some text is devoted to say that this is the starting point for the present analysis. The ballistic impact point is calculated (just as explained in our very first paper on the start decisions, Rossel, Corlija and Schuster, 2002) from the initial speed and height of the target, using simple high-school physics and the justification for that is also in that paper. This has been done already more than 20 years ago. How else could that paper have arrived at the conclusion that the fish turned to the virtual impact point even though nothing is falling? We also describe this for the readers of the present study, illustrate how accuracy is determined in Figures, in all videos and in an additional Supplementary Figure. Consulting the paper reveals that orientation of the fish is determined immediately at the end of stage 2 of its C-start and the error directly reports how close continuing in that direction would lead the fish to the (real or virtual) impact point. This measure has also been used since the first paper in 2002 in our lab and it is very useful because it provides an invariant measure that allows pooling all the different conditions (orientation and position of responding fish as well as direction, speed and height of target).

      How do the authors validate that the fish indeed perceives the virtual target as the falling target?

      See above. The fish produce C-starts (whose kinematics are analyzed and reported in Figures), whose latency is measured (from onset of target motion to onset of C-start) and whose accuracy in aligning them to the calculated virtual impact point is measured (see above). Additionally, the errors are also analyzed to other points of interest, for instance landmarks, the ballistic landing point in the re-trained fish or points calculated on the basis of specific hypotheses in the generalization experiments.

      Since the deflection is at a later stage of the virtual trajectory, it is not clear what is the actual physics that governs the world of the experiment.

      As explained in the text what we need is substituting the ballistic connection with another fixed relation between initial target motion and the landing point. This other relation needs to produce a large error in the aims when they remain based on the ballistic virtual landing point. It is directly shown in the key experiments that the fish need not see the deflection but can respond appropriately to the initial motion after training (Figs. 3, 5 and corresponding paragraphs in the text as well as additional movies). Please also note that after training the decision is based on the initial movement. This is shown in the interspersed experiments in which nothing than the initial (pre-deflection) movement was shown.

      Overall, the experimental setup is not well designed. 

      It is obviously designed well enough to mimic the natural situation in every aspect needed (see Fig. 2) and well enough to answer the questions we have asked.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript studies prey capture by archer fish, which observe the initial values of motion of aerial prey they made fall by spitting on them, and then rapidly turn to reach the ballistic landing point on the water surface. The question raised by the article is whether this incredibly fast decision-making process is hardwired and thus unmodifiable or can be adjusted by experience to follow a new rule, namely that the landing point is deflected from a certain amount of the expected ballistic landing point. The results show that the fish learn the new rule and use it afterward in a variety of novel situations that include height, side, and speed of the prey, and which preserve the speed of the fish's decision. Moreover, a remarkable finding presented in this work is the fact that fish that have learned to use the new rule can relearn to use the ballistic landing point for an object based on its shape (a triangle) while keeping simultaneously the 'deflected rule' for an object differing in shape (a disc); in other words, fish can master simultaneously two decision-making rules based on the different shape of objects. 

      Strengths: 

      The manuscript relies on a sophisticated and clever experimental design that allows changing the apparent landing point of a virtual prey using a virtual reality system. Several robust controls are provided to demonstrate the reliability and usefulness of the experimental setup. 

      Overall, I very much like the idea conveyed by the authors that even stimuli triggering apparently hardwired responses can be relearned in order to be associated with a different response, thus showing the impressive flexibility of circuits that are sometimes considered mediating pure reflexive responses.

      Thank you so much for this precise assessment of what we have shown!

      This is the case - as an additional example - of the main component of the Nasanov pheromone of bees (geraniol), which triggers immediate reflexive attraction and appetitive responses, and which can, nevertheless, be learned by bees in association with an electric shock so that bees end up exhibiting avoidance and the aversive response of sting extension to this odorant (1), which is a fully unnatural situation, and which shows that associative aversive learning is strong enough to override preprogrammed responding, thus reflecting an impressive behavioral flexibility. 

      That's very interesting, thanks.

      Weaknesses: 

      As a general remark, there is some information that I missed and that is mandatory in the analysis of behavioral changes. 

      Firstly, the variability in the performances displayed. The authors mentioned that the results reported come from 6 fish (which is a low sample size). How were the individual performances in terms of consistency? Were all fish equally good in adjusting/learning the new rule? How did errors vary according to individual identity? It seems to me that this kind of information should be available as the authors reported that individual fish could be recognized and tracked (see lines 620-635) and is essential for appreciating the flexibility of the system under study. 

      Secondly, the speed of the learning process is not properly explained. Admittedly, fish learn in an impressive way the new rule and even two rules simultaneously; yet, how long did they need to achieve this? In the article, Figure 2 mentions that at least 6 training stages (each defined as a block of 60 evaluated turn decisions, which actually shows that the standard term 'Training Block' would be more appropriate) were required for the fish to learn the 'deflected rule'. While this means 360 trials (turning starts), I was left with the question of how long this process lasted. How many hours, days, and weeks were needed for the fish to learn? And as mentioned above, were all fish equally fast in learning? I would appreciate explaining this very important point because learning dynamics is relevant to understanding the flexibility of the system. 

      First, it is very important to keep the question in mind that we wanted to clarify: Does the system have the potential to re-tune the decisions to other non-ballistic relations between the input variables and the output? This would have been established if one fish was found capable of doing that. However, we do have sufficient evidence to say that all six fish learned the new law and that at least one (actually four) individual was capable of simultaneously handling the two laws. We will explain this much better (hopefully) in our revised version. We also have to stress that not all archerfish might actually be able to do this and that not all archerfish might learn in the same way, at the same speed, or using the same strategies. These questions are extremely interesting and we therefore definitely will include all evidence that we have. If some individuals are better than others in quickly adjusting, then even observational learning could become a part of the story. However, we needed to make and document the first steps. Understanding these is essential and apparently is difficult enough.

      Reference: 

      (1) Roussel, E., Padie, S. & Giurfa, M. Aversive learning overcomes appetitive innate responding in honeybees. Anim Cogn 15, 135-141, doi:10.1007/s10071-011-0426-1 (2012). 

      Thanks for this reference!

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This is an interesting manuscript tackling the issue of whether subcircuits of the cerebellum are differentially involved in processes of motor performance, learning, or learning consolidation. The authors focus on cerebellar outputs to the ventrolateral thalamus (VL) and to the centrolateral thalamus (CL), since these thalamic nuclei project to the motor cortex and striatum respectively, and thus might be expected to participate in diverse components of motor control and learning. In mice challenged with an accelerating rotarod, the investigators reduce cerebellar output either broadly, or in projection-specific populations, with CNO targeting DREADD-expressing neurons. They first establish that there are not major control deficits with the treatment regime, finding no differences in basic locomotor behavior, grid test, and fixed-speed rotarod. This is interpreted to allow them to differentiate control from learning, and their inter-relationships. These manipulations are coupled with chronic electrophysiological recordings targeted to the cerebellar nuclei (CN) to control for the efficacy of the CNO manipulation. I found the manuscript intriguing, offering much food for thought, and am confident that it will influence further work on motor learning consolidation. The issue of motor consolidation supported by the cerebellum is timely and interesting, and the claims are novel. There are some limitations to the data presentation and claims, highlighted below, which, if amended, would improve the manuscript.

      We thank the reviewer for the positive comments and insightful critics.

      (1.1) Statistical analyses: There is too little information provided about how the Deming regressions, mean points, slopes, and intercepts were compared across conditions. This is important since in the heart of the study when the effects of inactivating CL- vs VL- projecting neurons are being compared to control performance, these statistical methods become paramount. Details of these comparisons and their assumptions should be added to the Methods section. As it stands I barely see information about these tests, and only in the figure legends. I would also like the authors to describe whether there is a criterion for significance in a given correlation to be then compared to another. If I have a weak correlation for a regression model that is non-significant, I would not want to 'compare' that regression to another one since it is already a weak model. The authors should comment on the inclusion criteria for using statistics on regression models.

      Currently the Methods indeed explain that groups are compared by testing differences of distributions of residuals of treatment and control groups around the Deming regression of the control groups: “To test if treatments altered the relationship between initial performance vs learning or daily vs overnight learning, we compared the distribution of signed distance to the control Deming regression line between groups.” But this shall indeed be explained in more details.

      The performance on a given day depends on a cumulative process, so that the average measure of performance is not fully informative on what is learned or what is changed by a treatment (this is further explained in the text p9-10).The challenge is to deal with the multivariate relationships where initial performance, daily learning, and consolidated learning are interdependent. While in control groups these quantities show linear relationships, this is far less the case in treatment groups; this may indeed be due to the variability of the effect of the treatment (efficacy of viral injections) which adds up to the intrinsic variability in the absence of treatment.

      Our choice to see if there is a shift in these relationships following treatments, is to see to which extent treatment points in bivariate comparisons (initial perf x daily learning, daily learning x consolidated learning) are evenly distributed around the control group regression line. We take the presence of a significant difference in the distribution of residuals between the control and treatment group as an indication that the process represented in group is disrupted by the treatment: e.g. if the residuals of the treatment group are lower than those of the control group in the initial performance * daily learning comparison, it indicates that learning is slower (or larger). If the residuals of the treatment group are lower than those of the control group in the daily learning * consolidated learning comparison, it indicates that consolidation is lower. This shall be clarified in a revised version.

      (1.2a) The introduction makes the claim that the cerebellar feedback to the forebrain and cortex are functionally segregated. I interpreted this to mean that the cerebellar output neurons are known to project to either VL or CL exclusively (i.e. they do not collateralize). I was unaware of this knowledge and could find no support for the claim in the references provided (Proville 2014; Hintzer 2018; Bosan 2013). Either I am confused as to the authors' meaning or the claim is inaccurate. This point is broader however than some confusion about citation.

      The references are not cited in the context of collaterals: “They [basal ganglia and cerebellum] send projections back to the cortex via anatomically and functionally segregated channels, which are relayed by predominantly non-overlapping thalamic regions (Bostan, Dum et al. 2013, Proville, Spolidoro et al. 2014, Hintzen, Pelzer et al. 2018). ” Indeed, the thalamic compartments targeted by the basal ganglia and cerebellum are distinct, and in the Proville 2014, we showed some functional segregation of the cerebello-cortical projections (whisker vs orofacial ascending projections). We do not claim that there is a full segregation of the two pathways, there is indeed some known degree of collateralization (see below).

      (1.2b) The study assumes that the CN-CL population and CN-VL population are distinct cells, but to my knowledge, this has not been established. It is difficult to make sense of the data if they are entirely the same populations, unless projection topography differs, but in any event, it is critical to clarify this point: are these different cell types from the nuclei?; how has that been rigorously established?; is there overlap? No overlap? Etc. Results should be interpreted in light of the level of this knowledge of the anatomy in the mouse or rat.

      Actually, the study does not assume that CL-projecting and VAL-projecting neurons are entirely separate populations (actually it is known that there is an overlap), but states that inhibition of neurons following retrograde infections from the CL and VAL do not produce identical results.

      There is indeed a paragraph devoted to the discussion of this point (middle paragraph p20). “Interestingly, both Dentate and Interposed nuclei contain some neurons with collaterals in both VAL and CL thalamic structures (Aumann and Horne 1996, Sakayori, Kato et al. 2019), suggesting that the effect on learning could be mediated by a combined action on the learning process in the striatum (via the CL thalamus) and in the cortex (via the VAL thalamus). However, consistent with (Sakayori, Kato et al. 2019), we found that the manipulations of cerebellar neurons retrogradely targeted either from the CL or from the VAL produced different effects in the task. This indicates that either the distinct functional roles of VAL-projecting of CL-projecting neurons reported in our study is carried by a subset of pathway-specific neurons without collaterals, or that our retrograde infections in VAL and CL preferentially targeted different cerebello-thalamic populations even if these populations had axon terminals in both thalamic regions.”. In other words, we actually know from the literature that there is a degree of collateralization (CN neurons projecting to both VAL and CL, see refs cited above), but as the reviewer says, it does not seem logically possible that the exact same population would have different effects, which are very distinct during the first learning days. The only possible explanation is the CN-CL and CN-VAL retrograde infections recruit somewhat different populations of neurons. This could be due to differences in density of collaterals in CL and VAL of neurons with collaterals in both regions, or presence of CL-projecting neurons without collaterals in VAL, and VAL-projecting neurons without collaterals in CL in addition to the (established) population of neurons with collaterals in both regions. The lesional approach of CN-thalamus neurons in Sakayori et al. 2019 also observed separate effects for CL and VL injections consistent with the differential recruitment of CN populations by retrograde infections.

      This should be improved in a revised version of the manuscript.

      (1.3) It is commendable that the authors perform electrophysiology to validate DREADD/CNO. So many investigators don't bother and I really appreciate these data. Would the authors please show the 'wash' in Figure 1a, so that we can see the recovery of the spiking hash after CNO is cleared from the system? This would provide confidence that the signal is not disappearing for reasons of electrode instability or tissue damage/ other.

      We do not have the wash data on the same day, but there is no significant change in the baseline firing rate across recording days.

      (1.4) I don't think that the "Learning" and "Maintenance" terminology is very helpful and in fact may sow confusion. I would recommend that the authors use a day range " Days 1-3 vs 4-7" or similar, to refer to these epochs. The terminology chosen begs for careful validation, definitions, etc, and seems like it is unlikely uniform across all animals, thus it seems more appropriate to just report it straight, defining the epochs by day. Such original terminology could still be used in the Discussion, with appropriate caveats.

      This shall be indeed corrected in a revised version.

      (1.5) Minor, but, on the top of page 14 in the Results, the text states, "Suggesting the presence of a 'critical period' in the consolidation of the task". I think this is a non-standard use of 'critical period' and should be removed. If kept, the authors must define what they mean specifically and provide sufficient additional analyses to support the idea. As it stands, the point will sow confusion.

      This shall be indeed corrected in a revised version

      Reviewer #2 (Public review):

      Summary:

      This study examines the contribution of cerebello-thalamic pathways to motor skill learning and consolidation in an accelerating rotarod task. The authors use chemogenetic silencing to manipulate the activity of cerebellar nuclei neurons projecting to two thalamic subregions that target the motor cortex and striatum. By silencing these pathways during different phases of task acquisition (during the task vs after the task), the authors report valuable findings of the involvement of these cerebellar pathways in learning and consolidation.

      Strengths:

      The experiments are well-executed. The authors perform multiple controls and careful analysis to solidly rule out any gross motor deficits caused by their cerebellar nuclei manipulation. The finding that cerebellar projections to the thalamus are required for learning and execution of the accelerating rotarod task adds to a growing body of literature on the interactions between the cerebellum, motor cortex, and basal ganglia during motor learning. The finding that silencing the cerebellar nuclei after a task impairs the consolidation of the learned skill is interesting.

      We thank the reviewer for the positive comments and insightful critics below.

      Weaknesses:

      (2.1) While the controls for a lack of gross motor deficit are solid, the data seem to show some motor execution deficit when cerebellar nuclei are silenced during task performance. This deficit could potentially impact learning when cerebellar nuclei are silenced during task acquisition.

      One of our key controls are the tests of the treatment on fixed speed rotarod, which provides the closest conditions to the ones found in the accelerating rotarod (the main difference between the protocols being the slow steady acceleration of rod rotation [+0.12 rpm per s]- in the accelerating version).

      In the CN experiments, we found clear deficits in learning and consolidation while there was no effect on the fixed speed rotarod (performance of the DREAD-CNO are even slightly better than some control groups), consistent with a separation of the effect on learning/consolidation from those on locomotion on a rotarod. However, small but measurable deficits are found at the highest speed in the fixed speed rotarod in the CN-VAL group; there was no significant effect in the CN-CL group, while the CN-CL actually shows lower performances from the second day of learning; we believe this supports our claim that the CN-CL inhibition impacted more the learning process than the motor coordination. In contrast the CN-VAL group only showed significantly lower performance on day 4 of the accelerating rotarod consistent with intact learning abilities. Of note, under CNO, CN-VAL mice could stay for more than a minute and half at 20rpm, while on average they fell from the accelerating rotarod as soon as the rotarod reached the speed of ~19rpm (130s).

      The text currently states “The inhibition of CN-VAL neurons during the task also yielded lower levels of performance in the Maintenance stage,[[NB: day 5-7]] suggesting that these neurons contribute also to learning and retrieval of motor skills, although the mild defect in fixed speed rotarod could indicate the presence of a locomotor deficit, only visible at high speed.” Following the reviewers’ comment, we shall however revise the sentence above in the revised version of the MS to say that we cannot fully disambiguate the execution / learning-retrieval effect at high speed for these mice.

      (2.2a) Separately, I find the support for two separate cerebello-thalamic pathways incomplete. The data presented do not clearly show the two pathways are anatomically parallel.

      As explained above (point 1.2a), it is already known that these pathways overlap to some degree (discussion p 20), but yet their targeting differentially affects the behavior, consistent with separate contributions. A similar finding was observed for a lesional (irreversible) approach in Sakayori et al. 2019.

      (2.2b) The difference in behavioral deficits caused by manipulating these pathways also appears subtle.

      While we agree that after 3-4 days of learning the difference of performance between the groups becomes elusive, we respectfully disagree with the reviewer that in the early stages these differences are negligible and the impact of inhibition on "learning rate" (ie. amount of learning for a given daily initial performance) and consolidation (i.e. overnight retention of daily gain of performance) exhibit different profiles for the two groups (fig 3h vs 3k).

      Reviewer #3 (Public review)

      Summary:

      Varani et al present important findings regarding the role of distinct cerebellothalamic connections in motor learning and performance. Their key findings are that:

      (1) cerebellothalamic connections are important for learning motor skills

      (2) cerebellar efferents specifically to the central lateral (CL) thalamus are important for short-term learning

      (3) cerebellar efferents specifically to the ventral anterior lateral (VAL) complex are important for offline consolidation of learned skills, and

      (4) that once a skill is acquired, cerebellothalamic connections become important for online task performance.

      The authors went to great lengths to separate effects on motor performance from learning, for the most part successfully. While one could argue about some of the specifics, there is little doubt that the CN-CL and CN-VAL pathways play distinct roles in motor learning and performance. An important next step will be to dissect the downstream mechanisms by which these cerebellothalamic pathways mediate motor learning and adaptation.

      Strengths:

      (1) The dissociation between online learning through CN-CL and offline consolidation through CN-VAL is convincing.

      (2) The ability to tease learning apart from performance using their titrated chemogenetic approach is impressive. In particular, their use of multiple motor assays to demonstrate preserved motor function and balance is an important control.

      (3) The evidence supporting the main claims is convincing, with multiple replications of the findings and appropriate controls.

      We thank the reviewer for the positive comments and insightful critics below.

      Weaknesses:

      (3.1) Despite the care the authors took to demonstrate that their chemogenetic approach does not impair online performance, there is a trend towards impaired rotarod performance at higher speeds in Supplementary Figure 4f, suggesting that there could be subtle changes in motor performance below the level of detection of their assays.

      This is also discussed in point 2.1 above. In our view, the fixed speed rotarod is a control very close to the accelerating rotarod condition, with very similar requirements between the two tasks (yet unfortunately rarely tested in accelerating rotarod studies). We do not exclude the presence of motor deficits, but the main argument is that these do not suffice to explain the differences observed in the accelerating rotarod. No detectable deficit was found in the CN group while very clear deficits in learning/consolidation were observed. A mild deficit is only significant in the CN-VAL group, while the deficit is not significant in the fixed-speed rotarod for the CN-CL group which shows the strongest deficit in accelerating rotarod during the first days: e.g. on day 2, the CN-CL group is already below the control group with latencies to fall ~100s (corresponding to immediate fall at ~15rpm) while the fixed speed rotarod performances at 15s of the control and CNO-treated groups show an ability to stay more than 1 min at this speed. The text shall be improved to clarify this point.

      (3.2) There is likely some overlap between CN neurons projecting to VAL and CL, somewhat limiting the specificity of their conclusions.

      There is indeed published evidence for some degree of anatomical overlap, but also for some differential contribution of CN-VAL and CN-CL to the task. The answer to this point is developed in the points 1.2a 2.2a above. Although this point was exposed in the discussion (p20), the text shall be improved in a revised version of the MS to clarify our statement.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors successfully detected distinct mechanisms signalling prediction violations in the auditory cortex of mice. For this purpose, an auditory pure-tone local-global paradigm was presented to awake and anaesthetised mice. In awake rodents, the authors also evaluated interneuron cell types involved in responses to the interruption of the regularity imposed by local-global sequences. By performing two-photon calcium imaging and single-unit electrophysiology, the authors disentangled three phenomena underlying responses to violations of the distinct local-global regularity levels: Stimulus-specific adaptation, surprise and surprise adaptation. Both stimulus-specific adaptation and surprise-or deviant-evoked responses are observable under anaesthesia. Altogether, this work advances our understanding of distinct predictive processes computing prediction violations upon the complexity of the regularity imposed by the auditory sequence.

      Strengths:

      it is an elegant study beautifully executed.

      Weaknesses:

      No weaknesses were identified by this reviewer.

      Reviewer #2 (Public review):

      Summary:

      Oddball responses are increases in sensory responses when a stimulus is encountered in an unexpected location in a sequence of predictable stimuli. There are two computational interpretations for these responses: stimulus-specific adaptation and prediction errors. In recent years, evidence has accumulated that a significant part of these sequence violation responses cannot be explained simply by stimulus-specific adaptation. The current work elegantly adds to this evidence by using a sequence paradigm based on two levels of sequence violations: "Local" sequence violations of repetitions of identical stimuli, and "global" sequence violations of stimulus sequence patterns. The authors demonstrate that both local and global sequence violation responses are found in L2/3 neurons of the mouse auditory cortex. Using sequences with different inter-stimulus intervals, they further demonstrate that these sequence violation responses cannot be explained by stimulus-specific adaption.

      Strengths:

      The work is based on a very clever use of a sequence violation paradigm (local-global paradigm) and provides convincing evidence for the interpretation that there are at least two types of sequence violation responses and that these cannot be explained by stimulus-specific adaption. Most of the conclusions are based on a large dataset, and are compelling.

      Weaknesses:

      The final part of the paper focuses on the responses of VIP and PV-positive interneurons. The responses of VIP interneurons appear somewhat variable and difficult to interpret (e.g. VIP neurons exhibit omission responses in the A block, but not the B block). The conclusions based on these data appear less solid.

      We agree with the referee that the response modulations observed in  VIP and PV-Positive interneurons are weak and variable. This is indicated in the manuscript. Probably, larger scale recordings are necessary to ascertain fully the presence and distribution of omission responses.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript entitled "Parallel mechanisms signal a hierarchy of sequence structure violations in the auditory cortex", Jamali et al. provide evidence for cellular-level mechanisms in the auditory cortex of mice for the encoding of predictive information on different temporal and contextual scales. The study design separates more clearly than previous studies between the effects of local and global deviants and separates their respective effects on the neuronal responses clearly through the use of various contextual conditions and short and long time scales. Further, it identifies a contribution from a small set of VIP interneurons to the detection of omitted sounds, and shows the influence of isofluorane anesthesia on the neural responses.

      Strengths:

      (1) The study provides a rather encompassing set of experimental techniques to study the cellular level responses, using two complementary recording techniques in the same animal and similar cortical location.

      (2) Comparison between awake and anesthetized states are conducted in the same animals, which allows for rather a direct comparison of populations under different conditions, thus reducing sampling variability.

      (3) The set of paradigms is well developed and specifically chosen to provide appropriate and meaningful controls/comparisons, which were missing from previous studies.

      (4) The addition of cell-type specific recordings is valuable and in particular in combination with the contrast of awake and anesthetized animals provides novel insights into the cellular level representation of deviant signals, such as surprise, prediction error, and general adaptation.

      (5) The analysis and presentation of the data are clear and quite complete, yet remain succinct and perform insightful contrasts.

      (6) The study will have an impact on multiple levels, as it introduces important variations in the paradigm and analytical contrasts that both human and animal researchers can pick up and improve their studies. The cell-type-specific results are particularly intriguing, although these would likely require replication before being completely reliable. Further, the study provides a substantial and diverse dataset that others can explore.

      Weaknesses:

      (1) The responses from cells recorded via Neuropixel and 2p differ qualitatively, as noted by the authors, with NP-recorded cells showing much more inhibited/reduced responses between acoustic stimulations. The authors briefly qualify these differences as potentially indicating a sampling issue, however, this matter deserves more detailed consideration in my opinion. Specifically, the authors could try to compare the different depths at which these neurons were sampled or relate the locations in the cortex to each other (as the Neuropixel recordings were collected in the same animals, a subset of the 2p recordings could be compared to the Neuropixel recordings.).

      We agree with the referee that the sampling issue, which we propose as a possible explanation for the large difference between our Neuropixel and 2P imaging recordings, must be investigated more thoroughly. This is, however, largely outside of the scope of this study. We have reported the depth and location of Neuropixel recordings in Figure S2. The depth is similar for both techniques covering mostly layers 2, 3 and 4. The location spans mostly the primary auditory cortex for two photon imaging and Neuropixel recordings. One Neuropixel recording is located in the ventral secondary auditory cortex. We could not find any evidence that the response to global violations in Neuropixel data stems specifically from this particular recording. 

      (2) The current study did not monitor the attentional state of the mouse in relation to the stimulus by either including a behavioral component or pupil monitoring, which could influence the neural responses to deviant stimuli and omissions.

      As reported by Bekinschtein et al. 2009, the attentional state influences responses to global violation in human subjects. It is extremely difficult to precisely compare attentional states in mice and human subjects. We have performed recordings in mice that had to attend to sound to detect a white noise sound in between the sequence to obtain a reward. This did not lead to increased global violation response. However, as the sequence themselves did not predict reward in this context they may divert attention. Therefore, this result is inconclusive and not worth including in our manuscript. If the sequence predicts rewards, there is a potential confound between violation responses and reward expectations or motor preparation signals. Pupil monitoring could be an alternative which we did not investigate.

      (3) Given the complexity and variety of the paradigms, conditions, and analyzed cell-types, the manuscript could profit from a more visual summary figure that provides an easy-to-access overview of what was found.

      This is an excellent suggestion, although given the complexity and diversity of our observations it may be hard to fit everything in one understandable figure.

    1. Author response:

      We appreciate the insightful comments and suggestions, which will significantly improve our work. We will revise the manuscript to address the reviewer’s concerns. Here, we list some of the key aspects of those concerns and our preliminary plans to address them.

      Both reviewers pointed out that we did not sufficiently justify the chosen optogenetic stimulation frequencies. We acknowledge and concur with their assessment, and will discuss it more extensively from a biological perspective (e.g., the neural firing rates in the olfactory bulb, OB, anterior olfactory nucleus, AON, and piriform cortex, Pir, under natural odor stimulation and respiration rhythm). Reviewer #1 suggested using beta values (b) rather than the area under the BOLD signal profile (AUC) to quantify the fMRI activations as they are more conventional for general linear model (GLM) analysis. We are aware of b and have used them for quantification of the amplitude of fMRI activations in our previous rodent fMRI studies1-3. However, in this study, we chose to utilize AUC as it offers a more comprehensive measure of BOLD signal change over time, including shape, duration, and magnitude, thereby capturing the bulk of neural activities and their dynamics throughout the stimulation period. b primarily represents the peak amplitude of BOLD responses (i.e., the % BOLD signal change)4 and can be constrained by the assumptions and limitations of the GLM analysis, such as the shape of the hemodynamic response function (HRF). AUC provides greater flexibility in capturing different aspects of neural responses across various brain regions, such as transient peaks and sustained responses.

      As mentioned by reviewer #1, correlating the adaptation of BOLD and electrophysiology signals at the brain region level would better signify our findings. We will pursue additional analysis to address this in our forthcoming responses. Reviewer #2 would like us to clarify the image and signal quality of our echo planar imaging (EPI)-based fMRI data, especially in the regions close to the air-tissue interface such as OB, Pir, entorhinal cortex and amygdala, and the methodology for some of the experimental protocols implemented in our study. We will show the raw EPI fMRI images from a representative animal and revise the results, discussion, and methods sections of the manuscript to address reviewer #2's concerns.

      In our forthcoming detailed responses to the reviewers' comments and recommendations, we will revise the text, figures, and captions accordingly to address and clarify the questions brought up by both reviewers.

      References

      (1) Gao, P.P., Zhang, J.W., Chan, R.W., Leong, A.T.L. & Wu, E.X. BOLD fMRI study of ultrahigh frequency encoding in the inferior colliculus. Neuroimage 114, 427-437 (2015).

      (2) Leong, A.T.L., Wong, E.C., Wang, X. & Wu, E.X. Hippocampus Modulates Vocalizations Responses at Early Auditory Centers. Neuroimage 270, 119943 (2023).

      (3) Gao, P.P., Zhang, J.W., Fan, S.J., Sanes, D.H. & Wu, E.X. Auditory midbrain processing is differentially modulated by auditory and visual cortices: An auditory fMRI study. Neuroimage 123, 22-32 (2015).

      (4) Goddard, E. & Mullen, K.T. fMRI representational similarity analysis reveals graded preferences for chromatic and achromatic stimulus contrast across human visual cortex. Neuroimage 215, 116780 (2020).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The emergence of Drosophila EM connectomes has revealed numerous neurons within the associative learning circuit. However, these neurons are inaccessible for functional assessment or genetic manipulation in the absence of cell-type-specific drivers. Addressing this knowledge gap, Shuai et al. have screened over 4000 split-GAL4 drivers and correlated them with identified neuron types from the "Hemibrain" EM connectome by matching light microscopy images to neuronal shapes defined by EM. They successfully generated over 800 split-GAL4 drivers and 22 split-LexA drivers covering a substantial number of neuron types across layers of the mushroom body associative learning circuit. They provide new labeling tools for olfactory and non-olfactory sensory inputs to the mushroom body; interneurons connected with dopaminergic neurons and/or mushroom body output neurons; potential reinforcement sensory neurons; and expanded coverage of intrinsic mushroom body neurons. Furthermore, the authors have optimized the GR64f-GAL4 driver into a sugar sensory neuron-specific split-GAL4 driver and functionally validated it as providing a robust optogenetic substitute for sugar reward. Additionally, a driver for putative nociceptive ascending neurons, potentially serving as optogenetic negative reinforcement, is characterized by optogenetic avoidance behavior. The authors also use their very large dataset of neuronal anatomies, covering many example neurons from many brains, to identify neuron instances with atypical morphology. They find many examples of mushroom body neurons with altered neuronal numbers or mistargeting of dendrites or axons and estimate that 1-3% of neurons in each brain may have anatomic peculiarities or malformations. Significantly, the study systematically assesses the individualized existence of MBON08 for the first time. This neuron is a variant shape that sometimes occurs instead of one of two copies of MBON09, and this variation is more common than that in other neuronal classes: 75% of hemispheres have two MBON09's, and 25% have one MBON09 and one MBON08. These newly developed drivers not only expand the repertoire for genetic manipulation of mushroom body-related neurons but also empower researchers to investigate the functions of circuit motifs identified from the connectomes. The authors generously make these flies available to the public. In the foreseeable future, the tools generated in this study will allow important advances in the understanding of learning and memory in Drosophila.

      Strengths:

      (1) After decades of dedicated research on the mushroom body, a consensus has been established that the release of dopamine from DANs modulates the weights of connections between KCs and MBONs. This process updates the association between sensory information and behavioral responses. However, understanding how the unconditioned stimulus is conveyed from sensory neurons to DANs, and the interactions of MBON outputs with innate responses to sensory context remains less clear due to the developmental and anatomic diversity of MBONs and DANs. Additionally, the recurrent connections between MBONs and DANs are reported to be critical for learning. The characterization of split-GAL4 drivers for 30 major interneurons connected with DANs and/or MBONs in this study will significantly contribute to our understanding of recurrent connections in mushroom body function.

      (2) Optogenetic substitutes for real unconditioned stimuli (such as sugar taste or electric shock) are sometimes easier to implement in behavioral assays due to the spatial and temporal specificity with which optogenetic activation can be induced. GR64f-GAL4 has been widely used in the field to activate sugar sensory neurons and mimic sugar reward. However, the authors demonstrate that GR64f-GAL4 drives expression in other neurons not necessary for sugar reward, and the potential activation of these neurons could introduce confounds into training, impairing training efficiency. To address this issue, the authors have elaborated on a series of intersectional drivers with GR64f-GAL4 to dissect subsets of labeled neurons. This approach successfully identified a more specific sugar sensory neuron driver, SS87269, which consistently exhibited optimal training performance and triggered ethologically relevant local searching behaviors. This newly characterized line could serve as an optimized optogenetic tool for sugar reward in future studies.

      (3) MBON08 was first reported by Aso et al. 2014, exhibiting dendritic arborization into both ipsilateral and contralateral γ3 compartments. However, this neuron could not be identified in the previously published Drosophila brain connectomes. In the present study, the existence of MBON08 is confirmed, occurring in one hemisphere of 35% of imaged flies. In brains where MBON08 is present, its dendrite arborization disjointly shares contralateral γ3 compartments with MBON09. This remarkable phenotype potentially serves as a valuable resource for understanding the stochasticity of neurodevelopment and the molecular mechanisms underlying mushroom body lobe compartment formation.

      Weaknesses:

      There are some minor weaknesses in the paper that can be clarified:

      (1) In Figure 8, the authors trained flies with a 20s, weak optogenetic conditioning first, followed by a 60s, strong optogenetic conditioning. The rationale for using this training paradigm is not explicitly provided.

      These experiments were designed to test if flies could maintain consistent performance with repetitive and intense LED activation, which is essential for experiments involving long training protocols or coactivation of other neurons inside a brain.

      In Figure 8E, if data for training with GR64f-GAL4 using the same paradigm is available, it would be beneficial for readers to compare the learning performance using newly generated split-GAL4 lines with the original GR64f-GAL4, which has been used in many previous research studies. It is noteworthy that in previously published work, repeating training test sessions typically leads to an increase in learning performance in discrimination assays. However, this augmentation is not observed in any of the split-GAL4 lines presented in Figure 8E. The authors may need to discuss possible reasons for this.

      As the reviewer pointed out, many previous studies including ours used the original Gr64f-GAL4 in olfactory conditioning. Figure 1H of Yamada et al., 2023 (https://doi.org/10.7554/eLife.79042) showed such a result, where the first and second-order olfactory conditioning were assayed. Indeed, the first-order conditioning scores were gradually augmented over repeated training. In this experiment, we used low red LED intensity for the optogenetic activation. In the Figure 8E of the present paper, the first memory test was after 3x pairing of 20s odor with five 1s red LED without intermediate tests. Therefore, flies were already sufficiently trained to show a plateau memory level in “Test1”. In the revision of another recent report (Figure 1C-F of Aso et al., 2023; https://doi.org/10.7554/eLife.85756), we included the learning curve data of our best Gr64f-split-GAL4, SS87269. Under a less saturated training conditioning, SS87269 did show learning augmentation over repeated training.

      (2) In line 327, the authors state that in all samples, the β'1 compartment is arborized by MBON09. However, in Figure 11J, the probability of having at least one β'1 compartment not arborized is inferred to be 2%. The authors should address and clarify this conflict in the text to avoid misunderstanding.

      The chance of visualizing MBON08 in MCFO images was 21/209 in total (Figure 11I). If we assume that each of four cells adopt MBON08 development fate at this chance, we can calculate the probability for each case of MBON08/09 cell type composition. From this calculation, we inferred approximately 2% of flies would lack innervations to β'1 compartment in at least one hemisphere. However, we didn't observe a lack of β'1 arborizations in 169 sample flies. If these MBONs independently develop into MBON08 at 21/209 odds, the chance of never observing two MBON08s in either hemisphere of all 169 samples is 3.29%. Therefore, some developmental mechanisms may prevent the emergence of two MBON08 in the same hemisphere.

      In the revised manuscript, we displayed these estimated probability for each case separately, and annotated actual observation on the right side.

      (3) In general, are the samples presented male or female? This sample metadata will be shown when the images are deposited in FlyLight, but it would be useful in the context of this manuscript to describe in the methods whether animals are all one sex or mixed sex, and in some example images (e.g. mAL3A) to note whether the sample is male or female.

      The samples presented in this study are mixed sex, except for Figure 11I, where genders are specified. We provided metadata information of the presented images in Supplemental File 7, and we added a paragraph in the in the method section:

      “Most samples were collected from females, though typically at least one male fly was examined for each driver line. While we noticed certain lines such as SS48900, exhibited distinct expression patterns in females and males, we did not particularly focus on sexual dimorphism, which is analyzed elsewhere (Meissner et al. 2024). Therefore, unless stated otherwise, the presented samples are of mixed gender.

      Detailed metadata, including gender information and the reporter used, can be found in Supplementary File 7.”

      Reviewer #2 (Public Review):

      Summary:

      The article by Shuai et al. describes a comprehensive collection of over 800 split-GAL4 and split-LexA drivers, covering approximately 300 cell types in Drosophila, aimed at advancing the understanding of associative learning. The mushroom body (MB) in the insect brain is central to associative learning, with Kenyon cells (KCs) as primary intrinsic neurons and dopaminergic neurons (DANs) and MB output neurons (MBONs) forming compartmental zones for memory storage and behavior modulation. This study focuses on characterizing sensory input as well as direct upstream connections to the MB both anatomically and, to some extent, behaviorally. Genetic access to specific, sparsely expressed cell types is crucial for investigating the impact of single cells on computational and functional aspects within the circuitry. As such, this new and extensive collection significantly extends the range of targeted cell types related to the MB and will be an outstanding resource to elucidate MB-related processes in the future.

      Strengths:

      The work by Shuai et al. provides novel and essential resources to study MB-related processes and beyond. The resulting tools are publicly available and, together with the linked information, will be foundational for many future studies. The importance and impact of this tool development approach, along with previous ones, for the field cannot be overstated. One of many interesting aspects arises from the anatomical analysis of cell types that are less stereotypical across flies. These discoveries might open new avenues for future investigations into how such asymmetry and individuality arise from development and other factors, and how it impacts the computations performed by the circuitry that contains these elements.

      Weaknesses:

      Providing such an array of tools leaves little to complain about. However, despite the comprehensive genetic access to diverse sensory pathways and MB-connected cell types, the manuscript could be improved by discussing its limitations. For example, the projection neurons from the visual system seem to be underrepresented in the tools produced (or almost absent). A discussion of these omissions could help prevent misunderstandings.

      We internally distributed efforts to produce split-GAL4 lines at Janelia Research Campus. The recent preprint (Nern et al., 2024; doi: https://doi.org/10.1101/2024.04.16.589741) described the full collection of split-GAL4 driver lines in the optic lobe including the visual projection neurons to the mushroom body. We cited this preprint in the revised manuscript by adding a short paragraph of discussion.

      “Although less abundant than the olfactory input, the MB also receives visual information from the visual projection neurons (VPNs) that originate in the medulla and lobula and are targeted to the accessory calyx (Vogt et al. 2016; Li et al. 2020). A recent preprint described the full collection of split-GAL4 driver lines in the optic lobe, which includes the VPNs to the MB (Nern et al. 2024).”

      Additionally, more details on the screening process, particularly the selection of candidate split halves and stable split-GAL4 lines, would provide valuable insights into the methodology and the collection's completeness.

      The details of our split-GAL4 design and screening procedures were described in previous studies (Aso et al., 2014; Dolan et al., 2019). Available data and tools to design split-GAL4 changed over time, and we took different approaches accordingly. Many of split-GAL4 lines presented in this study were designed and screened in parallel to the lines for MBONs and DANs in 2010-2014 when MCFO images of GAL4 drivers and EM connectome were not yet available. With knowledge of where MBONs and DANs project, I (Y.A.) manually examined and annotated thousands of confocal stacks (Jenett et al., 2012; https://doi.org/10.1016/j.celrep.2012.09.011) to find candidate cell types that may concat with them.

      Later I used more advanced computational tools (Otsuna et al., 2018; doi: https://doi.org/10.1101/318006) and MCFO images aligned to the standard brain volume (Meissner et al., 2023; DOI: 10.7554/eLife.80660.). Now, if one needs to further generate split-GAL4 lines for cell type identified in EM connectome data, neuron bridge website (https://neuronbridge.janelia.org/) can be very helpful to provide a list of GAL4 drivers that may label the neuron of interest.

      Reviewer #3 (Public Review):

      Summary:

      Previous research on the Drosophila mushroom body (MB) has made this structure the best-understood example of an associative memory center in the animal kingdom. This is in no small part due to the generation of cell-type specific driver lines that have allowed consistent and reproducible genetic access to many of the MB's component neurons. The manuscript by Shuai et al. now vastly extends the number of driver lines available to researchers interested in studying learning and memory circuits in the fly. It is an 800-plus collection of new cell-type specific drivers target neurons that either provide input (direct or indirect) to MB neurons or that receive output from them. Many of the new drivers target neurons in sensory pathways that convey conditioned and unconditioned stimuli to the MB. Most drivers are exquisitely selective, and researchers will benefit from the fact that whenever possible, the authors have identified the targeted cell types within the Drosophila connectome. Driver expression patterns are beautifully documented and are publicly available through the Janelia Research Campus's Flylight database where full imaging results can be accessed. Overall, the manuscript significantly augments the number of cell type-specific driver lines available to the Drosophila research community for investigating the cellular mechanisms underlying learning and memory in the fly. Many of the lines will also be useful in dissecting the function of the neural circuits that mediate sensorimotor circuits.

      Strengths:

      The manuscript represents a huge amount of careful work and leverages numerous important developments from the last several years. These include the thousands of recently generated split-Gal4 lines at Janelia and the computational tools for pairing them to make exquisitely specific targeting reagents. In addition, the manuscript takes full advantage of the recently released Drosophila connectomes. Driver expression patterns are beautifully illustrated side-by-side with corresponding skeletonized neurons reconstructed by EM. A comprehensive table of the new lines, their split-Gal4 components, their neuronal targets, and other valuable information will make this collection eminently useful to end-users. In addition to the anatomical characterization, the manuscript also illustrates the functional utility of the new lines in optogenetic experiments. In one example, the authors identify a specific subset of sugar reward neurons that robustly promotes associative learning.

      Weaknesses:

      While the manuscript succeeds in making a mass of descriptive detail quite accessible to the reader, the way the collection is initially described - and the new lines categorized - in the text is sometimes confusing. Most of the details can be found elsewhere, but it would be useful to know how many of the lines are being presented for the first time and have not been previously introduced in other publications/contexts.

      We revised the text as below.

      “Among the 828 lines, a subset of 355 lines, collectively labeling at least 319 different cell types, exhibit highly specific and non-redundant expression patterns are likely to be particularly valuable for behavioral experiments. Detailed information, including genotype, expression specificity, matched EM cell type(s), and recommended driver for each cell type, can be found in Supplementary File 1. A small subset of 40 lines from this collection have been previously used in studies (Aso et al., 2023; Dolan et al., 2019; Gao et al., 2019; Scaplen et al., 2021; Schretter et al., 2020; Takagi et al., 2017; Xie et al., 2021; Yamada et al., 2023). All transgenic lines newly generated in this study are listed in Supplementary File 2 (Aso et al., 2023; Dolan et al., 2019; Gao et al., 2019; Scaplen et al., 2021; Schretter et al., 2020; Takagi et al., 2017; Xie et al., 2021; Yamada et al., 2023).”

      And where can the lines be found at Flylight? Are they listed as one collection or as many?

      They are listed as one collection - “Aso 2021” release. It is named “2021” because we released the images and started sharing lines in December of 2021 without a descriptive paper. We added a sentence in the Methods section.

      “All splitGAL4 lines can be found at flylight database under “Aso 2021” release, and fly strains can be requested from Janelia or the Bloomington stock center.”

      Also, the authors say that some of the lines were included in the collection despite not necessarily targeting the intended type of neuron (presumably one that is involved in learning and memory). What percentage of the collection falls into this category?

      We do not have a good record of split-GAL4 screening to calculate the chance to intersect unintended cell types, but it was rather rare. Those unintended cell types can still be a part of circuits for associative learning (e.g. olfactory projection neurons) or totally unrelated cell types. For instance, among a new collection of split-LexA lines using Gr43a-LexADBD hemidriver (Figure 7-figure supplement 2), one line specifically intersected T1 neurons in the optic lobe despite that the AD line was selected to intersect sugar sensory neurons. We suspect that this is due to ectopic expression of Gr43a-LexADBD. Nonetheless, we included it in the paper because cell-type-specific Split-LexA driver for T1 will be useful irrespective of whether the expression of Gr43a gene is expressed in T1 or not.

      And what about the lines that the authors say they included in the collection despite a lack of specificity? How many lines does this represent?

      For a short answer, there are about 100 lines in the collection that lack the specificity for behavioral experiments.

      We ranked specificity of split-GAL4 drivers in the Supplementary File 1. Rank 2 are the ideal lines, Rank 1 are less ideal but acceptable, and Rank 0 is not suitable for activation screening in behavioral experiments. Out of the 828 split-GAL4 lines reported here, there are 413, 305 and 103 lines in rank2, rank1 and rank0 categories respectively. 7 lines are not ranked for specificity because only flipout expression data are available.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      As mentioned elsewhere and in addition to the minor points below, it is advisable for the authors to elaborate on the details of the screening process. Furthermore, a discussion about the circuits not targeted by their research, such as the visual projection neurons, would be beneficial.

      See the response above to Reviewer #2’s public review.

      Line 32-33: The citations are very fly-centric. the authors might want to consider reviews on the MB of other insect species regarding learning and memory.

      We additionally cited Rybak and Menzel 2017’s book chapter on honey bee mushroom body.

      Line 43-44: Citations should be added, e.g. Séjourné et al. (2011), Pai et al. (2013), Plaçais et al. (2013).

      Citation added

      Line 50-52: Citation Hulse et al. (2021) should be added.

      Citation added

      Line 162: In this part, it might be valuable for the reader to understand which of these PNs are actually connecting with KCs.

      A full list of cell types within the MB were provided in Supplementary File 4 of the revised manuscript. See also response to Reviewer 3, Lines 150-1.

      Line 179: Citation Burke et al. (2012) should be mentioned.

      Citation added

      Line 181: Thermogenic might be thermogenetic.

      Corrected

      Line 189: Citations add Otto et al. (2020) and Felsenberg et al. (2018).

      Citations added

      Line 208ff: The authors should consider discussing why they did not use other GR and IR promoters. For example, Gr5a is prominent in sugar-sensing, while Ir76b could be a reinforcement signal related to yeast food (Steck et al., 2018; Ganguly et al., 2017; see also Corfas et al., 2019 for local search).

      We focused on the Gr64f promoter because of its relatively broad expression and successful use of Gr64f-GAL4 for fictive reward experiment. We added the Split-LexA lines with Gr43a and Gr66a promoters (Figure 7-figure supplement 2). Other gustatory sensory neurons also have the potential to be reinforcement signals, but we just did not have the bandwidth to cover them all.

      Line 319: Consider citing Linneweber et al. (2020) for a neurodevelopmental account of such individuality.

      We added a sentence and cited this reference.

      “On the other hand, the neurodevelopmental origin of neuronal morphology appeared to have functional significance on behavioral individuality (Linneweber et al. 2020).”

      Line 352: Citation add Hulse et al. (2021).

      Citations added

      Line 356ff: The utility and value of Split-LexA may not be apparent to non-expert readers. Moreover, how were LexADBDs chosen for creating these lines?

      We have added an introductory sentence at the beginning of the paragraph and explained that these split-LexA lines were a conversion of split-GAL4 lines that were published in 2014 and frequently used in studying the mushroom body circuit.

      “Split-GAL4 lines enable cell-type-specific manipulation, but some experiments require independent manipulation of two cell types. Split-GAL4 lines can be converted into split-LexA lines by replacing the GAL4 DNA binding domain with that of LexA (Ting et al., 2011). To broaden the utility of the split-GAL4 lines that have been frequently used since the publication in 2014 (Aso et al., 2014a), we have generated over 20 LexADBD lines to test the conversions of split-GAL4 to split-LexA. The majority (22 out of 34) of the resulting split-LexA lines exhibited very similar expression patterns to their corresponding original split-GAL4 lines (Figure 12).”

      Line 374: Italicize Drosophila melanogaster.

      Revised as suggested.

      Reviewer #3 (Recommendations For The Authors):

      Major Comments:

      As mentioned in the Public Review, the drivers are nicely classified in the various subsections of the manuscript, but the statements in the text summarizing how many lines there are in specific categories are often confusing. For example, line 129 refers to "drivers encompassing 111 cell types that connect with the DANs and MBONs", but Figure 1E indicates that 46 new cell types downstream of MBONs and upstream of DANs have been generated. This seems like a discrepancy.

      The 46 cell types in Figure 1E consider only the CRE/SMP/SIP/SLP area, where MBON downstreams and DAN upstreams are highly enriched, while the 111 cell types include all. To avoid confusion, we removed the “MBON downstream and DAN upstream” counting in Figure 1E in the revised manuscript.

      Also, at line 75 the MBON lines previously generated by Rubin and Aso (2023) are referred to as though they are separate from the 828 described "In this report." Supplementary file 1 suggests, however, that they are included as part of this report.

      Twenty five lines generated in Rubin and Aso (2023) were initially included in Supplementary file 1 for the convenience of users, but they were not counted towards the 828 new lines described in this report. To avoid confusion, we removed these 25 lines in the revised manuscript. Now all lines listed in Supplementary file 1 were generated in this study (“Aso 2021” release), and if a line has been used in earlier studies, or introduced in other contexts, for example the accompanying omnibus preprint (Meissener 2024, doi: 10.1101/2024.01.09.574419), the citations are listed in the reference column.

      More generally, in lines 94-102 "828 useful lines based on their specificity, intensity and non-redundancy" are referred to, but they are subsequently subdivided into categories of lines with lower specificity (i.e. with off-target expression) and lines that did not target intended cell types (presumably ones unlikely to be involved in learning and memory). It would be useful to know how many lines (at least roughly) fall into these subcategories.

      See the response above to Reviewer #3’s public review.

      Finally, Figures 3B & C indicate cell types connected to DANs and MBONs and the number for which Split-Gal4 lines are available. The text (lines 136-7) states that the new collection covers 30 of these major cell types (Figure 3C)," but Figure 3C clearly has more than 30 dots showing the drivers available. Presumably existing and new driver lines are being pooled, but this should either be explained or the two should be distinguished.

      “(Figure 3C)” was replaced with “(Supplementaryl File 3)” in the revised manuscript to correct the reference. Figure 3B & C are plots of all MB interneurons, not just the major cell types.

      Minor Comments:

      Although the paper is generally well written there are minor grammatical errors throughout (e.g. dropped articles, odd constructions, etc.) that somewhat detract from an otherwise smooth and enjoyable reading experience. A quick editing pass by a native speaker (i.e. any of several of the authors) could clean up these and numerous other small mistakes. A few examples: line 138 "presented" should be present; line 204: "contain off-targeted expressions" should be "have off-target expression;" line 219: "usage to substitute reward" is awkward at best and could be something like "use in generating fictive rewards"; line 326 "arborize[s]"; l. 331 "Based on the likelihood" should be something like "based on these observations"'; line 349 "[is] likely to appear"; l. 352 "extensive connection[s]"; line 353 "has [a] strong influence;" l. 963 "Projections" should be singular; etc.

      All the mentioned examples have been corrected, and we have asked a native speaker to edit through the revised manuscript.

      Lines 81-3: Is the lookup table referred to Suppl. File 1? A reference is desirable.

      Yes, the lookup table referred to “Supplementary File 1” and a reference was added.

      Lines 111-2: what is a "non-redundant set of...cell types?" Cell types that are represented by a single cell (or bilateral pair)? Or does this sentence mean that of the 828 lines, 355 are specific to a single cell type, and in total 319 cell types are targeted? The statement is confusing.

      We revised the text as below.

      “Figure 1E provides an overview of the categories of covered cell types. Among the 828 lines, a subset of 355 lines, collectively labeling at least 319 different cell types, exhibit highly specific and non-redundant expression patterns are likely to be particularly valuable for behavioral experiments. Detailed information, including genotype, expression specificity, matched EM cell type(s), and recommended driver for each cell type, can be found in Supplementary File 1. A small subset of 40 lines from this collection have been previously used in studies (Aso et al.,

      2023; Dolan et al., 2019; Gao et al., 2019; Scaplen et al., 2021; Schretter et al., 2020; Takagi et al., 2017; Xie et al., 2021; Yamada et al., 2023). All transgenic lines newly generated in this study are listed in Supplementary File 2 (Aso et al., 2023; Dolan et al., 2019; Gao et al., 2019; Scaplen et al., 2021; Schretter et al., 2020; Takagi et al., 2017; Xie et al., 2021; Yamada et al., 2023).”

      Line 148: "MB major interneurons" is a confusing descriptor for postsynaptic partners of MBONs.

      We added a sentence to clarify the definition of the “MB major interneurons”.

      “In the hemibrain EM connectome, there are about 400 interneuron cell types that have over 100 total synaptic inputs from MBONs and/or synaptic outputs to DANs. Our newly developed collection of split-GAL4 drivers covers 30 types of these ‘major interneurons’ of the MB (Supplementary File 3).”

      Lines 150-1: Not sure what is meant by "have innervations within the MB." Sounds like cells are presynaptic to KCs, DANS, and MBONs, but Figure 3 Figure Supplement 1 indicates they include neurons that both provide and receive innervation to/from MB neurons. Please clarify.

      For clarification, in the revised manuscript we have included a full list of cell types within the MB in Supplementary File 4. Included are all neurons with >= 50 pre-synaptic connections or with >=250 post-synaptic connections in the MB roi in the hemibrain (excluding the accessory calyx). The cell types include KCs, MBONs, DANs, PNs, and a few other cell types. The coverage ratio was updated based on this list.

      Also, in line 152, what does it mean that they "may have been overlooked previously?" this seems unnecessarily ambiguous. Were they overlooked or weren't they?

      Changed the text to “These lines offer valuable tools to study cell types that previously are not genetically accessible. Notably, SS85572 enables the functional study of LHMB1, which forms a rare direct pathway from the calyx and the lateral horn (LH) to the MB lobes (Bates et al., 2020). ”

      Line 158 refers to PN cells within the MB, which are not mentioned in any place else as MB components.

      What are these PNs and how do they differ from MBONs?

      See responses to Lines 150-1 for clarification of cell types within the MB.

      Line 188: not clear what is meant by "more continual learning tasks".

      We rephrase it as “more complex learning tasks” to avoid jargon.

      Line 235: Not clear why "extended training with high LED intensity" wouldn't promote the formation of robust memories. Is this for some reason unexpected based on previous experiments? Please explain.

      See responses to weakness #1 of the same reviewer

      Lines 317-9: It would be useful to state here that MB0N08 and MB0N09 are the two neurons labeled by MB083C.

      Revised as suggested.

      Line 368: Presumably the "lookup table" referred to is Supplementary File 1, but a reference here would be useful.

      Yes, Supplementary File 1 and a reference was added.

      Comments on Figures:

      Figure 1C The "Dopamine Neurons" label position doesn't align with the Punishment and Reward labels, which is a bit confusing.

      They are intentionally not aligned, because dopamine neurons are not reward/punishment per se. We intend to use the schematic to show that the punishment and reward are conveyed to the MB through the dopamine neuron layer, just as the output from the MB output neuron layer is used to guide further integration and actions. To keep the labels of “Dopamine neurons” and “MB Output Neurons” in a symmetrical position, we decide to keep the original figure unchanged. But we thank the reviewer for the kind suggestion.

      Figure 1F and Figure 1 - Figure Supplement 1: the light gray labels presumably indicate the (EM-identified) neuron labeled by each line, but this should be explicitly stated in the figure legends. It would also be useful in the legends to direct the reader to the key (Supplementary File 1) for decoding neuronal identities.

      Revised as suggested.

      Figure 2: For clarity, I'd recommend titling this figure "LM-EM Match of the CRE011-specific driver SS45245". This reduces the confusion of mixing and matching the driver and cell-type names. Also, it would be helpful to indicate (e.g. with labels above the figure parts) that A & B represent the MCFO characterization step and C & D represent the LM-EM matching step of the pipeline. Revised as suggested.

      Figure 6: For clarity, it would be useful to separately label the PN and sensory neuron groups. Also, for the sensory neurons at the bottom, what is the distinction between the cell names in gray and black font?

      Figure 6 was updated to separate the non-olfactory PN and sensory neuron groups. The gray was intended for olfactory receptor neuron cell types that are additionally labeled in the driver lines. To avoid confusion, the gray cell types were removed in the revised figure, and a clarification sentence was added to the legend.

      “Other than thermo-/hygro-sensory receptor neurons (TRNs and HRNs), SS00560 and MB408B also label olfactory receptor neurons (ORNs): ORN_VL2p and ORN_VC5 for SS00560, ORN_VL1 and ORN_VC5 for MB408B.”

      Figure 7A: It's unclear why the creation of 6 Gr64f-LexADBD lines is reported. Aren't all these lines the same? If not, an explanation would be useful.

      These six Gr64f-LexADBD lines are with different insertion sites, and with the presence or absence of the p10 translational enhancer. Explanation was added to legend. Enhanced expression level with p10 can be helpful to compensate for the general tendency that split-LexA is weaker than split-GAL4. Different insertions will be useful to avoid transvections with split-GAL4s, which are mostly in attP40 and attP2.

      Figure 8F: It would help to include in the legend a brief description of each parameter being measured-essentially defining the y-axis label on the graphs as in Figure Supplement 2. Also, how is the probability of return calculated and what behavioral parameter does the change of curvature refer to?

      We added a brief description to the behavioral parameters in the legend of Figure 8F.

      “Return behavior was assessed within a 15-second time window. The probability of return (P return) is the percentage of flies that made an excursion (>10 mm) and then returned to within 3 mm of their initial position. Curvature is the ratio of angular velocity to walking speed.”

      Figure 9E: What are the parenthetical labels for lines SS49267, SS49300, and SS35008?

      They are EM bodyIDs. Figure legend was revised.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study compiles a wide range of results on the connectivity, stimulus selectivity, and potential role of the claustrum in sensory behavior. While most of the connectivity results confirm earlier studies, this valuable work provides incomplete evidence that the claustrum responds to multimodal stimuli and that local connectivity is reduced across cells that have similar long-range connectivity. The conclusions drawn from the behavioral results are weakened by the animals' poor performance on the designed task.This study has the potential to be of interest to neuroscientists.

      We thank the editor and the reviewers for their feedback on our work, which we have incorporated to help improve interpretation of our findings as outlined in the response below. While we agree with the editor that further work is necessary to provide a comprehensive understanding of claustrum circuitry and activity, this is true of most scientific endeavors and therefore we feel that describing this work as “incomplete” unfairly mischaracterizes the intent of the experiments performed which provide fundamental insights into this poorly understood brain region. Additionally, as identified in the main text, methods section, and our responses to the comments below, we disagree that the behavioral results are “weakened” by the performance of the animals. Our goal was to assess what information animals learned and used in an ambiguous sensory/reward environment, not to shape them toward a particular behavior and interpret the results solely based on their accuracy in performing the task.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper by Shelton et al investigates some of the anatomical and physiological properties of the mouse claustrum. First, they characterize the intrinsic properties of claustrum excitatory and inhibitory neurons and determine how these different claustrum neurons receive input from different cortical regions. Next, they perform in vitro patch clamp recordings to determine the extent of intraclaustrum connectivity between excitatory neurons. Following these experiments, in vivo axon imaging was performed to determine how claustrum-retrosplenial cortex neurons are modulated by different combinations of auditory, visual, and somatosensory input. Finally, the authors perform claustrum lesions to determine if claustrum neurons are required for performance on a multisensory discrimination task

      Strengths:

      An important potential contribution the authors provide is the demonstration of intra-claustrum excitation. In addition, this paper provides the first experimental data where two cortical inputs are independently stimulated in the same experiment (using 2 different opsins). Overall, the in vitro patch clamp experiments and anatomical data provide confirmation that claustrum neurons receive convergent inputs from areas of the frontal cortex. These experiments were conducted with rigor and are of high quality.

      We thank the reviewer for their positive appraisal of our work.

      Weaknesses:

      The title of the paper states that claustrum neurons integrate information from different cortical sources. However, the authors did not actually test or measure integration in the manuscript. They do show physiological convergence of inputs on claustrum neurons in the slice work. Testing integration through simultaneous activation of inputs was not performed. The convergence of cortical input has been recently shown by several other papers (Chia et al), and the current paper largely supports these previous conclusions. The in vivo work did test for integration because simultaneous sensory stimulations were performed. However, integration was not measured at the single cell (axon) level because it was unclear how activity in a single claustrum ROI changes in response to (for example) visual, tactile, and visual-tactile stimulations. Reading the discussion, I also see the authors speculate that the sensory responses in the claustrum could arise from attentional or salience-related inputs from an upstream source such as the PFC. In this case, claustrum cells would not integrate anything (but instead respond to PFC inputs).

      We thank the reviewer for raising this point. In response, we have provided a definition of “integration” in the manuscript text (lines 112-114, 353-354):

      “...single-cell responsiveness to more than one input pathway, e.g. being capable of combining and therefore integrating these inputs.”

      The reviewer’s point about testing simultaneous input to the claustrum is well made but not possible with the dual-color optogenetic stimulation paradigm used in our study as noted in the Results and Discussion sections (see also Klapoetke et al., 2014, Hooks et al., 2015). The novelty of our paper comes from testing these connections in single CLA neurons, something not shown in other studies to-date (Chia et al., 2020; Qadir et al., 2022), which average connectivity over many neurons.

      Finally, we disagree with the reviewer regarding whether integration was tested at the single-axon level and provide data and supplementary figures to this effect (Fig. 6, Supp. Fig. S14, lines 468-511) . Although the possibility remains that sensory-related information may arise in the prefrontal cortex, as we note, there is still a large collection of studies (including this one) that document and describe direct sensory inputs to the claustrum (Olson & Greybeil, 1980; Sherk & LeVay, 1981; Smith & Alloway, 2010; Goll et al., 2015; Atlan et al., 2017; etc.). We have updated the wording of these sections to note that both direct and indirect sensory input integration is possible.

      The different experiments in different figures often do not inform each other. For example, the authors show in Figure 3 that claustrum-RSP cells (CTB cells) do not receive input from the auditory cortex. But then, in Figure 6 auditory stimuli are used. Not surprisingly, claustrum ROIs respond very little to auditory stimuli (the weakest of all sensory modalities). Then, in Figure 7 the authors use auditory stimuli in the multisensory task. It seems that these experiments were done independently and were not used to inform each other.

      The intention behind the current manuscript was to provide a deep characterisation of claustrum to inform future research into this enigmatic structure. In this case, we sought to test pathways in vivo that were identified as being weak or absent in vitro to confirm and specifically rule out their influence on computations performed by claustrum. We agree with the reviewer’s assessment that it is not surprising that claustrum ROIs respond weakly to auditory stimuli. Not testing these connections in vivo because of their apparent sparsity in vitro would have represented a critical gap in our knowledge of claustrum responses during passive sensory stimulation.

      One novel aspect of the manuscript is the focus on intraclaustrum connectivity between excitatory cells (Figure 2). The authors used wide-field optogenetics to investigate connectivity. However, the use of paired patch-clamp recordings remains the ground truth technique for determining the rate of connectivity between cell types, and paired recordings were not performed here. It is difficult to understand and gain appreciation for intraclaustrum connectivity when only wide-field optogenetics is used.

      We thank the reviewer for acknowledging the novelty of these experiments. We further acknowledge that paired patch-clamp recordings are the gold standard for assessing synaptic connectivity. Typically such experiments are performed in vitro, a necessity given the ventral location of claustrum precluding in vivo patching. In vitro slice preparations by their very nature sever connections and lead to an underestimate of connectivity as noted in our Discussion. Kim et al. (2016) have done this experiment in coronal slices with the understanding that excitatory-excitatory connectivity would be local (<200 μm) and therefore preserved. We used a variety of approaches that enabled us to explore connectivity along the longitudinal axis of the brain (the rostro-caudal, e.g. “long” axis of the claustrum), providing fresh insight into the circuitry embedded within this structure that would be challenging to examine using dual recordings. Further, our optogenetic method (CRACM, Petreanu et al., 2007), has been used successfully across a variety of brain structures to examine excitatory connectivity while circumventing artifacts arising from the slice axis.

      In Figure 2, CLA-rsp cells express Chrimson, and the authors removed cells from the analysis with short latency responses (which reflect opsin expression). But wouldn't this also remove cells that express opsin and receive monosynaptic inputs from other opsin-expressing cells, therefore underestimating the connectivity between these CLA-rsp neurons? I think this needs to be addressed.

      The total number of opsin-expressing CLA neurons in our dataset is 4/46 tested neurons. Assuming all of these neurons project to RSP, they would have accounted for 4/32 CLARSP neurons. Given the rate of monosynaptic connectivity observed in this study, these neurons would only contribute 2-3 additional connected neurons. Therefore, the exclusion of these neurons does not significantly impact the overall statistical accuracy of our connectivity findings.

      In Figure 5J the lack of difference in the EPSC-IPSC timing in the RSP is likely due to 1 outlier EPSC at 30 ms which is most likely reflecting polysynaptic communication. Therefore, I do not feel the argument being made here with differences in physiology is particularly striking.

      We thank the reviewer for their attention to detail about this analysis. We have performed additional statistics and found that leaving this neuron out does not affect the significance of the results (new p-value = 0.158, original p-value = 0.314, Mann-Whitney U test). We have removed this datapoint from the figure and our analysis.

      In the text describing Figure 5, the authors state "These experiments point to a complex interaction ....likely influenced by cell type of CLA projection and intraclaustral modules in which they participate". How does this slice experiment stimulating axons from one input relate to different CLA cell types or intra-claustrum circuits? I don't follow this argument.

      We have removed this speculation from the Results section.

      In Figure 6G and H, the blank condition yields a result similar to many of the sensory stimulus conditions. This blank condition (when no stimulus was presented) serves as a nice reference to compare the rest of the conditions. However, the remainder of the stimulation conditions were not adjusted relative to what would be expected by chance. For example, the response of each cell could be compared to a distribution of shuffled data, where time-series data are shuffled in time by randomly assigned intervals and a surrogate distribution of responses generated. This procedure is repeated 200-1000x to generate a distribution of shuffled responses. Then the original stimulus-triggered response (1s post) could be compared to shuffled data. Currently, the authors just compare pre/post-mean data using a Mann-Whitney test from the mean overall response, which could be biased by a small number of trials. Therefore, I think a more conservative and statistically rigorous approach is warranted here, before making the claim of a 20% response probability or 50% overall response rate.

      We appreciate the reviewer's thorough analysis and suggestion for a more conservative statistical approach. We acknowledge that responses on blank trials occur about 10% of the time, indicating that response probabilities around this level may not represent "real" responses. To address this, we will include the responses to the blank condition in the manuscript (lines 505-509). This will allow readers to make informed decisions based on the presented data.

      Regarding Figure 6, a more conventional way to show sensory responses is to display a heatmap of the z-scored responses across all ROIs, sorted by their post-stimulus response. This enables the reader to better visualize and understand the claims being made here, rather than relying on the overall mean which could be influenced by a few highly responsive ROIs.

      We apologize to the reviewer that our data in this figure was challenging to interpret. We have included an additional supplemental figure (Supp. Fig. S15) that displays the requested information.

      For Figure 6, it would also help to display some raw data showing responses at the single ROI level and the population level. If these sensory stimulations are modulating claustrum neurons, then this will be observable on the mean population vector (averaged df/f across all ROIs as a function of time) within a given experiment and would add support to the conclusions being made.

      We appreciate the reviewer’s desire to see more raw data – we would have included this in the figure given more space. However, the average df/f across all ROIs is shown as a time series with 95% confidence intervals in Fig. 6D.

      As noted by the authors, there is substantial evidence in the literature showing that motor activity arises in mice during these types of sensory stimulation experiments. It is foreseeable that at least some of the responses measured here arise from motor activity. It would be important to identify to what extent this is the case.

      While we acknowledge that some responses may arise from motor-related activity, addressing this comprehensively is beyond the scope of this paper. Given the extensive number of trials and recorded axonal segments, we believe that motor-related activity is unlikely to significantly impact the average response across all trials. Future studies focusing specifically on motor activity during sensory stimulation experiments would be needed to elucidate this aspect in detail.

      All claims in the results for Figure 6 such as "the proportion of responsive axons tended to be highest when stimuli were combined" should be supported by statistics.

      We have provided additional statistics in this section (lines 490-511) to address the reviewer’s comment.

      In Figure 7, the authors state that mice learned the structure of the task. How is this the case, when the number of misses is 5-6x greater than the number of hits on audiovisual trials (S Figure 19). I don't get the impression that mice perform this task correctly. As shown in Figure 7I, the hit rate is exceptionally low on the audiovisual port in controls. I just can't see how control and lesion mice can have the same hit rate and false alarm rate yet have different d'. Indeed, I might be missing something in the analysis. However, given that both groups of mice are not performing the task as designed, I fail to see how the authors' claim regarding multisensory integration by the claustrum is supported. Even if there is some difference in the d' measure, what does that matter when the hits are the least likely trial outcome here for both groups.

      We thank the reviewer for their comments and hope the following addresses their confusion about the performance of animals during our multimodal conditioning task.

      Firstly, as pointed out by the reviewer, the hit-rate (HR) is lower than false-alarm-rate (FR) but crucially only when assessed explicitly within-condition (e.g. just auditory or just visual stimulation). Given the multimodal nature of the assay, HR and FR could also be evaluated across different trials, unimodal and multimodal, for both auditory and visual stimuli. Doing so resulted in a net positive d', as observed by the reviewer. From this perspective, and as documented in the Methods (Multimodal Conditioning and Reversal Learning) and Supplemental Figures, mice do indeed learn the conditioning task and perform at above-chance levels.

      Secondly, as raised in the Discussion, an important caveat of this assay was that it was unnecessary for mice to learn the task structure explicitly but, rather, that they respond to environmental cues in a reward-seeking manner that indicated perception of a stimulus. "Performance" as it is quantified here demonstrates a perceptual difference between conditions that is observed through behavioral choice and timing, not necessarily the degree to which the mice have an understanding of the task per se.

      In the discussion, it is stated that "While axons responded inconsistently to individual stimulus presentations, their responsivity remained consistent between stimuli and through time on average...". I do not understand this part of the sentence. Does this mean axons are consistently inconsistent?

      The reviewer’s interpretation is correct – although recorded axons tended to have a preferred stimulus or combination of stimuli, they displayed variability in their responses (response probability), though little or no variability in their likelihood to respond over time (on average).

      In the discussion, the authors state their axon imaging results contrast with recent studies in mice. Why not actually do the same analysis that Ollerenshaw did, so this statement is supported by fact? As pointed out above, the criteria used to classify an axon as responsive to stimuli were very liberal in this current manuscript.

      While we appreciate this comment from the reviewer, we feel that it was not necessary to perform similar analyses to those of Ollerenshaw et al in order to appreciate that methodological differences between these studies would have confounded any comparisons made, as we note in the Discussion.

      I find the discussion wildly speculative and broad. For example, "the integrative properties of the CLA could act as a substrate for transforming the information content of its inputs (e.g. reducing trial-to-trial variability of responses to conjunctive stimuli...)". How would a claustrum neuron responding with a 10% reliability to a stimuli (or set of stimuli) provide any role in reducing trial-to-trial variability of sensory activity in the cortex?

      We thank the reviewer for their feedback. We acknowledge the reviewer's concern regarding the speculative nature of our discussion. To address the specific point raised, while a neuron with a 10% reliability might appear limited in reducing trial-to-trial variability in sensory activity, it's possible that such neurons are responsive to a combination of stimuli or conditions not fully controlled or recorded in our current setup. For instance, variables like the animal’s attentional or motivational states could influence the responsiveness of claustrum neurons, thus integrating these inputs could theoretically modulate cortical processing. We have refined this section to clarify these points (now lines 810-813).

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Shelton et al. explore the organization of the Claustrum. To do so, they focus on a specific claustrum population, the one projecting to the retrosplenial cortex (CLA-RSP neurons). Using an elegant technical approach, they first described electrophysiological properties of claustrum neurons, including the CLA-RSP ones. Further, they showed that CLA-RSP neurons (1) directly excite other CLA neurons, in a 'projection-specific' pattern, i.e. CLA-RSP neurons mainly excite claustrum neurons not projecting to the RSP and (2) receive excitatory inputs from multiple cortical territories (mainly frontal ones). To confirm the 'integrative' property of claustrum networks, they then imaged claustrum axons in the cortex during singleor multi-sensory stimulations. Finally, they investigated the effect of CLA-RSP lesion on performance in a sensory detection task.

      Strengths:

      Overall, this is a really good study, using state-of-the-art technical approaches to probe the local/global organization of the Claustrum. The in-vitro part is impressive, and the results are compelling.

      We thank the reviewer for their positive appraisal of our work.

      Weaknesses:

      One noteworthy concern arises from the terminology used throughout the study. The authors claimed that the claustrum is an integrative structure. Yet, integration has a specific meaning, i.e. the production of a specific response by a single neuron (or network) in response to a specific combination of several input signals. In this study, the authors showed compelling results in favor of convergence rather than integration. On a lighter note, the in-vivo data are less convincing, and do not entirely support the claim of "integration" made by the authors.

      We thank the reviewer for their clarity on this issue. We absolutely agree that without clear definition in the study, interpretation of our data could be misconstrued for one of several possible meanings. We have updated our Introduction, Results, and Discussion text to reflect the definition of ‘integration’ we used in the interpretation of our work and hope this clarifies our intent to the reader.

      Reviewer #3 (Public Review):

      The claustrum is one of the most enigmatic regions of the cerebral cortex, with a potential role in consciousness and integrating multisensory information. Despite extensive connections with almost all cortical areas, its functions and mechanisms are not well understood. In an attempt to unravel these complexities, Shelton et al. employed advanced circuit mapping technologies to examine specific neurons within the claustrum. They focused on how these neurons integrate incoming information and manage the output. Their findings suggest that claustrum neurons selectively communicate based on cortical projection targets and that their responsiveness to cortical inputs varies by cell type.

      Imaging studies demonstrated that claustrum axons respond to both single and multiple sensory stimuli. Extended inhibition of the claustrum significantly reduced animals' responsiveness to multisensory stimuli, highlighting its critical role as an integrative hub in the cortex.

      However, the study's conclusions at times rely on assumptions that may undermine their validity. For instance, the comparison between RSC-projecting and non-RSC-projecting neurons is problematic due to potential false negatives in the cell labeling process, which might not capture the entire neuron population projecting to a brain area. This issue casts doubt on the findings related to neuron interconnectivity and projections, suggesting that the results should be interpreted with caution. The study's approach to defining neuron types based on projection could benefit from a more critical evaluation or a broader methodological perspective.

      We thank the reviewer for their attention to the methods used in our study. We acknowledge that there is an inherent bias introduced by false-negatives as a result of incomplete labeling but contend that this is true of most modern tracing experiments in neuroscience, irrespective of the method used. Moreover, if false-negative biases are affecting our results, then they likely do so in the direction of supporting our findings – perfect knowledge of claustrum connectivity would likely enhance the effects seen by increasing the pool of neurons for which we find an effect. For example, our cortico-claustal connectivity findings in Figure 3 likely would have shown even larger effects should false-negative CLARSP neurons have been positively identified.

      Where appropriate we have provided estimates of variability and certainty in our experimental findings and do not claim any definitive knowledge of the true rate and scope of claustrum connectivity.

      Nevertheless, the study sets the stage for many promising future research directions. Future work could particularly focus on exploring the functional and molecular differences between E1 and E2 neurons and further assess the implications of the distinct responses of excitatory and inhibitory claustrum neurons for internal computations. Additionally, adopting a different behavioral paradigm that more directly tests the integration of sensory information for purposeful behavior could also prove valuable.

      We thank the reviewer for their outlook on the future directions of our work. These avenues for study, we believe, would be very fruitful in uncovering the cell-type-specific computations performed by claustrum neurons.

      Recommendations for the authors:

      Reviewing Editor (Recommendations for the Authors):

      The editor recommends addressing the issues raised by the reviewers about the statistical significance of sensory response with respect to blank stimuli, and solving the issue generated by the exclusion of monosynaptically connected neurons in the connectivity study, to raise the assessment strength of evidence from incomplete to solid. Moreover, as the reported result stands, the behavioral task does not seem to be learned by the animals as the animals are above chance for visual and auditory but largely below chance level for multisensory. It seems that the animals do not perform a multisensory task. The authors should clarify this.

      Reviewer #1 (Recommendations For The Authors):

      Several references were missing from the manuscript, where mouse CLA-retrosplenial or CLA-frontal neurons were investigated and would be highly relevant to both the discussion of claustrum function and the context of the methodologies used here. (Wang et al., 2023 Nat Comm; Nair et al., 2023 PNAS, Marriott et al. 2024 Cell Reports ; Faig et al., 2024 Current

      Biology).

      Reviewer #2 (Recommendations For The Authors):

      Let me be clear, this is an excellent study, using state-of-the-art technical approaches to probe the local/global organization of the Claustrum. However, the study is somehow disconnected, with a fantastic in-vitro part, and, in my opinion, a less convincing in-vivo one.

      As stated in the public review, I'm concerned about the use of the term "integration", as, in my opinion, the data presented in this study (which I repeat are of excellent level) do not support that claim.

      Below are my main points regarding the article:

      (1) My main comment relates to the use of the term 'integration'. It might be a semantic debate, but I think that this is an important one. In my opinion, neural integration is the "summing of several neural input signals by a single neuron to produce an output signal that is some function of those inputs". As the authors state in the discussion, they were not able to "assess the EPSP response magnitude to the conjunction of stimuli due to photosensitivity of ChrimsonR opsins to blue light". Therefore, the authors did not specifically prove integration, but rather input convergence. This does not mean that the results presented are not important or of excellent quality, but I encourage the authors to either tone down the part on integration or to give a clear definition of what they call integration.

      (2) The in vivo imaging data are somehow confusing. First, the authors image two claustral populations simultaneously (the CLA-RSP and the CLA-ACA axons). I may be missing the information, but there is no evidence that these cells overlap in the CLA (no data in the supplement and existing literature only support partial overlap). Second, in the results part, the authors claim that 96% of the sensory-responsive axons displayed multisensory response. This, combined with the 47% of axons responsive to at least one stimulus should lead to a global response of around 45% of the axons in multisensory trials. Yet, in Figures 6F-G, one can see that the response probability is actually low (closer to 20%). To be honest, I cannot really understand how to make sense of these results. At first, I thought that most of the multisensory responsive axons show no response during multisensory stimulus (but one in the unimodal stimulus). This hypothesis is however unlikely, as response AUC is biased toward positivity in Figure 6H. Overall, I'm not totally convinced by the imaging data, and I think that the authors should be more cautious about interpreting their results (as they are in the discussion part, but less in the results part).

      (3) The TetTox approach used in the study ablates all neurons expressing the CRE in the CLA. If the hypothesis proposed by the authors is true, then ablating one subpopulation should not impact that much the functioning of the whole CLA, as other neurons will likely "integrate" information coming from multiple cortices (Figures 3 and 4), the local divergence (Figure 1) will then allow the broadcasting of this information back to multiples cortices. Do the authors think that such an approach deeply modified intra-claustral network connectivity? If this is not the case, shouldn't we expect less effect after lesioning a specific sub-population of CLA neurons?

      (4) The behavioral protocol is also confusing. If I understand correctly, the aim of the task was to probe the D-Prime factor, as all trials, whatever the response of the animal are rewarded. From the Figure 7I, one can see that the mice cannot properly answer to the audiovisual cues, clearly indicating that both groups show impaired response to this type of trial. The whole conclusion of the authors is therefore drawn from the D-Prime calculation. However, even if D-Prime should represent a measure of sensitivity (i.e. is unaffected by response bias), two assumptions need to be met: (1) the signal and noise distributions should be both normal, and (2) the signal and noise distributions should have the same standard deviation. However, these assumptions cannot be tested in the task used by the authors (one would need rating tasks). The authors might want to use nonparametric measures of sensitivity such as A' (see Pollack and Norman 1964).

      Reviewer #3 (Recommendations For The Authors):

      While the study is comprehensive, some of its conclusions are based on assumptions that potentially weaken their validity. A significant issue arises in the comparison between neurons that project to the retrosplenial cortex (RSC) and those that do not. This differentiation is based on retrograde labeling from a single part of the RSC. However, CTB labeling, the technique used, does not capture 100% of the neurons projecting to a brain area. The study itself demonstrates this by showing that injecting the dye into three sections of the RSC results in three overlapping populations of neurons in the claustrum. Therefore, limiting the injection to just one of these areas inevitably leads to many false negatives-neurons that project to the RSC but are not marked by the CTB. This issue recurs in the analysis of neurons projecting to both the RSC and the prelimbic cortex (PL), where assumptions about interconnectivity are made without a thorough examination of overlap between these populations. The incomplete labeling complicates the interpretation of the data and draws firm conclusions from it.

      Minor.

      There is a reference to Figure 1D where claustrum->cortical connections are described. This should be 5D.

      This is a correct reference pointing back to our single-cell characterizations of CLA morphoelectric types.

      End of Page 22. Implies should be imply.

      This has been resolved in the manuscript text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an interesting and valuable study that uses multiple approaches to understand the role of bursting involving voltage-gated calcium channels within the mediodorsal thalamus in the sedative-hypnotic effects of alcohol. Given its unique functional roles and connectivity pattern, the idea that the mediodorsal thalamus may have a fundamental role in regulating alcohol-induced transitions in consciousness state would be both important for researchers investigating thalamocortical dynamics and more broadly interesting for understanding brain function. In addition, the author's examination of the role of the voltage-gated calcium channel Cav3.1 provides some evidence that burst-firing mediated by this channel in the thalamus is functionally important for behavioral-state transitions. While many previous studies have suggested an analogous role for sleep-state regulation, the evidence for an analogous role of this type of bursting in sedative-induced transitions is more limited. Despite the importance of these results, however, there is some concern that the manipulations and recording approaches employed by the authors may affect other thalamic nuclei adjacent to the MD, such as the central lateral nucleus, which has also been implicated in controlling state transitions. The evidence for a specific role of the mediodorsal thalamus is therefore somewhat incomplete, and so additional validation is needed.

      Strengths:

      This study employs multiple, complementary research approaches including behavioral assays, sh-RNAbased localized knockdown, single-unit recordings, and patterned optogenetic interventions to examine the role of activity in the mediodorsal thalamus in the sedative-hypnotic effects of alcohol. Experiments and analyses included in the manuscript generally appear well conceived and are also generally well executed. Sample sizes are sufficiently large and statistical analysis appears generally appropriate though in some cases additional quantification would be helpful. The findings presented are novel and provide some interesting insight into the role of the thalamus as well as voltage-gated calcium channels within this region in controlling behavioral state transitions induced by alcohol. In particular, the observed effects of selective knockout along with recordings in total knockout of the voltage-gated calcium channel, Cav3.1, which has previously been implicated in bursting dynamics as well as state transitions, particularly in sleep, together suggest that the transition of thalamic neurons to a bursting pattern of firing from a more constant firing is important for transition to the sedated state produced by ethanol intoxication. While previous studies have similarly implicated Cav3.1 bursting in behavioral state transitions, the direct optogenetic interventions and single-unit recordings provide valuable new insight. These findings may also have interesting implications for the relationship between sleep process disruption associated with ethanol dependence, although the authors do not appear to examine this directly or extensively discuss these implications of their findings.

      Weaknesses:

      A key claim of the study is that the mediodorsal thalamus is specifically important for the sedative-hypnotic effect of ethanol and that a transition to a bursting pattern of firing in this circuit facilitates these effects due to a loss of a more constant tonic firing pattern. Despite the generally clear observed effects across the included experiments, however, the evidence presented does not fully support that the mediodorsal thalamus, in particular, is involved. This distinction is important because some previous studies have suggested that another thalamic nucleus which is very close to the mediodorsal thalamus, the central-lateral thalamus, has previously been suggested to play a role in preventing sedative-induced transitions. Despite its proximity to the mediodorsal thalamus, the central-lateral thalamus has a substantially different pattern of connectivity so distinguishing which region is impacted is important for understanding the findings in the manuscript. While sh- RNA knockdown appears to be largely centered in the mediodorsal thalamus in the example shown, (Figure 2) this is rather minimal evidence and it is also not well explained (indeed, the relevant panels do not even appear to be referenced in the text of the manuscript) and the consistency of the knockdown targeting is not quantified. Additional evidence should be provided to validate this approach. Similarly, while an example is shown for the expression of ChR2 (Fig. 5) there seems to be some spread of expression outside of the mediodorsal thalamus even in his example raising a concern about how regionally specific this effect.

      The recordings targeting the mediodorsal thalamus could provide evidence of a direct association between changes in activity specifically in this part of the thalamus with the behavioral measures but there are currently some issues with making this link. One difficulty is that, although lesions are shown in Figure S5 to validate recording locations, this figure is relatively unclear and the examples appear to be taken from a different anterior/posterior location compared to the reference diagram. A larger image and improved visualization of the overall set of lesion locations that includes multiple anterior/posterior coronal sections would be helpful. Moreover, even for these example images, it is difficult to evaluate whether these are in the mediodorsal thalamus, particularly given the small size of the image shown. Ideally, an example image that is more obviously in the mediodorsal thalamus would also be included. Finally, an assessment of the relationship between the approximate locations of recorded neurons across the tetrode arrays and the behavioral measures would be very helpful in supporting the unique role of the mediodorsal thalamus. The lack of these direct links, in combination with the histological issues, reduces the insight that can be gained from this study.

      In addition to the key experimental issues mentioned above, there are often problems in the text of the manuscript with reasoning or at least explanation as well as numerous minor issues with editing. The most substantial such issue is the lack of clarity in discussing the mediodorsal thalamus and other adjacent thalamic nuclei, such as the central-lateral nucleus, in the author's discussion of previous findings. Given that at last one of the manuscripts cited by the authors (Saalman, Front. Sys. Neuro. 2014) has directly claimed that central-lateral, rather than the mediodorsal, thalamus is important for arousal regulation related to a conscious state, this distinction should be addressed clearly in the discussion rather than papered over by grouping multiple thalamic nuclei as being medial. As part of this discussion, it would be important to consider additional relevant literature including Bastos et al., eLife, 2021 and Redinbaugh et al., Neuron, 2020 which are quite critical but currently do not appear to be cited. Considering additional literature relevant to the function of the mediodorsal thalamus would also be beneficial. While the methods employed generally seem sound, the description in the methods section is lacking in detail and is often difficult to follow. Analysis methods such as the burst index appear to only be given a brief explanation in the text and appear not to be mentioned in the methods section. Similarly, the staining method used in Figure 2 does not appear to be described in the methods section. The most substantial case is for the UMAP approach used in Figure 4-E which does not appear to be described in the methods or even described in the main text. The lack of detailed descriptions makes it difficult to evaluate the applicability and quality of the experimental and analytical approaches. Citations justifying the use of methods such as the approach to separate regular spiking and narrow spiking neuron subtypes are also needed.

      Beyond the problems with content and reasoning discussed above, there are also some relatively minor issues with the clarity of writing throughout the paper (for example, in the abstract the authors refer to "the ethanol resistance behavior in WT mice" but it is difficult to parse what they mean by this statement. Similarly, the next sentence "These results support that the maintenance..." while clearer, is not well phrased. Though individually minor, issues like this re-occur throughout the manuscript and sometimes make it difficult to follow so the text should be revised to correct them. There are also some problems with labels such as the labels of A1/A2 in Figure 4, which appear to be incorrect. Also, S7 has no label] on the B panels. Finally, some references are not included (only a label of [ref]).

      Reviewer #2 (Public Review):

      In the current study, Latchoumane and collaborators focus on the Cav3.1 calcium channels in the mediodorsal thalamic nucleus as critical players in the regulation of brain-states and ethanol resistance in mice. By combining behavioural, electrophysiological, and genetic techniques, they report three main findings. First, KO Cav3.1 mice exhibit resistance to ethanol-induced sedation and sustained tonic firing in thalamocortical units. Second, knocked-down Cav3.1 mice reproduce the same behaviour when the mediodorsal, but not the ventrobasal, thalamic nucleus is targeted. Third, either optogenetic or electric stimulation of the mediodorsal thalamus reduces ethanol-induced sedation in control animals.

      Overall, the study is well designed and performed, correctly controlled for confounds, and properly analysed. Nonetheless, it is important to address some aspects of the report. The results support the conclusions of the study. These results are likely to be relevant in the field of systems neuroscience, as they increase the molecular evidence showing how the thalamus regulates brain states.

      Reviewer #1 (Recommendations For The Authors):

      Aside from the additional quantification and clarification of the analysis discussed in the weakness section, in general, the experiments included in the manuscript seem reasonable. However, I would suggest one additional experiment as well as one control, both of which are relatively straightforward optogenetic experiments, that I feel would be helpful to further improve the study. First, as the authors note, the optogenetic interventions used do not directly address the relevance of the changes in bursting patterns observed in the knockout (KO), which are by far the most robust effect, with the changes in alcohol sensitivity. One approach that could help address this would be to use patterned suppression via inhibitory opsins (e.g. halorhodopsin) to "rescue" the periods of inhibition associated with bursting in the KO. Localizing this inhibition to the mediodorsal thalamus would also lend further credence to their claim that this nuclei is the relevant circuit for their observed effects. For the control, tonic activation of the ventrobasal nucleus, as the authors did for the mediodorsal nucleus, would be beneficial to rule out the possibility that the observed effect would occur with any thalamic nucleus. In addition to these experiments, I did not note the strategy for sharing data obtained through this study so this should be added.

      R1 – 1: A key claim of the study is that the mediodorsal thalamus is specifically important for the sedative-hypnotic effect of ethanol and that a transition to a bursting pattern of firing in this circuit facilitates these effects due to a loss of a more constant tonic firing pattern. Despite the generally clear observed effects across the included experiments, however, the evidence presented does not fully support that the mediodorsal thalamus, in particular, is involved. This distinction is important because some previous studies have suggested that another thalamic nucleus which is very close to the mediodorsal thalamus, the central-lateral thalamus, has previously been suggested to play a role in preventing sedative-induced transitions. Despite its proximity to the mediodorsal thalamus, the central-lateral thalamus has a substantially different pattern of connectivity so distinguishing which region is impacted is important for understanding the findings in the manuscript.

      R1-A1: The reviewer is right that CL has been pointed as another candidate structure with causal influence on arousal and consciousness. We have focused our efforts in including only recording single units that were from tetrode located in the MD specifically using the lesion code we explain in the method section and in response to R1 question#3. We also produced a quantification of Cav3.1 knock-down that clearly demonstrates that the KD experiment was itself specific to MD, bilaterally, and that CL to CM were minimally impacted by the knock-down process (Fig. 2C and D). Moreover, the optogenetic  (fiber incidence was 30 degrees guaranteeing a central coverage rather than lateral; Fiber optic NA = 0.22) and electric stimulation (bipolar twisted electrodes, 50uA) experiments were also very selective and specific to the MD (Fig.S5). It remains clear that MD might not be the sole structure involved in the brain state control towards sedation and “anesthetic states”, and CL might be a significant contributor as well, however, we show that CL manipulations were rather irrelevant in our experiments  (Fig. 2, S5, S9 and S11).

      R1-2: While sh-RNA knockdown appears to be largely centered in the mediodorsal thalamus in the example shown, (Figure 2) this is rather minimal evidence and it is also not well explained (indeed, the relevant panels do not even appear to be referenced in the text of the manuscript) and the consistency of the knockdown targeting is not quantified. Additional evidence should be provided to validate this approach.

      R1-A2: In order to address this important question, we have created an additional panel quantification to fig2D. We have then quantified the intensity per area of Cav3.1 expression in sub zones of 4 regions of interest: MD (left, right; 2 subzones each), Centro Medial (CM; 1 subzones in total), Centrolateral/Paraventricular nucleus (CL/PCN; left, right; 2 subzones each) and the submedial nucleus (SMT; left, right; used as a control for the intensity normalization; 1 subzones in total). This panel clearly illustrates that MD was knocked-down bilaterally (p<0.001). Moreover, CM (p<0.05) and CL (p<0.01) were also partially and unilaterally knocked down, as well. This analysis confirms that our KD had a high specificity to MD.

      We added the relevant figure caption and text:

      [Result section, Cav3.1 silencing in the MD, but not VB, increased ethanol resistance in mice, paragraph 3]

      “We then characterized the change in Cav3.1 expression following the shControl and shCav3.1 knockdown injections in three test regions MD (left and right), CM (centromedial nucleus) and CL (centrolateral nuclei, left and right side) and a negative control region SMT (submedial thalamic nuclei, left and right side). The average intensity was obtained from two coronal brain slices for each mice used in the experiment (see Methods sections, Cav3.1 Intensity quantification). Our results show that the targeting of the knockdown was very specific to the bilateral MD (p<0.001; Fig. 2D). We noted that the CM (p<0.05) and a marginal unilateral knock-down of the CL were also observed (p<0.01). Notably, we tested the correlation between the level of knock-down in MD and the total time in LOM and observed a significant association (Fig. 2D inset; R = 0.599, p = 0.018). This result highlights that the Cav3.1 knock-down was specific to MD and with an intensity associated with ethanol-induced loss of motion.”

      R1-3: One difficulty is that, although lesions are shown in Figure S5 to validate recording locations, this figure is relatively unclear and the examples appear to be taken from a different anterior/posterior location compared to the reference diagram. A larger image and improved visualization of the overall set of lesion locations that includes multiple anterior/posterior coronal sections would be helpful. Moreover, even for these example images, it is difficult to evaluate whether these are in the mediodorsal thalamus, particularly given the small size of the image shown. Ideally, an example image that is more obviously in the mediodorsal thalamus would also be included. Finally, an assessment of the relationship between the approximate locations of recorded neurons across the tetrode arrays and the behavioral measures would be very helpful in supporting the unique role of the mediodorsal thalamus.

      R1-A3: Related to fig.S5, we re-distributed the position of the recordings from the tetrode electrode burned positions over 3 representative coronal planes that best represent the implant positions. We also provided additional snapshots of tetrode location. To identify the positions of four tetrodes in each animal, we encoded the positions with different electrical lesion strategies as follows: 1 lesion(tetrode 1), 2 lesions while we redrew the tetrode with 100 um interval (tetrode 2), 3 lesions with 200um interval (tetrode 3), 4 lesions with 50um intervals (tetrode4). Tetrodes that were found outside of the MD delimited region were discarded post analysis. A straight relationship between the closeness of the electrode is unfortunately not possible for tetrode recording, a straight silicone probe which maintains the spatial spacing in recording would have been a better approach in that case, but unfortunately, it was not performed in our study.

      R1-4: In addition to the key experimental issues mentioned above, there are often problems in the text of the manuscript with reasoning or at least explanation as well as numerous minor issues with editing. The most substantial such issue is the lack of clarity in discussing the mediodorsal thalamus and other adjacent thalamic nuclei, such as the central-lateral nucleus, in the author's discussion of previous findings. Given that at last one of the manuscripts cited by the authors (Saalman, Front. Sys. Neuro. 2014) has directly claimed that central-lateral, rather than the mediodorsal, thalamus is important for arousal regulation related to a conscious state, this distinction should be addressed clearly in the discussion rather than papered over by grouping multiple thalamic nuclei as being medial. As part of this discussion, it would be important to consider additional relevant literature including Bastos et al., eLife, 2021 and Redinbaugh et al., Neuron, 2020 which are quite critical but currently do not appear to be cited. Considering additional literature relevant to the function of the mediodorsal thalamus would also be beneficial.

      R1-A4: We thank the reviewer for his comments and suggestions. We agree that the added references mentioned by the reviewers are highly relevant and should be integrated in the manuscript. We have integrated the above-mentioned references and further developed on the discussion on the role of MD relative to other thalamic nuclei (ILN and CL in particular). We believe that this better-referenced and clarified text does improve the manuscript greatly.

      [introduction section, paragraph 3]

      “The centrolateral (CL) thalamic nucleus has been implicated in the modulation of arousal, behavior arrest 31, and improvement of level of consciousness during seizures 32. Notably, the direct electrical stimulation of the intralaminar nuclei (ILN) and, in particular CL, promoted hallmarks of arousal and awakening in primate under propofol and ketamine propofol anesthesia.”

      [Discussion section, paragraph 1]

      “In this work, we identified that the neural activity in MD plays a causal role in the maintenance of consciousness. Whole body Cav3.1 KO and MD-specific Cav3.1 KD mice showed resistance to loss of consciousness induced by hypnotic dose of ethanol. In WT mice, MD neurons demonstrated a reduced firing rate in natural (sleep) and ethanol-induced unconscious states compared to awake states. This neural activity reduction was impaired in KO mice. In particular, transition to an unconscious state was accompanied with a switch of firing mode from tonic firing to burst firing in WT mice whereas this modeshift disappeared in KO mice. Finally, optogenetic or electric stimulations of the MD after ethanol injection were sufficient to induce a resistance to loss of motion, supporting that the level of neural firing in the MD is critical to maintain conscious state and delay unconscious state. We showed that the expression of Cav3.1 t-type calcium channels in MD is a cellular modulator associated with this effect.”

      [Discussion section, MD is a modulator of consciousness, paragraph 2 and 3]

      “The MD is known to innervate limbic region, basal ganglia and medial prefrontal cortex 50 and increased activity in MD might modulate the stability of cortical UP states (e.g. awaken, aroused and attentive states) and synchronization 9,26. Thus, MD might be a major hub involved in cortical state control and brain state stabilization.

      Supporting the brain state stabilization theory and the ethanol resistance of Cav3.1 mutants, Choi et al.34 demonstrated that the loss of Cav3.1 T-type calcium channel reduced the bilateral coherence between PFC and MD under ketamine anesthesia and ethanol hypnosis, especially in the delta frequency bands. More importantly, under propofol anesthesia, Bastos et al.35 showed that intralaminar nucleus and MD stimulation lead to increased wake-up subscore and arousal, together with an increased in cortico-cortico and thalamo-cortical slow (delta) frequency power.

      In the present study, we observed that MD KD (Fig. 2A), but not VB KD (Fig. S3) of Cav3.1 increased and is associated (Fig. 2D) with ethanol resistance in mice. We found that MD neurons in Cav3.1 mutant mice exhibited tonic firing within range of wakefulness (Fig. 3 and 4), indicative of resistance to ethanol and wake-like brain state. In addition, we found a strong association between the normalized tonic firing in MD and the arousal through brain states (i.e. walk to wake to sleep states), supporting that MD tonic firing could be interpreted both as a thalamic readout and a modulator of the brain state 11 (Fig. 3). Finally, direct optogenetic and electric MD stimulation increased resistance to loss of consciousness in WT mice (Fig.5 and Fig. S10). To our knowledge, this is the first report demonstrating the causal involvement of mediodorsal thalamic nucleus in the modulation of wakefulness and the resistance to ethanol-induced loss of consciousness in mice.”

      R1-5: While the methods employed generally seem sound, the description in the methods section is lacking in detail and is often difficult to follow. Analysis methods such as the burst index appear to only be given a brief explanation in the text and appear not to be mentioned in the methods section.

      R1-A5: We have added a clear definition in the supplementary method following the original work used:

      [Supplementary Method section, Single Unit recording, sorting and analysis, last paragraph]

      “The bursting index was derived as described in (Royer et al. 2012). Namely, the burst index was estimated from the spike auto-correlogram (1-ms bin size) by subtracting the mean value between 40 and 50 ms (baseline) from the peak measured between 0 and 10 ms. Positive burst amplitudes were normalized to the peak and negative amplitudes were normalized to the baseline to obtain indexes ranging from −1 to 1.” We also edited its mention in the text for clarity:

      [Result section, Lack of Ca3.1 in MD neurons removes thalamic burst in NREM sleep, paragraph 2]

      “[…] and a clear reduction in total bursting represented as bursting index (Fig. 3-B; ratio of spikes count <10 ms and >50 ms based on auto-cross-correlogram).”

      R1-6: Similarly, the staining method used in Figure 2 does not appear to be described in the methods section.

      R1-A6: The staining method can be found in the supplementary method of the paper. [supplementary method, Immunohistochemistry]

      R1-7: The most substantial case is for the UMAP approach used in Figure 4-E which does not appear to be described in the methods or even described in the main text.

      R1-A7: Regarding the method, the UMAP approach is described in the supplementary method document [Uniform Manifold Approximation and Projection (UMAP)]. We believe that only a succinct description was needed here considering the extent of the analysis. Regarding the inserts in the main text, we agree that the main text was lacking the description of these results and we have amended the main text to reflect a clear description of this result and what it entails. The following paragraph was added:

      [Result section, Under ethanol, MD neurons lacking Cav3.1 show no burst and a wake state-like neural activity, second to last paragraph]

      “Finally, we asked whether the firing modes and properties (tonic firing rate, burst firing rate; see supplementary methods) of single MD neurons would form distinct qualitative representation of “brain stages” using a lowered dimensional UMAP representation (Uniform Manifold Approximation and Projection42 ). We observed that for awake and active (i.e. walk), the brain state representation formed two adjacent clusters that confounded both wild and mutant neurons (Fig. 4E, left panel). The REM and NREM states, the wild type neurons formed 2 additional interconnected clusters, whereas the mutant neurons tend to overlap with the clusters attributed to the “awake” brain state (Fig. 4E, second to left panel). Ethanol induced fLOM, similarly to REM and NREM clusters, was distinct from awake clusters in wild type mice and overlapped with the NREM clusters (Fig. 4E, third to left panel). Here also, mutant MD neurons showed overlap with the awake clusters rather than the “low consciousness” brain states. These results indicate that the firing mode and properties could define a brain state representation that shows distinctions in levels of consciousness. Moreover, the mutant showed a representation of “low consciousness” states overlapping with wild type “awake” states consistent with the hypothesis of resistance to loss of consciousness.”

      R1-8: Citations justifying the use of methods such as the approach to separate regular spiking and narrow spiking neuron subtypes are also needed.

      R1-A8: We have added two references related to the observation of the two subpopulations of spiking neurons [Schiff and Reyes, 2012; Destexhe, 2008].

      R1-9: Beyond the problems with content and reasoning discussed above, there are also some relatively minor issues with the clarity of writing throughout the paper (for example, in the abstract the authors refer to "the ethanol resistance behavior in WT mice" but it is difficult to parse what they mean by this statement.

      R1-A9: We addressed this issue by editing and revising the manuscript for clarity and flow.

      R1-10: Similarly, the next sentence "These results support the maintenance..." while clearer, is not well phrased. Though individually minor, issues like this re-occur throughout the manuscript and sometimes make it difficult to follow so the text should be revised to correct them.

      R1-A10: We thank the reviewer for highlighting this point. We have edited the overall text to improve clarity and flow.

      [abstract] 

      These results suggest that maintaining MD neural firing at a wakeful level is sufficient to induce resistance to ethanol-induced hypnosis in WT mice.

      R1-11: There are also some problems with labels such as the labels of A1/A2 in Figure 4, which appear to be incorrect.

      R1-A11: We noted this issue and have rectified the figure for clarity.

      R1-12: Also, S7 has no label on the B panels.

      R1-A12: We thank the reviewer for pointing out this lack. We have added the y-label on the panel for clarity.

      R1-13: Finally, some references are not included (only a label of [ref]).

      R1-A13: We have completed the missing reference and thank the reviewer for pointing that out.

      Additional comments

      R1-14: Aside from the additional quantification and clarification of the analysis discussed in the weakness section, in general, the experiments included in the manuscript seem reasonable. However, I would suggest one additional experiment as well as one control, both of which are relatively straightforward optogenetic experiments, that I feel would be helpful to further improve the study. First, as the authors note, the optogenetic interventions used do not directly address the relevance of the changes in bursting patterns observed in the knockout (KO), which are by far the most robust effect, with the changes in alcohol sensitivity. One approach that could help address this would be to use patterned suppression via inhibitory opsins (e.g. halorhodopsin) to "rescue" the periods of inhibition associated with bursting in the KO.

      R1-A14: Here the reviewer proposes an interesting experiment which we have attempted to perform, however, poses several technical challenges. First, the KO do not have burst firing as they are depleted from Cav3.1 low-threshold calcium channel. Therefore, under ethanol, even if there might exist a rhythmic inhibition that activates Cav3.1 channels and causes a rebound burst, the KO are unable to have it. Therefore, an optogenetic inhibition would only accentuate the total inhibition and could potentially induce an overall decrease in MD firing, resulting in an increase in LOM features. Alternatively, we showed that in a WT with low ethanol dose (where LOM induction is harder), the increased rhythmic inhibition does indeed increase significantly LOM duration and marginally decreases latency to LOM (Fig. S12), indicating that increased inhibition could indeed explain the hypothesis: “ the stronger the decrease in MD firing, the faster and longer the LOM.” The only caveat of using WT here is that optogenetic inhibition might also include rebound burst post-inhibition. Injecting bursts only did not alter the response to ethanol (Fig. S10). These results point to the role of loss of firing in MD as a main factor for LOM, and potentially the contribution of burst necessitating a concurrent inhibition/loss of firing.

      We agree that inhibition in KO would further validate this hypothesis, controlling for the role of burst. We regret that we are not in the capacity to perform additional experiments involving the KO mice.

      R1-15: For the control, tonic activation of the ventrobasal nucleus, as the authors did for the mediodorsal nucleus, would be beneficial to rule out the possibility that the observed effect would occur with any thalamic nucleus.

      R1-A15: We agree with the reviewer that we could have added an additional region control to the gain/loss of function experiments. We would even go further as to suggest that a better control nucleus would be a high order nucleus such as PO or an unrelated sensory relay nucleus such as LGN. VB being a motor relay nucleus, could also mediate movement initiation, which could be hard to interpret. Since the complete control study for all thalamic nuclei Cav3.1 KD is outside the scope of this study, we opted not to redo these experiments and keep the focus of the manuscript on the manipulation of MD activity rather than the various available thalamic nuclei. We also do not claim that MD is the sole center able to initiate a switch in the loss of consciousness, and a more in-depth study on that matter would be clearly needed.

      R1-16: In addition to these experiments, I did not note the strategy for sharing data obtained through this study so this should be added.

      R1-A16: We have uploaded data and code for most figures at the following repository and provided a clearer statement regarding data sharing. We thank the reviewer for pointing out this missing element.

      The link for the repository is the following:

      It contains:

      - Excel spreadsheet file of all behavior values, including the newly quantified Cv3.1 expression in MD/CL/SMT

      - Excel spreadsheet follow-up of all MD cells (single unit; tetrode) analyzed

      - Folders for all groups studied with representative figures showing EEG power over time and normalized activity (WT vs KO for 2, 3 and 4 g/kg; MDshKD vs shCTR, VBshKD vs shCTR; CHR2 NOSTIM vs STIM; ESTIM Groups and ARCH NOSTIM vs STIM)

      - A1G LORRvsLOM and OPEN FIELD Matlab data

      - Matlab and ImageJ Codes: single unit analysis, characterization, brain state characterization, sleep stages, LOM, open field analysis and statistical analysis.

      We have added the data sharing subsection in the acknowledgements:

      “Part of the analyzed data and codes are available on the open access platform, mendeley:

      Latchoumane, Charles-francois (2024), “Mediodorsal thalamic nucleus mediates resistance to ethanol through Cav3.1 T-type Ca2+ regulation of neural activity”, Mendeley Data, V1, doi: 10.17632/7fr427426m.1

      Additional data (large size recording and images) can be provided upon reasonable requests.”

      Reviewer #2 (Recommendations For The Authors):

      R2-1. Consciousness is a contentious subject. Even in humans, there is still intense research on the topic, not to mention animals, about which we still know very little. Moreover, consciousness is not quantified in this study, as there is no standard metric to do so. Accordingly, talking about 'modulation', 'transition', ́level ', or 'reduction' of consciousness can be misleading. Hence, it is probably safer to strictly refer to brain-states and/or stages of the sleep-wake cycle in this study and reframe it entirely around these concepts.

      R2-A1. The reviewer points to an important point and we appreciate this highlight. Agreeing that the definition of consciousness is rather loose and arguably difficult to pinpoint. Here, we settle on a definition that relies on the loss of motion and loss of righting reflex. This definition is widely accepted as the “verified” state in which the absence of responsiveness (to continuous stimuli, inducing reflex or discomfort) is observed and uninterrupted by jerks and spurious movements. Additional metrics needed would be the recording of EMG to quantify atonia and EEG to the settling of a dominantly slow-wave frequency (~4 Hz; ethanol-induced sedation at theta rhythm), as shown in Fig S1A. The driver of this 4Hz frequency and its correlation has been investigated previously (e.g. Choi et al, PNAS, 2012), leading to the accepted link between LOM/LORR and loss of consciousness. Our data present the advantage of showing single neuron recordings and that LOM is a state where the lowest firing activity is present (Fig S7AB) and comparable to deep sleep state activity (Fig3D). The first LOM is the most important as it highlights the deepest loss of consciousness before the ethanol starts to be metabolized and cleared, which would be consistent between animals.

      As a result, we have edited the manuscript to clarify all mentions related to brain states and states of unconsciousness.

      R2-2. It is not clear why the authors focus on the mediodorsal nucleus. This should be better explained in the introduction and developed in the discussion.

      R2-A2. This comment converges with the Reviewer 1 comments and we are addressing this lack in the discussion as suggested. We have addressed it with this previous comment and believe it is now clearer.

      R2-3. The discussion mentions that 'increased activity in MD might modulate the stability of cortical UP state and synchronization' (pg 21). This point should be either further developed and put into context, or removed. In its current state, it does not seem to contribute much to the discussion of results.

      R2-A3. We understand that the working “UP state” might not be clear enough. We have modified this sentences as follows to clarify that UP state could be either a state of where the animal is awake, aroused or attentive:

      [Discussion section, MD is a modulator of consciousness, first paragraph]

      “The MD is known to innervate limbic region, basal ganglia and medial prefrontal cortex 50 and increased activity in MD might modulate the stability of cortical UP states (e.g. awaken, aroused and attentive states) and synchronization 9,26. Thus, MD might be a major hub involved in cortical state control and brain state stabilization.“

      R2-4. The discussion states that 'mutant mice did not exhibit a decreased arousal level (i.e. increased locomotor activity)' (pg 23). This is confusing as decreased arousal should be reflected in decreased locomotor activity.

      R2-A4. We understand that the formulation of this sentence may be confusing and we have edited this portion of the text to improve quality in the revised version of the manuscript. To clarify, mutant mice do not exhibit reduced or increased arousal (not quantified, just observational), they do have a phenotypic hyperlocomotion. This comes in contrast with a lower basal firing rate in the MD, which in our interpretation, is not synonymous with lower arousal. We believe that the relative change in MD determines the change in arousal, and that the absolute firing is not indicative of arousal in itself, only in comparison.

      [Discussion section, The lower variability in MD Firing reflects Ethanol Resistance in Cav3.1 mutant mice, paragraph 2]

      “Mutant RS neurons in MD showed an overall lower excitability and variability of firing in various natural conscious and unconscious states compared to wild type mice. Remarkably, Cav3.1 mutant mice exhibited a clear increased locomotor activity and an increased resistance to ethanol. The general lower firing rate and the high “arousal” observed in mutant mice suggests that the relative change from state to state in tonic firing in MD, and not the absolute value of firing, might be a better correlate of change in brain state in the mice.”

      R2-5. The methods (pg 27) state that two genetic backgrounds (129/svjae and C57BL/6J ) were used in the study. Authors should show whether there were significant differences between those backgrounds in the key parameters assessed in the study (particularly resistance to ethanol sedation).

      R2-A5. As mentioned in the method section, we only used the F1-background mice, which are the firstgeneration offspring produced by crossing 129/svjae and C57BL/6J strains. To produce F1 KO mice, we kept the heterozygote mice in two strains. We unfortunately did not study the particular difference of the respective KO of these two backgrounds; however, the pure C57BL/6J KO has been used in other studies by our group (Kim et al 2001; Na et al, 2008; Park et al., 2010). The F1 background allows us to work with mice that are less aggressive and can be handled with less inherent stress.

      R2-6. It would be convenient to produce a supplementary figure associated with Figure 1C to show the same data with averages per mouse. That is, 9 points for control and 9 points for KO mice. This also applies to all cases where data is not presented per mouse but pooled between animals.

      R2-A6. We have added a panel C in Figure S1, to show the scatter values for all the mice corresponding to the figure 1C. We have also generalized this presentation for all behavior graphics showing all the animals in the scatter plot next to the boxplot. We believe that this presentation increases further the transparency of the manuscript. We have then added the scatter plot for all mice in figure Fig1, Fig2, Fig5, Fig.S2, Fig.S3, Fig.S10 and Fig.S12.

      R2-7. It would be informative to make a supplementary figure associated with Figure 1D to compare baseline raw activity levels (i.e., baseline walking recording) between control and KO mice. That is, do KO and control mice cover comparable distances and at similar speeds during baseline conditions? Figure 1D and Figure 4A suggest that the variability of locomotor activity is larger in KO mice. Hence, this parameter should be quantified and reported.

      R2-A7. We thank the reviewer for this comment. We strived to answer to this question in the manuscript in two ways:

      - We first measure the overall hyperlocomotion of the mice using the open field total distance parkoured in our mice cohorts (FigS4C). We did observe that the KO mutant showed hyperlocomotion, but not MD or VB knock-down mice. Which indicates that the hyperlocomotion component is not specific to the two thalamic nuclei studied.

      - Using the forced walking task, we impose on the animal to keep a steady pace of roughly 6cm/s. This assay allows to normalize the general walking behavior to a relatively fixed pace making it comparable for all animals.

      The reviewer suggested reporting the mean and variance in walking of WT and KO during baseline (prior to the ethanol I.P. injection). We believe that the two points mentioned above are sufficient to describe in a more quantitative way the WT vs KO locomotion differences. Moreover, by construction the normalized locomotion on the forced walking task will return similar means for the baseline, the standard deviation would, however, potentially show differences but would remain inconclusive.

      R2-8. The legend in Figure 1 states that 'the loss of consciousness is evaluated using normalized moving index using either video analysis (differential pixel motion), on- head accelerometer-based motion, or neck electromyograms'. Authors should clarify whether these methods are equivalent and support it with data.

      R2-A8. We understand the reviewer point and we have made a few modifications to the method description aligning better with what was done. For most mice, video analysis was used to obtain the moving index. When video recording was not available (2 mice), we had an accelerometer attached to the animal’s head stage which helped us derive a moving index that was similar to the video moving index. The neck electromyogram was rather used for animals implanted with the tetrodes to identify sleep stages based on local field potential frequency and muscle tone.  We have then clarified the method for this matter and Figure 1 to avoid this confusion. Since no concurrent recording of both video and accelerometer was performed, we do not have the data to compute the correlation between the two measures, however, no noticeable deviation from loss of motion was observed between the two methods. We realize that this may be a weak argument, however, our observations showed that video and accelerometers returned very similar timings for loss of motion (only a few comparative instances insufficient to present a statistical comparison).

      R2-9. How were spike bursts defined? The authors should try different criteria and verify the consistency of results.

      R2-A9 For in vivo single unit recording, we opted for a definition that is validated from our works and others as a silencing of at least 100 ms followed by a minimum of 3 spikes with:

      - First spike pairs interspike interval less than 4 ms

      - Remaining spike pairs interspike interval less than 20 ms

      We have performed this analysis using a minimum of 2 spikes, and varied silencing periods between 50 and 100ms, without observing significant deviation of the results. As shown in Figure S6B, with this approach we observed that the burst distribution had a majority with <10 spikes per burst. Figure S6C indicated that with a clear distribution of ISI for first spike within 2-4ms as observed in previous works (Desai and Varela, 2021; Alitto et al, 2019), importantly, not clearly capped at 4 ms, showing that the range for the first ISI might indeed be lower than 4ms for thalamic burst. Within burst spike waveforms can become very variable and the choice of 3 over 2 spikes minimum per burst stems from the aim to reduce false positive detection of ultra-short bursts, which in single unit recording remains controversial (Gray et al. 1995).

      Minor:

      R2-10: Figure 4A2 'Cav3.1(+/+)' should presumably be Cav3.1(-/-).

      R2-A10: this is correct and we have corrected the figure label [This sentence is ambiguous. What is ‘this’ that is correct?]

      R2-11: Figure S2C legend states 'Post-hoc group comparison was performed using.' The sentence seems to be incomplete.

      R2-A11: We have completed the sentence for clarity.

      R2-12: In the methods (pg 29) virus concentration is reported as '107 TU/ul', which probably refers to 10e7.

      R2-A12: We have corrected it by superscripting the power 7.

      R2-13: Verify Fig 1C1 and correct Y-axis overlap between title and units.

      R2-A13: We edited the figure for clarity, thank you.

      R2-14: On page 24 there is a '[ref]' that probably stands for (a missing) reference.

      R2-A14: the missing reference has been added.

    1. Author response:

      We are glad that the reviewers found our work to be interesting and appreciate its contribution to enhancing ecological validity of attention research. We also agree that much more work is needed to solidify this approach, and that some of the results should be considered “exploratory” at this point, but appreciate the recognition of the novelty and scientific potential of the approach introduced here.

      We will address the reviewers’ specific comments in a revised version of the paper, and highlight the main points here:

      · We agree that the use of multiple different neurophysiological measures is both an advantage and a disadvantage, and that the abundance of results can make it difficult to tell a “simple” story. In our revision, we will make an effort to clarify what (in our opinion) are the most important results and provide readers with a more cohesive narrative.

      · Important additional discussion points raised by the reviewers, which will be discussed in a revised version are a) the similarities and differences between virtual and real classrooms; b) the utility of the methods and data to the community and c) the implication of these results for educational neuroscience and ADHD research.

      · In the revision, we will also clarify several methodological aspects of the data analysis, as per the reviewers’ requests.

      · After final publication, the data will be made available for other researchers to use.

    1. Author response:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The heatmaps (for example, Figure 3A, B) are challenging to read and interpret due to their size. Is there a way to alter the visualization to improve interpretability? Perhaps coloring the heatmap by general anatomical region could help? We feel that these heatmaps are critical to the utility of the registration strategy, and hence, clear visualization is necessary.

      We thank the reviewers for this point on aesthetic improvement, and we agree that clearer visualization of our correlation heatmaps is important. To address this point, we have incorporated the capability of grouping “child” subregions in anatomical order by their more general “parent” region into the package function, plot_correlation_heatmaps(). Parent regions will be visually represented as smaller sub-facets in the heatmaps, and we will be submitting our full revised manuscript with these visual changes.

      (2) Additional context in the Introduction on the use of immediate early genes to label ensembles of neurons that are specifically activated during the various behavioral manipulations would enable the manuscript and methodology to be better appreciated by a broad audience.

      We thank the reviewers for this suggestion and will be revising parts of our Introduction to reflect the broader use and appeal of immediate early genes (IEGs) for studying neural changes underlying behavior.

      (3) The authors mention that their segmentation strategies are optimized for the particular staining pattern exhibited by each reporter and demonstrate that the manually annotated cell counts match the automated analysis. They mention that alternative strategies are compatible, but don't show this data.

      We thank the reviewers for this comment. We also appreciate that integration with alternative strategies is a major point of interest to readers, given that others may be interested in compatibility with our analysis and software package, rather than completely revising their own pre-existing workflows.

      This specific point on segmentation refers to the import_segmentation_custom()function in the package. As there is currently not a standard cell segmentation export format adopted by the field, this function still requires some data wrangling into an import format saved as a .txt file. However, we chose not to visually demonstrate this capability in the paper for a few reasons.

      i. A figure showing the broad testing of many different segmentation algorithms, (e.g., Cellpose, Vaa3d, Trainable Weka Segmentation) would better demonstrate the efficacy of segmentation of these alternative approaches, which have already been well-documented. However, demonstrating importation compatibility is more of a demonstration of API interface, which is better shown in website documentation and tutorial notebooks.

      ii. Additionally, showing importation with one well-established segmentation approach is still a demonstration of a single use case. There would be a major burden-of-proof in establishing importation compatibility with all potential alternative platforms, their specific export formats, which may be slightly different depending on post-processing choices, and the needs of the experimenters (e.g., exporting one vs many channels, having different naming conventions, having different export formats). For example, output from Cellpose can take the form of a NumPy file (_seg.npy file), a .png, or Native ImageJ ROI archive output, and users can have chosen up to four channels. Until the field adopts a standardized file format, one flexible enough to account for all the variables of experimental interest, we currently believe it is more efficient to advise external groups on how to transform their specific data to be compatible with our generic import function.

      Internally, in collaborative efforts, we have validated the ability to import datasets generated from completely different workflows for segmentation and registration. We intend on releasing this documentation in coming updates on our package website, which we believe will be more demonstrative on how to take advantage of our analysis package, without adopting our entire workflow.

      (4) The authors provided highly detailed information for their segmentation strategy, but the same level of detail was not provided for the registration algorithms. Additional details would help users achieve optimal alignment.

      We apologize for this lack of detail. The registration strategy depends upon the WholeBrain package for registration to the Allen Mouse Common Coordinate Framework. While this strategy has been published and documented elsewhere, we will be revising our methods to better incorporate details of this approach.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) While I was able to install the SMARTR package, after trying for the better part of one hour, I could not install the "mjin1812/wholebrain" R package as instructed in OSF. I also could not find a function to load an example dataset to easily test SMARTR. So, unfortunately, I was unable to test out any of the packages for myself. Along with the currently broken "tractatus/wholebrain" package, this is a good example of why I would strongly encourage the authors to publish SMARTR on either Bioconductor or CRAN in the future. The high standards set by Bioc/CRAN will ensure that SMARTR is able to be easily installed and used across major operating systems for the long term.

      We thank reviewers for pointing out this weakness; long-term maintenance of this package is certainly a mutual goal. Loading an .RDATA file is accomplished by either double-clicking directly on the file in a directory window, or by using the load() function, (e.g., load("directory/example.RData")). We will explicitly outline these directions in the online documentation and in our full revision.

      Moreover, we will submit our package to CRAN. Currently, SMARTR is not dependent on the WholeBrain package, which remains optional for the registration portion of our workflow. Ultimately, this independence will allow us to maintain the analysis and visualization portion of the package independently, and allow for submission to a more centralized software repository such as CRAN.

      (2) The package is quite large (several thousand lines include comments and space). While impressive, this does inherently make the package more difficult to maintain - and the authors currently have not included any unit tests. The authors should add unit tests to cover a large percentage of the package to ensure code stability.

      We appreciate this feedback and will add unit testing to improve the reliability of our package in the full revision.

      (3) Why do the authors choose to perform image segmentation outside of the SMARTR package using ImageJ macros? Leading segmentation algorithms such as CellPose and StarMap have well-documented APIs that would be easy to wrap in R. They would likely be faster as well. As noted in the discussion, making SMARTR a one-stop shop for multi-ensemble analyses would be more appealing to a user.

      We appreciate this feedback. We believe parts of our response to Reviewer 1, comment 3, are relevant to this point. Interfaces for CellPose and ClusterMap (which processes in situ transcriptomic approaches like STARmap) are both in python, and currently there are ways to call python from within R (https://rstudio.github.io/reticulate/index.html). We will certainly explore incorporating these APIs from R. However, we would anticipate this capability is more similar to “translation” between programming languages, but would not currently preclude users from the issue of still needing some familiarity with the capabilities of these python packages, and thus with python syntax.

      (4) Given the small number of observations for correlation analyses (n=6 per group), Pearson correlations would be highly susceptible to outliers. The authors chose to deal with potential outliers by dropping any subject per region that was> 2 SDs from the group mean. Another way to get at this would be using Spearman correlation. How do these analyses change if you use Spearman correlation instead of Pearson? It would be a valuable addition for the author to include Spearman correlations as an option in SMARTR.

      We thank reviewers for this suggestion and will provide a supplementary analysis of our results using Spearman correlations.

      (5) I see the authors have incorporated the ability to adjust p-values in many of the analysis functions (and recommend the BH procedure) but did not use adjusted p-values for any of the analyses in the manuscript. Why is this? This is particularly relevant for the differential correlation analyses between groups (Figures 3P and 4P). Based on the un-adjusted p-values, I assume few if any data points will still be significant after adjusting. While it's logical to highlight the regional correlations that strongly change between groups, the authors should caution which correlations are "significant" without adjusting for multiple comparisons. As this package now makes this analysis easily usable for all researchers, the authors should also provide better explanations for when and why to use adjusted p-values in the online documentation for new users.

      We appreciate the feedback and will more explicitly outline that in our paper, our dataset is presented as a more demonstrative and exploratory resource for readers and, as such, we accept a high tolerance for false positives, while decreasing risk of missing possible interesting findings. As noted by Reviewer #2, it is still “logical to highlight the regional correlations that strongly change between groups.” We will further clarify in our methods that we chose to present uncorrected p-values when speaking of significance. We will also include more statistical detail on our online documentation regarding FDR correction. Ultimately, the decision to correct for multiple comparisons and FDR choice of threshold, should still be informed by standard statistical theory and user-defined tolerance for inclusion of false-positives and missing of false-negatives. This will be influenced by factors, such as the nature and purpose of the study, and quality of the dataset.  

      (6) The package was developed in R3.6.3. This is several years and one major version behind the current R version (4.4.3). Have the authors tested if this package runs on modern R versions? If not, this could be a significant hurdle for potential users.

      We thank reviewers for pointing out concerns regarding versioning. Analysis and visualization capabilities are currently supported using R version 4.1+. The recommendation for R 3.6.3 is primarily for users interested in using the full workflow, which requires installation of the WholeBrain package. We anticipate supporting of visualization and network analysis capabilities with updated packages and R versions, and maintaining a legacy version for the full workflow presented in this paper.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors present a new application of the high-content image-based morphological profiling Cell Painting (CP) to single cell type classification in mixed heterogeneous induced pluripotent stem cellderived mixed neural cultures. Machine learning models were trained to classify single cell types according to either "engineered" features derived from the image or from the raw CP multiplexed image. The authors systematically evaluated experimental (e.g., cell density, cell types, fluorescent channels) and computational (e.g., different models, different cell regions) parameters and convincingly demonstrated that focusing on the nucleus and its surroundings contains sufficient information for robust and accurate cell type classification. Models that were trained on mono-cultures (i.e., containing a single cell type) could generalize for cell type prediction in mixed co-cultures, and describe intermediate states of the maturation process of iPSC-derived neural progenitors to differentiation neurons.

      Strengths:

      Automatically identifying single-cell types in heterogeneous mixed-cell populations holds great promise to characterize mixed-cell populations and to discover new rules of spatial organization and cell-cell communication. Although the current manuscript focuses on the application of quality control of iPSC cultures, the same approach can be extended to a wealth of other applications including an in-depth study of the spatial context. The simple and high-content assay democratizes use and enables adoption by other labs.

      The manuscript is supported by comprehensive experimental and computational validations that raise the bar beyond the current state of the art in the field of high-content phenotyping and make this manuscript especially compelling. These include (i) Explicitly assessing replication biases (batch effects); (ii) Direct comparison of feature-based (a la cell profiling) versus deep-learning-based classification (which is not trivial/obvious for the application of cell profiling); (iii) Systematic assessment of the contribution of each fluorescent channel; (iv) Evaluation of cell-density dependency; (v) Explicit examination of mistakes in classification; (vi) Evaluating the performance of different spatial contexts around the cell/nucleus; (vii) Generalization of models trained on cultures containing a single cell type (mono-cultures) to mixed co-cultures; (viii) Application to multiple classification tasks.

      I especially liked the generalization of classification from mono- to co-cultures (Figure 4C), and quantitatively following the gradual transition from NPC to Neurons (Figure 5H).

      The manuscript is well-written and easy tofollow.

      Thank you for the positive appreciation of our work and constructive comments. 

      Weaknesses:

      I am not certain how useful/important the specific application demonstrated in this study is (quality control of iPSC cultures), this could be better explained in the manuscript. 

      To clarify the importance we have added an additional explanation to the introduction (page 3) and also come back to it in the discussion (page 17).

      Text from the introduction:

      “However, genetic drift, clonal and patient heterogeneity cause variability in reprogramming and differentiation efficiency10,11. The differentiation outcome is further strongly influenced by variations in protocol12. This can significantly impact experimental outcomes, leading to inconsistent and potentially misleading results and consequently, it hinders the use of iPSC-derived cell systems in systematic drug screening or cell therapy pipelines. This is particularly true for iPSC-derived neural cultures, as their composition, purity and maturity directly affect gene expression and functional activity, which is essential for modelling neurological conditions13,14. Thus, from a preclinical perspective, there is the need for a fast and cost-effective QC approach to increase experimental reproducibility and cell type specificity15. From a clinical perspective in turn, robust QC is required for safety and regulatory compliance (e.g., for cell therapeutic solutions). This need for improved standardization and QC is underscored by large-scale collaborative efforts such as the International Stem Cell Banking Initiative16, which focusses on clinical quality attributes and provides recommendations for iPSC validation testing for use as cellular therapeutics, or the CorEuStem network, aiming to harmonize iPSC practices across core facilities in Europe.”

      Text from the discussion: 

      “Many groups highlight the difficulty of reproducible neural differentiation and attribute this to culture conditions, cultivation time and variation in developmental signalling pathways in the source iPSC material43,44. Spontaneous neural differentiation has previously been shown to require approximately 80 days before mature neurons arise that can fire action potentials and show neural circuit formation. Although these differentiation processes display a stereotypical temporal sequence34, the exact timing and duration might vary. This variation negatively affects the statistical power when testing drug interventions and thus prohibits the application of iPSC-culture derivatives in routine drug screening. Current solutions (e.g., immunocytochemistry, flow cytometry, …) are often cost-ineffective, tedious, and incompatible with longitudinal/multimodal interrogation. CP is a much more cost-effective solution and ideally suited for this purpose. Routine CP-based could add confidence to and save costs for the drug discovery pipeline. We have shown that CP can be leveraged to capture the morphological changes associated with neural differentiation.”

      Another issue that I feel should be discussed more explicitly is how far can this application go - how sensitively can the combination of cell painting and machine learning discriminate between cell types that are more subtly morphologically different from one another?

      Thank you for this interesting question. The fact that an approach based on a subregion not encompassing the whole cell (the “nucleocentric” approach) can predict cell types equally well, suggests that the cell shape as such is not the defining factor for accurate cell type profiling. And, while clearly neural progenitors, neurons or glia have vastly different cell shapes. We have shown that cells with closer phenotypes such as 1321N1 vs. SH-SY5Y or astrocytes vs. microglia can be distinguished with equal performance. However, triggered by the reviewers’ question, we have now tested additional conditions with more subtle phenotypes, including the classification of 1321N1 vs. two related retinal pigment epithelial cells with much more similar morphology (ARPE and RPE1 cells). We found that the CNN could discriminate these cells equally well and have added the results on page 8 and in Fig. 3D. To address this question from a different angle, we have also performed an experiment in which we changed cell states to assess whether discriminatory power remains high. Concretely, we exposed co-cultures of neurons and microglia to LPS to trigger microglial activation (more subtly visible as cytoskeletal changes and vacuole formation). This revealed that our approach still discriminates both cell types (neurons vs. microglia) with high accuracy, regardless of the microglial state. Furthermore, using a two-step approach, we could also distinguish LPS-treated (assumed to be activated) from unchallenged microglia (assumed to be more homeostatic), albeit with a lower accuracy. This experiment has been added as an extra results section (Cell type identification can be applied to mixed iPSC-derived neuronal cultures regardless of activation state, p12) and Fig. 7c. Finally, we have also added our take on what the possibilities could be for future applications in even more complex contexts such as tissue slice, 3D and live cell applications (page 17-18). 

      Regarding evaluations, the use of accuracy, which is a measure that can be biased by class imbalance, is not the most appropriate measurement in my opinion. The confusion matrices are a great help, but I would recommend using a measurement that is less sensitive for class imbalance for cell-type classification performance evaluations.  

      Across all CNNs trained in this manuscript, the sample size of the input classes has always been equalized, ruling out any effects of class imbalance. Nevertheless, to follow the reviewers’ recommendation, we have now used the F-score to document performance as it is insensitive to such imbalance. For clarity, we have now also mentioned the input number (ROIs/class) in every figure.

      Another issue is that the performance evaluation is calculated on a subset of the full cell population - after exclusion/filtering. Could there be a bias toward specific cell types in the exclusion criteria? How would it affect our ability to measure the cell type composition of the population?

      As explained in the M&M section, filtering was performed based on three criteria:

      (1) Nuclear size: values below a threshold of 160, objects are considered to represent debris;

      (2) DAPI intensity: values below a threshold of 500 represent segmentation errors;

      (3) IF staining intensity: gates were set onto the intensity of the fluorescent markers used with posthoc IF to only retain cells that are unequivocally positive for either marker and to avoid inclusion of double positive (or negative) cells in the ground truth training. 

      One could argue that the last criterion introduces a certain bias in that it does not consider part of the cell population. However, this is also not the purpose of our pioneering study that aims at identifying unique cell types for which ground truth is as pure and reliable as possible. Not filtering out these cells with a ‘dubious’ IF profile (e.g., cells that might be transitioning or are of a different type) would negatively affect the model by introducing noise. It is correct that the predictions are based only on these inputs and so cells of a subsequent test set will only be classified according to these labels. For example, in the neuronal differentiation experiment (Fig. 6G-H), cells are either characterized as NPC or as neurons, which leaves the transitioning (or undefined) cells in either category. Despite this simplification, the model adequately predicted the increase in neuron/NPC ratio with culture age. In future iterations, one could envision defining more refined cell (sub-)types in a population based on richer post-hoc information (e.g., through cyclic immunofluorescence or spatial single cell transcriptomics) or longitudinal follow-up of cell-state transitions using live imaging. This notion has been added to page 17 of the manuscript.

      I am not entirely convinced by the arguments regarding the superiority of the nucleocentric vs. the nuclear representations. Could it be that this improvement is due to not being sensitive/ influenced by nucleus segmentation errors?

      The reviewer has a valid point that segmentation errors may occur. However, the algorithm we have used (Stardist classifier), is very robust to nuclear segmentation errors. To verify the performance, we have now quantified segmentation errors in 20 images for 3 different densities and found a consistently low error rate (0.6 -1.6%) without correlation to the culture density. Moreover, these errors include partial imperfections (e.g., a missed protrusion or bleb) as well as over- (one nucleus detected as more) or under- (more nuclei detected as one) segmentations. The latter two will affect both the nuclear and nucleocentric predictions and should thus not affect the prediction performance. In the case of imperfect segmentations, there may be a specific impact on the nucleus-based predictions (which rely on blanking the non-nuclear part), but this alone cannot explain the significantly higher gain in accuracy for nucleocentric predictions (>5%). Therefore, we conclude that segmentation errors may contribute in part, but not exclusively, to the overall improved performance of nucleocentric input models. We have added this notion in the discussion (pages 14-15 and Suppl. Fig. 1E).

      GRADCAM shows cherry-picked examples and is not very convincing.

      To help convince the reviewer and illustrate the representativeness of selected images, we have now randomly selected for each condition and density 10 images (using random seeds to avoid cherrypicking) and added these in a Suppl. Fig. 3.

      There are many missing details in the figure panels, figure legend, and text that would help the reader to better appreciate some of the technical details, see details in the section on recommendations for the authors.

      Please see further for our specific adaptations.

      Reviewer #2 (Public Review):

      This study uses an AI-based image analysis approach to classify different cell types in cultures of different densities. The authors could demonstrate the superiority of the CNN strategy used with nucleocentric cell profiling approach for a variety of cell types classification. The paper is very clear and well-written. I just have a couple of minor suggestions and clarifications needed for the reader.

      The entire prediction model is based on image analysis. Could the authors discuss the minimal spatial resolution of images required to allow a good prediction? Along the same line, it would be interesting to the reader to know which metrics related to image quality (e.g. signal to noise ratio) allow a good accuracy of the prediction.

      Thank you for the positive and relevant feedback.

      The reviewer has a good point that it is important to portray the imaging conditions that are required for accurate predictions. To investigate this further we have performed additional experiments that give a better view on the operating window in terms of resolution and SNR (manuscript page 7-8 and new figure panels Fig. 3B-C). The initial image resolution was 0.325 µm/pixel. To understand the dependency on resolution we performed training and classifications for image data sets that were progressively binned. We found that a two-fold reduction in resolution did not significantly affect the F-score, but further degradation decreased the performance. At a resolution of 6,0 µm/pixel (20-fold binning), the F-score dropped to 0.79±0.02, comparable to the performance when only the DAPI (nuclear) channel was used as input. The effect of reduced image quality was assessed in a similar manner, by iteratively adding more Gaussian noise to the image. We found that above an SNR of 10 the prediction performance remains consistent but below it starts to degrade. While this exercise provides a first impression of the current confines of our method, we do believe it is plausible that its performance can be extended to even lower-quality images for example by using image restoration algorithms. We have added this notion in the discussion (page 14).

      The authors show that nucleocentric-based cell feature extraction is superior to feeding the CNN-based model for cell type prediction. Could they discuss what is the optimal size and shape of this ROI to ensure a good prediction? What if, for example, you increase or decrease the size of the ROI by a certain number of pixels?

      To identify the optimal input, we varied the size of the square region around the nuclear centroid from 0.6 to 150 µm for the whole dataset. Within the nuclear-to-cell window (12µm- 30µm) the average Fscore is limited, but an important observation is the increasing error and differences in precision and recall with increasing nucleocentric patch sizes, which will become detrimental in cases of class imbalance. The F-score is maximal for a box of 12-18µm surrounding the nuclear centroid. In this “sweet spot”, the precision and recall are also in balance. Therefore, we have selected this region for the actual density comparison experiment. We have added our results to the manuscript (page 9 and 15).

      It would be interesting for the reader to know the number of ROI used to feed each model and know the minimal amount of data necessary to reach a high level of accuracy in the predictions.

      The figures have now been adjusted so that the number of ROIs used as input to feed the model are listed. The minimal number of ROIs required to obtain high level accuracy is tested in Figure 2C. By systematically increasing the number of input ROIs for both RF and CNN, we found that a plateau is reached at 5000 input ROIs (per class) for optimal prediction performance. This is also documented in the results section page 6.

      From Figure 1 to Figure 4 the author shows that CNN based approach is efficient in distinguishing 1321N1 vs SH-SY5Y cell lines. The last two figures are dedicated to showing 2 different applications of the techniques: identification of different stages of neuronal differentiation (Figure 5) and different cell types (neurons, microglia, and astrocytes) in Figure 6. It would be interesting, for these 2 two cases as well, to assess the superiority of the CNN-based approach compared to the more classical Random Forest classification. This would reinforce the universal value of the method proposed.

      To meet the reviewer’s request, we have now also compared CNN to RF for the classification of cells in iPSC-derived models (Figures 6 and 7). As expected, the CNN performed better in both cases. We have now added these results in Fig. 6 D and 7 C and pages 12 and 13 of the manuscript.

      Reviewer #3 (Public Review):

      Induced pluripotent stem cells, or iPSCs, are cells that scientists can push to become new, more mature cell types like neurons. iPSCs have a high potential to transform how scientists study disease by combining precision medicine gene editing with processes known as high-content imaging and drug screening. However, there are many challenges that must be overcome to realize this overall goal. The authors of this paper solve one of these challenges: predicting cell types that might result from potentially inefficient and unpredictable differentiation protocols. These predictions can then help optimize protocols.

      The authors train advanced computational algorithms to predict single-cell types directly from microscopy images. The authors also test their approach in a variety of scenarios that one may encounter in the lab, including when cells divide quickly and crowd each other in a plate. Importantly, the authors suggest that providing their algorithms with just the right amount of information beyond the cells' nuclei is the best approach to overcome issues with cell crowding.

      The work provides many well-controlled experiments to support the authors' conclusions. However, there are two primary concerns: (1) The model may be relying too heavily on the background and thus technical artifacts (instead of the cells) for making CNN-based predictions, and (2) the conclusion that their nucleocentric approach (including a small area beyond the nucleus) is not well supported, and may just be better by random chance. If the authors were to address these two concerns (through additional experimentation), then the work may influence how the field performs cell profiling in the future.

      Thank you very much for confirming the potential value of our work and raising these relevant items. To better support our claims we have now performed additional validations, which we detail below. 

      (1) The model may be relying too heavily on the background and thus technical artifacts (instead of the cells) for making CNN-based predictions 

      To address the first point, we have adapted the GradCAM images to show an overlay of the input crop and GradCAM heatmap to give a better view of the structures that are highlighted by the CNN. We further investigated the influence of the background on the prediction performance. Our finding that a CNN trained on a monoculture retains a relatively high performance on cocultures implies that the CNN uses the salient characteristics of a cell to recognize it in more complex heterogeneous environments. Assuming that the background can vary between experiments, the prediction of a pretrained CNN on a new dataset indicates that cellular characteristics are used for robust prediction.  When inspecting GradCAM images obtained from the nucleocentric CNN approaches (now added in Suppl. Fig. 3), we noticed that the nuclear periphery typically contributed the most (but not exclusively) to the prediction performance. When using only the nuclear region as input, GradCAMs were more strongly (but again not exclusively) directed to the background surrounding the nuclei. To train the latter CNN, we had cropped nuclei and set the background to a value of zero. To rule out that this could have introduced a bias, we have now performed the exact same training and classification, but setting the background to random noise instead (Suppl. Fig. 2). While this effectively diverted the attention of the GradCAM output to the nucleus instead of the background, the prediction performance was unaltered. We therefore assume that irrespective of the background, when using nuclear crops as input, the CNN is dominated by features that describe nuclear size. We observe that nuclear size is significantly different in both cell types (although intranuclear features also still contribute) which is also reflected in the feature map gradient in the first UMAP dimension (Suppl. Fig. 2). This notion has been added to the manuscript (page 9) and Suppl. Fig. 2. 

      (2) The conclusion that their nucleocentric approach (including a small area beyond the nucleus) is not well supported, and may just be better by random chance. 

      To address this second concern, which was also raised by reviewer 2, we have performed a more extensive analysis in which the patch size was varied from 0.6 to 120µm around the nuclear centroid (Fig. 4E and page 9 of the manuscript). We observed that there is little effect of in- or decreasing patch size on the average F-score within the nuclear to cell window, but that the imbalance between the precision and recall increases towards the larger box sizes (>18µm). Under our experimental conditions, the input numbers per class were equal, but this will not be the case in situations where the ground truth is unknown (and needs to be predicted by the CNN). Therefore, a well-balanced CNN is of high importance. This notion has been added to page 15 of the manuscript.

      The main advantage of nucleocentric profiling over whole-cell profiling in dense cultures is that it relies on a more robust nuclear segmentation method and is less sensitive to differences in cell density (Suppl. Fig. 1D). In other words, in dense cultures, the segmentation mask will contain similar regional input as the nuclear mask and the nucleocentric crop will contain more perinuclear information which contributes to the prediction accuracy. Therefore, at high densities, the performance of the CNN on whole-cell crops decreases owing to poorer segmentation performance. A CNN that uses nucleocentric crops, will be less sensitive to these errors. This notion has been added to pages 14-15 of the manuscript. 

      Additionally, the impact of this work will be limited, given the authors do not provide a specific link to the public source code that they used to process and analyze their data.

      The source code is now available on the Github page of the DeVos lab, under the following URL: https://github.com/DeVosLab/Nucleocentric-Profiling

      Recommendations for the authors:  

      Reviewing Editor (Recommendations For The Authors):

      Evaluation summary

      The authors present a new application of the high-content image-based morphological profiling Cell Painting (CP) to single cell type classification in mixed heterogeneous induced pluripotent stem cellderived mixed neural cultures. Machine learning models were trained to classify single cell types according to either "engineered" features derived from the image or from the raw CP multiplexed image. The authors systematically evaluated experimental (e.g., cell density, cell types, fluorescent channels, replication biases) and computational (e.g., different models, different cell regions) parameters and argue that focusing on the nucleus and its surroundings contains sufficient information for robust and accurate cell type classification. Models that were trained on mono-cultures (i.e., containing a single cell type) could generalize for cell type prediction in mixed co-cultures, and describe intermediate states of the maturation process of iPSC-derived neural progenitors to differentiation neurons.

      Strengths:

      Automatically identifying single-cell types in heterogeneous mixed-cell populations is an important application and holds great promise. The simple and high-content assay democratizes use and enables adoption by other labs. The manuscript is supported by comprehensive experimental and computational validations. The manuscript is well-written and easy to follow.

      Weaknesses:

      The conclusion is that the nucleocentric approach (including a small area beyond the nucleus) is not well supported, and may just be better by random chance. If better supported by additional experiments, this may influence how the field performs cell profiling in the future. Model interpretability (GradCAM) analysis is not convincing. The lack of a public source code repository is also limiting the impact of this study. There are missing details in the figure panels, figure legend, and text that would help the reader to better appreciate some of the technical details.

      Essential revisions:

      To reach a "compelling" strength of evidence the authors are requested to either perform a comprehensive analysis of the effect of ROI size on performance, or tune down statements regarding the superior performance of their "nucleocentric" approach. Further addition of a public and reproducible source code GitHub repository will lead to an "exceptional" strength of evidence.

      To answer the main comment, we have performed an experiment in which we varied the size of the nucleocentric patch and quantified CNN performance. We have also evaluated the operational window of our method by varying the resolution and SNR and we have experimented with different background blanking methods. We have expanded our examples of GradCAM images and now also made our source code and an example data set available via GitHub.

      Reviewer #1 (Recommendations For The Authors):

      I think that an evaluation of how the excluded cells affect our ability to measure the cell type composition of the population would be helpful to better understand the limitations and practical measurement noise introduced by this approach. A similar evaluation of the excluded cells can also help to better understand the benefit of nucleocentric vs. cell representations by more convincingly demonstrating the case for the nucleocentric approach. In any case, I recommend discussing in more depth the arguments for using the nucleocentric representation and why it is superior to the nuclear representation.

      The benefits of nucleocentric representation over nuclear and whole-cell representation are discussed more in depth at pages 14-15 of the manuscript. 

      “The nucleocentric approach, which is based on more robust nuclear segmentation, minimizes such mistakes whilst still retaining input information from the structures directly surrounding the nucleus. At higher cell density, the whole-cell body segmentation becomes more error-prone, while also loosing morphological information (Suppl. Fig. 1D). The nucleocentric approach is more consistent as it relies on a more robust segmentation and does not blank the surrounding region. This way it also buffers for occasional nuclear segmentation errors (e.g., where blebs or parts of the nucleus are left undetected).”

      It is not entirely clear to me why Figure 5 moves back to "engineered" features after previous figures showed the superiority of the deep learning approach. Especially, where Figure 6 goes again to DL. Dimensionality reduction can be also applied to DL-based classifications (e.g., using the last layer).

      Following up on the reviewers’ interesting comment, we extracted the embeddings from the trained CNN and performed UMAP dimensionality reduction. The results are shown in Fig. 3D, 6F and supplementary figure 1B and added to the manuscript on pages 6, 8 and 12. 

      We concluded that unsupervised dimensionality reduction using the feature embeddings could separate cell type clusters, where the distance between the clusters reflected the morphological similarity between the cell lines. 

      I would recommend including more comprehensive GRADCAM panels in the SI to reduce the concern of cherry-picking examples. What is the interpretation of the nucleocentric area?

      A more extensive set of GradCAM images have now been included in supplementary material (Supplementary figure 3) using the same random seeds for all conditions, thus avoiding any cherry picking. We interpret the GradCAM maps on the nucleocentric crops as highlighting the structures surrounding the nucleus (reflecting ER, mitochondria, Golgi) indicating their importance in correct cell classification. This was added to the manuscript on pages 9 and 15.

      Missing/lacking details and suggestions in the figure panels and figure legend:

      - Scale bars missing in some of the images shown (e.g., Figure 2F, Figure 3D, Figure 4, Supplementary Figure 4), what are the "composite" channels (e.g., Figure 2F), missing x-label in Figure 3B. 

      These have now been added.

      - Terms that are not clear in the figure and not explained in the legend, such as FITC and cy3 energy (Figure 1C). 

      The figure has been adapted to better show the region, channel and feature. We have now added a Table (Table 5), detailing the definition of each morphological feature that is extracted. On page 27, information on feature extraction is noted.

      - Details that are missing or not sufficiently explained in the figure legends such as what each data point represents and what is Gini importance (Figure 1D) 

      We have added these explanations to the figure legends. The Gini importance or mean decrease in impurity reflects how often this feature is used in decision tree splits across all random forest trees.

      Is it the std shown in Figure 2C?

      Yes, this has now been added to the legend.  

      It is not fully clear what is single/mixed (Figure 2D)

      Clarification is added to the legend and in the manuscript on page 6.

      explain what is DIV 13-90 in the legend (Figure 5).

      DIV stands for days in vitro, here it refers to the days in culture since the start of the neural induction process. This has been added in the legend.

      and state what are img1-5 (Supplementary Figures 1B-C) Clarification has been added to the legend.

      - Supplementary Figure 1. What is the y-axis in panel C and how do the results align with the cell mask in panel B?

      The y-axis represents the intersection over union (IoU). The IoU quantifies the overlap between ground truth (manually segmented ROI) and the ROI detected by the segmentation algorithm. It is defined as the area of the overlapping region over the total area. This clarification has been added to the legend.

      - Supplementary Figure 1 and Methods. Please explain when CellPose and when StarDist were applied.

      Added to supplementary figure and methods at page 24. In the case of nuclear segmentation (nucleus and nucleocentric crops), Stardist was used. For whole-cell crops, cell segmentation using Cellpose was used.

      - Supplementary Figure 4C - the color code is different between nuclear and nucleocentric - this is confusing.

      We have changed to color code to correspond in both conditions in Fig. 1A.

      - Figure 3B - better to have a normalized measure in the x-axis (number of cells per area in um^2)

      We agree and have changed this.

      Suggestions and missing/lacking details in the text:

      • Line #38: "we then applied this" because it is the first time that this term is presented.

      This has been rephrased.

      • Line #88: a few words on what were the features extracted would be helpful.

      Short description added to page 26-27 and detailed definition of all features added in table 5.

      -  Line #91: PCA analysis - the authors can highlight what (known) features were important to PC1 using the linear transformation that defined it.

      The 5 most important features of PC1 were (in order of decreasing importance): channel 1 dissimilarity, channel 1 homogeneity, nuclear perimeter, channel 4 dissimilarity and nuclear area.  

      - Line #92: Order of referencing Supplementary Figure 4 before referencing Supplementary Figure 13.

      The order of the Supplementary images was changed to follow the chronology. 

      • Line #96: Can the authors show the data supporting this claim?

      The unsupervised UMAP shown in fig. 1B is either color coded by cell type (left) or replicate (right). Based on this feature map, we observe clustering along the UMAP1 axis to be associated with the cell type. Variations in cellular morphology associated with the biological replicate are more visible along the UMAP2 axis. When looking at fig. 1C, the feature map reflecting the cellular area shows a gradient along the UMAP1 direction, supporting the assumption that cell area contributes to the cell type separation. On the other hand, the average intensity (Channel 2 intensity) has a gradient within the feature map along the UMAP2 direction. This corresponds to the pattern associated with the inter-replicate variability in panel B.

      - Line #108: what is "nuclear Cy3 energy"?

      This represents the local change of pixel intensities within the ROI in the nucleus in the 3rd channel dimension. This parameter reflects the texture within the nuclear region for the phalloidin and WGA staining. The definitions of all handcrafted features are added in table 5 of the manuscript.

      - Line #110-112: Can the authors show the data supporting this claim?

      The figure has been changed to include the results from a filtered and unfiltered dataframe (exclusion and inclusion of redundant features). Features could be filtered out if the correlation was above a threshold of 0.95. This has been added to page 6 of the manuscript and fig. 1D.  

      - Line #115-116: please state the size of the mask.

      Added to the text (page 6). We used isotropic image crops of 60µm centred on individual cell centroids.

      - Lines 120-122: more details will make this more clear (single vs. mixed).

      This has been changed on page 6 of the manuscript.

      • Line #142: "(mimics)" - is it a typo?

      Tissue mimics refers to organoids/models that are meant to replicate the physiological behaviour.

      • Line #159: the bounding box for nucleocentric analysis is 15x15um (and not 60), as stated in the Methods.

      Thank you for pointing out this mistake. We have adapted this.

      - Line #165: what is the interpretation of what was important for the nucleocentric classification?

      The colour code in GradCAM images is indicative of the attention of the CNN (the more to the red, the more attention). In fig. 4D and Suppl. Fig. 3 the structures directly surrounding the nucleus receive high attention from the CNN trained on nucleocentric crops. This has been added to the manuscript page 9 and 15.

      • Section starting in line #172: not explicitly stated what model was used (nucleocentric?).

      Added in the legend of fig. 5. For these experiments, the full cell segmentation was still used. 

      - Section starting in line #199: why use a feature-based model rather than nucleocentric? A short sentence would be helpful.

      For CNN training, nucleocentric profiling was used. In response to a legitimate question of one of the reviewers, the feature-based UMAP analysis was replaced with the feature embeddings from the CNN. 

      - Line #213: Fig. 5B does not show transitioning cells.

      Thank you for pointing this out, this was a mistake and has been changed.

      Lines #218-220: not fully clear to some readers (culture condition as a weak label), more details can be helpful.

      We changed this at page 11 of the manuscript for clarity. 

      “This gating strategy resulted in a fractional abundance of neurons vs. total (neurons + NPC) of 36,4 % in the primed condition and 80,0% in the differentiated condition (Fig. 6C). We therefore refer to the culture condition as a weak label as it does not take into account the heterogeneity within each condition (well).”

      -  Line #230: "increasing dendritic outgrowth" - what does it mean? Can you explicitly highlight this phenotype in Figure 5G?

      When the cells become more mature during differentiation, the cell body becomes smaller and the neurons form long, thin ramifications. This explanation has been added to page 12 of the manuscript.

      • Line #243: is it the nucleocentric CNN?

      Yes.

      • Lines #304-313, the authors might want to discuss other papers dealing with continuous (non-neural) differentiation state transitions (eg PMID: 38238594).  

      A discussion of the use of morphological profiling for longitudinal follow-up of continuous differentiation states has been added to the manuscript at page 18. 

      - Line #444: cellpose or stardist? How did the authors use both?

      Clarification has been added to supplementary figure 1 and methods at page 24. Stardist was used for nuclear segmentation, whereas Cellpose was used for whole-cell segmentation. 

      • Line #470-474: I would appreciate seeing the performance on the full dataset without exclusions.

      Cells have been excluded based on 3 arguments: the absence of DAPI intensity, too small nuclear size and absence of ground truth staining. The first two arguments are based on the assumption that ROIs that contain no DAPI signal or are too small are errors in cell segmentation and therefore should not be taken along in the analysis. The third filtering step was based on the ground-truth IF signal. Not filtering out these cells with a ‘dubious’ IF profile (e.g., cells that might be transitioning or are of a different type) would negatively affect the model by introducing noise. It is correct that the predictions are based only on these inputs and so cells of a subsequent test set will only be classified according to these labels which might introduce bias. However, the model could predict increase in neuron/NPC ratio with culture age in absence of ground-truth staining (and thus IF-based filtering).

      Reviewer #2 (Recommendations For The Authors):

      Figure 1A: it would be interesting to the reader to see the SH-SY5Y data as well.

      This has been added in fig. 1A.

      Figure 3A: 95-100% image: showing images with the same magnification as the others would help to appreciate the cell density.

      Now fig. 4A. The figure has been changed to make sure all images have the same magnification. 

      Figure Supp 4 (line 132) is referred to before Figure Supp1 (line 152).

      The image order and numbering has been changed to solve this issue.

      Figure Supp 2 & 3 are not referred to in the text.

      This has been adjusted.

      Line 225: a statistical test would help to convince of the accuracy of these results (Figure 5C vs Figure 5F)?

      These figures represent the total ROI counts and thus represent a single number.

      Line 227: Could you explain to the reader, in a few words, what a dual SMAD inhibition is?

      This has been added to the manuscript at page 20. 

      “This dual blockade of SMAD signalling in iPSCs is induces neural differentiation by synergistically causing the loss of pluripotency and push towards neuroectodermal lineage.”

      Reviewer #3 (Recommendations For The Authors):

      I have a few concerns and several comments that, if addressed, may strengthen conclusions, and increase clarity of an already technically sound paper.

      Concerns

      • The results presented in Figure 3 panel D, may indicate a critical error in data processing and interpretation that the authors must address. The GradCAM method highlights the background as having the highest importance. While it can be argued in the nucleocentric profiling method that GradCAM focuses on the nuclear membrane, the background is highly important even for the nuclear profiling method, which should provide little information. What procedure did the authors use for mask subtraction prior to CNN training? Could the segmentation algorithm be performing differently between cell lines? The authors interpret the GradCAM results to indicate a proxy for nuclear size, but then why did the CNN perform so much better than random forest using hand-crafted features that include this variable? The authors should also present size distributions between cell lines (and across seeding densities, in case one of the cell lines has different compaction properties with increasing density).

      Perhaps clarifying this sentence (lines 166-168) would help as well: "As nuclear area dropped with culture density, the dynamic range decreased, which could explain the increased error rate of the CNN for high densities unrelated to segmentation errors (Suppl. Fig. 4B)." What do the authors mean by "dynamic range" and it is not clear how Supplementary Figure 4B provides evidence for this? 

      The dynamic range refers to the difference between the minimum and maximum nuclear area. We expect the difference to decrease at highe rdensity owing to the crowding that forces all nuclei to take on a more similar (smaller) size.

      More clarification on this has been added to page 9 of the manuscript.

      I certainly understand that extrapolating the GradCAM concern to the remaining single-cell images using only four (out of tens of thousands of options) is also dangerous, but so is "cherry-picking" these cells to visualize. Finally, I also recommend that the authors quantitatively diagnose the extent of the background influence according to GradCAM by systematically measuring background influence in all cells and displaying the results per cell line per density.

      To avoid cherry picking of GradCAM images, we have now randomly selected for each condition and density 10 images (using random seeds to avoid cherry-picking) and added these in a Suppl. Fig. 3.

      In answer to this concern, we refer to the response above: 

      “To address the first point, we have adapted the GradCAM images to show an overlay of the input crop and GradCAM heatmap to give a better view of the structures that are highlighted by the CNN. We further investigated the influence of the background on the prediction performance. Our finding that a CNN trained on a monoculture retains a relatively high performance on cocultures implies that the CNN uses the salient characteristics of a cell to recognize it in more complex heterogeneous environments. Assuming that the background can vary between experiments, the prediction of a pretrained CNN on a new dataset indicates that cellular characteristics are used for robust prediction.  When inspecting GradCAM images obtained from the nucleocentric CNN approaches (now added in Suppl. Fig. 3), we noticed that the nuclear periphery typically contributed the most (but not exclusively) to the prediction performance. When using only the nuclear region as input, GradCAMs were more strongly (but again not exclusively) directed to the background surrounding the nuclei. To train the latter CNN, we had cropped nuclei and set the background to a value of zero. To rule out that this could have introduced a bias, we have now performed the exact same training and classification, but setting the background to random noise instead (Suppl. Fig. 2). While this effectively diverted the attention of the GradCAM output to the nucleus instead of the background, the prediction performance was unaltered. We therefore assume that irrespective of the background, when using nuclear crops as input, the CNN is dominated by features that describe nuclear size. We observe that nuclear size is significantly different in both cell types (although intranuclear features also still contribute) which is also reflected in the feature map gradient in the first UMAP dimension (Suppl. Fig. 2). This notion has been added to the manuscript (page 9) and Suppl. Fig. 2.”

      • The data supporting the conclusion about nucleocentric profiling outperforming nuclear and full-cell profiling is minimal. I am picking on this conclusion in particular, because I think it is a super cool and elegant result that may change how folks approach issues stemming from cell density disproportionately impacting profiling. Figures 3B and 3C show nucleocentric slightly outperforming full cell, and the result is not significant. The authors state in lines 168-170: "Thus, we conclude that using the nucleocentric region as input for the CNN is a valuable strategy for accurate cell phenotype identification in dense cultures." This is somewhat of a weak conclusion, that, with additional analysis, could be strengthened and add high value to the community. Additionally, the authors describe the nucleocentric approach insufficiently. In the methods, the authors state (lines 501-503): "Cell crops (60μm whole cell - 15μm nucleocentric/nuclear area) were defined based on the segmentation mask for each ROI." This is not sufficient to reproduce the method. What software did the authors use?

      Presumably, 60μm refers to a box size around cytoplasm? Much more detail is needed. Additionally, I suggest an analysis to confirm the impact of nucleocentric profiling, which would strengthen the authors' conclusions. I recommend systematically varying the subtraction (-30μm, -20μm, -10μm, 5μm, 0, +5μm, +10μm, etc.) and reporting the density-based analysis in Figure 3B per subtraction. I would expect to see some nucleocentric "sweet spot" where performance spikes, especially in high culture density. If we don't see this difference, then the non-significant result presented in Figures 3B and C is likely due to random chance. The authors mention "iterative data erosion" in the abstract, which might refer to what I am recommending, but do not describe this later.

      More detail was added to the methods describing the image crops given as input to the CNN (page 28 of the manuscript). 

      “Crops were defined based on the segmentation mask for each ROI. The bounding box was cropped out of the original image with a fixed patch size (60µm for whole cells, 18µm for nucleus and nucleocentric crops) surrounding the centroid of the segmentation mask. For the whole cell and nuclear crops, all pixels outside of the segmentation mask were set to zero. This was not the case for the nucleocentric crops. Each ROI was cropped out of the original morphological image and associated with metadata corresponding to its ground truth label.”

      To address this concern, we also refer to the answer above. 

      “We have performed a more extensive analysis in which the patch size was varied from 0.6 to 120µm around the nuclear centroid (Fig. 4E and page 9 of the manuscript). We observed that there is little effect of in- or decreasing patch size on the average F-score within the nuclear to cell window, but that the imbalance between the precision and recall increases towards the larger box sizes (>18µm). Under our experimental conditions, the input numbers per class were equal, but this will not be the case in situations where the ground truth is unknown (and needs to be predicted by the CNN). Therefore, a well-balanced CNN is of high importance. This notion has been added to page 12 of the manuscript.

      The main advantage of nucleocentric profiling over whole-cell profiling in dense cultures is that it relies on a more robust nuclear segmentation method and is less sensitive to differences in cell density (Suppl. Fig. 1D). In other words, in dense cultures, the segmentation mask will contain similar regional input as the nuclear mask and the nucleocentric crop will contain more perinuclear information which contributes to the prediction accuracy. Therefore, at high densities, the performance of the CNN on whole-cell crops decreases owing to poorer segmentation performance. A CNN that uses nucleocentric crops, will be less sensitive to these errors. This notion has been added to pages 14-15 of the manuscript.“

      Comments

      • There is a disconnect between the abstract and the introduction. The abstract highlights the nucleocentric model, but then it is not discussed in the introduction, which focuses on quality control. The introduction would benefit from some additional description of the single-cell or whole-image approach to profiling.

      We highlight the importance of QC of complex iPSC-derived neural cultures as an application of morphological profiling. We used single-cell profiling to facilitate cell identification in these mixed cultures where the whole-image approach would be unable to deal with the heterogeneity withing the field of view. In the introduction, we added a description of the whole-image vs. single-cell approach to profiling (page 4). In the discussion (page 18), we further highlight the application of this single-cell profiling approach for QC purposes. 

      - Comments on Figure 1. It is unclear how panel B shows "without replicate bias". 

      In response to this comment, we refer to the answer above: “The unsupervised UMAP shown in fig. 1B is either color coded by cell type (left) or replicate (right). Based on this feature map, we observe clustering along the UMAP1 axis to be associated with the cell type. Variations in cellular morphology associated with the biological replicate are more visible along the UMAP2 axis. When looking at fig. 1C, the feature map reflecting the cellular area shows a gradient along the UMAP1 direction, supporting the assumption that cell area contributes to the cell type separation. On the other hand, the average intensity (Channel 2 intensity) has a gradient within the feature map along the UMAP2 direction. This corresponds to the pattern associated with the inter-replicate variability in panel B.” We added this notion to page 5 of the manuscript.

      The paper would benefit from a description of how features were extracted sooner.

      Information on the feature extraction was added to the manuscript at page 27. An additional table (table 5) has been added with the definition of each feature.  

      - Comments on Supplementary Figure 4. The clustering with PCA is only showing 2 dimensions, so it is not surprising UMAP shows more distinct clustering.

      We used two components for UMAP dimensionality reduction, so the data was also visualized in two dimensions. However, we agree that UMAP can show more distinct clustering as this method is non-linear.

      Why is Figure S4 the first referenced Supplementary Figure?

      This has been changed. 

      • Comments on Figure 2. Need discussion of the validation set - how was it determined? Panel E might have the answer I am looking for, but it is difficult to decipher exactly what is being done. The terminology needs to be defined somewhere, or maybe it is inconsistent. It is tough to tell. For example, what exactly are the two categories of model validation (cross-validation and independent testing)?

      Additional clarification has been added to the manuscript at pages 6-7 and figure 2.

      The metric being reported is accuracy for the independent replicate if the other two are used to train?

      Yes. 

      Panel C is a very cool analysis. Panel F needs a description of how those images were selected, randomly?

      Added in the methods section (page 29). GradCAM analysis was used to visualize the regions used by the CNN for classification. This map is specific to each cell. Images are selected randomly out the full dataset for visualization.  

      They also need scale bars.

      Added to the figures. 

      Panel G would benefit from explicit channel labels (at least a legend would be good!).

      Explanation has been added to the legend. All color code and channel numbering are consistent with fig. 1A. 

      What do the dots and boxplots represent? The legend says, "independent replicates", but independent replicates of, I assume, different model initializations?

      Clarification has been added to the figure legends. For plots showing the performance of a CNN or RF classifier, each dot represents a different model initialization. Each classifier has been initialized at least 3 times. When indicated, the model training was performed with different random seeds for data splitting.

      • Comments on Figure 3. Panel A needs scale bar. See comment on Panel D in concern #1 described above. 

      This has been added.

      • Comments on Supplementary Figure 1. A reader will need a more detailed description in panel C. I assume that the grey bar is the average of the points, and the points represent different single cells?

      How many cells? How were these cells selected? 

      This information on the figure (now Suppl. Fig. 1D), has been added to the legend.

      “Left: Representative images of 1321N1 cells with increasing density alongside their cell and nuclear mask produced using resp. Cellpose and Stardist. Images are numbered from 1-5 with increasing density. Upper right: The number of ROIs detected in comparison to the ground truth (manual segmentation). A ROI was considered undetected when the intersection over union (IoU) was below 0,15. Each bar refers to the image number on the left. The IoU quantifies the overlap between ground truth (manually segmented ROI) and the ROI detected by the segmentation algorithm. It is defined as the area of the overlapping region over the total area. IoU for increasing cell density for cell and nuclear masks is given in the bottom right. Each point represents an individual ROI. Each bar refers to the image number on the left.”

      • Comments on Figure 4. More details on quenching are needed for a general audience. The markers chosen (EdU and BrdU) are generally not specific to cell type but to biological processes (proliferation), so it is confusing how they are being used as cell-type markers. 

      The base analogues were incorporated into each cell line prior to mixing them, i.e.  when they were still growing in monoculture so they could be labelled and identified after co-seeding and morphological profiling. Additional clarification has been added to the manuscript (page 26) 

      It is also unclear why reducing CV is an important side-effect of finetuning. CV of what? The legend says, "model iterations", but what does this mean? 

      The dots in the violinplot are different CNN initializations. A lower variability between model initializations is an indicator of certainty of the results. Prior to finetuning, the results of the CNN were highly variable leading to a high CoV between the different CNNs. This means the outcome after finetuning is more robust.

      • Comments on Figure 5. This is a very convincing and well-described result, kudos! This provides another opportunity to again compare other approaches (not just nucleocentric). Additionally, since the UMAP space uses hand-crafted features. The authors could consider interpreting the specific morphology features impacted by the striking gradual shift to neuron population by fitting a series of linear models per individual feature. This might confirm (or discover) how exactly the cells are shifting morphology.

      The supervised UMAP on the handcrafted features did not highlight any features contributing to the separation. Using the supervised UMAP, the clustering is dominated by the known cell type. Unsupervised UMAP on the handcrafted features does not show any clustering. In response to a previous comment, we adapted the figure to show UMAP dimensionality reduction using the feature embeddings from the cell-based CNN. This unsupervised UMAP does show good cell type separation, but it does not use any directly interpretable shape descriptors.

      • General comments on Methods. The section on "ground truth alignment" needs more details. Why was this performed? 

      Following sequential staining and imaging rounds, multiple images were captured representing the same cell with different markers. Lifting the plate of the microscope stage and imaging in sequential rounds after several days results in small linear translations in the exact location of each image. These linear translations need to be corrected to align (or register) morphological with ground truth image data within the same ROI. This notion has been added to the manuscript at page 26. 

      Handcrafted features extracted using what software? 

      The complete analysis was performed in python. All packages used are listed in table 4. Handcrafted features were extracted using the scikit-image package (regionprops and GLCM functions). This has been added to the manuscript at page 27.

      Software should be cited more often throughout the manuscript. 

      Lastly, the GitHub URL points to the DeVosLab organization, but should point to a specific repository. Therefore, I was unable to review the provided code. A well-documented and reproducible analysis pipeline should be included.

      A test dataset and source code are available on GitHub:  https://github.com/DeVosLab/Nucleocentric-Profiling

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 1. In Figure 1, the MafB antibody (Sigma) was used to identify Renshaw cells at P5. However, according to the supplementary Figure 3D, the specificity of the MafB antibody (Sigma) is relatively low. The image of MafB-GFP, V1-INs, and MafB-IR at P5 should be added to the supplementary figure. The specificity of MaFB-IR-Sigma in V1 neurons at P5 should be shown. This image also might support the description of the genetically labeled MafB-V1 distribution at P5 (page 8, lines 28-32). 

      We followed the reviewer’s suggestion and moved analyses of the MafB-GFP mouse to a supplemental figure (Fig S3). The characterization of MafB immunoreactivities is now in supplemental Figure S2 and the related text in results was also moved to supplemental to reduce technicalities in the main text. We added confocal images of MafB-GFP V1 interneurons at P5 showing immunoreactivities for both MafB antibodies, as suggested by the reviewer (Fig S2A,B). We agree with the reviewer that this strengthens our comparisons on the sensitivity and specificity of the two MafB antibodies used in this study. 

      As explained in the preliminary response we cannot show lack of immunoreactivity for MafB antibodies in MafB GFP/GFP knockout mice at P5 because MafB global KOs die at birth. This is why we used tissues from late embryos to check MafB immunoreactivities (Figure S2C and S2D). We made this point clearer in the text and supplemental figure legends.

      Comment 2. The proportion of genetically labeled FoxP2-V1 in all V1 is more than 60%, although immunolabeled FoxP2-V1 is approximately 30% at P5. Genetically labeled Otp-V1 included other nonFoxP2 V1 clades (Fig. 8L-M). I wonder whether genetically labeled FoxP2-V1 might include the other three clades. The authors should show whether genetically labeled FoxP2-V1 expresses other clade markers, such as pou6f2, sp8, and calbindin, at P5. 

      We included the requested data in Figure 3E-G. Lineage-labeled Foxp2-V1 neurons in our genetic intersection do not include cells from other V1-clades.

      Reviewer 2:

      Comment 1. The current version of the paper is VERY hard to read. It is often extremely difficult to "see the forest for the trees" and the reader is often drowned in methodological details that provide only minor additions to the scientific message. Non-specialists in developmental biology, but still interested in the spinal cord organization, especially students, might find this article challenging to digest and there is a high risk that they will be inclined to abandon reading it. The diversity of developmental stages studied (with possible mistakes between text and figures) adds a substantial complexity in the reading. It is also not clear at all why authors choose to focus on the Foxp2 V1 from page 9. Naively, the Pou6f2 might have been equally interesting. Finally, numerous discrepancies in the referencing of figures must also be fixed. I strongly recommend an in-depth streamlining and proofreading, and possibly moving some material to supplement (e.g. page 8, and elsewhere).

      The whole text was re-written and streamlined with most methodological discussion (including the section referred to by the reviewer) transferred to supplemental data. Nevertheless, enough details on samples, stats and methods were retained to maintain the rigor of the manuscript. 

      The reasons justifying a focus on Foxp2-V1 interneurons were fully explained in our preliminary response. Briefly, we are trying to elucidate V1 heterogeneity, and prior data showed that this is the most heterogeneous V1 clade (Bikoff et al., 2016), so it makes sense it was studied further. We agree that the Pou6f2 clade is equally interesting and is in fact the subject of several ongoing studies.

      Comment 2. … although the different V1 populations have been investigated in detail regarding their development and positioning, their functional ambition is not directly investigated through gain or loss of function experiments. For the Foxp2-V1, the developmental and anatomical mapping is complemented by a connectivity mapping (Fig 6s, 8), but the latter is fairly superficial compared to the former. Synapses (Fig 6) are counted on a relatively small number of motoneurons per animal, that may, or may not, be representative of the population. Likewise, putative synaptic inputs are only counted on neuronal somata. Motoneurons that lack of axo-somatic contacts may still be contacted distally. Hence, while this data is still suggestive of differences between V1 pools, it is only little predictive of function.

      We fully answered the question on functional studies in the preliminary response. Briefly, we are currently conducting these studies using various mouse models that include chronic synaptic silencing using tetanus toxin, acute partial silencing using DREADDs, and acute cell deletion using diphtheria toxin. Each intervention reveals different features of Foxp2-V1 interneuron functions, and each model requires independent validation. Moreover, these studies are being carried out at three developmental stages: embryos, early postnatal period of locomotor maturation and mature animals. Obviously, this is all beyond the goals and scope of the present study. The present study is however the basis for better informed interpretations of results obtained in functional studies.

      Regarding the question on synapse counts, we explained in the preliminary results fully why we believe our experimental designs for synapse counting at the confocal level are among the most thorough that can be found in the literature. We counted a very large number of motoneurons per animal when adding all motor column and segments analyzed in each animal. Statistical power was also enough to detect fundamental variation in synaptic density among motor columns.

      We focus our analyses on motoneuron cells bodies because analysis of full dendritic arbors on all motor columns present throughout all lumbosacral segments is not feasible. Please see Rotterman et al., 2014 (J. of Neuroscience; doi: 10.1523/JNEUROSCI.4768-13.2014) for evaluation of what this entails for a single motoneuron. We agree with the reviewer that analyses of V1 synapses over full dendrite arbors in specific motoneurons will be very relevant in further studies. These should be carried out now that we know which motor columns are of high interest. Nevertheless, inhibitory synapses exert the most efficient modulation of neuronal firing when they are on cell bodies, and our analyses clearly suggest a difference in in cell body inhibitory synapses targeting between different V1 interneuron types that we find very relevant.

      Comment 3. I suggest taking with caution the rabies labelling (Figure 8). It is known that this type of Rabies vectors, when delivered from the periphery, might also label sensory afferents and their postsynaptic targets in the cord through anterograde transport and transneuronal spread (e.g., Pimpinella et al., 2022). Yet I am not sure authors have made all controls to exclude that labelled neurons, presumed here to be premotoneurons, could rather be anterogradely labelled from sensory afferents. 

      Over the years, we performed many extensive controls and validation of rabies virus transsynaptic tracing methods. These were presented at two SfN meetings (Gomez-Perez et al., 2015 and 2016; Program Nos. 242.08 and 366.06). Our validation of this technique was fully explained in our preliminary response. We also pointed out that the methods used by Pimpinella et al. have a very different design and therefore their results are not comparable to ours. In this study we injected the virus at P15 into leg muscles, and not directly into the spinal cord. In our hands, and as cited in Pimpinella et al., the rabies virus loses tropism for primary afferents with age when injected in muscle. The lack of primary afferent labeling in key lumbosacral segments (L4 and L5) is now illustrated in a new supplemental figure (Figure S6). This figure also shows some starter motoneurons. As explained in the text and in our previous response, these are few in number because of the reduced infection rate when using this method in mature animals (after P10).  

      Comment 4. The ambition to differentiate neuronal birthdate at a half-day resolution (e.g., E10 vs E10.5) is interesting but must be considered with caution. As the author explains in their methods, animals are caged at 7pm, and the plug is checked the next morning at 7 am. There is hence a potential error of 12h. 

      We agree with the reviewer, and we previously explicitly discussed these temporal resolution caveats. We have now further expanded on this in new text (see middle paragraph in page 5). Nevertheless, the method did reveal the temporal sequence of neurogenesis of V1 clades with close to 12-hour resolution.

      As explained in text and preliminary response this is because we analyzed a sufficient number of animals from enough litters and utilized very stringent criteria to count EdU positives. 

      Moreover, our results fit very well with current literature. The data agree with previous conclusions from Andreas Sagner group (Institut für Biochemie, Friedrich-Alexander-Universität Erlangen-Nürnberg), on spinal interneurons (including V1s) birthdates based on a different methodology (Delile J et al.

      Development. 2019 146(12):dev173807. doi: 10.1242/dev.173807. PMID: 30846445; PMCID: PMC6602353). In the discussion we compared in detail both the data and methods between Delile article and our results. We also cite Sagner 2024 review as requested later in the reviewer’s detailed comments. Our results also confirmed our previous report on the birthdates of V1-derived Renshaw cells and Ia inhibitory interneurons (Benito-Gonzalez A, Alvarez FJ J Neurosci. 2012 32(4):1156-70. doi: 10.1523/JNEUROSCI.3630-12.2012. PMID: 22279202; PMCID: PMC3276112). Finally, we recently received a communication notifying us that our neurogenesis sequence of V1s has been replicated in a different vertebrate species by Lora Sweeney’s group (Institute of Science and Technology Austria; direct email from this lab) and we shared our data with them for comparison. This manuscript is currently close to submission. Therefore, we are confident that despite the limitations of EdU birthdating we discussed, the conclusions we offered are strong and are being validated by other groups using different methods and species. We also want to acknowledge the positive comments of reviewer 3 regarding our birthdating study, indicating it is one the most rigorous he or she has ever seen.

      Reviewer 3:

      Comment 1. My only criticism is that some of the main messages of the paper are buried in technical details. Better separation of the main conclusions of the paper, which should be kept in the main figures and text, and technical details/experimental nuances, which are essential but should be moved to the supplement, is critical. This will also correct the other issue with the text at present, which is that it is too long.

      Similar to our response to comment 1 from Reviewer 2 we followed the reviewers’ recommendations and greatly summarized, simplified and removed technical details from the main text, trying not to decrease rigor.  

      Reviewer #1 (Recommendations For The Authors):

      In Figure 1, the definition of the area to analyze MafB ventral and MafB dorsal is unclear. It should be described.

      This has been clarified in both text and supplemental figure S3.

      “We focused the analyses on the brighter dorsal and ventral MafB-V1 populations defined by boxes of 100 µm dorsoventral width at the level of the central canal (dorsal) or the ventral edge of the gray matter (ventral) (Supplemental Figure S3B).”

      Problems with figure citation.

      We apologize for the mistakes. All have been corrected. 

      Reviewer #2 (Recommendations For The Authors):

      As indicated in the public review, I'd recommend to substantially revise the writing, for clarity. As such, the paper is extremely hard to read. I would also recommend justifying the focus on Foxp2 neurons.

      Also, the scope of the present paper is not clearly stated in the introduction (page 4).

      Done. We also modified the introduction such that the exact goals are more clearly stated.

      I would also recommend toning down the interpretation that V1 clades constitute "unique functional subsets" (discussion and elsewhere). Functional investigation is not performed, and connectomic data is partial and only very suggestive.

      We include the following sentence at the end of the 1st paragraph in the discussion:

      “This result strengthens the conclusion that these V1 clades defined by their genetic make-up might represent distinct functional subtypes, although further validation is necessary in more functionally focused studies.”

      Different post-natal stages are used for different sections of the manuscript. This is often confusing, please justify each stage. From the beginning even, why is the initial birthdating (Figure 1) done here at p5, while the previous characterization of clades was done at p0? I am not sure to understand the justification that this was chosen "to preserve expression of V1 defining TFs". Isn't the sooner the better?

      The birthdating study was carried out at P5. P5 is a good time point because there is little variation in TF expression compared to P0, as demonstrated in the results. Furthermore, later tissue harvesting allows higher replicability since it is difficult to consistently harvest tissue the day a litter is born (P0). Also technically, it is easier to handle P5 tissue compared to P0. The analysis of VGUT1 synapses was also done at P5 rather than later ages. This has two advantages: TFs immunoreactivities are preserved at this age, and also corticospinal projections have not yet reached the lumbar cord reducing interpretation caveats on the origins of VGUT1 synapses in the ventral horn (although VGLUT1 synapses are still maturing at this age, see below).

      Other parts of the study focus on different ages selected to be most adequate for each purpose. To best study synaptic connectivity, it is best to study mature spinal cords after synaptic plasticity of the first week. For the tracing study we thoroughly explain in the text the reasons for the experimental design (see also below in detailed comments). For counting Foxp2-V1 interneurons and comparing them to motor columns we analyze mature animals. For testing our lineage labeling we use animals of all ages to confirm the consistency of the genetic targeting strategy throughout postnatal development and into adulthood.

      Figure 5: wouldn't it be worth quantifying and illustrating cellular densities, in addition to the average number of Foxp2 neurons, across lumbar segments (panel D & E)? Indeed, the size of - and hence total number of cells within - each lumbar segment might not be the same, with a significant "enlargement" from L2 to L4 (this is actually visible on the transverse sections). Hence, if the total number of cells is in the higher in these enlarged segments, but the total number of Foxp2-V1 is not, it may mean that this class is proportionally less abundant.

      We believe the critical parameter is the ratio of Foxp2-V1s to motoneurons. This informs how Foxp2-V1 interneurons vary according to the size of the motor columns and the number of motoneurons overall.

      The question asked by the reviewer would best be answered by estimating the proportion of Foxp2-V1 neurons to all NeuN labeled interneurons. This is because interneuron density in the spinal cord varies in different segments. We are not sure what this additional analysis will contribute to the paper.

      Why, in the Rabies tracing scheme (Fig 8), the Rabies injection is performed at p15? As the authors explain in the text, rabies uptake at the neuromuscular junction is weak after p10. It is not clear to me why such experiments weren't done all at early postnatal stages, with a "classical" co-injection of TVA and Rabies.

      First, we do not need TVA in this experiment because we are using B19-G coated virus and injecting it into muscles, not into the spinal cord directly.

      Second, enhanced tracing occurs when the AAV is injected a few days before rabies virus. This is because AAV transgene expression is delayed with respect to rabies virus infection and replication. We have performed full time courses and presented these data in one abstract to SfN: Gomez-Perez et al., 2015 Program Nos. 242. We believe full description of these technical details is beyond the scope of this manuscript that has already been considered too technical.

      Third, the justification of P15 timing of injections for anterograde primary afferent labeling and retrograde monosynaptic labeling of interneurons is fully explained in the text. 

      “To obtain transcomplementation of RVDG-mCherry with glycoprotein in LG motoneurons, we first injected the LG muscle with an AAV1 expressing B19-G at P4. We then performed RVDG and CTB injections at P15 to optimize muscle targeting and avoid cross-contamination of nearby muscles. Muscle specificity was confirmed post-hoc by dissection of all muscles below the knee. Analyses were done at P22, a timepoint after developmental critical windows through which Ia (VGLUT1+) synaptic numbers increase and mature on V1-IaINs (Siembab et al., 2010)” 

      Furthermore, CTB starts to decrease in intensity 7 days after injection because intracellular degradation and rabies virus labeling disappears because cell death. Both limit the time of postinjection for analyses.

      Likewise, I am surprised not to see a single motoneuron in the rabies tracing (Fig 8, neither on histology nor on graphs (Fig 8). How can authors be certain that there was indeed rabies uptake from the muscle at this age, and that all labelled cells, presumed to be preMN, are not actually sensory neurons? It is known that Rabies vectors, when delivered from the periphery, might also label sensory afferents and their post-synaptic targets through anterograde transport and transneuronal spread (e.g., Pimpinella et al., 2022). This potential bias must be considered.

      This is fully explained in our previous response to the second reviewer’s general comments. We have also added a confocal image showing starter motoneurons as requested (Figure S6A).

      Please carefully inspect the references to figures and figure panels, which I suspect are not always correct.

      Thank you. We carefully revised the manuscript to correct these deficiencies and we apologize for them.

      Reviewer #3 (Recommendations For The Authors):

      Figure 1: Data here is absolutely beautiful and provides one of the most thorough studies, in terms of timepoints, number of animals analyzed, and precision of analysis, of edU-based birth timing that has been published for neuron subtypes in the spinal cord so far. My only suggestion is to color code the early and late born populations (in for example, different shades of green for early; and blue for late, to better emphasize the differences between them). It is very difficult to differentiate between the purple, red and black colors in G-I, which this would also fix. The antibody staining for Pou6f2 (F) is also difficult to see; gain could be increased on these images or insets added for clarity.

      The choice of colors is adapted for optimal visualization by people with different degrees of color blindness. Shades of individual colors are always more difficult to discriminate. This is personally verified by the senior corresponding author of this paper who has some color discrimination deficits. Moreover, each line has a different symbol for the same purpose of easing differentiation.

      Figure 2: This is also a picture-perfect figure showing further diversity by birth time even within a clade. One small aesthetic comment is that the arrows are quite unclear and block the data. Perhaps the contours themselves could be subdivided by region and color coded by birth time-such that for example the dorsal contours that emerge in the MafB clade at E11 are highlighted in their own color. Some quantification of the shift in distribution as well as the relative number of neurons within each spatially localized group would also be useful. For MafB, for example, it looks as though the ventral cells (likely Renshaw) are generated at all times in the contour plots; in the dot plots however, it looks like the most ventral cells are present at e10.5. This is likely because the contours are measuring fractional representations, not absolute number. An independent measure of absolute number of ventral and dorsal, by for example, subdividing the spinal cord into dorsoventral bins, would be very useful to address this ambiguity.

      We believe density plots already convey the message of the shift in positions with birthdate. We are not sure how we can quantify this more accurately than showing the differences in cellular density plots. We used dorsoventral and mediolateral binning in our first paper decades ago (Avarez et al., 2005). This has now been replaced by more rigorous density profiles that describe better cell distributions. Unfortunately, to obtain the most accurate density profiles we need to pool all cells from all animals precluding statistical comparisons. This is because for some groups there have very few cells per animal (for example early born Sp8 or Foxp2 cells).

      Figure 3 and Figure 4: These, and all figures that compare the lineage trace and antibody staining, should be moved to the supplement in my opinion-as they are not for generalist readers but rather specialists that are interested in these exact tools. In addition, the majority of the text that relates to these figures should be transferred to the supplement as well. Figure 5: Another great figure that sets the stage for the analysis of FoxP2V1-to-MN synaptic connectivity, and provides basic information about the rostrocaudal distribution of this clade, by analyzing settling position by level. I have only minor comments. The grid in B obscures the view of the cells and should be removed. The motor neuron cell bodies in C would be better visible if they were red.

      We moved some of the images to supplemental (see new supplemental Fig S4). However, we also added new data to the figure as requested by reviewers (Fig 3E-G). We preserved our analyses of Foxp2 and non-Foxp2 V1s across ages and spinal segments because we think this information is critical to the paper. Finally, we want to prevent misleading readers into believing that Foxp2 is a marker that is unique to V1s. Therefore, we also preserved Figures 3H to 3J showing the non-V1 Foxp2 population in the ventral horn. 

      Figure 6: Very careful and quantitative analysis of V1 synaptic input to motor neurons is presented here.  For the reader, a summary figure (similar to B but with V1s too) that schematizes V1 FoxP2 versus Renshaw cell connectivity with LMC, MMC, and PGC motor neurons are one level would be useful.

      Thanks for the suggestion. A summary figure has now been included (Figure 5G). 

      Figure 7: The goal of this figure is to highlight intra-clade diversity at the level of transcription factor expression (or maintenance of expression), birth timing and cell body position culminating in the clear and concise diagram presented in G. In panels A-F however, it takes extra effort to link the data shown to these I-IV subtypes. The figure should be restructured to better highlight these links. One option might be to separate the figure into four parts (one for each type): with the individual spatial, birth timing and TF data for each population extracted and presented in each individual part.

      We agree with the reviewer that this is a very busy figure. We tried to re-structure the figure following the suggestions of the reviewer and also several alternative options. All resulted in designs that were more difficult to follow than the original figure. We apologize for its complexity, but we believe this is the best organization to describe all the data in the simplest form.

      Figure 8: in A-D, the main point of the figure - that V1FoxP2Otp preferentially receive proprioceptive synapses is buried in a bunch of technical details. To make it easier for the reader, please:

      (1) add a summary as in B of the %FoxP2-V1 Otp+ cells (82%) with Vglut1 synapses to make the point stronger that the majority of these cells have synapses.

      We added this graph by extending the previous graph to include lineage labeled Foxp2-V1s with OTP or Foxp2 immunoreactivity. It is now Figure 7B.

      (2) Additionally, add a representative example that shows large numbers of proximal synapses on an FoxP2-V1 Otp+.

      The image we presented before as Figure 8A was already immunostained for OTP, so we just added the OTP channel to the images. Now all this information is in panels that are subparts of Figure 7A.

      (3) Move the comparison between FoxP2-V1 and FoxP2AB+V1s to the supplement.

      We preserved the quantitative data on Foxp2-V1 lineage cells with Foxp2-immunoreactivity but made this a standalone figure, so it is not as busy.

      (4) Move J-M description of antibody versus lineage trace of Otp to supplement as ending with this confuses the main message of the paper (see comment above).

      All results for the Otp-V1 mouse model have now been placed in a supplemental figure (Figure 5S).

      Discussion: A more nuanced and detailed discussion of how the temporal pattern of subtype generation presented here aligns with the established temporal transcription factor code (nicely summarized in Sagner 2024) would be helpful to place their work in the broader context of the field.

      This aspect of the discussion was expanded on pages 20 and 21. We replaced the earlier cited review (Sagner and Briscoe, 2019, Development) with the updated Sagner 2024 review and further discussed the data in the context of the field and neurogenesis waves throughout the neural tube, not only the spinal cord. We previously carefully compared our data with the spinal cord data from Sagner’s group (Delile et, 2019, Development). We have now further expanded this comparison in the discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study characterized the cellular and molecular mechanisms of spike timing-dependent long-term depression (t-LTD) at the synapses between excitatory afferents from lateral (LPP) and medial (MPP) perforant pathways to granule cells (GC) of the dentate gyrus (DG) in mice.

      Strengths:

      The electrophysiological experiments are thorough. The experiments are systematically reported and support the conclusions drawn.

      This study extends current knowledge by elucidating additional plasticity mechanisms at PP-GC synapses, complementing existing literature.

      We thank the reviewer for the positive assessment of our work and the constructive suggestions to improve the manuscript.

      Weaknesses:

      To more conclusively define the pivotal role of astrocytes in modulating t-LTD at MPP and LPP GC synapses through SNARE protein-dependent glutamate release, as posited in this study, the authors could adopt additional methods, such as alternative mouse models designed to regulate SNARE-dependent exocytosis, as well as optogenetic or chemogenetic strategies for precise astrocyte manipulation during t-LTD induction. This would provide more direct evidence of the influence of astrocytic activity on synaptic plasticity.

      We thank the reviewer for the suggestion. As stated in the manuscript and in figure 4, we already used two different approaches (aBAPTA to interfere with astrocyte calcium signalling and dnSNARE mice (that have vesicular release impaired) to determine the involvement of astrocytes in the discovered forms of LTD, and both approaches clearly indicated the requirement of astrocytes for t-LTD. In BAPTA-treated astrocytes and in dnSNARE mice, t-LTD was prevented. Notwithstanding this, and as suggested by the reviewer, we used two additional approaches to confirm astrocyte participation. We loaded astrocytes with the light chain of the tetanus toxin (TeTxLC), which is known to block exocytosis by cleaving the vesicle-associated membrane protein, an important part of the SNARE complex (Schiavo et al., 1992, Nature 359, 832-835). In this experimental condition, we observed a clear lack of t-LTD at both (lateral and medial) pathways, thus confirming the requirement of astrocytes and the SNARE complex and vesicular release for both types of t-LTD. In addition, to gain more insight into the fact that glutamate is released by astrocytes, we blocked glutamate release from astrocytes by loading the astrocytes with Evans blue, known to interfere with glutamate uptake into vesicles as it inhibits the vesicular glutamate transporter (VGLUT). In this experimental condition, again t-LTD was prevented, indicating that t-LTD requires Ca2+dependent exocytosis of glutamate from astrocytes.

      Reviewer #2 (Public Review):

      Summary:

      This work reports the existence of spike timing-dependent long-term depression (t-LTD) of excitatory synaptic strength at two synapses of the dentate gyrus granule cell, which are differently connected to the entorhinal cortex via either the lateral or medial perforant pathways (LPP or MPP, respectively). Using patch-clamp electrophysiological recording of tLTD in combination with either pharmacology or a genetically modified mouse model, they provide information on the differences in the molecular mechanism underlying this t-LTD at the two synapses.

      Strengths:

      The two synapses analyzed in this study have been understudied. This new data thus provides interesting new information on a plasticity process at these synapses, and the authors demonstrate subtle differences in the underlying molecular mechanisms at play. Experiments are in general well controlled and provide robust data that are properly interpreted.

      We thank the reviewer for the positive assessment of our work and the constructive suggestions to improve the manuscript.

      Weaknesses:

      • Caution should be taken in the interpretation of the results to extrapolate to adult brain as the data were obtained in P13-21 days old mice, a period during which synapses are still maturing and highly plastic.

      We thank the reviewer for noticing this. In fact, our experiments were intentionally performed in young animals (P13-21), just knowing that this is a critical period of plasticity. We indicate that in the methods, results, and discussion (where we discuss that in some detail) sections.

      • In experiments where the drug FK506 or thapsigargin are loaded intracellularly, the concentrations used are as high as for extracellular application. Could there be an error of interpretation when stating that the targeted actors are necessarily in the post-synaptic neuron? Is it not possible for the drug to diffuse out of the cell as it is evident that it can enter the cell when applied extracellularly?

      We thank the reviewer for rising this point. While it would be possible that these compounds cross the cell membranes, to do it and to pass to other cells, this would, in principle, require a relatively long time to occur. Additionally, to have any effect, the same concentration or a relatively high concentration of that we put into the pipette has to reach other cells. Furthermore, even if a compound is able to cross a cell membrane during the duration of an experiment, after this, it may be exposed to the extracellular fluid where will be diluted and most probably washed out. For all these reasons, we do not see this very plausible. Notwithstanding this, and as suggested, we have repeated the experiments using lower concentrations of thapsigargin (1 uM) and FK506 (1 uM), and have obtained the same results. These data are now included in the figure 3 and in the text.

      • The experiments implicating glutamate release from astrocytes in t-LTD would require additional controls to better support the conclusions made by the authors. As the data stand, it is not clear, how the authors identified astrocytes to load BAPTA and if dnSNARE expression in astrocytes does not indirectly perturb glutamate release in neurons.

      We thank the reviewer for rising this point. We now indicate how astrocytes have been identified to load BAPTA. We reply to this in detail in the “Recommendations for the authors” from reviewer 2.

      Significance:

      While this is the first report of t-LTD at these synapses, this plasticity process has been mechanistically well investigated at other synapses in the hippocampus and in the cortex. Nevertheless, this new data suggests that mechanistic differences in the induction of t-LTD at these two DG synapses could contribute to the differences in the physiological influence of the LPP and MPP pathways.

      Reviewer #3 (Public Review):

      Coatl et al. investigated the mechanisms of synaptic plasticity of two important hippocampal synapses, the excitatory afferents from lateral and medial perforant pathways (LPP and MPP, respectively) of the entorhinal cortex (EC) connecting to granule cells of the hippocampal dentate gyrus (DG). They find that these two different EC-DG synaptic connections in mice show a presynaptically expressed form of long-term depression (LTD) requiring postsynaptic calcium, eCB synthesis, CB1R activation, astrocyte activity, and metabotropic glutamate receptor activation. Interestingly, LTD at MPP-GC synapses requires ionotropic NMDAR activation whereas LTD at LPP-GC synapse is NMDAR independent. Thus, they discovered two novel forms of t-LTD that require astrocytes at EC-GC synapses. Although plasticity of EC-DG granule cell (GC) synapses has been studied using classical protocols, These are the first analysis of the synaptic plasticity induced by spike timing dependent protocols at these synapses. Interestingly, the data also indicate that t-LTD at each type of synapse require different group I mGluRs, with LPP-GC synapses dependent on mGluR5 and MPP-GC t-LTD requiring mGluR1.

      The authors performed a detailed analysis of the coefficient of variation of the EPSP slopes, miniature responses and different approaches (failure rate, PPRs, CV, and mEPSP frequency and amplitude analysis) they demonstrate a decrease in the probability of neurotransmitter release and a presynaptic locus for these two forms of LTD at both types of synapses. By using elegant electrophysiological experiments and taking advantage of the conditional dominant-negative (dn) SNARE mice in which doxycycline administration blocks exocytosis and impairs vesicle release by astrocytes, they demonstrate that both LTD forms require the release of gliotransmitters from astrocytes. These data add in an interesting way to the ongoing discussion on whether LTD induced by STDP participates in refining synapses potentially weakening excitatory synapses under the control of different astrocytic networks. The conclusions of this paper are mostly well supported by data, but some aspects the results must be clarified and extended.

      We thank the reviewer for the positive assessment of our work and the constructive suggestions to improve the manuscript.

      (1) It should be clarified whether present results are obtained with or without the functional inhibitory synapse activation. It is not clear if GABAergic synapses are blocked or not. If GABAergic synapses are not blocked authors must discuss whether the LTD of the EPSPs is due to a decrease in glutamatergic receptor activation or an increase in GABAergic receptor activation. Moreover, it should be recommended to analyze not only the EPSPs but also the EPSCs to address whether the decrease in synaptic transmission is caused by a decrease in the input resistance or by a decrease in the space constant (lambda).

      We thank the reviewer for rising these points. GABAergic inhibition was not blocked in our experiments. The observed forms of t-LTD seem to be due to a decrease in glutamate release probability as indicated in the manuscript, mediated by the mechanism we uncover and describe here. To determine and clarify whether GABA receptors have any role in these forms of t-LTD, we repeated the experiments in the presence of the GABAA and GABAB receptors antagonists bicuculline and SCH50911, respectively. Blocking GABA receptors do not prevent or affect t-LTD at LPP- or MPP-GC synapses, that is still present and with a similar magnitude that controls. These results indicating that these receptors are not involved in these forms of t-LTD. These results are now included in the text in the results section (page 8) and as a new figure S1. In our experiments, no changes in input resistance or space constant were observed, and importantly, no changes were observed in the amplitude/slopes of EPSP in the control pathway that does not undergo plasticity protocol that we routinely use in our experiments.

      (2) Authors show that Thapsigargin loaded in the postsynaptic neuron prevents the induction of LTD at both synapses. Analyzing the effects of blocking postsynaptic IP3Rs (Heparin in the patch pipette) and Ryanodine receptors (Ruthenium red in the patch pipette) is recommended for a deeper analysis of the mechanism implicated in the induction of this novel forms of LTD in the hippocampus.

      We thank the reviewer for this suggestion. We repeated the experiments loading the postsynaptic cell with heparin and ruthenium red using the path pipette. In these experimental conditions, we observed that t-LTD was not affected by the heparin treatment (discharging a role of IP3Rs), but that it was prevented by the ruthenium red treatment (indicating the requirement of ryanodine receptors). We include now this data in the text (page 12) and in the Figure 3a, b, e, f.

      (3) Authors nicely demonstrate that CB1R activation is required in these forms of LTD by blocking CB1Rs with AM251, however an interesting unanswered question is whether CB1R activation is sufficient to induce this synaptic plasticity. This reviewer suggests studying whether applying puffs of the CB1R agonist, WIN 55,212-2, could induce these forms of LTD.

      We thank the reviewer for this suggestion. We repeated the experiments adding WIN55, 212-2 as suggested.  The activation of CB1R by puffs of the agonist WIN 55, 212-2 to the astrocyte, directly induced LTD at both LPP- and MPP-GC synapses. We include now this data in the text (page 14) and in the Figure 3c, d, g, h.

      (4) Finally, adding a last figure with a cartoon summarizing the proposed model of action in these novel forms of LTD would add a positive value and would help the reading of the manuscript, especially in those aspects related with the discussion of the results.

      We thank the reviewer for the suggestion. We include now a figure showing the proposed mechanisms (Figure 5).

      The extension of these results would improve the manuscript, which provides interesting results showing two novel forms of presynaptic t-LTD in the brain synapses with different action mechanisms probably implicated in the different aspects of information processing.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There are just a few aspects that could be clarified to bolster the authors' conclusions.

      The author centered the conclusion of their study on the role of astrocytic activity in regulating these two forms of plasticity (see title). To strengthen the evidence that astrocytes are key regulators of t-LTD at MPP and LPP GC synapses by regulating SNARE protein-dependent glutamate release, additional complementary approaches should be considered, such as other mouse models enabling the control of SNARE-dependent exocytosis and/or optogenetic/chemogenetic tools to selectively manipulate astrocytes during the induction of t-LTD, thereby directly assessing the impact of astrocytic activity on synaptic plasticity. Implementing calcium imaging or glutamate sensors to visualize the dynamics of astrocytic calcium signaling and glutamate release during t-LTD could be also considered.

      We thank the reviewer for the suggestion. As stated in the manuscript and in figure 4, we already used two different approaches (aBAPTA to interfere with astrocyte calcium signalling and dnSNARE mice (that have vesicular release impaired) to determine the involvement of astrocytes in the discovered forms of LTD, and both approaches clearly indicated the requirement of astrocytes for t-LTD. In BAPTA-treated astrocytes and in dnSNARE, t-LTD was prevented. Notwithstanding this, and as suggested by the reviewer, we used two additional approaches to confirm astrocytes participation. We loaded astrocytes with the light chain of the tetanus toxin (TeTxLC), which is known to block exocytosis by cleaving the vesicle-associated membrane protein, an important part of the SNARE complex (Schiavo et al., 1992, Nature 359, 832-835). In this experimental condition, we observed a clear lack of t-LTD at both (lateral and medial) pathways, thus confirming the requirement of astrocytes and the SNARE complex and vesicular release for both types of t-LTD. In addition, to gain more insight into the fact that glutamate is released by astrocytes, we blocked glutamate release from astrocytes by loading the astrocytes with Evans blue, known to interfere with glutamate uptake into vesicles as it inhibits the vesicular glutamate transporter (VGLUT). In this experimental condition, again t-LTD was prevented, indicating that t-LTD requires Ca2+-dependent exocytosis of glutamate from astrocytes. This information is now included in the text, pages 14 and 15 and in figure 4.

      • How were astrocytes identified to be loaded with BAPTA? The author should clarify this methodological aspect and provide confocal images of patched astrocytes situated 50-100 um from the recorded neuron.

      We thank the reviewer for the comment. We include now this information in the Methods section (page 6) and in figure S3. Astrocytes were identified by their rounded morphology under differential interference contrast microscopy, and were characterized by low membrane potential, low membrane resistance and passive responses (they do not show action potentials) to both negative and positive current injection.

      • Please provide confocal images of EGFP expression in the DG astrocytes of dnSNARE mice both on and off Dox, to verify transgene expression in astrocytes

      We thank the reviewer for this suggestion. We now include an image of GFP expression in the DG astrocytes of off Dox dnSNARE mice. We did not provide the animals with doxycycline since birth and thus the gene was constantly expressed. We now show this image in Fig. S3. All the pups and mice are not DOX fed, meaning that the transgenes are continuously being expressed and therefore the exocytosis should be blocked in astrocytes.

      Minor points:

      Lines 250-253: It is mentioned that TTX is added at baseline, washed out for the t-LTD experiment, and then reapplied post t-LTD. I suggest clarifying the timing and rationale for this application for a broad audience.

      We thank the reviewer for the suggestion. We now include some information related to the timing and rationale of the experiment phases (page 9).

      The discussion is quite detailed and provides a comprehensive overview of the study's findings. To enhance clarity and impact, the authors might consider to,

      • add subheadings and bullet points for key findings. This will improve readability.

      • this section could benefit from streamlining to avoid redundancy.

      • some sentences could be made more concise without losing meaning.

      We thank the reviewer for these suggestions. We now include subheadings in the discussion section to improve readability and have made some sentences more concise and simple without losing meaning.

      In figure legends, consistency with capitalization should be maintained, for example in the statistical significance notation, ***P < 0.001" or ***p < 0.001")

      We now include p<0.001 in the figure legend 4 for consistency.

      Reviewer #2 (Recommendations For The Authors):

      Major:

      • All results were obtained in young still quite immature synapses. To strengthen the significance of the findings, the authors could repeat some of the main experiments in adult mice (8 weeks and beyond). If not, they should state clearly that these mechanisms were only evidenced in early post-natal conditions.

      We thank the reviewer for noticing this. In fact, our experiments were intentionally performed in young animals (P13-21), just knowing that this is a critical period of plasticity. As the reviewer suggests, we indicate that in the methods (page 5), results (page 8), and discussion (page 19) (where we discuss that in some detail) sections.

      • Lines 246-249 and fig 1f,p: Authors need to perform a statistical test on these two graphs to support their claim that 'A plot of CV-2 versus the change in the mean evoked EPSP 246 slope (M) before and after t-LTD mainly yielded points below the diagonal line at LPP-GC and MPP-GC synapses'.

      That could not be clear in the previous version. We observed an error in the points (with some points missing) of one of the graphs that we have corrected. In addition, and as suggested by the reviewer we performed a regression analysis that confirms the conclusions stated. This is now included in the text (page 9). Thus, we have added information about mean values ± SEM in the text and the linear regression of the data for LPP-GC (Mean = 0.607 ± 0.054 vs 1/CV2 = 0.439 ± 0.096, R2 = 0.337; n = 14) and MPP-GC synapses (Mean = 0.596 ± 0.056 vs 1/CV2 = 0.461 ± 0.090, R2 = 0.168; n = 13), respectively. Data yielded on the dotted horizontal line, 1/CV2 = 1, indicates no change in the probability of release, in contrast, data yielded below the dotted diagonal line is suggestive of a change in the probability of release parameters (for review, see Brock et al., 2020, Front Synaptic Neurosci 12, 11).

      • We are not sure that the experiment with the MK801 provided in the patch pipet can be interpreted correctly (Figure 2 a,b and e,f). How sure are the authors that, when applying MK801 in the patch pipet, it can reach its binding site within the pore? The concentration of MK801 is also very high (500 microM) and used at the same concentration extracellularly and intracellularly. Why did the authors not use lower concentration when applied intracellularly?

      We thank the reviewer for rising this point. MK801 in the pipette is reaching the pore when loaded postsynaptically as when we record NMDA currents from postsynaptic neurons loaded with MK801, these currents are blocked. We include now a control experiment showing the effect of postsynaptic MK801 on NMDA current in the text (page 10). NMDA currents has been recorded at +40 mV, blocking AMPAR and GABAR with NBQX and bicuculline. Related to the concentration, it has been described that the affinity from the internal site is much lower (several orders of magnitude) than from the extracellular side(Sun et al., 2018 Neuropharmacology, 143, 122-129) and the concentrations used have been extensively used in previous studies. It is clear that the concentrations used in the present work blocked NMDAR currents but did not prevent LTD.

      • Linked to the point above, for the intracellular application of FK506 and thapsigargin, the concentrations used extracellularly and intracellularly are identical. The authors could have used lower concentrations for the intracellular application. Also, how can they be sure of the correct interpretation of these data as the drug essentially reaching a post-synaptic target when applied intracellularly? If the drug can enter the neuron, why could it not diffuse out of the neuron especially when loaded at a high concentration? Maybe using a lower concentration when applied intracellularly could at least partially address this issue.

      It is evident that it can enter the cell when applied extracellularly?

      We thank the reviewer for rising this point. While it would be possible that these compound cross the cell membranes, to do it and to pass to other cells, this would, in principle, require a relatively long time to occur. Additionally, to have any effect, the same concentration or a relatively high concentration of that we put into the pipette has to reach other cells. Furthermore, even if a compound is able to cross a cell membrane during the duration of an experiment, after this, it may be exposed to the extracellular fluid where it will be diluted and most probably washed out. For all these reasons, we do not see this very plausible. Notwithstanding this, we have repeated the experiments using lower concentrations of thapsigargin (1 uM) and FK506 (1 uM) and have obtained the same results. These data are now included in the figure 3 and the numbers in the text have been updated (pages 12-13).

      • The data supporting the possibility of glutamate release by astrocytes as a main source of glutamate to promote t-LTD needs to be strengthened. In experiment Figure a-h, it is not clear how the authors recognize astrocytes to patch. No details are provided in the methods or in the main text. If we understand correctly, it is only by performing a current steps protocol to ensure that the patched cell did not produce action potentials. If this was the case, the authors need to be more specific and provide details of this protocol. More importantly, the one trace that was provided in Figures 4a and 4f suggests, albeit by a rough estimation that we made with a ruler, that the highest current step only depolarized the cell to about -40 mV. This is not sufficient to ensure that the recorded cell is not a neuron. The authors should increase their steps to high depolarizing currents to ensure that the patched cell is not a neuron. Better yet, they should load the cell with an dye to process the slice after the electrophysiological recording for immunohistochemistry to ensure that it was indeed an astrocyte. Alternatively, they can try to aspirate the cell content at the end of the recording to perform a qPCR for astrocyte markers eg. GFAP.

      We thank the reviewer for the comment. We include now information regarding how astrocytes were identified (also raised by reviewer 1) in the Methods section (page 6) and in figure S3. Astrocytes were identified by their rounded morphology under differential interference contrast microscopy, eGFP fluorescence (astrocytes from dnSNARE mice), and were characterized by low membrane potential, low membrane resistance and passive responses (they do not show action potentials) to both negative and positive current injection.

      We agree with the reviewer that in figure 4a and 4f, the step protocol might not be completely clear. For this, we revised that and now include in a clearer way that we applied pulses that depolarized astrocytes beyond -20 mV, with no action potentials found at any point. We also include now this in figure S3.

      • Related to the point above, the use of the model expressing dnSNARE in astrocytes is elegant. Yet, to really interpret the data obtained in these slices as a lack of vesicle release (and most importantly glutamate) we think that the authors should ensure that glutamate release from nearby neurons is not impacted. They could patch nearby neurons in dnSNARE slices and test PPR or synaptic fatigue when stimulating either the LPP or MPP. The authors should avoid overinterpretation of these results. As it stands, it is not evident that dnSNARE expression does not perturb other mechanisms within the astrocyte that in turn perturb pre-synaptic glutamate release. Adding back glutamate as puffs does not help to disentangle this issue.

      To gain more insight into the fact that glutamate is released by astrocytes we blocked glutamate release from astrocytes by loading the astrocytes with Evans blue, known to interfere with glutamate uptake into vesicles as it inhibits the vesicular glutamate transporter (VGLUT). In this experimental condition, as indicated above, t-LTD was prevented, indicating that t-LTD requires Ca2+-dependent exocytosis of glutamate from astrocytes. This is included in the text (page 15) and in figure 4d,e, i, j.

      In addition, we loaded astrocytes with the light chain of the tetanus toxin (TeTxLC) which is known to block exocytosis by cleaving the vesicle-associated membrane protein, an important part of the SNARE complex (Schiavo et al., 1992, Nature 359, 832-835). In this experimental condition, we observed a clear lack of t-LTD at both (lateral and medial) pathways, thus confirming the requirement of astrocytes and the SNARE complex and vesicular release for both types of t-LTD. These data indicate that t-LTD requires Ca2+-dependent exocytosis of glutamate from astrocytes. This information is now included in the text, page 14 and in figure 4.

      Minor points:

      • line 107, did the authors mean t-LTP and t-LTD? we don't understand STDP mentioned here.

      We meant to say t-LTP. This is now corrected.

      • line 108: should STDP be replaced by t-LTD as the authors only focused on this plasticity mechanism.

      We agree, we indicate now t-LTD.

      • line 131-132 : it is not clear when the animals were fed with doxycycline. If it was from birth, then the 'not' should be removed. Otherwise the authors should clearly state when the doxycyline was provided.

      DOX was not provided and that means that the transgene was continuously expressed and therefore the exocytosis should be blocked in astrocytes. We express that clearer in page 5, methods section.

      • line 223 : which hippocampal synapses? needs to be stated

      As suggested this is now included in the text as for cortical synapses. Synapses are Schaffer collaterals SC-CA1 for hippocampus and layer L4-L2/3 for cortical synapses (page 8).

      • line 273: what do the authors mean when writing 'from'? We don't understand the data provided on this line.

      We thank the reviewer for noticing this. That refers to the amplitude of NMDAR-mediated currents average before and after D-AP5 or MK801. We express this now in a clearer way (page 10, from 57±8 pA to 6±5 pA).

      • line 286 : why do the authors point out work on GluN2B and GluN3A only here when they first investigate GluN2A contribution to t-LTD? what about previous data on GluN2A?

      We have now expressed this in a different way to make it clear. We wanted to indicate that the available data for presynaptic NMDAR at MPP-GC synapses has been indicated to contain GluN2B and GluN3A subunits and to our knowledge, no data indicate that they contain GluN2A subunits.

      • line 428 : what do the authors mean by 'not least' ?

      This is a typo and we have removed that from the text.

      Reviewer #3 (Recommendations For The Authors):

      My only suggestion for improving data presentation in the manuscript would be to split some figures of the paper. In my opinion, the figures are too dense and therefore difficult to follow for the broad audience of eLife readers. In addition, a real image of the recorded dentate granule cells in the slice showing also the location of the real stimulation electrodes would significantly improve the presentation of Figure 1.

      We thank the reviewer for the suggestion, but we would prefer to let the figures as they are organized, as while we agree in some cases they are a bit big, in this way it is easier to compare lateral and medial pathways. For this, it could be better to let information regarding the two pathways in the same figure. Nevertheless, we try now to make figures clearer to use a columnar organization of the figures for each pathway what we think, would make easier to compare pathways. As the reviewer suggests we include now a real image of the recorded dentate granule cells in the slice showing also the location of the real stimulation electrodes in Figure 1, that we agree will improve the presentation of this figure and thank the reviewer for the suggestion.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank the Reviewer for all their effort and suggestions over multiple drafts. Their comments have encouraged us to read and think more deeply about the issue under discussion (BLA spiking in response to CS/US inputs), and to find the papers whose contents we think provide a potential solution. We agree that there is more to understand about the mechanisms underlying associative learning in the BLA. We offer our paper as providing a new way of understanding the role of circuit dynamics (rhythms) in guiding associative learning via STDP. As we pointed out in our response to the previous review, the issue highlighted by the Reviewer is an issue for the entire field of associative learning in BLA: our discussion of the issue suggests why the experimentally observed BLA spiking in response to CS inputs, performed in the absence of US inputs (as done in the papers cited by the Reviewer), may not be what occurs in the presence of the US. Since our explanation involves the role of neuromodulators, such as ACh and dopamine, the suggestion is open to further testing.


      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Public Review’s only objection: “Deficient in this study is the construction of the afferent drive to the network, which does elicit activities that are consistent with those observed to similar stimuli. It still remains to be demonstrated that their mechanism promotes plasticity for training protocols that emulate the kinds of activities observed in the BLA during fear conditioning.”

      Recommendations for the Authors: “The authors have successfully addressed most of my concerns. I commend them for their thorough response. The one nagging issue is the unrealistic activation used to drive CS and US activation in their network. While I agree that their stimulus parameters are consistent with a contextual fear task, or one that uses an olfactory CS, this was not the focus of their study as originally conceived. Moreover, the types of activation observed in response to auditory cues, which is the focus of their study, do not follow what is reported experimentally. Thus, I stand by the critique that the proposed mechanism has not been demonstrated to work for the conditioning task which the authors sought to emulate (Krabbe et al. 2019). Frustratingly, addressing this is simple: run the model with ECS neurons driven so that they fire bursts of action potentials every ~1 sec for 30 sec, and with the US activation noncontiguous with that. If the model does not produce plasticity in this case, then it suggests that the mechanisms embedded in the model are not sufficient, and more work is needed to identify them. While 'memory' effects are possible that could extend the temporal contiguity of the CS and US, the authors need to provide experimental evidence for this occurring in the BLA under similar conditions if they want to invoke it in their model. 

      (1) Fair response. I accept the authors arguments and changes. 

      (2) The authors rightly point out that the simulated afferents need not perfectly match the time courses of the peripheral inputs, since what the amygdala receives them indirectly via the thalamus, cortex, etc. However, it is known how amygdala neurons respond to such stimuli, so it behooves the authors to incorporate that fact into their model. 

      Quirk et al. 1997 show that the response to the tone plummets after the first 100 ms in Figs 5A and 6B. The Herry et al. 2007 paper emphasizes the transient response to tone pips, with spiking falling back to a poisson low firing rate baseline outside of the time when the pip is delivered. 

      Regarding potential metabotropic glutamate activation, the stimulus in Whittington et al. 1995 was electrical stimulation at 100 Hz that would synchronously activate a large volume of tissue, which is far outside the physiological norm. I appreciate that metabotropic glutamate receptors may play a role here, but ultimately the model depends upon spiking activity for the plastic process to occur, and to the best of my knowledge the spiking activity in BLA in response to a sustained, unconditioned tone, is brief (see also Quirk, Repa, and Ledoux 1995). Perhaps a better justification for the authors would be Bordi and Ledoux 1992, which found that 18% of auditory responsive neurons showed a 'sustained' response, but the sustained response neurons appear to show much weaker responses than those with transient ones (Fig 2).  I am willing to say that their paper IS relevant to contextual fear, but that is not what the authors set out to do. 

      (3) Fair response. 

      (4) Very good response! 

      Minor points: All points were addressed.”

      We thank Reviewer 1 (R1) for the positive feedback and also for pointing out that, in R1’s opinion, there is still a nagging issue related to the activation in response to CS we modeled. In (Krabbe et al., 2019), CS is a pulsed input and US is delivered right after the CS offset. The current objection of R1 is that instead, we are modeling CS and US as continuous and overlapping. R1 suggested that we add the actual input and see if they will produce the desired outputs. The answer is simple: it will not work because we need the effects of CS and US on pyramidal cells to overlap. We note that the fear learning community appears to agree with us that such contingency is necessary for synaptic plasticity (Sun et al., 2020; Palchaudhuri et al., 2024). To the best of our understanding, the source of that overlap is not understood in the community, and the gap has been much noticed (Sun et al., 2020). We do note, however, that STDP may not be the only kind of plasticity in fear learning (Li et al., 2009; Kim et al., 2013, 2016).

      It is important to emphasize that it is not the aim of our paper to model the origin of the overlap. Rather, our intent is to demonstrate the roles of brain rhythms in producing the appropriate timing for STDP, assuming that ECS and F cells can continue to be active after the offset of CS and US, respectively. This assumption is very close to how the field now treats the plasticity, even for auditory fear conditioning (Sun et al., 2020). Thus, our methodology does not contradict known results. However, the question raised by R1 is indeed very interesting, if not the point of our paper. Hence, below we give details about why our hypothesis is reasonable.

      Several papers (Quirk, Repa and LeDoux, 1995; Herry et al, 2007; Bordi and Ledoux 1992) show that the pips in auditory fear conditioning increase the activity of some BLA neurons: after an initial transient, the overall spike rate is still higher than baseline activity. As R1 points out, we did not model the transient increase in BLA spiking activity that occurs in response to each pip in the auditory fear conditioning paradigm. However, we did model the low-level sustained activity that occurs in between pips of the CS in the absence of US (Quirk, Repa and LeDoux, 1995, Fig. 2) and after CS offset (see Fig. 2B, left hand part of our manuscript). We read the data of Quirk et al., 1995 as suggesting that the low-level activity can be sustained for some indefinite time after a pip (cut off of recording was at 500 ms with no noticeable decrease in activity). As such, even if the pips and the US do not overlap in time, as in (Krabbe et al., 2019), the spiking of the ECS can be sustained after CS offset and thus overlap with US, a condition necessary in our model for plasticity through STDP. In Herry et al., 2007 Fig. 3 shows that BLA neurons respond to a pip at the population level with a transient increase in spiking and return to a baseline Poisson firing rate. However, a subset of cells continues to fire at an increased-over-baseline rate after the transient effect wears off (Fig. 3C, top few neurons) and this increased rate extends to the end of the recording time (here ~ 300 ms). These are the cells we consider to be ECS in our model. In Quirk et al., 1997, Fig. 5A also shows sustained low level activity of neurons in BLA in response to a pip. The low-level activity is shown to increase after fear learning, as is also the case in our model since ECS now entrains F so that there are more pyramidal cells spiking in response to CS. The question remains as to whether the spiking is sustained long enough and at a high enough rate for STDP to take place when US is presented sometime after the stop of the CS. 

      Experimental recordings cannot speak to the rate of spiking of BLA neurons during US due to recording interference from the shock. However, evidence seems to suggest that ECS activity should increase during the US due to the release of acetylcholine (ACh) from neurons in the basal forebrain (BF) (Rajebhosale et al., 2024). Pyramidal cells of the BLA robustly express M1 muscarinic ACh receptors (Muller et al., 2013; McDonald and Mott, 2021). Thus, ACh from BF should elicit a depolarization in pyramidal cells. Indeed, the pairing of ACh with even low levels of spiking of BLA neurons results in a membrane depolarization that can last 7 – 10 s (Unal et al., 2015). This should induce higher spiking rates and more sustained activity in the ECS and F neurons during and after the presentation of US, thus ensuring a concomitant activation of ECS and fear (F) neurons necessary for STDP to take place. Other modulators, including dopamine, may also play a role in producing the sustained activity. Activation of US leads to increased dopamine release in the BLA (Harmer and Phillips, 1999; Suzuki et al., 2002). D1 receptors are known to increase the membrane excitability of BLA projection neurons by lowering their spiking threshold (Kröner et al., 2005). Thus, the activation of the US can lead to continued and higher firing rates of ECS and F. The effect of dopamine can last up to 20 minutes (Kröner et al., 2005). For CS-positive neurons, the ACh modulation coming from the firing of US may lead to a temporary extension of firing that is then amplified and continued by dopaminergic effects.

      Hence, we suggest that a solution to the problem raised by R1 may be solved by considering the roles of ACh and dopamine in the BLA. The involvement of neuromodulators is consistent with the suggestion of (Sun et al., 2020). The model we have may be considered a “minimal” model that puts in by hand the overlap in activity due to the neuromodulation without explicitly modeling it. As R1 says, it is important for us to give the motivation of our hypotheses. We have used the simplest way to model overlap without assumptions about timing specificity in the overlap.

      To account for these points in the manuscript, we first specified that we consider the effects of the US and CS inputs on the neuronal network as overlapping, while the actual inputs may not overlap. To do that, we added the following text:

      (1) In the introduction: 

      “In this paper, we aim to show 1) How a variety of BLA interneurons (PV, SOM and VIP) lead to the creation of these rhythms and 2) How the interaction of the interneurons and the rhythms leads to the appropriate timing of the cells responding to the US and those responding to the CS to promote fear association through spike-timing-dependent plasticity (STDP). Since STDP requires overlap of the effects of the CS and US, and some conditioning paradigms do not have overlapping US and CS, we include as a hypothesis that the effects of the CS and US overlap even if the CS and US stimuli do not. In the Discussion, we suggest how neuromodulation by ACh and/or dopamine can provide such overlap. We create a biophysically detailed model of the BLA circuit involving all three types of interneurons and show how each may participate in producing the experimentally observed rhythms and interacting to produce the necessary timing for the fear learning.”

      (2) In the Result section “With the depression-dominated plasticity rule, all interneuron types are needed to provide potentiation during fear learning”:

      “The 40-second interval we consider has both ECS and F, as well as VIP and PV interneurons, active during the entire period: an initial bout of US is known to produce a long-lasting fear response beyond the offset of the US (Hole and Lorens, 1975) and to induce the release of neuromodulators. The latter, in particular acetylcholine and dopamine that are known to be released upon US presentation (Harmer and Phillips, 1999; Suzuki et al., 2002; Rajebhosale et al., 2024), may induce more sustained activity in the ECS, F, VIP, and PV neurons during and after the presentation of US, thus ensuring a concomitant activation of those neurons necessary for STDP to take place (see “Assumptions and predictions of the model” in the Discussion).”

      (3) In the Discussion section “Synaptic plasticity in our model”:

      “Synaptic plasticity is the mechanism underlying the association between neurons that respond to the neutral stimulus CS (ECS) and those that respond to fear (F), which instantiates the acquisition and expression of fear behavior. One form of experimentally observed long-term synaptic plasticity is spike-timing-dependent plasticity (STDP), which defines the amount of potentiation and depression for each pair of pre- and postsynaptic neuron spikes as a function of their relative timing (Bi and Poo, 2001; Caporale and Dan, 2008). All forms of STDP require that there be an overlap in the firing of the pre- and postsynaptic cells. In some fear learning paradigms, the US and the CS do not overlap. We address this below under “Assumptions and predictions of the model”, showing how the effects of US and CS on the spiking of the relevant neurons can overlap even in the absence of overlap of US and CS.”

      To fully present our reasoning about the origin of the overlap of the effects of US and CS, we modified and added to the last paragraph of the Discussion section “Assumptions and predictions of the model”, which now reads as follows:

      “Finally, our model requires the effect of the CS and US inputs on the BLA neuron activity to overlap in time in order to instantiate fear learning through STDP. Such a hypothesis, that learning uses spike-timing-dependent plasticity, is common in the modeling literature (Bi and Poo, 2001; Caporale and Dan, 2008; Markram et al., 2011). Current paradigms of fear conditioning include examples in which the CS and US stimuli do not overlap (Krabbe et al., 2019). Such a condition might seem to rule out the mechanisms in our paper. Nevertheless, the argument below suggests that the effects of the CS and US can cause an overlap in neuronal spiking of ECS, F, VIP, and SOM, even when CS and US inputs do not overlap.

      Experimental recordings cannot speak to the rate of spiking of BLA neurons during US due to recording interference from the shock. However, evidence suggests that ECS activity should increase during the US due to the release of acetylcholine (ACh) from neurons in the basal forebrain (BF) (Rajebhosale et al., 2024). Pyramidal cells of the BLA robustly express M1 muscarinic ACh receptors (McDonald and Mott, 2021). Thus, ACh from BF should elicit a depolarization in pyramidal cells. Indeed, the pairing of ACh with even low levels of spiking of BLA neurons results in a membrane depolarization that can last 7 – 10 s (Unal et al., 2015).   Other modulators, including dopamine, may also play a role in producing the sustained activity. Activation of US leads to increased dopamine release in the BLA (Harmer and Phillips, 1999; Suzuki et al., 2002). D1 receptors are known to increase the membrane excitability of BLA projection neurons by lowering their spiking threshold (Kröner et al., 2005). Thus, neuromodulator release should induce higher spiking rates and more sustained activity in the ECS and F neurons during and after the presentation of US, thus ensuring a concomitant activation of ECS and fear (F) neurons necessary for STDP to take place. Thus, the activation of the US can lead to continued and higher firing rates of ECS and F. The effect of dopamine can last up to 20 minutes (Kröner et al., 2005). For CS-positive neurons, the ACh modulation coming from the firing of US may lead to a temporary extension of firing that is then amplified and continued by dopaminergic effects.

      Hence, we suggest that a solution to the problem apparently posed by the non-overlap US and CS in some paradigms of auditory fear conditioning (Krabbe et al., 2019) may be solved by considering the roles of ACh and dopamine in the BLA. The model we have may be considered a “minimal” model that puts in by hand the overlap in activity due to the neuromodulation without explicitly modeling it. We have used the simplest way to model overlap without assumptions about timing specificity in the overlap. We note that, even though ECS and F neurons have the ability to fire continuously when ACh and dopamine are involved, the participation of the interneurons enforces periodic silence needed for the depression-dominated STDP.”

      In the Discussion (in section “Involvement of other brain structures”), we also acknowledged that the overlap between the effects of US and CS in the BLA may be provided by other brain structures by writing the following:

      “In our model, the excitatory projection neurons and VIP and PV interneurons show sustained activity during and after the US presentation, thus allowing potentiation through STDP to take place. The medial prefrontal cortex and/or the hippocampus may provide the substrates for the continued firing of the BLA neurons after the 2-second US stimulation. We also discuss below that this network sustained activity may originate from neuromodulator release induced by US (see section “Assumptions and predictions of the model” in the Discussion).”

      We also improved our discussion about the (Grewe et al., 2017) paper, which questions Hebbian plasticity in the context of fear conditioning based on several critiques. We included a new section in the Discussion entitled “Is STDP needed in fear conditioning?” to discuss those critiques and how our model may address them, which reads as follows:

      “Is STDP needed in fear conditioning? The study in (Grewe et al., 2017) questions the validity of the Hebbian model in establishing associative learning during fear conditioning. There are several critiques we discuss here. The first critique is that Hebbian plasticity does not explain the experimental finding showing that both upregulation and downregulation of stimulus-evoked responses are present between coactive neurons. The upregulation is provided by our model, so the issue is the downregulation, which is not addressed by our model. However, our model highlights that coactivity alone does not create potentiation; the fine timing of the pre- and postsynaptic spikes determines whether there is potentiation or depression. Here, we find that PING networks are instrumental in setting up the fine timing for potentiation. We suggest that networks not connected to produce the PING may undergo depression when coactive.

      The second critique raised by (Grewe et al., 2017) is that Hebbian plasticity alone does not explain why most of the cells exhibiting enhanced responses to the CS did not react to the US before fear conditioning. They suggest that neuromodulators may provide a third condition (besides the activity of the pre- and postsynaptic neurons) that changes the plasticity rule. Our model also does not explicitly address this experimental finding since it requires F to be initially activated by US in order for the fear association to be established. We agree that the fear cells described in (Grewe et al. 2017) may be depolarized by the US without reaching the spiking threshold; however, with neuromodulation provided during the fear training, the same input can lead to spiking, enabling the conditions for Hebbian plasticity. Our discussions above about how neuromodulators affect excitability are relevant to this point. We do not exclude that other forms of plasticity may play a role during fear conditioning in cells not initially activated by the US, but this is not the topic of our modeling study.

      The third critique raised by (Grewe et al., 2017) is that Hebbian plasticity cannot explain why the majority of cells that were US- and CS-responsive before training have a reduced CS-evoked response afterward. The reduced response happens over multiple exposures of CS without US; this can involve processes similar to those present in fear extinction, which require plasticity in further networks, especially involving the infralimbic cortex (Milad and Quirk, 2002; Burgos-Robles et al., 2007). An extension of our model could investigate such mechanisms. In the fourth critique, (Grewe et al., 2017) suggests that the Hebbian plasticity rule cannot easily account for the reduction of the responses of many CS+-responsive cells, but not of the CS−-responsive cells. We suggest that the circuits involving paradigms similar to fear extinction do not involve the CS- cells.

      Overall, we agree with (Grewe et al., 2017) that neuromodulators play a crucial role in fear conditioning, especially in prolonging the US- and CS-encoding activity as discussed in (see section “Assumptions and predictions of the model” in the Discussion), or even participating in changing the details of the plasticity rule. A possible follow-up of our work involves investigating how fear ensembles form and modify through fear conditioning and later stages. This follow-up work may involve using a tri-conditional rule, as suggested in (Grewe et al., 2017), in which the potential role of neuromodulators is taken into account in the plasticity rule in addition to the pre- and postsynaptic neuron activity. Another direction is to investigate a possible relationship between neuromodulation and a depression-dominated Hebbian rule.”

      Finally, we made additional minor changes to the manuscript:

      (1) In the Result section “Interneurons interact to modulate fear neuron output”, we specified the following:

      “The US input on the pyramidal cell and VIP interneuron is modeled as a Poisson spike train at ~ 50 Hz and an applied current, respectively. In the rest of the paper, we will use the words “US” as shorthand for “the effects of US”.” 

      (2) In the Result section “Interneuron rhythms provide the fine timing needed for depression dominated STDP to make the association between CS and fear”, we also reported the following:

      “Similarly to the US, in the rest of the paper, we will use the words “CS” as shorthand for “the effects of CS”. In our simulations, CS is modeled as a Poisson spike train at ~ 50 Hz, independent of the US input. Thus, we hypothesize that the time structure of the inputs sometimes used for the training (e.g., a series of auditory pips) is not central to the formation of the plasticity in the network.”  

      Reviewer #2 (Public Reviews):

      The authors of this study have investigated how oscillations may promote fear learning using a network model. They distinguished three types of rhythmic activities and implemented an STDP rule to the network aiming to understand the mechanisms underlying fear learning in the BLA. 

      After the revision, the fundamental question, namely, whether the BLA networks can or cannot intrinsically generate any theta rhythms, is still unanswered. The author added this sentence to the revised version: "A recent experimental paper, (Antonoudiou et al., 2022), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone." In the cited paper, the authors studied gamma oscillations, and when they applied 10 uM Gabazine to the BLA slices observed rhythmic oscillations at theta frequencies. 10 uM Gabazine does not reduce the GABA-A receptor-mediated inhibition but eliminates it, resulting in rhythmic populations burst driven solely by excitatory cells. Thus, the results by Antonoudiou et al., 2022 contrast with, and do not support, the present study, which claims that rhythmic oscillations in the BLA depend on the function of interneurons. Thus, there is still no convincing evidence that BLA circuits can intrinsically generate theta oscillations in intact brain or acute slices. If one extrapolates from the hippocampal studies, then this is not surprising, as the hippocampal theta depends on extrahippocampal inputs, including, but not limited to the entorhinal afferents and medial septal projections (see Buzsaki, 2002). Similarly, respiratory related 4 Hz oscillations are also driven by extrinsic inputs. Therefore, at present, it is unclear which kind of physiologically relevant theta rhythm in the BLA networks has been modelled. 

      In our public reply to the Reviewer’s point, we reported the following:

      (1) We kindly disagree that (Antonoudiou et al., 2022) contrasts with our study. (Antonoudiou et al., 2022) is a slice study showing that the BLA theta power (3-12 Hz) increases with gabazine compared to baseline. With all GABAergic currents omitted due to gabazine, the LFP is composed of excitatory currents and intrinsic currents. In our model, the high theta (6-12 Hz) comes from the spiking activity of the SOM cells, which increase their activity if the inhibition from VIP cells is removed. Thus, the model produces high theta in the presence of gabazine (see Fig. 1 in our replies to the Reviewers’ public comments). The model also shows that a PING rhythm is produced without gabazine, and that this rhythm goes away with gabazine because PING requires feedback inhibition from PV to fear cells. Thus, the high theta increase and gamma reduction with gabazine in the (Antonoudiou et al., 2022) paper can be reproduced in our model.

      (2) We agree that (Antonoudiou et al., 2022) alone is not sufficient evidence that the BLA can produce low theta (3-6 Hz); we discussed a new paper (Bratsch-Prince et al., 2024) that provides further evidence of BLA ability to produce low theta and under what circumstances. The authors reported that intrinsic BLA theta is produced in slices with ACh stimulation (without needing external glutamate input) which, in vivo, would be provided by the basal forebrain (Rajebhosale et al., eLife, 2024) in response to salient stimuli. The low theta depends on muscarinic activation of CCK interneurons, a group of interneurons that overlaps with the VIP neurons in our model (Krabbe 2017; Mascagni and McDonald, 2003). We suspect that the low theta produced in (Bratsch-Prince et al., 2024) is the same as the low theta in our model. In future work, we will aim to show that ACh activates the BLA VIP cells, which are essential to the low theta generation in the network.

      In the manuscript, we added to and modified the Discussion section “Where the rhythms originate, and by what mechanisms”. This text aims to better discuss (Antonoudiou et al. 2022) and introduce (Bratsch-Prince et al., 2024) with its connection to our hypothesis that the theta oscillations can be produced within the BLA. The new version is:

      “Where the rhythms originate, and by what mechanisms. A recent experimental paper (Antonoudiou et al., 2022) suggests that the BLA can intrinsically generate theta oscillations (312 Hz) detectable by LFP recordings when inhibition is totally removed due to gabazine application. They draw this conclusion in mice by removing the hippocampus, which can volume conduct to BLA, and noticing that other nearby brain structures did not display any oscillatory activity. In our model, we note that when inhibition is removed, both AMPA and intrinsic currents contribute to the network dynamics and the LFP. Thus, interneurons with their specific intrinsic currents (i.e., D-current in the VIP interneurons, and NaP- and H- currents in SOM interneurons) can indeed affect the model LFP and support the generation of theta and gamma rhythms (Fig. 6G). 

      Another slice study, (Bratsch-Prince et al., 2024), shows that BLA is intrinsically capable of producing a low theta rhythm with ACh stimulation and without needing external glutamate input. ACh is produced in vivo by the basal forebrain in response to US (Rajebhosale et al., 2024). Although we did not explicitly include the BF and ACh modulation of BLA in our model, we implicitly include the effect of ACh in BLA by increasing the activity of the VIP cells, which then produce the low theta rhythm. Indeed, low theta in the BLA is known to depend on the muscarinic activation of CCK interneurons, a group of interneurons that overlaps with the class of VIP neurons in our model (Mascagni and McDonald, 2003; Krabbe et al., 2018). 

      Although the BLA can produce these rhythms, this does not rule out that other brain structures also produce the same rhythms through different mechanisms, and these can be transmitted to the BLA. Specifically, it is known that the olfactory bulb produces and transmits the respiratoryrelated low theta (4 Hz) oscillations to the dorsomedial prefrontal cortex, where it organizes neural activity (Bagur et al., 2021). Thus, the respiratory-related low theta may be captured by BLA LFP because of volume conduction or through BLA extensive communications with the prefrontal cortex. Furthermore, high theta oscillations are known to be produced by the hippocampus during various brain functions and behavioral states, including during spatial exploration (Vanderwolf, 1969) and memory formation/retrieval (Raghavachari et al., 2001), which are both involved in fear conditioning. Similarly to the low theta rhythm, the hippocampal high theta can manifest in the BLA. It remains to understand how these other rhythms may interact with the ones described in our paper. However, we emphasize that there is also evidence (as discussed above) that these rhythms arise within the BLA.”

      Reviewer #2 (Recommendations for the Authors):

      (1) Three different types of VIP interneurons with distinct firing patterns have been revealed in the BLA (Rhomberg et al., 2018). Does the generation of rhythmic activities depend on the firing features of VIP interneurons? Does it matter whether VIP interneurons fire burst of action potentials or they discharge more regularly?  

      (2) The authors used data for modeling SST interneurons obtained e.g., in the hippocampus. However, there are studies in the BLA where the intrinsic characteristics of SST interneurons have been reported (Unal et al., 2020; Guthman et al., 2020; Vereczki et al., 2021). Have the authors considered using results of studies that were conducted in the BLA? 

      We thank the Reviewer for their questions, which have helped us further improve our manuscript in response to similar queries from Reviewer 3 in the previous review round. More in detail:

      (1) Although other electrophysiological types exist (Sosulina et al., 2010), we hypothesized that the electrophysiological type of VIP neurons that display intrinsic stuttering is the type that would be involved in mediating low theta oscillations during fear conditioning. This is because VIP intrinsic stuttering in cortical neurons is thought to involve the D-current, which helps create low theta bursting oscillations in the neuronal spiking patterns (Chartove et al., 2020). We think that the other subtypes of VIP interneurons are not essential for the low theta oscillatory dynamics observed during fear conditioning and, thus, did not provide an essential constraint for the phenomena we are trying to capture. VIP interneurons in our network must fire bursts at low theta to be effective in creating the pauses in ECS and F spiking needed for potentiation; single spikes at theta are not sufficient to create these pauses.

      (2) In our model, we used the results conducted in a BLA study (Sosulina et al., 2010). SOM cells in the BLA display several physiologic types. We chose to include in our model the type showing early adaptation in response to a depolarizing current and inward (outward) rectification upon the initiation (release) of a hyperpolarizing current. We hypothesize that this type can produce high theta oscillations, a prominently observed rhythm in the BLA. Unal et al., 2020 (Unal et al., 2020) found two populations of SOM cells in the BLA, which have been previously recorded in (Sosulina et al., 2010), including the one type we chose to model. This SOM cell type shows a low threshold spiking profile characterized by spike frequency adaptation and voltage sag indicative of an H-current used in our model. Guthman et al., 2020, (Guthman et al., 2020), also found a population of SOM cells with hyperpolarization induced sag.

      Our model also uses a NaP-current for which there is no data in the BLA. However, it is known to exist in hippocampal SOM cells and that NaP- and H- currents can produce such a high theta in hippocampal cells. It is a standard practice in modeling to use the best possible replacement for unknown currents. Of course, it is unfortunate to have to do this. We also note that models can be considered proof of principle, that can be proved or disproved by further experimental work. Both (Guthman et al., 2020) and (Vereczki et al., 2021) also uncover further heterogeneity among BLA SOM interneurons involving more than electrophysiology. We hypothesize that such a level of heterogeneity revealed by these three studies is not key to the question we are asking (where crucial ingredients are the rhythms) and, therefore, was not included in our minimal model.

      We modified the Discussion section titled “Assumptions and predictions of the model” as follows:

      “Our model, which is a first effort towards a biophysically detailed description of the BLA rhythms and their functions, does not include the neuron morphology, many other cell types, conductances, and connections that are known to exist in the BLA; models such as ours are often called “minimal models” and constitute most biologically detailed models. For example, although there is considerable variability in the activity patterns of both VIP cells and SOM cells (Sosulina et al., 2010; Guthman et al., 2020; Ünal et al., 2020; Vereczki et al., 2021), our focus was specifically on those subtypes that generate critical rhythms within the BLA. Such minimal models are used to maximize the insight that can be gained by omitting details whose influence on the answers to the questions addressed in the model are believed not to be qualitatively important. We note that the absence of these omitted features constitutes hypotheses of the model: we hypothesize that the absence of these features does not materially affect the conclusions of the model about the questions we are investigating. Of course, such hypotheses can be refuted by further work showing the importance of some omitted features for these questions and may be critical for other questions. Our results hold when there is some degree of heterogeneity of cells of the same type, showing that homogeneity is not a necessary condition.”

      (3) The authors may double-check the reference list, as e.g., Cuhna-Reis et al., 2020 is not listed. 

      We thank the Reviewer for spotting this. We checked the reference list and all the references are now listed.

      Finally, we wanted to acknowledge that we made other changes to the manuscript unrelated to the reviewers’ questions with the purpose of gaining clarity. More specifically:

      (1) We included a section titled “Significance” after the abstract and keywords, which reads as follows:

      “Our paper accounts for the experimental evidence showing that amygdalar rhythms exist, suggests network origins for these rhythms, and points to their central role in the mechanisms of plasticity involved in associative learning. It is one of the few papers to address high-order cognition with biophysically detailed models, which are sometimes thought to be too detailed to be adequately constrained. Our paper provides a template for how to use information about brain rhythms to constrain biophysical models. It shows in detail, for the first time, how multiple interneurons help to provide time scales necessary for some kinds of spike-timing-dependent plasticity (STDP). It spells out the conditions under which such interactions between interneurons are needed for STDP and why. Finally, our work helps to provide a framework by which some of the discrepancies in the fear learning literature might be reevaluated. In particular, we discuss issues about Hebbian plasticity in fear learning; we show in the context of our model how neuromodulation might resolve some of those issues. The model addresses issues more general than that of fear learning since it is based on interactions of interneurons that are prominent in the cortex, as well as the amygdala.”

      (2) The Result section “Physiology of the interneuron types is critical to their role in depression-dominated plasticity”, which is now titled “Mechanisms by which interneurons contribute to potentiation in depression-dominated plasticity”, now reads as follows:

      “Mechanisms by which interneurons contribute to potentiation during depressiondominated plasticity. The PV cell is necessary to induce the correct pre-post timing between ECS and F needed for long-term potentiation of the ECS to F conductance. In our model, PV has reciprocal connections with F and provides lateral inhibition to ECS. Since the lateral inhibition is weaker than the feedback inhibition, PV tends to bias ECS to fire before F. This creates the fine timing needed for the depression-dominated rule to instantiate plasticity. If we used the classical Hebbian plasticity rule (Bi and Poo, 2001) with gamma frequency inputs, this fine timing would not be needed and ECS to F would potentiate over most of the gamma cycle, and thus we would expect random timing between ECS and F to lead to potentiation (Fig. S4). In this case, no interneurons are needed (See Discussion “Synaptic plasticity in our model” for the potential necessity of the depression-dominated rule). 

      In this network configuration, the pre-post timing for ECS and F is repeated robustly over time due to coordinated gamma oscillations (PING, as shown in Fig. 4A, Fig. 1C) arising through the reciprocal interactions between F and PV (Feng et al., 2019). PING can arise only when PV is in a sufficiently low excitation regime such that F can control PV activity (Börgers et al., 2005), as in Fig. 4A. However, although such a low excitation regime establishes the correct fine timing for potentiation, it is not sufficient to lead to potentiation (Fig. 4A, Fig. S2C): the depression-dominated rule leads to depression rather than potentiation unless the PING is periodically interrupted. During the pauses, made possible only in the full network by the presence of VIP and SOM, the history-dependent build-up of depression decays back to baseline, allowing potentiation to occur on the next ECS/F active phase. (The detailed mechanism of how this happens is in the Supplementary Information, including Fig. S2). Thus, a network without the other interneuron types cannot lead to potentiation. Though a low excitation level for a PV cell is necessary to produce a PING, a higher excitation level is necessary to produce a pause in the ECS and F. This higher excitation level is consistent with the experimental literature showing a strong activation of PV after the onset of CS (Wolff et al., 2014). The higher excitation happens when the VIP cell is silent, whereas a low excitation level is achieved when the VIP cell fires and partially inhibits the PV cell (Fig. 4B, Fig. S2D). The interruption in the ECS and F activity requires the participation of another interneuron, the SOM cell (Figs. 2B, S2): the pauses in inhibition from the VIP periodically interrupt ECS and F firing by releasing PV and SOM from inhibition and thus indirectly silencing ECS and F. Without these pauses, depression dominates (see SI section “ECS and F activity patterns determine overall potentiation or depression”).”

      We also removed a supplementary figure (Fig. S2).

      (3) We wanted to be clear and motivate our choice to extend the low theta range to 2-6 Hz and the high theta range to 6-14 Hz, compared to the 3-6 Hz and 6-12 Hz, respectively in the BLA experimental literature. Our main reason for extending the ranges was because the peaks of low and high theta power in the VIP and SOM cells, respectively, (the cells that generate these oscillations) occurred at the borders of the experimental ranges. Thus, in order to include the peaks of the model LFP, we lowered the low theta range by 1 Hz and increased the high theta range by 2 Hz.

      We present a new supplementary figure (Fig. S1) containing the power spectra of VIP, which is the source of low theta in our model, and SOM interneuron, which is the source of high theta:

      We mention Fig. S1 in the Result section “Rhythms in the BLA can be produced by interneurons”, where we added the following text: o “In the baseline condition, the condition without any external input from the fear conditioning paradigm (Fig. 1B, top), our VIP neurons exhibit short bursts of gamma activity (~38 Hz) at low theta frequencies (~2-6 Hz) (peaking at ~3.5 Hz) (see Fig. S1A).” o “In our baseline model, SOM cells have a natural frequency of ~12 Hz (Fig. 1B, middle; Fig. S1B), which is at the upper limit of the experimental high theta range; this motivates our choice to extend the high theta range up to 14 Hz in order to include the peak.” 

      Knowing the natural frequencies of VIP and SOM interneurons from the Result section “Rhythms in the BLA can be produced by interneurons”, we specified more clearly that we quantify the change of power in the low and high theta range around the power peaks in those ranges. Specifically, we changed some sentences in the first paragraph of the Result section “Increased low-theta frequency is a biomarker of fear learning” as follows:

      “We find that fear conditioning leads to an increase in low theta frequency power of the network spiking activity compared to the pre-conditioned level (Fig. 6 A,B); there is no change in the high theta power. We also find that the LFP, modeled as the linear sum of all the AMPA, GABA, NaP-, D-, and H- currents in the network, similarly reveals a low theta power increase when considering the peak of the low theta power, and no significant variation in the high theta power again when considering the peak of the high theta power (Fig. 6 C,D,E).”

      Finally, we made a few other small changes:

      In the Introduction, we mention the following: “We also note that there is not uniformity on the exact frequencies associated with low and high theta, e.g., ((Lorétan et al., 2004) used 2-6 Hz for low theta). Here, we use 2-6 Hz for the theta range and 6-14 Hz for the high theta range.”

      In Fig. 6DE (reported below point 3)), we reran the statistics using a smaller interval for high theta (11.5-13 Hz) to focus around the peak. Our initial result showing significant change in low theta between pre and post fear conditioning and no change in high theta still holds.

      In Fig. 6 of the Result section “Increase low-theta frequency is a biomarker of fear learning”, we switched the order of panels F and G. This change allows us to first focus on the AMPA currents, which are the major contributors of the low theta power increase, and to specify what AMPA current drives that increase. After that, we present the power spectrum of the GABA currents, as well.

      The corresponding text in the Result section, now reads as follows:

      “We find that fear conditioning leads to an increase in low theta frequency power of the network spiking activity compared to the pre-conditioned level (Fig. 6 A,B); there is no change in the high theta power. We also find that the LFP, modeled as the linear sum of all the AMPA, GABA, NaP-, D-, and H- currents in the network, similarly reveals a low theta power increase when considering the peak of the low theta power, and no significant variation in the high theta power again when considering the peak of the high theta power (Fig. 6 C,D,E). These results are consistent with the experimental findings in (Davis et al., 2017). Specifically, the newly potentiated AMPA synapse from ECS to F ensures F is active after fear conditioning, thus generating strong currents in the PV cells to which it has strong connections (Fig. 6F). It is the AMPA currents to the PV interneurons that are directly responsible for the low theta increase; it is the newly potentiated ECS to F synapse that paces the AMPA currents in the PV interneurons to go at low theta. Thus, the low theta increase is due to added excitation provided by the new learned pathway.”

      (4) In the Discussion section “Assumptions and predictions of the model”, we specified the following:

      “Our model predicts that blockade of D-current in VIP interneurons (or silencing VIP interneurons) will both diminish low theta and prevent fear learning. Finally, the model assumes the absence of significantly strong connections from the excitatory projection cells ECS to PV interneurons, unlike the ones from F to PV. Including those synapses would alter the PING rhythm created by the interactions between F and PV, which is crucial for fine timing between ECS and F needed for LTP.”

      (5) Finally, to broaden the potential interest of our study, we added the following sentences:

      At the conclusion of the abstract:

      “The model makes use of interneurons commonly found in the cortex and, hence, may apply to a wide variety of associative learning situations.” - At the conclusion of the introduction:

      “Finally, we note that the ideas in the model may apply very generally to associative learning in the cortex, which contains similar subcircuits of pyramidal cells and interneurons: PV, SOM and VIP cells.” 

      Also, changes in the emphasis of the paper led us to remove the following from the abstract: “Finally, we discuss how the peptide released by the VIP cell may alter the dynamics of plasticity to support the necessary fine timing.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The manuscript could be improved by addressing the following issues.

      (1) Fig. 3: The analgesic effects after astrocyte ablation appear to recover after one week. Is this due to repopulation of astrocytes?

      Although we did not detect the proliferation of astrocytes, we hypothesized that it was likely related to the microglia phagocytosis of astrocyte debris after astrocyte ablation. Microglia are known to have the function of phagocytosis of cell debris. Diphtheria toxin-mediated cell ablation caused AAV2/5-GfaABC1D-Cre labeled astrocytes death and cell fragmentation. We hypothesized that the microglia could phagocyte the astrocyte fragments and were stimulated to activate type I interferon signal. When microglia phagocyte debris ended, the activation of type I interferon signal was also declined. Reduced activation of type I interferon signal may also be accompanied by recurrence of pain.

      (2) Fig. 3: Please justify the large sample size of n=30-36. Is this sample size based on previous studies or statistical estimation?

      The number of mice was based on our previous report [1], and the increased number of mice may also ensure that the pain data would also be reliable. Not only did we explore the differences between the sexes, and we also needed to obtain samples at different times for different experiments.

      (3) Please try to plot individual data points for some critical time points to demonstrate data distribution. It is also helpful to plot male and female data points separately for some time points.

      Individual data have been plotted as your request and added in the supplementary material.

      (4) It is unclear if the same number of males and females were used in this study, as females were typically used for SCI studies. I wonder if you can use repeated measures Two-Way ANOVA for statistical analysis.

      According to our observations, the number of males and females was not the same, while both of them were sufficient for statistical analysis. In addition, in the process of breeding transgenic mice, we would obtain both male and female mice, and rational use of mice may be better for us. Indeed, previous studies have shown that female mice are more commonly used in pain studies. Although we did not observe a gender difference in this study, it has been reported in the previous studies that gender is one of the factors for pain differences. According to your suggestion, we adopted the Two-Way ANOVA for statistical analysis and updated it in the part of statistical methods, but the statistical results were consistent with the previous results, so we did not modify the statistical results of the pictures.

      (5) Fig. 3C, D: The effects of astrocyte ablation on mechanical pain are mild, compared to thermal pain. Electronic von Frey apparatus may be difficult for mice. It works very well for rats and large animals.

      Since the animals involved in this study were all mice, we did not know how electronic von Frey was used in rats and large animals. But after the using of electronic von Frey, it seems to us that electronic von Frey is very suitable for mouse experiments. Best of all, our electronic von Frey can achieve accuracy as low as 0.01g. This allows us to detect very sensitive pain data, which may be more accurate and intuitive than before.

      (6) Fig. 2B: In the figure legend it states n = 3 biological repeats. There are many more dots in each column. Are these individual animals or spinal cord sections?

      As we describe in our method, n = 3 biological repeats represented three biological repeats per group, i.e., three mice/group with three IF per mouse. We take three or more values in each ascending tract (depending on the partition size of the different ascending tracts of lumbar enlargements). So, we would get more data as shown in Figure 2, which could be also more reliable.

      (7) Fig. 4C: It appears that GFAP is increased by toxin treatment. Please explain this result.

      This figure was calculated for astrocyte activation in the lesion area (T9-10), but not for the lumbar enlargement.

      Reviewer #2 (Recommendations For The Authors):

      Specific Comments:

      RNA-Sequencing Analysis: The strength of the RNA-sequencing data in elucidating the impact of astrocyte elimination is compelling. While the focus on IFN signaling is well-supported, the manuscript overlooks other differentially expressed genes. A deeper analysis or at least a discussion of these genes could enrich the study's conclusions, offering a more holistic view of the underlying mechanisms.

      Although we did not focus more on other relevant differential genes, we focused on the most significant differential genes, for these differential genes have a more significant effect on pain.

      Q2: Figure Presentation: Consolidating Figures 1-3 could increase the clarity of the result presentation, reducing distractions from the main narrative. Certain aspects, such as the comparison of different tracts in Figure 2B and the body weight data in Figure 3C, seem tangential and might be better suited for supplementary materials.

      The comparison of astrocyte activation in different ascending tracts of lumbar enlargements explained the relationships between astrocyte activation and pain, and laid the foundation for the subsequent astrocyte elimination. The weight data is also important, reflecting not only the changes in the overall recovery process after spinal cord injury, but also the effect of astrocyte elimination on the overall effect of mice. Thus, the weight data together with the pain test results will be more intuitive for the reader to understand the change of overall conditions of mice after astrocyte elimination.

      Q3: Schematic Clarity: The schematic in Figure 1A is confusing, particularly in distinguishing between transgenic mice and viral constructs. The inconsistent naming of Cre recombinase (alternatively referred to as Cre, CRE, and sometimes DRE) further complicates understanding. Standardizing these elements would greatly enhance clarity for the readers.

      As we described in the part of method, Gt(ROSA)26Sorem1(CAG-LSL-RSR-tdTomato-2A-DTR)Smoc mice contain both Loxp-stop-Loxp sequence and Rox-stop-Rox sequence. In the process of reproduction, Gt(ROSA)26Sorem1(CAG-LSL-RSR-tdTomato-2A-DTR)Smoc mice crossed with C57BL/6JSmoc-Tg(CAG-Dre)Smoc mice could remove the Rox-stop-Rox sequence, which could further crossed with mice containing Cre recombinase, or with AAV2/5-GfaABC1D-Cre intervention to remove the Loxp-stop-Loxp sequence and induce the expression of tdTomato and DTR.

      Q4: Pathway Analysis: The discussion of the signal pathway analysis in Figure 8 leans heavily on speculation without direct evidence from the study. Distinguishing clearly between findings and literature-derived hypotheses is crucial. A more detailed discussion that properly cites sources for each pathway element would strengthen the manuscript.

      According to your question, we have added this figure to the supplementary picture.

      Q5: Statistical Analysis: The use of one-way ANOVA, despite presenting data in groups, is misaligned with the data's structure. Employing two-way ANOVA followed by post-hoc comparisons is appropriate for statistical analysis.

      According to your suggestions, we adopted the Two-Way ANOVA for statistical analysis and updated it in the part of statistical methods, but the statistical results are consistent with the previous ones. Therefore, we did not modify the statistical results of the pictures.

  2. Oct 2024
    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      […] Strengths:

      The study has several important strengths: (i) the work on GDA stability and competition of GDA with point mutations is a very promising area of research and the authors contribute new aspects to it, (ii) rigorous experimentation, (iii) very clearly written introduction and discussion sections. To me, the best part of the data is that deletion of lon stimulates GDA, which has not been shown with such clarity until now.

      Weaknesses:

      The minor weaknesses of the manuscript are a lack of clarity in parts of the results section (Point 1) and the methods (Point 2).

      We thank the reviewer for their comments and suggestions on our manuscript. We also appreciate the succinct summary of key findings that the Reviewer has taken cognisance of in their assessment, in particular the association of the Lon protease with the propensity for GDAs as well as its impact on their eventual fate. Going ahead, we plan to revise the manuscript for greater clarity as suggested by Reviewer #1.

      Reviewer #2 (Public review):

      […] The study does what any bold and ambitious study should: it contains large claims and uses multiple sorts of evidence to test those claims.

      Weaknesses:

      While the general argument and conclusion are clear, this paper is written for a bacterial genetics audience that is familiar with the manner of bacterial experimental evolution. From the language to the visuals, the paper is written in a boutique fashion. The figures are even difficult for me - someone very familiar with proteostasis - to understand. I don't know if this is the fault of the authors or the modern culture of publishing (where figures are increasingly packed with information and hard to decipher), but I found the figures hard to follow with the captions. But let me also consider that the problem might be mine, and so I do not want to unfairly criticize the authors.

      For a generalist journal, more could be done to make this study clear, and in particular, to connect to the greater community of proteostasis researchers. I think this study needs a schematic diagram that outlines exactly what was accomplished here, at the beginning. Diagrams like this are especially important for studies like this one that offer a clear and direct set of findings, but conduct many different sorts of tests to get there. I recommend developing a visual abstract that would orient the readers to the work that has been done.

      Next, I will make some more specific suggestions. In general, this study is well done and rigorous, but doesn't adequately address a growing literature that examines how proteostasis machinery influences molecular evolution in bacteria.

      While this paper might properly test the authors' claims about protein quality control and evolution, the paper does not engage a growing literature in this arena and is generally not very strong on the use of evolutionary theory. I recognize that this is not the aim of the paper, however, and I do not question the authors' authority on the topic. My thoughts here are less about the invocation of theory in evolution (which can be verbose and not relevant), and more about engagement with a growing literature in this very area.

      The authors mention Rodrigues 2016, but there are many other studies that should be engaged when discussing the interaction between protein quality control and evolution.

      A 2015 study demonstrated how proteostasis machinery can act as a barrier to the usage of novel genes: Bershtein, S., Serohijos, A. W., Bhattacharyya, S., Manhart, M., Choi, J. M., Mu, W., ... & Shakhnovich, E. I. (2015). Protein homeostasis imposes a barrier to functional integration of horizontally transferred genes in bacteria. PLoS genetics, 11(10), e1005612

      A 2019 study examined how Lon deletion influenced resistance mutations in DHFR specifically: Guerrero RF, Scarpino SV, Rodrigues JV, Hartl DL, Ogbunugafor CB. The proteostasis environment shapes higher-order epistasis operating on antibiotic resistance. Genetics. 2019 Jun 1;212(2):565-75.

      A 2020 study did something similar: Thompson, Samuel, et al. "Altered expression of a quality control protease in E. coli reshapes the in vivo mutational landscape of a model enzyme." Elife 9 (2020): e53476.

      And there's a new review (preprint) on this very topic that speaks directly to the various ways proteostasis shapes molecular evolution:

      Arenas, Carolina Diaz, Maristella Alvarez, Robert H. Wilson, Eugene I. Shakhnovich, C. Brandon Ogbunugafor, and C. Brandon Ogbunugafor. "Proteostasis is a master modulator of molecular evolution in bacteria."

      I am not simply attempting to list studies that should be cited, but rather, this study needs to be better situated in the contemporary discussion on how protein quality control is shaping evolution. This study adds to this list and is a unique and important contribution. However, the findings can be better summarized within the context of the current state of the field. This should be relatively easy to implement.

      We thank the reviewer for their encouraging assessment of our manuscript. We appreciate that the manuscript may not be accessible for a general readership in its present form. We plan to revise the manuscript, in part by modifying figures and adding schematics, to afford greater clarity. We also appreciate the concern regarding situating this study in the context of other published work that relates proteostasis and molecular evolution. Indeed, this was a particularly difficult aspect for us given the different kinds of literature that were needed to make sense of our study. We plan on revising the manuscript by incorporating the references that the Reviewer has pointed out.

      Reviewer #3 (Public review):

      […] Strengths:

      The major strength of this paper is identifying an example of antibiotic resistance evolution that illustrates the interplay between the proteolytic stability and copy number of an antibiotic target in the setting of antibiotic selection. If the weaknesses are addressed, then this paper will be of interest to microbiologists who study the evolution of antibiotic resistance.

      Weaknesses:

      Although the proposed mechanism is highly plausible and consistent with the data presented, the analysis of the experiments supporting the claim is incomplete and requires more rigor and reproducibility. The impact of this finding is somewhat limited given that it is a single example that occurred in a lon strain and compensatory mutations for evolved antibiotic resistance mechanisms are described. In this case, it is not clear that there is a functional difference between the evolution of copy number versus any other mechanism that meets a requirement for increased "expression demand" (e.g. promoter mutations that increase expression and protein stabilizing mutations).

      We thank the reviewer for their in-depth assessment of our work and appreciate their concerns regarding reproducibility and rigor in analysis of our data. We will incorporate this feedback and provide the necessary clarifications in the revised version of our manuscript.

    1. Author Response:

      We would like thank reviewers for your comprehensive and insightful reviews of our manuscript. We highly value your constructive comments and suggestions and are preparing revisions that will enhance both the clarity and robustness of our study. Below is an outline of the changes we will implement in response to the points you raised.

      All three reviewers expressed concerns regarding the robustness of our conclusions about the relationship between task-related theta activity and aperiodic changes. We will revise the manuscript to present these conclusions more cautiously, stating that the findings indicate a potential contribution of aperiodic activity to what is traditionally interpreted as theta activity. While our results emphasize the importance of distinguishing between periodic and aperiodic components, further research is necessary to fully understand this relationship. We will conduct additional control analyses, including a comparison of the scalp topographies of theta and aperiodic components, to better understand the relationship between aperiodic and periodic (theta) activity.

      In response to Reviewer #1's request for greater transparency in our reporting of methodological details, we will provide key clarifications. We will add a clear statement noting that the primary results are based on data from middle-aged to older adults, some of whom had subjective cognitive complaints (SCC). However, it is important to note that no differences were observed between the SCC group and the control group regarding periodic or aperiodic changes in power. Additionally, the main findings were replicated in a sample of middle-aged adults.

      To address potential confounding factors, we will include an analysis contrasting response-related ERPs with the identified aperiodic components. However, we do not entirely agree with the assertion that this will necessarily clarify the results. ERPs are not inherently distinct from aperiodic (or periodic) activity; they may reflect changes in aperiodic (or periodic) power. In our view, examining aperiodic and periodic power, ERPs, or time-frequency decomposition with baseline correction provides different perspectives on the same data. Nonetheless, the combined analyses and their results are intended to guide future researchers toward the most suitable approach for interpreting this data.

      Reviewer #3 raised concerns regarding the task's effectiveness in evoking theta power and the ability of spectral parameterization method (specparam) to adequately quantify background activity around theta bursts. To address these concerns, we will include additional visualizations demonstrating that the task reliably elicited theta (and delta) activity. Regarding the reviewer's concerns about specparam and theta bursts, it is important to clarify that specparam, in the form we used, does not incorporate time information; rather, it can be applied to any power spectral density (PSD), independent of how the PSD is derived. Specparam’s performance depends on the methods used to estimate frequency content. For time-frequency decomposition, we employed superlets (https://doi.org/10.1038/s41467-020-20539-9), which have been shown to resolve short bursts of activity more effectively than other methods. To our knowledge, superlets provide the highest resolution in terms of both time and frequency. Moreover, to improve stability, we performed spectral parameterization on trial-averaged power (in contrast to the approach in https://doi.org/10.7554/eLife.77348). Nonetheless, we will conduct a simulation to test whether specparam can reliably resolve low-frequency peaks over the 1/f activity.

      Reviewer #2 suggested that the manuscript would benefit from a more detailed account of the effects. In response, we will include more detailed quantifications of the analyzed effects, such as model error and R² values.

      We believe that the planned revisions will strengthen the manuscript and address the primary concerns raised by the reviewers. We sincerely appreciate your thoughtful feedback and look forward to submitting an improved version of the manuscript soon.

      Once again, thank you for your time and expertise in reviewing our work.

      Sincerely,

      Andraž Matkovič & Tisa Frelih

    1. Author response:

      The following is the authors’ response to the original reviews.

      We greatly appreciate reviewer 2 comments with both insightful and clearly evaluated assessments of this study that include, much appreciated reframing and evaluation of the study’s advances in the sleep field. It is a constructive review and provides considerable added value to this study in better defining the biological significance of the findings, including both advances and limitations.  

      Reviewer 2 nicely summarized the work as “…highlight(ing) the accumulation and resolution of sleep need centered on the strength of excitatory synapses onto excitatory neurons.”. The reviewer succinctly placed one of the main electrophysiological findings in context of one of the sleep field’s most prevalent views, “that LTP associated with wake, leads to the accumulation of sleep need by increasing neuronal excitability, and by the "saturation" of LTP capacity.” It has been speculated that “This saturation subsequently impairs the capacity for further ongoing learning. This new data provides a satisfying mechanism of this saturation phenomenon (and its restoration by recovery sleep) by introducing the concept of silent synapses.” We want to emphasize that sleep need and its resolution involves more than just homeostasis of excitatory synaptic strength but may also be extended to include homeostasis of excitatory synaptic potential to undergo LTP (a homeostasis of meta-plasticity), with implications for learning and memory.   

      Reviewer 2 also identified another advance made by this study, summarized as, “The new snRNAseq dataset indicates the sleep need is primarily seen (at the transcriptional level) in excitatory neurons, consistent with a number of other studies.” References for these studies are nicely provided by the reviewer. Our analysis of this data extends the evidence for transcriptional sleep-need-driven changes, observed by us and others in excitatory neurons to more particularly involve the excitatory neurons in layers 2-5, targeting  intra-telencephalic neurons.  

      Reviewer 2, importantly noted, “New snRNAseq analysis indicates that SD drives the expression of synaptic shaping components (SSCs) consistent with the excitatory synapse as a major target for the restorative basis of sleep function”, and that “SD-induced gene expression is also enriched for autism spectrum disorder (ASD) risk genes”. These comments are well appreciated as they emphasize that beyond identification of the major target cell type of sleep function, the major sleep-target, gene-ontological characteristics are starting to be addressed.

      Reviewer 2 commented on the molecular sleep model, making a key observation that “SDinduced gene expression in excitatory neurons overlaps with genes regulated by the transcription factor MEF2C and HDAC4/5 (Figure 4),” and accurately discusses the significance with respect to the proposed model.

      We are in complete agreement with the observation that the molecular sleep model presented is not “definitively supported by the new data and in this regard should be viewed as a perspective…”. One of the more glaring gaps in supporting evidence is the absence of understanding of the role of HDAC4/5 (part of the SIK3-HDAC4/5 pathway) in sleep need modulation of excitatory synapses. Resolution of this issue might be approached by assessment of the synaptic effects of constitutively nuclear HDAC4/5. The current study provides a first step in the assessment by showing a correlation between HDAC4/5 and MEF2c target genes and a subset of differentially expressed synaptic shaping component (SSC) genes that modulate excitatory synapse strength and phenotype. However, the functional studies have yet to be completed. Complimentary studies on SD-induced SSC-DEGs (identified in this study) are also needed for follow-up characterization of their sleep need induced functional impact (both strength and meta-plasticity modulation) on the most relevant excitatory synapses (as identified in the current study).

      We agree with both reviewers 1 and 2 that, “Additional work is also needed to understand the mechanistic links between SIK3-HDAC4/5 signaling and MEF2C activity”. Reviewer 2 clarifies the key unresolved issue as, “cnHDAC4/5 suppresses NREM amount and NREM SWA but had no effect on the NREM-SWA increase following SD (Zhou et al., Nature 2022). Loss of MEF2C in CaMKII neurons had no effect on NREM amount and suppressed the increase in NREM-SWA following SD (Bjorness et al., 2020)”. One may conclude with reviewer 2, “These instances indicate that cnHDAC4/5 and loss of MEF2C do not exactly match suggesting additional factors are relevant in these phenotypes.”

      An understanding of the mechanism(s) responsible for the relationship between sleep need and SWA are critical to the evaluation of sleep need’s correlation with sleep DEGs and synaptic transmission, including “additional factors” as suggested by reviewer 2. SWA might result from a decrease of cortical glutamatergic neurotransmission below some threshold, which might occur in response to prolonged waking (possibly in response to waking activity-induced local increases of adenosine?), rather than being a cause of, or, being intimately involved in resolving sleep need.  

      An increase of SWA in association with SD can result directly from an acute SD-induced increase in local adenosine concentration. This will elicit an ADORA1-mediated down-regulation of glutamate excitatory neurotransmission in the cortex (Bjorness et al., 2016) and in cholinergic arousal centers (Rainnie et al., 1994; Porkka-Heiskanen et al., 1997; Portas et al., 1997; Li et al., 2023). When MEF2c is derepressed by chronic loss of HDAC4 function, SWA is facilitated (Kim et al., 2022). It is plausible that loss of HDAC4 function contributes to the increased SWA by downscaling glutamate excitatory transmission (independent of sleep need). This is expected to result from derepressed, MEF2c mediated sleep-gene expression.  

      Similarly, over-expression of constitutively active HDAC4 (cnHD4) can contribute to chronic upscaling of cortical glutamate synaptic strength to depress SWA (again, independent of sleep need). Thus, facilitation or depression of SWA correlates with up or down scaling effects on cortical glutamate neurotransmission, respectively, even in the absence of  a direct effects on sleep need (Figure 4D). Many reagents that reduce the excitability of glutamate pyramidal cells by various mechanisms, including anesthetics like isoflurane, barbiturates or benzodiazepines in addition to those activating ADORA1, increase SWA. Finally, it is important to acknowledge that direct evidence for this proposed link of SWA to cortical glutamate transmission remains in need of further investigation. Thus, SWA may reflect generalized cortical glutamate synaptic activity whether modulated by sleep function or by other agents.

      Still, other factors that can have a role mediating some of the mis-match between cnHD4/5 DEGs and Mef2c-cKO DEGs, include the broader over-expression of AAV-cnHD4 compared to CamKII- driven Cre KO of Mef2c. The cnHD4 overexpression can increase arousal center activity in the hypothalamus and other arousal areas to interfere with SWA, but not to the exclusion of SD-DEG repression resulting from a repression of MEF2c-mediated sleep gene expression.

      The critique by reviewer 1 raises a number of important technical issues with this study. A key, potentially critical issue raised by reviewer 1, is that of our method of experimental sleep deprivation (ESD). The reviewer suggests that “…neuronal activity/induction of plasticity”, peculiar to the ESD methodology employed in this study, “…rather than sleep/wake states are responsible for the observed results…”.  

      In this study, a slow-moving treadmill (SMTM; 0.1km/hour, as stated in the methods), requiring locomotion to avoid bumping into the backwall of a false bottomed plexiglass cage was used to induce ESD. A mouse, in its home cage, typically moves much faster than 0.1km/hour and the mouse is able to eat and drink freely while in the cage (see file: video 1). Furthermore, our observations using a beam-break cage, indicate that mice spontaneously travel for comparable to longer distances over 6 hours than the treadmill moves (during the ESD of 6 hours). Finally, our EEG recordings of mice on the active treadmill show 100% waking while it is on (Bjorness et al., 2009), whereas prevention of NREM sleep (including transition time) using the “gentle handling”  (GH) technique occurs depending on the diligence of the experimenter.  

      The accommodation (one week prior to ESD) included exposure to the treadmill-on for 30minutes ~ZT=2 & ZT= 14 hours (now spelled out in the “Materials & Methods” section). Thus, the likelihood of motor learning seems vanishingly small.  

      As with all ESD methods, there must be some associated increase in sensory and motor neuronal activity to drive arousal and prevent transition to sleep. For example, the more widely employed GH method of ESD involves sensory stimulation (tactile and or auditory) of sufficient intensity to induce postural change from that associated with sleep to that associated with wake (often involving some locomotion). Like the SMTM, both sensory and motor systems are likely to be engaged. Unlike the SMTM method, the stimulation used in GH is variably-intermittent from mouse to mouse and from experimenter to experimenter as it is applied only when the experimenter judges the mouse to be falling asleep. . It can even be argued that the varied and unpredictable ways in which these interactions happen cause plastic changes with a higher likelihood than the constant slow motion of a treadmill – the mice know how to walk, after all. In other protocols, novel objects are introduced to the animals – those will certainly trigger plastic processes –something that is avoided using a slow-running treadmill to which the mouse has been accommodated, for sleep deprivation.  

      The changes induced by SMTM technique are reproducible and induce arousal by somatic stimulation of sufficient intensity to induce natural motor activity as with GH. All ESD methods induce motor activity and it is reasonable to speculate that induced, motor activity is essential for effective ESD for the prolonged durations (>4 hours in mice) that elicit high sleep need. Electrophysiological assessment of SD-evoked increases in mEPSC amplitude and frequency using GH-ESD (Liu et al., 2010) are similar in all respects to our observations of the response to SMTMESD (Bjorness et al., 2020). Further studies might directly address a comparison of SMTM-ESD to GH-ESD as suggested by reviewer 1 but are regrettably outside the scope and resources of our study.

      The model presented in Figure 4C is consistent with the experimental findings with respect to the observed electrophysiological changes (including loss of silent synapses and increased AMPA/NMDA ratio after ESD of 6 hours) and altered gene expression that includes enrichment of SSC genes, many of which (7 candidates are listed) can affect both AMPA/NMDA ratio and silent synapses. No claim of mechanism linking the changed expression to altered AMPAR or NMDAR activity can be made at this point, even as to polarity of gene expression, related to electrophysiological outcome. Furthermore, some transcripts may involve receptor trafficking while others more directly affect activated receptor function. To help illustrate the complexity of interpreting gene up-regulation, consider the following hypothetical scenario. If a gene like upregulated Grin3a acts rapidly, it may facilitate reduction of NMDAR function (decreasing plasticity) during ESD, whereas upregulation of a gene like Kif17, if acting in a more delayed manner, might enhance NMDAR surface expression and activity (increasing silent synapses) in response to ESD, during recovery sleep. Relevant references, consistent with these various outcomes are supplied in the manuscript but further investigation is clearly needed, or as reviewer 2 so aptly commented, this work “…provides a framework to stimulate further research and advances on the molecular basis of sleep function”.  

      Several issues are raised by reviewer 1 concerning the electrophysiological methodology and statistical assessment. In regard to the former, we closely followed established protocols employed in the frontal neocortex (Myme et al., 2003). We did not include the details for series resistance monitoring. Series resistance values ranged between 8 and 15 MOhm and experiments with changes larger than 25% not used for further analyses. Thank you for bringing this  oversight on our part, to our attention. This essential information, that is unfailingly gathered for all our whole cell recordings, is now added to the version of record.

      The -90 mV holding potential was chosen according to precedent (Myme et al., 2003). It increases driving force and permits lower stimulus strength for the same response size – reducing the likelihood for polysynaptic responses. Experiments with multiple response peaks at -90 mV were not included in the analysis. The -90 mV holding potential also increases NMDA receptor Mg++ block resulting in a minimally contaminated AMPA response. This information is now added to our submitted version of record.

      The statistical assessments shown in Table 1 refer to two sets of data measured from 3X2=6 different cohorts for each sleep condition (CS, SD, RS): 1) AMPA & NMDA EPSCs and 2) AMPA/NMDA FR ratios (FRR; now bolded in row 1, second tab, Table S1). As stated in the results section, “A two-way ANOVA analysis showed a significant interaction between AMPA matched to NMDA EPSC response for each neuron, and sleep condition (F (2, 21) = 7.268, p<0.004; Figure 1 A, C, E). When considered independently, neither the effect of sleep condition nor of EPSC subtype reached significance at p<0.05 (Figure 1 C)”.  

      As noted by reviewer 1, we inadvertently dropped one of the data points from the RS FR and FR ratio (FRR) statistical analysis (raw data in the third tab of Table S1, statistical data in fourth and fifth tab and illustrated in figure 1 F). Thanks to this appreciated, rigorous review, we can correct the oversight (using raw data unchanged in Table S1, third tab). The Table S1 and figure 1 F are now corrected for the version of record. For better clarity, we now use two tabs, the fourth and fifth tabs, respectively of Table S1, for separate stat analyses of FR and FRR data.

      The significance of the AMPA/NMDA FRR across sleep conditions was assessed with the KruskalWallis test, a non-parametric method. The two-stage linear step-up procedure of Benjamini, Krieger, and Yekutieli (BKY) was used to control for the FDR across multiple sleep conditions, in the non-parametric Kruskal-Wallis test but it is usually less powerful than tests presuming normal distributions like the one-way ANOVA and Holm-Sidak’s test. We have now added re-analyzed  FRR across CS, SD and RS conditions using a normal one-way ANOVA (Table S1, tab5). The results now read, “The difference between  sleep conditions and FRR is significant (F (2, 19) = 11.3, Table S1, tab5). Multiple comparisons (Holm-Sidak, Table S1, tab5) indicate the near absence of silent synapses was reversed by either CS or RS (SD/CS; p<0.0011 and SD/RS: p<0.0006; Table S1, tab 5; Figure 1 F).”. These analyses compare well to the non-parametric assessment using the  KruskalWallis test (significant at p= 0.0006) with BYK correction for multiple comparison analysis to give for CS-SD, p<= 0.0262 and for RS-SD, p<= 0.0006 (statistics also shown in Table S1, tab5). [Also shown in tab5 is the “standard approach of correcting for family wise error rate”, namely, Dunn’s test. It is more conservative but less powerful than the BYK correction- in general the tradeoff of greater power/ less conservative is better tolerated when many comparisons are made, however, it can be argued that in the present analysis type 2 errors are also potentially misleading and thus not well tolerated.]  The modifications of our statistical analyses, inspired by reviewer 1,  did not affect the interpretation of the data nor the conclusions.  

      Bjorness TE, Kelly CL, Gao T, Poffenberger V, Greene RW (2009) Control and function of the homeostatic sleep response by adenosine A1 receptors. The Journal of neuroscience : the official journal of the Society for Neuroscience 29:1267-1276.

      Bjorness TE, Dale N, Mettlach G, Sonneborn A, Sahin B, Fienberg AA, Yanagisawa M, Bibb JA, Greene RW (2016) An Adenosine-Mediated Glial-Neuronal Circuit for

      Homeostatic Sleep. The Journal of neuroscience : the official journal of the Society for Neuroscience 36:3709-3721.

      Bjorness TE, Kulkarni A, Rybalchenko V, Suzuki A, Bridges C, Harrington AJ, Cowan CW, Takahashi JS, Konopka G, Greene RW (2020) An essential role for MEF2C in the cortical response to loss of sleep in mice. Elife 9.

      Kim SJ et al. (2022) Kinase signalling in excitatory neurons regulates sleep quantity and depth. Nature 612:512-518.

      Li B, Ma C, Huang YA, Ding X, Silverman D, Chen C, Darmohray D, Lu L, Liu S, Montaldo G, Urban A, Dan Y (2023) Circuit mechanism for suppression of frontal cortical ignition during NREM sleep. Cell 186:5739-5750 e5717.

      Liu ZW, Faraguna U, Cirelli C, Tononi G, Gao XB (2010) Direct evidence for wake-related increases and sleep-related decreases in synaptic strength in rodent cortex. The Journal of neuroscience : the official journal of the Society for Neuroscience 30:8671-8675.

      Myme CI, Sugino K, Turrigiano GG, Nelson SB (2003) The NMDA-to-AMPA ratio at synapses onto layer 2/3 pyramidal neurons is conserved across prefrontal and visual cortices. Journal of neurophysiology 90:771-779.

      Porkka-Heiskanen T, Strecker RE, Thakkar M, Bjorkum AA, Greene RW, McCarley RW (1997) Adenosine: a mediator of the sleep-inducing effects of prolonged wakefulness. Science 276:1265-1268.

      Portas CM, Thakkar M, Rainnie DG, Greene RW, McCarley RW (1997) Role of adenosine in behavioral state modulation: a microdialysis study in the freely moving cat. Neuroscience 79:225-235.

      Rainnie DG, Grunze HC, McCarley RW, Greene RW (1994) Adenosine inhibition of mesopontine cholinergic neurons: implications for EEG arousal. Science 263:689692.

    1. Author response

      We appreciate the positive comments and constructive suggestions from the editors and reviewers, which will help us improve our manuscript. We will implement the changes as requested by the reviewers, focusing primarily on revising and clarifying the following aspects:

      First, we will clarify the use of biological and technical replicates in each experiment and provide more details about the statistical analyses conducted. Additionally, we plan to include a schematic representation of the experimental design.

      Second, we will explain the experiment conducted to rule out hormonal effects or differences in the oocyte maturation method used. We will also indicate the concentration of OVGP1 in the oviduct and explain why we selected OVGP1 as the probable cause of species specificity.

      Third, by addressing all of the reviewers' suggestions, we aim to resolve any concerns, inconsistencies, or minor errors identified by the reviewers.

      We are committed to addressing all the issues raised by the reviewers and believe that the manuscript will greatly benefit from the insightful suggestions and invaluable contributions of the editors and reviewers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper begins with phenotyping the DGRP for post-diapause fecundity, which is used to map genes and variants associated with fecundity. There are overlaps with genes mapped in other studies and also functional enrichment of pathways including most surprisingly neuronal pathways. This somewhat explains the strong overlap with traits such as olfactory behaviors and circadian rhythm. The authors then go on to test genes by knocking them down effectively at 10 degrees. Two genes, Dip-gamma and sbb, are identified as significantly associated with post-diapause fecundity, and they also find the effects to be specific to neurons. They further show that the neurons in the antenna but not the arista are required for the effects of Dip-gamma and sbb. They show that removing the antenna has a diapause-specific lifespan-extending effect, which is quite interesting. Finally, ionotropic receptor neurons are shown to be required for the diapause-associated effects.

      Strengths and Weaknesses:

      Overall I find the experiments rigorously done and interpretations sound. I have no further suggestions except an ANOVA to estimate the heritability of the post-diapause fecundity trait, which is routinely done in the DGRP and offers a global parameter regarding how reliable phenotyping is. A minor point is I cannot find how many DGRP lines are used.

      Thank you for the suggestions. We screened 193 lines and we will add that information to the methods. Additionally, we will add the heritability estimate of the post-diapause fecundity trait.

      Reviewer #2 (Public Review):

      Summary

      In this study, Easwaran and Montell investigated the molecular, cellular, and genetic basis of adult reproductive diapause in Drosophila using the Drosophila Genetic Reference Panel (DGRP). Their GWAS revealed genes associated with variation in post-diapause fecundity across the DGRP and performed RNAi screens on these candidate genes. They also analyzed the functional implications of these genes, highlighting the role of genes involved in neural and germline development. In addition, in conjunction with other GWAS results, they noted the importance of the olfactory system within the nervous system, which was supported by genetic experiments. Overall, their solid research uncovered new aspects of adult diapause regulation and provided a useful reference for future studies in this field.

      Strengths:

      The authors used whole-genome sequenced DGRP to identify genes and regulatory mechanisms involved in adult diapause. The first Drosophila GWAS of diapause successfully uncovered many QTL underlying post-diapause fecundity variations across DGRP lines. Gene network analysis and comparative GWAS led them to reveal a key role for the olfactory system in diapause lifespan extension and post-diapause fecundity.

      Weaknesses:

      (1) I suspect that there may be variation in survivorship after long-term exposure to cold conditions (10ºC, 35 days), which could also be quantified and mapped using genome-wide association studies (GWAS). Since blocking Ir21a neuronal transmission prevented flies from exiting diapause, it is possible that natural genetic variation could have a similar effect, influencing the success rate of exiting diapause and post-diapause mortality. If there is variation in this trait, could it affect post-diapause fecundity? I am concerned that this could be a confounding factor in the analysis of post-diapause fecundity. However, I also believe that understanding phenotypic variation in this trait itself could be significant in regulating adult diapause.

      We agree that it is possible that the ability to endure cool temperatures per se may influence post-diapause fecundity. However, cool temperature is the essential diapause-inducing condition in Drosophila, so it is not obvious how to separate those effects experimentally, and we agree that phenotypic variation in the cool-sensitivity trait itself could be significant in regulating diapause.

      (2) On p.10, the authors conclude that "Dip-𝛾 and sbb are required in neurons for successful diapause, consistent with the enrichment of this gene class in the diapause GWAS." While I acknowledge that the results support their neuronal functions, I remain unconvinced that these genes are required for "successful diapause". According to the RNAi scheme (Figure 4I), Dip-γ and sbb are downregulated only during the post-diapause period, but still show a significant effect, comparable to that seen in the nSyb Gal4 RNAi lines (Figure 4K).

      Our definition of successful diapause is the ability to produce viable adult progeny post-diapause, which requires that the flies enter, maintain, and exit diapause, alive and fertile. We will restate our conclusion to say that Dip-γ and sbb are required for post-diapause fecundity.

      In addition, two other RNAi lines (SH330386, 80461) that did not show lethality did not affect post-diapause fecundity.

      We interpret those results to mean that those RNAi lines were not effective since Dip-γ and sbb are known to be essential.

      Notably, RNAi (27049, KK104056) substantially reduced non-diapause fecundity, suggesting impairment of these genes affects fecundity in general regardless of diapause experience. Therefore, the reduced post-diapause fecundity observed may be a result of this broader effect on fecundity, particularly in a more "sensitized" state during the post-diapause period, rather than a direct regulation of adult diapause by these genes.

      Ubiquitous expression of RNAi lines #27049 or #KK104056 was lethal, so we included the tubGAL80ts repressor to prevent RNAi from taking effect during development. Flies had to be shifted to 30 °C to inactivate the repressor and thereby activate the RNAi. At 30 °C, fecundity of the controls (GFP RNAi lines #9331, KK60102) were also lower (average non-diapause fecundity = 12 and 19 respectively) and similar to #27049 or #KK104056. We also assessed the knockdown using Repo GAL4 and nSyb GAL4 and did not find a significant difference/decline in the non diapause fecundity for #27049 and #KK104056 as compared to a nonspecific RNAi control (#54037).

      (3) The authors characterized 546 genetic variants and 291 genes associated with phenotypic variation across DGRP lines but did not prioritize them by significance. They did prioritize candidate genes with multiple associated variants (p.9 "Genes with multiple SNPs are good candidates for influencing diapause traits."), but this is not a valid argument, likely due to a misunderstanding of LD among variants in the same gene. A gene with one highly significantly associated variant may be more likely to be the causal gene in a QTL than a gene with many weakly associated variants in LD. I recommend taking significance into account in the analysis.

      We agree with the reviewer, and in Supplemental Table S3 we list top-associated SNPs in order from the lowest (most significant) p-value. Most of the top-associated genes from this analysis were uncharacterized CG numbers for which there were insufficient tools available for validation purposes. Nevertheless, there is overlap amongst the highly significant genes by p-value and those with multiple SNPs. Amongst the top 15 genes with multiple associated SNPs- CG18636 & CR15280 ranked 3rd by p-value, CG7759 ranked 4th, CG42732 ranked 10th, and Drip ranked 30th (all above the conservative Bonferroni threshold of 4.8e-8) while three Sbb-associated SNPs also appear in Table 3 above the standard e-5 threshold.

      Reviewer #3 (Public Review):

      Summary:

      Drosophila melanogaster of North America overwinters in a state of reproductive diapause. The authors aimed to measure 'successful' D. melanogaster reproductive diapause and reveal loci that impact this quantitative trait. In practice, the authors quantified the number of eggs produced by a female after she exited 35 days of diapause. The authors claim that genes involved with olfaction in part contribute to some of the variation in this trait.

      Strengths:

      The work used the power platform of the fly DRGP/GWAS. The work tried to verify some of the candidate loci with targeted gene manipulations.

      Weaknesses:

      Some context is needed. Previous work from 2001 established that D. melanogaster reproductive diapause in the laboratory suspends adult aging but reduces post-diapause fecundity. The work from 2001 showed the extent fecundity is reduced is proportional to diapause duration. As well, the 2001 data showed short diapause periods used in the current submission reduce fecundity only in the first days following diapause termination; after this time fecundity is greater in the post-diapause females than in the non-diapause controls.

      The 2001 paper by Tatar et al. reports the number of eggs laid after 3, 6, or 9 weeks in diapause conditions. Thus the diapause conditions used in this study (35 days or 5 weeks) are neither short nor long, rather intermediate. Does the reviewer have a specific concern?

      In this context, the submission fails to offer a meaningful concept for what constitutes 'successful diapause'. There is no biological rationale or relationship to the known patterns of post-diapause fecundity. The phenotype is biologically ambiguous.

      We have unambiguously defined successful diapause as the ability to produce viable adult progeny post-diapause. Other groups have measured % of flies that arrest ovarian development or % of post-diapause flies with mature eggs in the ovary, or # eggs laid post-diapause; however we suggest that # of viable adult progeny produced post-diapause is more meaningful than the other measurements from the point of view of perpetuating the species.

      I have a serious concern about the antenna-removal design. These flies were placed on cool/short days two weeks after surgery. Adults at this time will not enter diapause, which must be induced soon after eclosion. Two-week-old adults will respond to cool temperatures by 'slowing down', but they will continue to age on a time scale of day-degrees. This is why the control group shows age-dependent mortality, which would not be seen in truly diapaused adults. Loss of antennae increases the age-dependent mortality of these cold adults, but this result does not reflect an impact on diapause.

      We carried out the lifespan study under two different conditions. We either removed the antenna and moved the flies directly to 10 °C or we removed the antenna and allowed a “wound healing” period prior to moving the flies to 10 °C (out of concern that the flies might die quickly because wound healing may be impaired at 10 °C). In both cases, antenna removal shortened lifespan. Furthermore the lifespan extension at 10 °C was similar regardless of whether flies had experienced two weeks at 25 °C or not.

      • Appraisal of whether the authors achieved their aims, and whether the results support their conclusions.

      The work falls well short of its aim because the concept of 'successful diapause' is not biologically established. The paper studies post-diapause fecundity, and we don't know what that means. The loci identified in this analysis segregate for a minimally constructed phenotype. The results and conclusions are orthogonal.

      It is unclear to us why the reviewer has such a negative opinion of measuring post-diapause fecundity, specifically the ability to produce viable progeny post-diapause. The value of this measurement seems obvious from the point of view of perpetuating the species.

      • The likely impact of the work on the field, and the utility of the methods and data to the community.

      The work will have little likely impact. Its phenotype and operational methods are weakly developed. It lacks insight based on the primary literature on post-diapause. The community of insect diapause investigators are not likely to use the data or conclusions to understand beneficial or pest insects, or the impact of a changing climate on how they over-winter.

      The reviewer has not explained why his/her opinion is so negative.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Perform an ANOVA to estimate heritability.

      We will do this.

      (2) List the number of DGRP lines tested.

      193

      Reviewer #2 (Recommendations For The Authors):

      [Minor suggestions]

      (1) Check Drosophila italics

      We will do this.

      (2) It would be informative to include the number of DGRP lines used in this study in the Results and Methods section.

      We will include the information that we assessed 193 DGRP lines.

      (3) Figure 1C - several dots are missing at the top of the line.

      We will correct.

      (4) Figures 1E, F - Why use a discontinuous histogram for continuous distribution? Consider using a continuous histogram (e.g. Lafuente et al. (2018) Figure 1C).

      We will do this.

      (5) Figure 1F - Why have fewer bins than panel E?

      Figure 1F is normalized post-diapause fecundity. Individual post-diapause fecundity was normalized to the mean non-diapause fecundity. Then the normalized individual post-diapause fecundity was averaged to get the mean normalized post-diapause fecundity for the DGRP line. So the bins are different in panel E. Please refer to Supplemental Table S1.

      (6) Figure 2D - It would be informative to have fold enrichment stats.

      The following will be added in the methods section: The Gene Ontology (GO) categories and Q-values from the false discovery rate (FDR)-corrected hypergeometric test for enrichment are reported. Additionally, coverage ratios for the number of annotated genes in the displayed network versus the number of genes with that annotation in the genome are provided. GeneMANIA estimates Q-values using the Benjamini-Hochberg procedure.

      (7) Supplementary table (Table S5) or supplemental table (other supplementary tables)? Need consistency (to Supplementary?)

      We will change ‘Supplementary Table S5’ to ‘Supplemental Table S5’.

      (8) Figure 5D,E - unused ticks on the x-axis.

      The unused ticks on the x-axis will be removed from Figures 5D and E.

      Reviewer #3 (Recommendations For The Authors):

      • Suggestions for improved or additional experiments, data or analyses.

      The authors cannot redo the GWAS with an alternative trait that might better reflect 'successful diapause', and I am not even sure what such a trait would involve or mean. Given this limitation, the authors should consider how they can conduct additional experiments to better define, justify, and elaborate how post-diapause reproduction relates to the mechanisms, processes, depth, and 'success' of diapause.

      We agree that it is entirely unclear what trait would be a better measure of successful diapause. Other investigators might have chosen to measure something different but there is no reason why a different choice would be a better choice. We do not believe that this is a “limitation.” We believe that we have unambiguously defined and justified  post-diapause reproduction as a measurement of successful diapause with respect to perpetuating the species through a stressful period.

      • Recommendations for improving the writing and presentation.

      The mechanics of the writing are fine, aside from some typos/grammar issues. But, the paper is conceptually superficial and tautological. It claims to provide a 'stringent criterion' for 'successful diapause', then measures an unjustified trait, then claims this demonstrates variation for 'successful diapause'.

      We respectfully disagree with this opinion.

      This story is conducted without reference to prior, primary literature or on the mechanisms of reproductive diapause. The presentation may be improved by considering the literature and precedence for what and how reproductive diapause is induced, maintained, and terminated ... in many insects as well as Drosophila

      We will revisit our citations of the literature and apologize for any inadvertent omissions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      In our initial submission, reviewers highlighted that the major limitations of our study were related to both the number of minibinders tested as well as the number of optimizations we evaluated for improving minibinder function. In this revision, we have focused on expanding the minibinders tested. To do so, we selected two previously published minibinders against the epidermal growth factor receptor (EGFR). Selection of EGFR as a target enabled us to evaluate two minibinders that bind at different sites, unlike the previously evaluated binders LCB1 and LCB3 which both bind the same interface on SARS-CoV-2 Spike. Further, using EGFR as a target enabled us to qualitatively compare the efficacy of minibinder-coupled chimeric antigen receptors against an existing anti-EGFR CAR. We believe the results here demonstrate broader generalizability of our approach across binding sites, targets, and minibinders. We hope this addition is sufficient to convince future would-be users of these tools to attempt synthetic receptor engineering using minibinders against their protein of choice.

      Reviewers made comments about the presentation of flow data and the use of statistics throughout the manuscript. We did not modify how flow data are presented as the density plots we used are common throughout the field. We have opted to not include statistics – we believe that in the case of most of the experiments we show, our findings are obvious. In cases where statistics would be helpful for discerning whether subtle effects are real – for example, comparing the linker-based optimizations or comparing the anti-EGFR CARs – we believe that other experimental factors like construct expression are sufficient confounds that even in the presence of statistically significant effects we would be leading readers astray to make such claims about our data. As such, we have sought to limit the claims we make and hope that reviewers and audience agree we do not over interpret our data without statistical support.

      On more minor points, both reviewers addressed the differences in Figure 5A and 5C, which we addressed in our figure legend and in the previous response to reviews is the result of these data originating from different time points of the same assay. Reviewer #2 believed we should be more staid in our comments about linker optimality, which we have addressed by changing the referenced line in the discussion. Otherwise, we have made no modifications to figures or text beyond the addition of new data.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We addressed the issue of “tolerability” in our answers to Reviewer 2 and in the revised manuscript where we had added data concerning tolerability, see the paragraph in the Results Section, page 11:

      "Finally, tolerability studies were performed with the administration of up to 20 and 40 mg/kg eq. NT (i.e. 25.8 and 51.6 mg/kg of VH-N412) with n=3 for these doses. The rectal temperature of the animals did not fall below 32.5 to 33.2°C, similar to the temperature induced with the 4 mg/kg eq. NT dose. We observed no mortality or notable clinical signs other than those associated with the rapid HT effect such as a decrease in locomotor activity. We thus report a very interesting therapeutic index since the maximal tolerated dose (MTD) was > 40 mg/kg eq. NT, while the maximum effect is observed at a 10x lower dose of 4 mg/kg eq. NT and an ED50 established at 0.69 mg/kg as shown in Figure 1G.”

      We have slightly modified the paragraph above to emphasize that the tolerability studies were performed in “naïve mice”. 

      "Finally, tolerability studies were performed in naïve mice with the administration of up to 20 and 40 mg/kg eq. NT (i.e. 25.8 and 51.6 mg/kg of VH-N412) with n=3 for these doses. The rectal temperature of the animals did not fall below 32.5 to 33.2°C, similar to the temperature induced with the 4 mg/kg eq. NT dose. We observed no mortality or notable clinical signs other than those associated with the rapid HT effect such as a decrease in locomotor activity. We thus report a very interesting therapeutic index since the maximal tolerated dose (MTD) was > 40 mg/kg eq. NT, while the maximum effect is observed at a 10x lower dose of 4 mg/kg eq. NT and an ED50 established at 0.69 mg/kg as shown in Figure 1G.”

      We propose to add a sentence in the Results section, page 11, relative to the fact that we can also induce severe hypothermia in rats using conjugates similar to VH-N412.

      We also added in the Discussion section (page 38) that we could induce hypothermia with different conjugates in mice, rats and pigs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Some of the figures are of rather poor quality. For example, the H&E and Sirius Red stainings in Figures 3 and 4 are quite poor so it is difficult to see what is going on in the muscles. The authors should take note of another publication on dy3K/dy3K mice of similar age (PMID: 31586140) where such images are of much higher quality. Similarly, the Western blot for laminin-alpha2 (Figure 4B) of the wild-type mouse needs improvement. If the single laminin-alpha2 protein is not detected, there is an issue with the denaturation buffer used to load the protein.

      Thank you for the valuable suggestions. We have read the study on dy3K/dy3K mice of similar age (PMID: 31586140) which showed dystrophic changes in dy3K/dy3K muscle throughout the disease course with the whole muscle and representative muscle area. We have generated new figures with higher quality including the whole muscle and representative muscle area for the H&E and Sirius Red stainings.  However, due to the large images, we have added them in the new Figure supplement 2 and Figure supplement 3. Also, we have changed the denaturation buffer used to load the protein, and performed Western blot of laminin α2, the result of the laminin α2 protein of the wild-type mice (n =3) and dyH/dyH mice (n =3) detected by Western blot has been showed in Figure 4B.

      (2) My biggest concern is, however, the many overstatements in the manuscript and the over-interpretation of the data. This already starts with the first sentence in the abstract where the authors write: "Understanding the underlying pathogenesis of LAMA2- related muscular dystrophy (LAMA2-MD) have been hampered by lack of genuine mouse model." This is not correct as the dy3K/dy3K, generated in 1997 (PMID: 9326364), are also Lama2 knockout mice; there are also other strains (dyW/dyW mice) that are severely affected and there are the dy2J/dy2J mice that represent a milder form of LAMA2-MD. Similarly, the last two sentences of the abstract "This is the first reported genuine model simulating human LAMA2-MD. We can use it to study the molecular pathogenesis and develop effective therapies." are a clear overstatement. The mechanisms of the disease are well studied and the above-listed mouse models have been amply used to develop possible treatment options. The overinterpretation concerns the results from transcriptomics. The fact that Lama2 is expressed in particular cell types of the brain does not at all imply that Lama2 knockout mice have a defect in the blood-brain barrier as the authors state. If there are no functional data, this cannot be stated. Indications for a blood-brain barrier defect come from work in dy3K/dy3K mice (PMID: 25392494) and this needs to be written like this.

      Thank you for your comment and sorry for the overstatements in the manuscript. We have carefully considered our previous statements and corrected them accordingly. We have changed the first sentence in the abstract into "Our understanding of the molecular pathogenesis of LAMA2-related muscular dystrophy (LAMA2-MD) requires improving". Also, we have changed the last two sentences in the abstract with "In summary, this study provided useful information for understanding the molecular pathogenesis of LAMA2-MD".

      We also agree that "Lama2 is expressed in particular cell types of the brain does not at all imply that Lama2 knockout mice have a defect in the blood-brain barrier", and the indications for a blood-brain barrier defect come from work in dy3K/dy3K mice (PMID: 25392494). Therefore, we have corrected the overstatement according to the suggestion with "It was reported that the deficiency of laminin α2 in astrocytes and pericytes was associated with a defective blood-brain barrier (BBB) in the dy3K/dy3K mice (Menezes et al., 2014). The defective BBB presented with altered integrity and composition of the endothelial basal lamina, reduced pericyte coverage, and hypertrophic astrocytic endfeet lacking appropriately polarized aquaporin4 channels."

      (3) Finally, the bulk RNA-seq data also needs to be presented in a disease context. The authors, again, mix up changes in expression with functional impairment. All gene expression changes are interpreted as direct evidence of an involvement of the cytoskeleton. In fact, changes in the cytoskeleton are more likely a consequence of the severe muscle phenotype and the delay in muscle development. This is particularly possible as muscle samples from 14-day-old mice are compared; a stage at which muscle still develops and grows tremendously. Thus, all the data need to be interpreted with caution.

      Thank you for your comment. We have changed the over-interpretation of the bulk RNA-seq data, and have corrected the last sentence in the Result with "These observations important data for the impaired muscle cytoskeleton and abnormal muscle development which were associated with the muscle pathology consequence of severe dystrophic changes in the dyH/dyH mice.".

      (4) In summary, the authors need to improve data presentation and, most importantly, they need to tone down the interpretation and they must be fully aware that their work is not as novel as they present it.

      Thank you for your comments and valuable suggestions, and we have changed the previous overstatements and interpretation of the results. We are sorry that we failed to clearly present our rational of making this mouse model. Indeed, there were many existing mouse models, which were all important to the research in the field. One of the reasons why we wished to create dyH/dyH is to make a mouse model without any trace of engineering (e.g., inserted bacterial elements for knockout). By doing so, we were hoping to provide a novel model suited for gene-editing-based gene therapy development. To this end, dyH/dyH was created to reflect the hot mutation region in the Chinese population. Hopefully, you will agree with our points and see that we were not trying to belittle previous models but were simply trying to provide a different option. The overstatements were largely rooted from language barriers, and we have tried to make our statements more cautious and acceptable to the readers.

      Reviewer #2 (Public Review):

      (1) The major weakness is the manuscript reads like this was the first-ever knockout mouse model generated for LAMA2-CMD. There are in fact many Lama2 knockout mice (dy, dy2J, dy3k, dyW, and more) which have all been extensively studied with publications. It is important for the authors to comment on these other published studies that have generated these well-studied mouse lines. Therefore, there is a lack of background information on these other Lama2 null mice.

      Thank you for your comment. We have added background information on these other Lama2 null mice with the sentences "The most common mouse models for LAMA2-MD are the dy/dy, dy3k/dy3k, dyw/dyw and dy2J/dy2J mice (Xu et al., 1994; Michelson et al., 1995; Miyagoe et al., 1997; Kuang et al., 1998; Sunada et al., 1995). Among them, the dy/dy, dy3k/dy3k, dyw/dyw mice present severe muscular dystrophy, and dy2J/dy2J mice show mild muscular dystrophy and peripheral neuropathy (Gawlik and Durbeej, 2020). The mutation of the dy/dy mice has been still unclear (Xu et al., 1994; Michelson et al., 1995). The dy3k/dy3k mice were generated by inserting a reverse Neo element in the 3' end of exon 4 of Lama2 gene in 1997 (Miyagoe et al., 1997), and the dyw/dyw mice were created with an insertion of lacZ-neo in the exon 1 of Lama2 gene in 1998 (Kuang et al., 1998). The dy2J/dy2J mice were generated in 1970 by a spontaneous splice donor site mutation which resulted in a predominant transcript with a 171 base in-frame deletion, leading to the expression of a truncated laminin α2 with a 57 amino acid deletion (residues 34-90) and a substitution of Gln91Glu (Sunada et al., 1995). They were established in the pre-gene therapy era, leaving trace of engineering, such as bacterial elements in the Lama2 gene locus, thus unsuitable for testing various gene therapy strategies. Moreover, insufficient transcriptomic data of the muscle and brain of LAMA2-CMD mouse models limits the understanding of disease hallmarks. Therefore, there is a need to create new appropriate mouse models for LAMA2-CMD based on human high frequently mutated region using the latest gene editing technology such as clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9."

      (2) The phenotypes of dyH/dyH are similar to, if not identical to dy/dy, dy2J/dy2J, dy3k/dy3k, dyW/dyW including muscle wasting, muscle weakness, compromised blood-brain barrier, and reduced life expectancy. This should be addressed, and a comparison made with Lama2 deficient mice in published literature.

      Thank you for your comment. We have added Table supplement 3 to make a comparison between dyH/dyH with other Lama2 deficient mice. We aslo have added the statement in Discussin with "Compared with other Lama2 deficient mice including dy/dy, dy2J/dy2J, dy3k/dy3k and dyW/dyW, the phenotype of the dyH/dyH mice presented with a very severe muscular dystrophy, which was similar to that of the dy3k/dy3k mice (Table supplement 3)."

      (3) Recent published studies (Chen et al., Development (2023), PMID 36960827) show loss of Itga7 causes disruption of the brain-vascular basal lamina leading to defects in the blood-brain barrier. This should be referenced in the manuscript since this integrin is a major Laminin-211/221 receptor in the brain and the mouse model appears to phenocopy the dyH/dyH mouse model.

      Thank you for your great suggestion. We have cited the published studies (Chen et al., Development (2023), PMID 36960827) and added statements in Discussion with "As reported, the aberrant BBB function was also associated with the adhesion defect of alpha7 integrin subunit in astrocytes to laminins in the Itga_7-/- mice (_Chen et al., 2023). In this study, loss of communications involving the laminins’ pathway between laminin α2 and integrins were predicted between vascular and leptomeningeal fibroblasts and astrocytes in the dyH/dyH brain, providing more evidence for the impaired BBB due to laminin α2 deficiency."

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Improve the data presentation (as mentioned above). Make a new picture of the histology; repeat the Western blots. Discuss the RNA-seq data with more caution and present it in a more attractive way. Tone down the wording.

      Thank you for your recommendations. We have revised the overstatements and improved the RNA-seq data interpretation as suggested. Also,we have made a new picture of the histology, and repeated the Western blots.

      Reviewer #2 (Recommendations For The Authors)

      (1) There are many grammatical errors within the manuscript. The manuscript should be carefully proofread.

      Thank you for your recommendations. We have carefully corrected the grammatical errors within the manuscript.

      (2) Figure 2: The animal numbers used in this analysis were not indicated. Please include this number in the figure legend.

      Thank you for your recommendations. We have added animal numbers in the figure legends wherever applicable.

      (3) Figure 2: The forelimb grip strength is informative but has limitations. Ex vivo or in vivo muscle contractility is the gold standard for measuring muscle strength.

      Thank you for your recommendations. We do agree that the ex vivo or in vivo muscle contractility is the gold standard for measuring muscle strength, and we really want to finish this experiment. However, we feel sorry that this test has not been finished due to the following reasons: (1) The forelimb grip strength for measuring muscle strength is a classic method and remains a commonly used method for measuring mouse muscle strength in the studies of different muscular dystrophies, such as LAMA2-MD (Amelioration of muscle and nerve pathology of Lama2-related dystrophy by AAV9-laminin-αLN linker protein. JCI Insight. 2022;7(13):e158397. PMID: 35639486), Duchenne muscular dystrophy (Investigating the role of dystrophin isoform deficiency in motor function in Duchenne muscular dystrophy. J Cachexia Sarcopenia Muscle. 2022;13(2):1360-1372. PMID: 35083887), facioscapulohumeral muscular dystrophy (Systemic delivery of a DUX4-targeting antisense oligonucleotide to treat facioscapulohumeral muscular dystrophy. Mol Ther Nucleic Acids. 2021;26:813-827. PMID: 34729250), and etc. (2) The forelimb grip strength for measuring muscle strength is also used in the human studies (PMID: 32366821; PMID: 29313844; PMID: 34499663, and etc). In view of reasons above, for measuring muscle strength, we used the forelimb grip strength, and have not finished the supplementary experiment of ex vivo or in vivo muscle contractility.

      (4) Figure 3: Muscle fibrosis should be measured with a hydroxyproline assay.

      Thank you for your recommendations. We do agree that the hydroxyproline assay is one of the most classic method to evaluate collagen content for measuring muscle fibrosis. However, we performed Sirius Red staining for measuring muscle fibrosis due to the following reasons: (1) Muscle fibrosis measured by Sirius Red staining can be observed more directly, and the other pathological features also can be observed, and compared through muscle pathology. (2) Sirius Red staining is also a classic method and remains a commonly used method for measuring muscle fibrosis, which has been previously reported in the mouse studies of muscle disorders, such as PMID: 22522482 (Losartan, a therapeutic candidate in congenital muscular dystrophy: studies in the dy(2J) /dy(2J) mouse. Ann Neurol. 2012;71(5):699-708.), PMID: 34337906 (Aging-related hyperphosphatemia impairs myogenic differentiation and enhances fibrosis in skeletal muscle. J Cachexia Sarcopenia Muscle. 2021;12(5):1266-1279.), PMID: 28798156 (Phosphodiesterase 4 inhibitor and phosphodiesterase 5 inhibitor combination therapy has antifibrotic and anti-inflammatory effects in mdx mice with Duchenne muscular dystrophy. FASEB J. 2017;31(12):5307-5320.), and etc. Therefore, we used Sirius Red staining to measure muscle fibrosis in this study.

      (5) Figure 8: The N=3 is very low which could result in type I or II statistical errors. A larger sample size will reduce the chance of statistical errors.

      Thank you for your recommendations. We have increased the number of animals to reduce the chance of statistical errors. We have performed the supplementary experiment, the number of animals for each group has been increased to 6 (3 male and female each).  The results were consistent with previous data in Figure 8.

      (6) Power analysis to estimate experimental animal numbers should be reported in the manuscript.

      Thank you for your recommendations. Refer to previous study (Power and sample size. Nature Methods. 2013;10:1139–1140), “The distributions show effect sizes d = 1, 1.5 and 2 for n = 3 and α = 0.05. Right, power as function of d at four different a values for n = 3”, and “If we average seven measurements (n = 7), we are able to detect a 10% increase in expression levels (μ_A = 11, _d = 1) 84% of the time with α = 0.05.”, the experimental animal numbers estimated were 3 to 7. Moreover, if the increased number of experimental animals could be available, we would retain data.

      (7) It is unclear if the studies were performed with adequate rigor. Were those scoring outcome measures blinded to the treatment groups?

      Thank you for your recommendations. We performed the studies with those scoring outcome measures not blinded to the treatment groups, the groups were based on their genotype. Actually, it was easy to discriminate the dyH/dyH groups from the WT/Het mice due to their small body shape.

      (8) Authors should appropriately cite previous studies that have generated Lama2 null mice.

      Thank you for your recommendations. We have cited previous studies that have generated Lama2 null mice with the sentence “The most common mouse models for LAMA2-MD are the dy/dy, dy3k/dy3k, dyw/dyw and dy2J/dy2J mice (Xu et al., 1994; Michelson et al., 1995; Miyagoe et al., 1997; Kuang et al., 1998; Sunada et al., 1995)”.

      (9) The number of animals should be increased to reduce the chance of statistical error.

      Thank you for your recommendations. We have performed the supplementary experiment, the number of animals for each group has been increased to reduce the chance of statistical error.

      (10) A power analysis should be performed to determine the number of experimental animals.

      Thank you for your recommendations. We have performed a power analysis to determine the number of experimental animals as mentioned above.

      (11) There are many grammatical errors within the manuscript. The manuscript should be carefully proofread.

      Thank you for your recommendations. We have carefully corrected the grammatical errors within the manuscript.

    1. Author Response:

      Reviewer #1 (Public review):

      Summary:

      Fallah and colleagues characterize the connectivity between two basal ganglia output nuclei, the SNr and GPe, and the pedunculopontine nucleus, a brainstem nucleus that is part of the mesencephalic locomotor region. Through a series of systematic electrophysiological studies, they find that these regions target and inhibit different populations of neurons, with anatomical organization. Overall, SNr projects to PPN and inhibits all major cell types, while the GPe inhibits glutamatergic and GABAergic PPN neurons, and preferentially in the caudal part of the nucleus. Optogenetic manipulation of these inputs had opposing effects on behavior - SNr terminals in the PPN drove place aversion, while GPe terminals drove place preference.

      Strengths:

      This work is a thorough and systematic characterization of a set of relatively understudied circuits. They build on the classic notions of basal ganglia connectivity and suggest a number of interesting future directions to dissect motor control and valence processing in brainstem systems.

      We thank the reviewers for these positive comments.

      Weaknesses:

      Characterization of the behavioral effects of manipulations of these PPN input circuits could be further parsed, for a better understanding of the functional consequences of the connections demonstrated in the ephys analyses.

      We will further analyze our behavioral data to reveal more nuanced functional effects.

      All the cell type recording studies showing subtle differences in the degree of inhibition and anatomical organization of that inhibition suggest a complex effect of general optogenetic manipulation of SNr or GPe terminals in the PPN. It will be important to determine if SNr or GPe inputs onto a particular cell type in PPN are more or less critical for how the locomotion and valence effects are demonstrated here.

      This is a really interesting future direction and we will expand on these points in the discussion.

      Reviewer #2 (Public review):

      Summary:

      Fallah et al carefully dissect projections from SNr and GPe - two key basal ganglia nuclei - to the PPN, an important brainstem nucleus for motor control. They consider inputs from these two areas onto 3 types of downstream PPN neurons: GABAergic, glutamatergic, and cholinergic neurons. They also carefully map connectivity along the rostrocaudal axis of the PPN.

      Strengths:

      The slice electrophysiology work is technically well done and provides useful information for further studies of PPN. The optogenetics and behavioral studies are thought-provoking, showing that SNr and GPe projections to PPN play distinct roles in behavior.

      We appreciate the reviewer’s positive evaluation.

      Weaknesses:

      Although the optogenetics and behavioral studies are intriguing, they are somewhat difficult to fit together into a specific model of circuit function. Perhaps the authors can work to solidify the connection between these two arms of the work.

      We will expand on these topics in the discussion.

      (1) Male and female mice are used, but the authors do not discuss any analysis of sex differences. If there are no sex differences, it is still useful to report data disaggregated by sex in addition to pooled data.

      While we do not have sufficient n for a well-powered analysis of sex differences in behavior, we find that both male and female mice increase movement in response to SNr axon stimulation and decrease movement in response to GPe axon stimulation. We will expand on this further in the revised manuscript.

      (2) There is some lack of clarity in the current manuscript on the ages used - 2-5 months vs "at least 7 weeks." Is 7 weeks the time of virus injection surgery, then recordings 3 weeks later (at least 10 weeks)? Please clarify if these ages apply equally to electrophysiological and behavioral studies. If the age range used for the test is large, it may be useful to analyze and report if there are age-related effects.

      7 weeks is the youngest age at which mice used for electrophysiology were injected, and all were used for electrophysiology between 2-5 months. For behavior, the youngest mice used were 11 weeks old at time of behavior (8 weeks old at injection). Mice in the GPe-stimulated condition were 110 ± 7.4 SEM days old and mice in the SNr-stimulated condition 132 ± 23.4 SEM days old. We will add these details to the revised manuscript.

      In addition, we have correlated distance traveled at baseline and during stimulation with age for both SNr and GPe stimulated conditions. Baseline distance traveled did not correlate with age, but there was a trend toward more movement during stimulation with older mice in the SNr axon stimulation group. We will discuss this in the revised manuscript.

      (3) Were any exclusion criteria applied, e.g. to account for missed injections?

      All injection sites and implant sites were within our range of acceptability, so we did not exclude any mice for missed injections.

      (4) 28-34degC is a fairly wide range of temperatures for electrophysiological recording, which could affect kinetics.

      This is an important consideration. We have checked our main measurement of current amplitude in the condition where we found significant differences between rostral and caudal PPN (SNr to Vglut2 PPN neurons) against temperature and found no correlation (Pearson’s r value = -0.0076). Similarly, we found no correlation between baseline (pre-opto) firing frequency and temperature (r = -0.068).

      (5) It would be good to report the number of mice used for each condition in addition to n=cells. Statistically, it would be preferable not to assume that each cell from the same mouse is an independent measurement and to use a nested ANOVA.

      For electrophysiology, the number of mice used in each experiment was 6 (3 male, 3 female). In the manuscript ‘N’ represents number of mice and ‘n’ represents number of cells. Because of the unpredictability of how many healthy cells can be recorded from one mouse, our data were planned to be collected with n=cells, and are underpowered for a nested ANOVA. However, rostral and caudal data were collected from the same mice. While we do not have sufficient paired data for each parameter, analyzing one of our main and most important findings with a paired comparison (with biological replicates being mice) shows a statistically significant difference in the inhibitory effect of SNr axon stimulation on firing rate between rostral and caudal glutamatergic neurons (p=0.031, Wilcoxon signed rank test).

      Reviewer #3 (Public review):

      Summary:

      The study by Fallah et al provides a thorough characterization of the effects of two basal ganglia output pathways on cholinergic, glutamatergic, and GABAergic neurons of the PPN. The authors first found that SNr projections spread over the entire PPN, whereas GPe projections are mostly concentrated in the caudal portion of the nucleus. Then the authors characterized the postsynaptic effects of optogenetically activating these basal ganglia inputs and identified the PPN's cell subtypes using genetically encoded fluorescent reporters. Activation of inputs from the SNr inhibited virtually all PPN neurons. Activation of inputs from the GPe predominantly inhibited glutamatergic neurons in the caudal PPN, and to a lesser extent GABAergic neurons. Finally, the authors tested the effects of activating these inputs on locomotor activity and place preference. SNr activation was found to increase locomotor activity and elicit avoidance of the optogenetic stimulation zone in a real-time place preference task. In contrast, GPe activation reduced locomotion and increased the time in the RTPP stimulation zone.

      Strengths:

      The evidence of functional connectivity of SNr and GPe neurons with cholinergic, glutamatergic, and GABAergic PPN neurons is solid and reveals a prominent influence of the SNr over the entire PPN output. In addition, the evidence of a GPe projection that preferentially innervates the caudal glutamatergic PPN is unexpected and highly relevant for basal ganglia function.

      Opposing effects of two basal ganglia outputs on locomotion and valence through their connectivity with the PPN.

      Overall, these results provide an unprecedented cell-type-specific characterization of the effects of basal ganglia inputs in the PPN and support the well-established notion of a close relationship between the PPN and the basal ganglia.

      We thank the reviewer for their positive comments.

      Weaknesses:

      The behavioral experiments require further analysis as some motor effects could have been averaged out by analyzing long segments.

      We will further analyze our motor effects in the revised manuscript.

      Additional controls are needed to rule out a motor effect in the real-time place preference task.

      This is an important point. Our use of unilateral stimulation in the RTPP task reduces potential motor effects, and our supplemental videos show that the mice can easily escape and enter the stimulated zone. However, we can't completely rule out a motor component. To delve into this further, we analyzed mouse speed in the RTPP task. We find that in both SNr and GPe stimulation conditions, the maximum speed of the mouse is not different in the stimulated vs unstimulated zone. We will further analyze mouse speed at the transition into and out of the stimulated zone to identify any acute motor effects in this experiment.

      Importantly, the location of the stimulation is not reported even though this is critical to interpret the behavioral effects.

      The implant locations were generally over the middle-to-rostral PPN and we will clarify this in the revised manuscript. These locations are shown in figure 7B.

      There are some concerns about the possible recruitment of dopamine neurons in the SNr experiments.

      We are very interested in this possibility and plan to discuss this with more clarity in a revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors): 

      This is not a recommendation. While reading old literature, I found some interesting facts. The shape of the neurocranium in monotremes, birds, and mammals, at least in early stages, resembles the phenotype of 'dact'1/2, wnt11f2, or syu mutants. For more details, see DeBeer's: 'The Development of the Vertebrate Skull, !937' Plate 137. 

      Thank you for pointing this out. It is indeed interesting.

      Minor Comments: 

      • Lines 64, 66, and 69: same citation without interruption: Heisenberg, Brand et al. 1996

      Revised line 76. 

      • Lines 101 and 102: same citation without interruption: Li, Florez et al. 2013 

      Revised line 118.

      • Lines 144, 515, 527, and 1147: should be wnt11f2 instead of wntllf2 - if not, then explain 

      Revised lines 185, 625, 640,1300.

      • Lines 169 and 171: incorrect figure citation: Fig 1D - correct to Fig 1F 

      Revised lines 217, 219.

      • Line 173: delete (Fig. S1) 

      Revised line 221.

      • Line 207: indicate that both dact1 and dact2 mRNA levels increased, noting a 40% higher level of dact2 mRNA after deletion of 7 bp in the dact2 gene 

      Revised line 265.

      • Line 215: Fig 1F instead of Fig 1D 

      Revised line 217.

      • Line 248: unify naming of compound mutants to either dact1/2 or dact1/dact2 compound mutants 

      Revised to dact1/2 throughout.

      • Line 259: incorrect figure citation: Fig S1 - correct to Fig S2D/E 

      Revised line 324.

      • Line 302: correct abbreviation position: neural crest (NCC) cell - change to neural crest cell (NCC) population 

      Revised line 380.

      • Line 349: repeating kny mut definition from line 70 may be unnecessary 

      Revised line 434.

      • Line 351: clarify distinction between Fig S1 and Fig S2 in the supplementary section 

      Revised line 324.

      • Line 436: refer to the correct figure for pathways associated with proteolysis (Fig 7B) 

      Revised line 530.

      • Line 446-447: complete the sentence and clarify the relevance of smad1 expression, and correct the use of "also" in relation to capn8 

      Revised line 567.

      • Line 462: clarify that this phenotype was never observed in wildtype larvae, and correct figure reference to exclude dact1+/- dact2+/- 

      Revised line 563, 568.

      • Line 463: explain the injection procedure into embryos from dact1/2+/- interbreeding 

      Revised line 565.

      • Lines 488 and 491: same citation without interruption: Waxman, Hocking et al. 2004 

      Revised line 591.

      • Line 502: maintain consistency in referring to TGF-beta signaling throughout the article 

      Revised throughout.

      • Line 523: define CNCC; previously used only NCC 

      Revised to cranial NCC throughout.

      • Line 1105: reconsider citing another work in the figure legend 

      Revised line 1249.

      • Line 1143: consider using "mutant" instead of "mu" 

      Revised line 1295.

      • Fig 2A/B: indicate the number of animals used ("n") 

      N is noted on line 1274.

      • Fig 2C, D, E: ensure uniform terminology for control groups ("wt" vs. "wildtype") 

      Revised in figure.

      • Fig 7C: clarify analysis of dact1/2-/- mutant in lateral plate mesoderm vs. ectoderm 

      Revised line 1356.

      • Fig 8A: label the figure to indicate it shows capn8, not just in the legend 

      Revised.

      • Fig 8D: explain the black/white portions and simplify to highlight important data 

      Revised.

      • Fig S2: add the title "Figure S2" 

      Revised.

      • Consider omitting the sentence: "As with most studies, this work has contributed some new knowledge but generated more questions than answers." 

      Revised line 720.

      Reviewer #2 (Recommendations For The Authors): 

      Major comments: 

      (1) The authors have addressed many of the questions I had, including making the biological sample numbers more transparent. It might be more informative to use n = n/n, e.g. n = 3/3, rather than just n = 3. Alternatively, that information can be given in the figure legend or in the form of penetrance %. 

      The compound heterozygote breeding and phenotyping analyses were not carried out in such a way that we can comment on the precise % penetrance of the ANC phenotype, as we did not dissect every ANC and genotype every individual that resulted from the triple heterozygote in crossings. We collected phenotype/genotype data until we obtained at least three replicates.

      We did genotype every individual resulting from dact1/2 dHet crosses to correlate genotype to the phenotype of the embryonic convergent extension phenotype and narrowed ethmoid plate (Fig. 2A, Fig. 3) which demonstrated full penetrance.

      (2) The description of the expression of dact1/2 and wnt11f2 is not consistent with what the images are showing. In the revised figure 1 legend, the author says "dact2 and wnt11f2 transcripts are detected in the anterior neural plate" (line 1099)", but it's hard to see wnt11f2 expression in the anterior neural plate in 1B. The authors then again said " wnt11f2 is also expressed in these cells", referring to the anterior neural plate and polster (P), notochord (N), paraxial and presomitic mesoderm (PM) and tailbud (TB). However, other than the notochord expression, other expression is actually quite dissimilar between dact2 and wnt11f2 in 1C. The authors should describe their expression more accurately and take that into account when considering their function in the same pathway. 

      We have revised these sections to more carefully describe the expression patterns. We have added references to previous descriptions of wnt11 expression domains.

      (3) Similar to (2), while the Daniocell was useful in demonstrating that expression of dact1 and dact2 are more similar to expression of gpc4 and wnt11f2, the text description of the data is quite confusing. The authors stated "dact2 was more highly expressed in anterior structures including cephalic mesoderm and neural ectoderm while dact1 was more highly expressed in mesenchyme and muscle" (lines 174-176). However, the Daniocell seems to show more dact1 expression in the neural tissues than dact2, which would contradict the in situ data as well. I think the problem is in part due to the dataset contains cells from many different stages and it might be helpful to include a plot of the cells at different stages, as well as the cell types, both of which are available from the Daniocell website. 

      We have revised the text to focus the Daniocell analysis on the overall and general expression patterns. Line 220.

      (4) The authors used the term "morphological movements" (line 337) to describe the cause of dact1/2 phenotypes. Please clarify what this means. Is it cell movement? Or is it the shape of the tissues? What does "morphological movements" really mean and how does that affect the formation of the EP by the second stream of NCCs? 

      We have revised this sentence to improve clarity. Line 416.

      (5) In the first submission, only 1 out of 142 calpain-overexpressing animals phenocopied dact1/2 mutants and that was a major concern regarding the functional significance of calpain 8 in this context. In the revised manuscript, the authors demonstrated that more embryos developed the phenotype when they are heterozygous for both dact1/2. While this is encouraging, it is interesting that the same phenomenon was not observed in the dact1-/-; dact2+/- embryos (Fig. 6D). The authors did not discuss this and should provide some explanation. The authors should also discuss sufficiency vs requirement tested in this experiment. However, given that this is the most novel aspect of the paper, performing experiments to demonstrate requirements would be important. 

      We have added a statement regarding the non-effect in dact1-/-;dact2+/- embryos. Line 568-570. We have also added discussion of sufficiency vs necessity/requirement testing. Line 676-679.

      (6) Related to (5), the authors cited figure 8c when mentioning 0/192 gfp-injected embryos developed EP phenotypes. However, figure 8c is dact1/2 +/- embryos. The numbers also doesn't match the numbers in Figure 8d either. Please add relevant/correct figures. 

      The text has been revised to distinguish between our overexpression experiment in wildtype embryos (data not shown) versus overexpression in dact1/2 double het in cross embryos (Fig 8).

      Minor comments: 

      (1) Fig 1 legend line 1106 "the midbrain (MP)" should be MB 

      Revised line 1250.

      (2) Wntllf2, instead of wnt11f2, (i.e. the letter "l" rather than the number "1") was used in 4 instances, line 144, 515, 527, 1147 

      Revised lines 185, 625, 640,1300.

      (3) The authors replaced ANC with EP in many instances, but ANC is left unchanged in some places and it's not defined in the text. It's first mentioned in line 170.

      Revised line 218.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript gives a broad overview of how to write NeuroML, and a brief description of how to use it with different simulators and for different purposes - cells to networks, simulation, optimization, and analysis. From this perspective, it can be an extremely useful document to introduce new users to NeuroML.

      We are glad the reviewer found our manuscript useful.

      However, the manuscript itself seems to lose sight of this goal in many places, and instead, the description at times seems to target software developers. For example, there is a long paragraph on the board and user community. The discussion on simulator tools seems more for developers, not users. All the information presented at the level of a developer is likely to be distracting to eLife readership.

      To make the paper less developer focussed and more accessible to the end user we have shortened the long paragraphs on the board and user community (and moved some of this text to the Methods section; lines: 524-572 in the document with highlighted changes). We have also made the discussion on simulator tools more focussed on the user (lines 334-406). However, we believe some information on the development and oversight of NeuroML and its community base are relevant to the end user, so we have not removed these completely from the main text.

      Strengths:

      The modularity of NeuroML is indeed a great advantage. For example, the ability to specify the channel file allows different channels to be used with different morphologies without redundancy. The hierarchical nature of NeuroML also is commendable, and well illustrated in Figures 2a through c.

      The number of tools available to work with NeuroML is impressive.

      The abstract, beginning, and end of the manuscript present and discuss incorporating NeuroML into research workflows to support FAIR principles.

      Having a Python API and providing examples using this API is fantastic. Exporting to NeuroML from Python is also a great feature.

      We are glad the reviewer appreciated the design of NeuroML and its support for FAIR principles.

      Weaknesses:

      Though modularity is a strength, it is unclear to me why the cell morphology isn't also treated similarly, i.e., specify the morphology of a multi-compartmental model in a separate file, and then allow the cell file to specify not only the files containing channels, but also the file containing the multi-compartmental morphology, and then specify the conductance for different segment groups. Also, after pynml_write_neuroml2_file, you would not have a super long neuroML file for each variation of conductances, since there would be no need to rewrite the multi-compartmental morphology for each conductance variation.

      We thank the reviewer for highlighting this shortcoming in NeuroML2. We have now added the ability to reference externally defined (e.g. in another file) <morphology> and <biophysicalProperties> elements from <cells>. This has enabled the morphologies and/or specification of ionic conductances to be separated out and enables more streamlined analysis of cells with different properties, as requested. Simulators NEURON, NetPyNE and EDEN already support this new form. Information on this feature has been added to https://docs.neuroml.org/Userdocs/ImportingMorphologyFiles.html#neuroml2 and also mentioned in the text (lines 188-190).

      This would be especially important for optimizations, if each trial optimization wrote out the neuroML file, then including the full morphology of a realistic cell would take up excessive disk space, as opposed to just writing out the conductance densities. As long as cell morphology must be included in every cell file, then NeuroML is not sufficiently modular, and the authors should moderate their claim of modularity (line 419) and building blocks (551).

      We believe the new functionality outlined above addresses this issue, as a single file containing the <morphology> element could be referenced, while a much smaller file, containing the channel distributions in a <biophysicalProperties> element would be generated and saved on each iteration of the optimisation.

      In addition, this is very important for downloading NeuroML-compliant reconstructions from NeuroMorpho.org. If the cell morphology cannot be imported, then the user has to edit the file downloaded from NeuroMorpho.org, and provenance can be lost.

      While the NeuroMorpho.Org website does support converting reconstructed morphologies in SWC format to NeuroML, this export feature is no longer supported on most modern browsers due to it being based on Java Applet technologies. However, a desktop version of this application, CVApp, is actively maintained

      (https://github.com/NeuroML/Cvapp-NeuroMorpho.org), and we have updated it to support export of the SWC to the standalone <morphology> element form of NeuroML discussed above. Additionally, a new Python application for conversion of SWC to NeuroML is in development and will be incorporated into PyNeuroML (Google Summer of Code 2024). Our documentation has been updated with the recommended use of SWC in NeuroML based modelling here: https://docs.neuroml.org/Userdocs/Software/Tools/SWC.html

      We have also included URLs to the tool and the documentation in the paper (lines: 473-474).

      SWC files, however, cannot be used “as is” for modelling since they only include information (often incomplete—for example a single point may represent a soma in SWC files) on the points that make the cell, but not on the sections/segments/cables that these form. Therefore, NeuroML and other simulation tools, including NEURON, must convert these into formats suitable for simulation. The suggested pipeline for use of NeuroMorpho SWC files would therefore be to convert them to NeuroML, check that they represent the intended compartmentalisation of the neuron and then use them in models.

      To ensure that provenance is maintained in all NeuroML models (including conversions from other formats), NeuroML supports the addition of RDF annotations using the COMBINE annotation specifications in model files:

      https://docs.neuroml.org/Userdocs/Provenance.html. We have added this information to the paper (lines: 464-465).

      Also, Figure 2d loses the hierarchical nature by showing ion channels, synapses, and networks as separate main branches of NeuroML.

      While an instance of an ion channel is on a segment, in a cell, in a population (and hence there is a hierarchy between them), in terms of layout in a NeuroML file the ion channel is defined at the “top level” so that it can be referenced and used by multiple cells, the cell definitions are also defined top level, and used in multiple populations, etc. There are multiple ways to depict these relationships between entities, and we believe Fig 2d complements Fig 2a-c (which is more hierarchical), by emphasising the different categories of entities present in NeuroML files. We have modified the caption of Figure 2d to clarify that it shows the main categories of elements included in the NeuroML standard in their respective hierarchies.

      In Figure 5, the difference between the core and native simulator is unclear.

      We have modified the figure and text (lines: 341) to clarify this. We now say “reference” simulators instead of “core”. This emphasises that jNeuroML and pyLEMS are intended as reference implementations in each of their languages of how to interpret NeuroML models, as opposed to high performance simulators for research use. We have also updated the categorization of the backends in the text accordingly.

      What is involved in helper scripts?

      Simulators such as NetPyNE can import NeuroML into their own internal format, but require some boilerplate code to do this (e.g. the NetPyNE scripts calls the importNeuroML2SimulateAnalyze() method with appropriate parameters). The NeuroML tools generate short scripts that use this boilerplate code. We have renamed “helper scripts” to “import scripts'' for clarity (Figure 5 and its caption).

      I thought neurons could read NeuroML? If so, why do you need the export simulator-specific scripts?

      The NEURON simulator does have some NeuroML functionality (it can export cells, though not the full network, to NeuroML 2 through its ModelView menu), but does not natively support reading/importing of NeuroML in its current version. But this is not a problem as jNeuroML/PyNeuroML translates the NeuroML model description into NEURON’s formats: Python scripts/HOC/Nmodl which NEURON then executes.

      As NEURON is the simulator which allows simulation of the widest range of NeuroML elements, we have (in agreement with the NEURON developers) concentrated on incorporating the best support for NeuroML import/export in the latest (easy to install/update) releases of PyNeuroML, rather than adding this to the Neuron source code. NEURON’s core features have been very stable for years and many versions of the simulator are used by modellers - installing the latest PyNeuroML gives them the latest NEURON support without having to reinstall the latter.

      In addition, it seems strange to call something the "core" simulation engine, when it cannot support multi-compartmental models. It is unclear why "other simulators" that natively support NeuroML cannot be called the core.

      We agree that this terminology was confusing. As mentioned above, we have changed “core simulator” to “reference simulator”, to emphasise the roles of these simulation engine options.

      It might be more helpful to replace this sort of classification with a user-targeted description. The authors already state which simulators support NeuroML and which ones need code to be exported. In contrast, lines 369-370 mention that not all NeuroML models are supported by each simulator. I recommend expanding this to explain which features are supported in each simulator. Then, the unhelpful separation between core and native could be eliminated.

      As suggested, we have grouped the simulators in terms of function and removed the core/ non-core distinction. We have also added a table (Table 3) in the appendices that lists what features each simulation engine supports and updated the text to be more user focussed (lines: 348-394).

      The body of the manuscript has so much other detail that I lose sight of how NeuroML supports FAIR. It is also unclear who is the intended audience. When I get to lines 336-344, it seems that this description is too much detail for the eLife audience. The paragraph beginning on line 691 is a great example of being unclear about who is the audience. Does someone wanting to develop NeuroML models need to understand XSD schema? If so, the explanation is not clear. XSD schema is not defined and instead explains NeuroML-specific aspects of XSD. Lines 734-735 are another example of explaining to code developers (not model developers).

      We have modified these sentences to be more suitable for the general eLife audience: we have moved the explanation of how the different simulator backends are supported to the more technically detailed Methods section (lines 882-942).

      While the results sections focus on documenting what users can do with NeuroML, the Methods sections include information on “how” the NeuroML and software ecosystem function. While the information in the methods sections may not be required by users who want to use the standard NeuroML model elements, those users looking to extend NeuroML with their own model entities and/or contribute these for inclusion in the NeuroML standard will require some understanding of how the schema and component types work.

      We have tried to limit this information to the bare minimum, pointing to online documentation where appropriate. XSD schemas are, for example, briefly introduced at the beginning of the section “The NeuroML XML Schema”. We have also included a link to the W3C documentation on XSD schemas as a footnote (line 724).

      Reviewer #2 (Public Review):

      Summary:

      Developing neuronal models that are shareable, reproducible, and interoperable allows the neuroscience community to make better use of published models and to collaborate more effectively. In this manuscript, the authors present a consolidated overview of the NeuroML model description system along with its associated tools and workflows. They describe where different components of this ecosystem lay along the model development pathway and highlight resources, including documentation and tutorials, to help users employ this system.

      Strengths:

      The manuscript is well-organized and clearly written. It effectively uses the delineated model development life cycle steps, presented in Figure 1, to organize its descriptions of the different components and tools relating to NeuroML. It uses this framework to cover the breadth of the software ecosystem and categorize its various elements. The NeuroML format is clearly described, and the authors outline the different benefits of its particular construction. As primarily a means of describing models, NeuroML also depends on many other software components to be of high utility to computational neuroscientists; these include simulators (ones that both pre-date NeuroML and those developed afterwards), visualization tools, and model databases.

      Overall, the rationale for the approach NeuroML has taken is convincing and well-described. The pointers to existing documentation, guides, and the example usages presented within the manuscript are useful starting points for potential new users. This manuscript can also serve to inform potential users of features or aspects of the ecosystem that they may have been unaware of, which could lower obstacles to adoption. While much of what is presented is not new to this manuscript, it still serves as a useful resource for the community looking for information about an established, but perhaps daunting, set of computational tools.

      We are glad the reviewer appreciated the utility of the manuscript.

      Weaknesses:

      The manuscript in large part catalogs the different tools and functionalities that have been produced through the long development cycle of NeuroML. As discussed above, this is quite useful, but it can still be somewhat overwhelming for a potential new user of these tools. There are new user guides (e.g., Table 1) and example code (e.g. Box 1), but it is not clear if those resources employ elements of the ecosystem chosen primarily for their didactic advantages, rather than general-purpose utility. I feel like the manuscript would be strengthened by the addition of clearer recommendations for users (or a range of recommendations for users in different scenarios).

      To make Table 1 more accessible to users and provide recommendations we have added the following new categories: Introductory guides aimed at teaching the fundamental

      NeuroML concepts; Advanced guides illustrating specific modelling workflows; and Walkthrough guides discussing the steps required for converting models to NeuroML. Box 1 has also been improved to clearly mark API and command line examples.

      For example, is the intention that most users should primarily use the core NeuroML tools and expand into the wider ecosystem only under particular circumstances? What are the criteria to keep in mind when making that decision to use alternative tools (scale/complexity of model, prior familiarity with other tools, etc.)? The place where it seems most ambiguous is in the choice of simulator (in part because there seem to be the most options there) - are there particular scenarios where the authors may recommend using simulators other than the core jNeuroML software?

      The interoperability of NeuroML is a major strength, but it does increase the complexity of choices facing users entering into the ecosystem. Some clearer guidance in this manuscript could enable computational neuroscientists with particular goals in mind to make better strategic decisions about which tools to employ at the outset of their work.

      As mentioned in the response to Reviewer 1, the term “core simulator” for jNeuroML was confusing, as it suggested that this is a recommended simulation tool. We have changed the description of jNeuroML to a “reference simulator” to clarify this (Figure 5 and lines 341, 353).

      In terms of giving specific guidance on which simulator to use, we have focussed on their functionality and limitations rather than recommending a specific tool (as simulator independent standards developers we are not in a position to favour particular simulators). While NEURON is the most widely used simulator currently, other simulation opinions (e.g. EDEN) have emerged recently which provide quite comprehensive NeuroML support and similar performance. Our approach is to document and promote all supported tools, while encouraging innovation and new developments. The new Table 3 in the Appendix gives a guide to assist users in choosing which simulator may best suit their needs and we have updated the text to include a brief description (lines 348-394).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I do not understand what the $comments mean in Box 1. It isn't until I get further in the text that I realize that those are command line equivalents to the Python commands.

      We thank the reviewer for highlighting this confusion. We’ve now explicitly marked the API usage and command line usage example columns to make this clearer. We have also used “>” instead of “$” now to indicate the command line,

      In Figure 9 Caption "Examples of analysis functions ..", the word analysis seems a misnomer, as these graphs all illustrate the simulation output and graphing of existing variables. I think analysis typically refers to the transformation of variables, such as spike counts and widths.

      To clarify this we have changed the caption to “Examples of visualizing biophysical properties of a NeuroML model neuron”.

      Figure 10: Why is the pulse generator part of a model? Isn't that the input to a model?

      Whether the input to the model is described separately from the NeuroML biophysical description or combined with it is a choice for the researcher. This is possible because in NeuroML any entity which has time varying states can be a NeuroML element, including the current pulse generator. In this simple example the input is contained within the same file (and therefore <neuroml> element) as the cell. However, this does not need to be the case. The cell could be fully specified in its own NeuroML file and then this can be included in other files which add different inputs to facilitate different simulation scenarios. The Python scripting interface facilitates these types of workflows.

      In the interest of modularity, can stim information be stored in a separate file and "included"?

      Yes, as mentioned above, the stimulus could be stored in a separate file.

      I find it strange to use a cell with mostly dimensionless numbers as an example. I think it would be more helpful to use a model that was more physiological.

      In choosing an example model type to use to illustrate the use of LEMS (Fig 12), NeuroML (Fig 10), XML Schema (Fig 11), the Python API (Fig 13) and online documentation (Fig 15), we needed an example which showed a sufficiently broad range of concepts (dimensional parameters, state variables, time derivatives), but which is sufficiently compact to allow a concise depiction of the key elements in figures, that fit in a single page (e.g. Fig 12). We felt that the Hindmarsh Rose model, while not very physiological, was well suited for this purpose (explaining the underlying technologies behind the NeuroML specification). The simplicity of the Hindmarsh Rose model is counterbalanced in the manuscript by the detailed models of neurons and circuits in Figures 7 & 9. The latter shows a morphologically and biophysically detailed cortical L5b pyramidal cell model.

      In lines 710-714, it is unclear what is being validated. That all parameters are defined? Using the units (or lack thereof) defined in the schema?

      Validation against the schema is “level 1” validation where the model structure, parameters, parameter values and their units, cardinality, and element positioning in the model hierarchy are checked. We have updated the paragraph to include this information and to also point to Figure 6 where different levels of validation are explained.

      Lines 740 to 746 are confusing. If 1-1 between XSD and LEMS (1st sentence) then how can component types be defined in LEMS and NOT added to the standard? Which is it? 1-1 or not 1-1?

      For the curated model elements included in the NeuroML standard, there will be a 1-1 correspondence between their component type definitions in LEMS and type definitions in the XSD schema. New user defined component types (e.g. a new abstract cell model) can be specified in LEMS as required, and these do not need to be included in the XSD schema to be loaded/simulated. However, since they are not present in the schema definition of the core/curated elements, they cannot be validated against it (level 1 validation). We have modified the text to make this clearer (line: 778).

      Nonetheless, if the new type is useful for the wider community, it can be accepted by the Editorial Board, and at that stage it will be incorporated into the core types, and added to the Schema, to be part of “valid NeuroML”.

      Figure 12. select="synapses[*]/i" is not explained. Does /i mean that iSyn is divided by i, which is current (according to the sentence 3 lines after 766) or perhaps synapse number?

      We thank the reviewer for highlighting this confusion. We have now explained the construct in the text (lines 810-812). It denotes “select the i (current) values from all Attachments which have the id ‘synapses’”. These multiple values should be reduced down to a single value through addition, as specified by the attribute: reduce=”add”.

      The line after 766 says that "DerivedVariables, variables whose values depend on other variables". You should add "and that are not derivatives, which are handled separately" because by your definition derivatives are derived variables.

      Thank you. We have updated the text with your suggestion

      Reviewer #2 (Recommendations For The Authors):

      - Figure 9: I found it somewhat confusing to have the header from the screenshot at the top ("Layer 5 Burst Accommodating Double Bouquet Cell (5)") not match the morphology shown at the bottom. It's not visually clear that the different panels in Figure 9 may refer to unrelated cells/models.

      Thank you for pointing this out. We have replaced the NeuroML-DB screenshot with one of the same Layer 5b pyramidal cells shown in the panels below it.

      Additional change:

      Figure 7c (showing the NetPyNE-UI interface) has been replaced. Previously, this displayed a 3D model which had been created in NetPyNE itself, but now shows a model which has been created in NeuroML and imported for display/simulation in NetPyNE-UI, and therefore better illustrates NeuroML functionality.

    1. Author response:

      To Reviewer #1:

      Thank you for your kind words regarding the novelty, study design, and evidence presented. We will clarify our language when describing fuzzy local-linear regression discontinuity analysis. We thank you for this feedback as our goals are to introduce these methods to a neuroscientific audience. Lastly, we will respond and clarify the methodological points, including post-selection inference, bandwidths, and Bayesian analysis in version 2.

      To Reviewers #2 and #3:

      We thank you both for your constructive feedback, specifically in highlighting 1) the scope of the intervention and 2) the UKB-neuro healthy volunteer bias. In the next manuscript version, we will expand our discussion of plausible reasons for not finding an effect – weighing up the strengths and limitations of our study in 3 aspects; statistical (RD power), design-based (lack of representativeness vs. large sample), and mechanistic (the impact/or lack thereof of one-year of education on neural plasticity decades later). As we believe the approach of natural experiments with RD designs has considerable promise for the field of population cognitive neuroscience beyond this particular study, we will address each of these points within a broader section focused on considerations on how to optimize the insight, power, and inferences gained in future work within and beyond Biobank. Moreover, we will situate our discussion on the magnitude of the educational intervention among a broader discussion of cognitive training versus education, and short - versus long-term effects. We believe revising the manuscript will improve interpretation for the reader and thank you for your in-depth feedback. Lastly, we will provide a point-by-point response in the next version.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      The conserved AAA-ATPase PCH-2 has been shown in several organisms including C. elegans to remodel classes of HORMAD proteins that act in meiotic pairing and recombination. In some organisms the impact of PCH-2 mutations is subtle but becomes more apparent when other aspects of recombination are perturbed. Patel et al. performed a set of elegant experiments in C. elegans aimed at identifying conserved functions of PCH-2. Their work provides such an opportunity because in C. elegans meiotically expressed HORMADs localize to meiotic chromosomes independently of PCH-2. Work in C. elegans also allows the authors to focus on nuclear PCH-2 functions as opposed to cytoplasmic functions also seen for PCH-2 in other organisms. 

      The authors performed the following experiments: 

      (1) They constructed C. elegans animals with SNPs that enabled them to measure crossing over in intervals that cover most of four of the six chromosomes. They then showed that doublecrossovers, which were common on most of the four chromosomes in wild-type, were absent in pch-2. They also noted shifts in crossover distribution in the four chromosomes. 

      (2) Based on the crossover analysis and previous studies they hypothesized that PCH-2 plays a role at an early stage in meiotic prophase to regulate how SPO-11 induced double-strand breaks are utilized to form crossovers. They tested their hypothesis by performing ionizing irradiation and depleting SPO-11 at different stages in meiotic prophase in wild-type and pch-2 mutant animals. The authors observed that irradiation of meiotic nuclei in zygotene resulted in pch-2 nuclei having a larger number of nuclei with 6 or greater crossovers (as measured by COSA-1 foci) compared to wildtype. Consistent with this observation, SPO11 depletion, starting roughly in zygotene, also resulted in pch-2 nuclei having an increase in 6 or more COSA-1 foci compared to wild type. The increased number at this time point appeared beneficial because a significant decrease in univalents was observed. 

      (3) They then asked if the above phenotypes correlated with the localization of MSH-5, a factor that stabilizes crossover-specific DNA recombination intermediates. They observed that pch-2

      mutants displayed an increase in MSH-5 foci at early times in meiotic prophase and an unexpectedly higher number at later times. They conclude based on the differences in early MSH-5 localization and the SPO-11 and irradiation studies that PCH-2 prevents early DSBs from becoming crossovers and early loading of MSH-5. By analyzing different HORMAD proteins that are defective in forming the closed conformation acted upon by PCH-2, they present evidence that MSH-5 loading was regulated by the HIM-3 HORMAD. 

      (4) They performed a crossover homeostasis experiment in which DSB levels were reduced. The goal of this experiment was to test if PCH-2 acts in crossover assurance. Interestingly, in this background PCH-2 negative nuclei displayed higher levels of COSA-1 foci compared to PCH-2 positive nuclei. This observation and a further test of the model suggested that "PCH-2's presence on the SC prevents crossover designation." 

      (5) Based on their observations indicating that early DSBS are prevented from becoming crossovers by PCH-2, the authors hypothesized that the DNA damage kinase CHK-2 and PCH2 act to control how DSBs enter the crossover pathway. This hypothesis was developed based on their finding that PCH-2 prevents early DSBs from becoming crossovers and previous work showing that CHK-2 activity is modulated during meiotic recombination progression. They tested their hypothesis using a mutant synaptonemal complex component that maintains high CHK-2 activity that cannot be turned off to enable crossover designation. Their finding that the pch-2 mutation suppressed the crossover defect (as measured by COSA-1 foci) supports their hypothesis. 

      Based on these studies the authors provide convincing evidence that PCH-2 prevents early DSBs from becoming crossovers and controls the number and distribution of crossovers to promote a regulated mechanism that ensures the formation of obligate crossovers and crossover homeostasis. As the authors note, such a mechanism is consistent with earlier studies suggesting that early DSBs could serve as "scouts" to facilitate homolog pairing or to coordinate the DNA damage response with repair events that lead to crossing over. The detailed mechanistic insights provided in this work will certainly be used to better understand functions for PCH-2 in meiosis in other organisms. My comments below are aimed at improving the clarity of the manuscript. 

      We thank the reviewer for their concise summary of our manuscript and their assessment of our work as “convincing” and providing “detailed mechanistic insight.”

      Comments 

      (1) It appears from reading the Materials and Methods that the SNPs used to measure crossing over were obtained by mating Hawaiian and Bristol strains. It is not clear to this reviewer how the SNPs were introduced into the animals. Was crossing over measured in a single animal line? Were the wild-type and pch-2 mutations made in backgrounds that were isogenic with respect to each other? This is a concern because it is not clear, at least to this reviewer, how much of an impact crossing different ecotypes will have on the frequency and distribution of recombination events (and possibly the recombination intermediates that were studied). 

      We will clarify these issues in the Materials and Methods of an updated preprint. The control and pch-2 mutants were isogenic in either the Bristol or Hawaiian backgrounds. Control lines were the original Bristol and Hawaiian lines and pch-2 mutants were originally made in the Bristol line and backcrossed at least 3 times before analysis. Hawaiian pch-2 mutants were made by backcrossing pch-2 mutants at least 7 times to the Hawaiian background and verifying the presence of Hawaiian SNPs on all chromosomes tested in the recombination assay. To perform the recombination assays, these isogenic lines were crossed to generate the relevant F1s.

      (2) The authors state that in pch-2 mutants there was a striking shift of crossovers (line 135) to the PC end for all of the four chromosomes that were tested. I looked at Figure 1 for some time and felt that the results were more ambiguous. Map distances seemed similar at the PC end for wildtype and pch-2 on Chrom. I. While the decrease in crossing over in pch-2 appeared significant for Chrom. I and III, the results for Chrom. IV, and Chrom. X. seemed less clear. Were map distances compared statistically? At least for this reviewer the effects on specific intervals appear less clear and without a bit more detail on how the animals were constructed it's hard for me to follow these conclusions. 

      We hope that the added details above makes the results of these assays more clear. Map distances were compared and did not satisfy statistical significance, except where indicated. While we agree that the comparisons between control animals and pch-2 mutants may seem less clear with individual chromosomes, we argue that more general patterns become clear when analyzing multiple chromosomes. Indeed, this is why we expanded our recombination analysis beyond Chromosome III and the X Chromosomes, as reported in Deshong, 2014. 

      (3) Figure 2. I'm curious why non-irradiated controls were not tested side-by-side for COSA-1 staining. It just seems like a nice control that would strengthen the authors' arguments. 

      We will add these controls in the updated preprint.

      (4) Figure 3. It took me a while to follow the connection between the COSA-1 staining and DAPI staining panels (12 hrs later). Perhaps an arrow that connects each set of time points between the panels or just a single title on the X-axis that links the two would make things clearer. 

      We will make changes in the updated preprint to make this figure more clear.

      Reviewer #2 (Public review): 

      Summary: 

      This paper has some intriguing data regarding the different potential roles of Pch-2 in ensuring crossing over. In particular, the alterations in crossover distribution and Msh-5 foci are compelling. My main issue is that some of the models are confusingly presented and would benefit from some reframing. The role of Pch-2 across organisms has been difficult to determine, the ability to separate pairing and synapsis roles in worms provides a great advantage for this paper. 

      Strengths: 

      Beautiful genetic data, clearly made figures. Great system for studying the role of Pch-2 in crossing over. 

      We thank the reviewers for their constructive and useful summary of our manuscript and the analysis of its strengths. 

      Weaknesses: 

      (1) For a general audience, definitions of crossover assurance, crossover eligible intermediates, and crossover designation would be helpful. This applies to both the proposed molecular model and the cytological manifestation that is being scored specifically in C. elegans. 

      We will make these changes in an updated preprint.

      (2) Line 62: Is there evidence that DSBs are introduced gradually throughout the early prophase? Please provide references. 

      We will reference Woglar and Villeneuve 2018 and Joshi et. al. 2015 to support this statement in the updated preprint.

      (3) Do double crossovers show strong interference in worms? Given that the PC is at the ends of chromosomes don't you expect double crossovers to be near the chromosome ends and thus the PC? 

      Despite their rarity, double crossovers do show interference in worms. However, the PC is limited to one end of the chromosome. Therefore, even if interference ensures the spacing of these double crossovers, the preponderance of one of these crossovers toward one end (and not both ends) suggest something functionally unique about the PC end.

      (4) Line 155 - if the previous data in Deshong et al is helpful it would be useful to briefly describe it and how the experimental caveats led to misinterpretation (or state that further investigation suggests a different model etc.). Many readers are unlikely to look up the paper to find out what this means. 

      We will add this to the updated preprint.

      (5) Line 248: I am confused by the meaning of crossover assurance here - you see no difference in the average number of COSA-1 foci in Pch-2 vs. wt at any time point. Is it the increase in cells with >6 COSA-1 foci that shows a loss of crossover assurance? That is the only thing that shows a significant difference (at the one time point) in COSA-1 foci. The number of dapi bodies shows the loss of Pch-2 increases crossover assurance (fewer cells with unattached homologs). So this part is confusing to me. How does reliably detecting foci vs. DAPI bodies explain this? 

      We apologize for the confusion and will make this more clear in an updated perprint. The reviewer is correct that we do not see a difference in the average number of GFP::COSA1 foci at all time points in this experiment, even though we do see a difference in the number of DAPI stained bodies (an increase in crossover assurance in pch-2 mutants). What we meant to convey is that because of PCH-2’s dual role in regulating crossover formation (inhibiting it in early prophase, guaranteeing assurance later), the average number of GFP::COSA-1 foci at all time points also reflects this later role, resulting in this average being lower than if PCH-2 only inhibited crossovers early in meiotic prophase. We have shown that this later role does not significantly affect the average number of DAPI stained bodies, allowing us to see the role of PCH-2 in early meiotic prophase on crossover formation more clearly.

      (6) Line 384: I am confused. I understand that in the dsb-2/pch2 mutant there are fewer COSA-1 foci. So fewer crossovers are designated when DSBs are reduced in the absence of PCH-2.

      How then does this suggest that PCH-2's presence on the SC prevents crossover designation? Its absence is preventing crossover designation at least in the dsb-2 mutant. 

      We will also make this more clear in an updated preprint, as well as provide additional evidence to support this claim. In this experiment, we had identified three possible explanations for why PCH-2 persists on some nuclei that do not have GFP::COSA-1 foci: 1) PCH-2 removal is coincident with crossover designation; 2) PCH-2 removal depends on crossover designation; and 3) PCH-2 removal facilitates crossover designation. The decrease in the number of GFP::COSA-1 foci in dsb-2::AID;pch-2 mutants argues against the first two possibilities, suggesting that the third might be correct. We have additional evidence that we will include in an updated preprint that should provide stronger support and make this more clear.

      (7) Discussion Line 535: How do you know that the crossovers that form near the PCs are Class II and not the other way around? Perhaps early forming Class I crossovers give time for a second Class II crossover to form. In budding yeast, it is thought that synapsis initiation sites are likely sites of crossover designation and class I crossing over. Also, the precursors that form class I and II crossovers may be the same or highly similar to each other, such that Pch-2's actions could equally affect both pathways. 

      We do not know that the crossovers that form near the PC are Class II but hypothesize that they are based on the close, functional relationship that exists between Class I crossovers and synapsis and the apparent antagonistic relationship that exists between Class II crossovers and synapsis. We agree that Class I and Class II crossover precursors are likely to be the same or highly similar, exhibit extensive crosstalk that may complicate straightforward analysis and PCH-2 is likely to affect both, as strongly suggested by our GFP::MSH-5 analysis. We present this hypothesis based on the apparent relationship between PCH-2 and synapsis in several systems but agree that it needs to be formally tested. We will make this argument more clear in an updated preprint.

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript describes an in-depth analysis of the effect of the AAA+ ATPase PCH-2 on meiotic crossover formation in C. elegant. The authors reach several conclusions, and attempt to synthesize a 'universal' framework for the role of this factor in eukaryotic meiosis. 

      Strengths: 

      The manuscript makes use of the advantages of the 'conveyor' belt system within the c.elegans reproductive tract, to enable a series of elegant genetic experiments. 

      We thank this reviewer for the useful assessment of our manuscript and the articulation of its strengths.

      Weaknesses: 

      A weakness of this manuscript is that it heavily relies on certain genetic/cell biological assays that can report on distinct crossover outcomes, without clear and directed control over other aspects and variables that might also impact the final repair outcome. Such assays are currently out of reach in this model system. 

      In general, this manuscript could be more generally accessible to non-C.elegans readers. Currently, the manuscript is hard to digest for non-experts (even if meiosis researchers). In addition, the authors should be careful to consider alternative explanations for certain results. At several steps in the manuscript, results could ostensibly be caused by underlying defects that are currently unknown (for example, can we know for sure that pch-2 mutants do not suffer from altered DSB patterning, and how can we know what the exact functional and genetic interactions between pch-2 and HORMAD mutants tell us?). Alternative explanations are possible and it would serve the reader well to explicitly name and explain these options throughout the manuscript. 

      We will make the manuscript more accessible to non-C. elegans readers and discuss alternate explanations for specific results in an updated preprint.

    1. Author response:

      The following is the authors’ response to the original reviews.

      A summary of changes

      (1) Line 93: “positive effect” to “positive contribution”, as suggested by reviewer 2.

      (2) Line 147-148: the null hypothesis to test “equal interspecific and intraspecific interactions”, as indicated by reviewers 2 and 4.

      (3) Lines 155-162: removed to reduce duplication with the additive partitioning, as suggested by reviewer 2.

      (4) Lines 186-188: added “the estimated competitive growth response would also include the effects of density-dependent pests, pathogens, or microclimates”, as suggested by reviewer 3.  

      (5) Lines 219-222: added “The community positive effect can be further partitioned by mechanisms of positive interactions (resource partitioning and facilitation), and facilitative effect can be classified as mutualism (+/+), commensalism (+/0), or parasitic (+/–) based on species specific assessments”.  

      (6) Lines 377-386: added options for determining maximum competitive growth response in some extreme scenarios of species mixtures.

      (7) Figure 1: modified to show the variations of competitive growth response with relative competitive ability from minimum (null expectation) to maximum (competitive exclusion).    

      A summary of four reviewers’ questions and authors’ response

      (1) A summary of authors’ responses. Reviewers did not seem to understand our work. They indicated that our model is inadequate for hypothesis testing. The fact is, as we note below, that our model allows for more hypothesis testing than the additive partitioning model. They suggested that one of our model components, the competitive growth response, needs to be further partitioned. However, this term represents only the competition effect and can not be split any further. Reviewers criticized us for misunderstanding the additive components while they suggested the same logic to test some intuitive ideas. They did not seem to know that the effects of competitive interactions vary with assessment methods, which differ between competition and biodiversity research. Our work seeks to harmonise definitions between these two fields and bridge the gap. The reviewers acknowledged that the additive components (i.e., the selection effect and complementarity effect) do not have clear biological meanings; however, they did not acknowledge that the additive components are used extensively for determining mechanisms of species interactions in biodiversity research. There is hardly any research that uses the additive partitioning model without linking the additive components to specific mechanisms of species interactions (i.e., positive SE to competition and positive CE to positive interactions).

      (2) Additive partitioning and underlying mechanisms. Some reviewers acknowledged that additive partitioning is not meant for determining mechanisms of species interactions and therefore argued that the additive partitioning should not be criticized for lack of biological meanings with the additive components. However, they insisted that additive partitioning is useful in quantifying net biodiversity effects against the null hypothesis that there is no difference between intraspecific and interspecific interactions or testing the idea that “niche complementarity mitigates competition” or “competitively superior species dominate mixtures”. Are these views contradictory each other? How can the additive partitioning that is not designed for determining mechanisms of species interactions provide meaningful explanations for outputs of species interactions, e.g., “niche complementarity mitigates competition” or “competitively superior species dominate mixtures”?

      Reviewers did not seem to realize that these ideas are equivalent to the suggestions that CE represents for the effects of positive interactions and SE for the effects of competitive interactions, that the quantification of net biodiversity effects does not require the two additive components, and that the null hypothesis exists long before the additive partitioning (see de Wit, 1960, de Wit et al., 1966). It is generally agreed that CE and SE result from mathematical calculations and do not have clear biological meanings in terms of linkages to specific mechanisms of species interactions responsible for observed net biodiversity effects or changes in ecosystem function (Loreau and Hector, 2012; Bourrat et al., 2023). Calling some mixed effects of species interactions as mechanisms (e.g., CE and SE) is misleading.        

      Model structure: incomplete or inadequate for hypothesis testing. Other than positive, negative, and competition interactions, two reviewers wanted to have more specific interactions such as microclimate amelioration and negative feedback from species-specific pests and pathogens. The determination of these specific mechanisms requires more investigations and cannot be simply made through partitioning growth and yield data. However, the effects of these interactions will be captured in our definition of species interactions.  Reviewers did not seem to know that the additive partitioning would also not allow identifying these specific positive species interactions.

      Inspired by the mathematical form of additive partitioning, two reviewers suggested that our model (presumably equation 4) is incomplete and the second term, i.e., competitive growth response needs to be further explored or partitioned. The second term represents deviations from the null expectation, due to species differences in growth and competitive ability or competition effect. We do not know why and how this term can be further partitioned and what any subcomponents would mean.   

      Our competitive partitioning model is based on two hypotheses: first, the null hypothesis to test the equivalence of interspecific and intraspecific interactions. This hypothesis is the same as the additive partitioning model. Second, the competitive hypothesis, which tests the dominance of positive or negative species interactions in a community. Thus, our model allows for more hypothesis testing than the current additive partitioning model.     

      (3) Types of species interactions. We follow the definition of species interactions generally used in biodiversity research (see Loreau and Hector, 2001), i.e., positive interactions (or complementarity) include resource partitioning and facilitation, negative interactions include interference competition, and competitive interactions include resource competition. One reviewer suggested that resource partitioning is byproduct of competition and should not be part of positive species interactions, which may be true for long-term evolution of species co-existence but not for biodiversity experiments of decade duration at most. Two reviewers suggested that positive interactions should also include microclimate amelioration or negative feedback from species-specific pests and pathogens. We agree and these are included in our definition. 

      (4) Significance of partial density monocultures. We used partial and full density monocultures and species competitive ability to determine what species can possibly achieve in mixture under the competitive hypothesis that constituent species share an identical niche but differ in growth and competitive ability. We did not use partial monocultures to test the effects of density on biodiversity effects. As with the additive partitioning, the competitive partitioning model is not designed for comparing yields across different densities. We added at lines 186-188 to indicate that the estimated competitive growth response would also include the effects of density-dependent pests, pathogens, or microclimates.  

      Similarly, we do not use the partial density monoculture to  supplant the replacement series design. Partial density monocultures only supplement the “replacement series” design that does not provides estimates of facilitative effects and competitive growth responses that would occur in mixtures. It is crucial to know that one experimental approach is simply not enough for determining underlying mechanisms of species interactions responsible for changes in ecosystem function.  

      (5) Competition effect in competition and biodiversity research. Due to different methods used, competition effect in competition research has different ecological meanings from that in biodiversity research. In competition research, species performance in mixture are compared with their partial density monocultures and therefore competition effect is generally negative, as suggested by reviewer 4. In biodiversity research, comparison is between mixture and full density monocultures. The resulting competition effect can be positive or negative for both individual species and community productivity defined by species composition and full density monoculture yields.     

      Therefore, we cannot use the results of competition research based on additive series design to describe effects of competitive interactions on ecosystem productivity based replacement series design.

      Reviewer #1 (Public Review):

      [Editors' note: this is an overall synthesis from the Reviewing Editor in consultation with the reviewers.]

      The three reviews expand our critique of this manuscript in some depth and complementary directions. These can be synthesized in the following main points (we point out that there is quite a bit more that could be written about the flaws with this study; however, time constraints prevented us from further elaborating on the issues we see):

      (1) It is unclear what the authors want to do.

      As indicate by the title, our objective is to “partition changes in ecosystem productivity by effects of species interactions”, i.e., partitioning net biodiversity effects estimated from the null expectation into components associated with positive, negative, or competition interspecific interactions.

      It seems their main point is that the large BEF literature and especially biodiversity experiments overstate the occurrence of positive biodiversity effects because some of these can result from competition.

      We demonstrated through ecological theories and simulation/experiment data that competition is a major source of the net biodiversity effects estimated with additive partitioning model. We know that competition effect varies with mixture attributes. Future research will determine average effect of competitive interactions on biodiversity effects in large BEF literature.   

      Because reduced interspecific relative to intraspecific competition in mixture is sufficient to produce positive effects in mixtures (if interspecific competition = 0 then RYT = S, where S is species richness in mixture -- this according to the reciprocal yield law = law of constant final yield), they have a problem accepting NE > 0 as true biodiversity effect (see additive partitioning method of Loreau & Hector 2001 cited in manuscript).

      We have no problem to accept NE>0 as true positive biodiversity effect. However, NE>0 can also result from competitive interactions based on the null expectation and needs to be partitioned by effects of species interactions.

      (2) The authors' next claim, without justification, that additive partitioning of NE is flawed and theoretically and biologically meaningless.

      The additive partitioning model is based on Covariance equation (or Price equation) that has nothing to do with biodiversity partitioning (Bourrat et al., 2023). Biological meaning was arbitrarily assigned to CE and SE. We made clear that the additive partitioning model is mathematically sound but does not have biological meanings that it has been used for.   

      They misinterpret the CE component as biological niche partitioning and the SE component as biological dominance.

      We did not. Loreau and Hector (2001) clearly indicated positive CE for positive interactions and positive SE for competitive interactions, which is generally what has been used for in the last twenty years.

      They do not seem to accept that the additive partitioning is a logically and mathematically sound derivation from basic principles that cannot be contested.

      We do not have problem with mathematical form of additive partitioning but only oppose ecological meanings assigned to CE and SE, simply because CE and SE both result from all species interactions (see Loreau and Hector, 2001; Bourrat et al., 2023). The reviewer seemed to have a contradictory thinking that the additive components are biologically meaningless but derived from biological basic principles.       

      (3) The authors go on to introduce a method to calculate species-level overyielding (RY > 1/S in replacement series experiments) as a competitive growth response and multiply this with the species monoculture biomass relative to the maximum to obtain competitive expectation. This method is based on resource competition and the idea that resource uptake is fully converted into biomass (instead of e.g. investing it in allelopathic chemical production).

      Correct, but we did not assume “resource uptake is fully converted into biomass”.

      (4) It is unclear which experiments should be done, i.e. are partial-density monocultures planted or simply calculated from full-density monocultures? At what time are monocultures evaluated? The framework suggests that monocultures must have the full potential to develop, but in experiments, they are often performing very poorly, at least after some time. I assume in such cases the monocultures could not be used.

      Both partial and full density monocultures are needed, along with mixtures to separate NE by species interactions. Calculating competitive growth responses from density-size relationships can be an alternative, given the lack of partial density monocultures in current biodiversity experiments, but is not preferred.

      Similar to additive partitioning, our model can (and should) be applied to all developmental stages of an experiment to examine how interactions evolve through time.   

      (5) There are many reasons why the ideal case of only resource competition playing a role is unrealistic. This excludes enemies but also differential conversion factors of resources into biomass and antagonistic or facilitative effects. Because there are so many potential reasons for deviations from the null model of only resource competition, a deviation from the null model does not allow conclusions about underlying mechanisms.

      The competitive expectation is only a hypothesis, just as the null expectation. The difference between competitive and null expectations represents a competitive effect resulting from species differences in growth and competitive ability, while the deviation of observed yields from the competitive expectation indicates positive or negative effect (see lines 201-219).

      Furthermore, this is not a systematically developed partitioning, but some rather empirical ad hoc formulation of a first term that is thought to approximate competitive effects as understood by the authors (but again, there already are problems here). The second residual term is not investigated. For a proper partitioning approach, one would have to decompose overyielding into two (or more) terms and demonstrate (algebraically) that under some reasonable definitions of competitive and non-competitive interactions, these end up driving the respective terms.

      The first term represents the null expectation assuming equal interspecific and intraspecific interactions, i.e., absence of positive, negative, and competition effects. The second residual term represents competition effect, due to species differences in growth and competitive ability. The meaning of second residual term is clear and does not need to be further partitioned or investigated.

      In fact, our competitive partitioning also has several components including null expectation, competitive growth response, and observed yield, plus partial density monocultures for species assessment, or null expectations, competitive expectations, and observed yields for community level assessment, although different from the additive partitioning.

      (6) Using a simplistic simulation to test the method is insufficient. For example, I do not see how the simulation includes a mechanism that could create CE in additive partitioning if all species would have the same monoculture yield. Similarly, they do not include mechanisms of enemies or antagonistic interactions (e.g. allelopathy).

      The simulation model we used is developed from real world data and can only do what are available in the model in terms of species and their growth under different conditions. We can not go beyond data limitation. The model is empirical and has been shown to accurately estimate yield in the aspen-spruce forest condition. We would also note that we do also use experimental data (Table 2).  

      (7) The authors do not cite relevant literature regarding density x biodiversity experiments, competition experiments, replacement-series experiments, density-yield experiments, additive partitioning, facilitation, and so on.

      We cited literature relevant to biodiversity partitioning since we are not aiming to cover everything. The reviewer may not be aware that most of the research areas listed are actually included in our work, such as additive and replacement-series experiment designs, additive partitioning, facilitation, competition studies, and density-yield relationships. Our competitive model partitioning is based on biological principles, while the additive partitioning model is based only on a mathematical equation.   

      Overall, this manuscript does not lead further from what we have already elaborated in the broad field of BEF and competition studies and rather blurs our understanding of the topic.

      The results of competition studies based on additive series design are not really used in the broad field of BEF based on replacement series design. The effects of competitive interactions on BEF are never clearly defined using the results of competition studies. Our work is filling that gap.  

      Reviewer #2 (Public Review):

      This manuscript is motivated by the question of what mechanisms cause overyielding in mixed-species communities relative to the corresponding monocultures. This is an important and timely question, given that the ultimate biological reasons for such biodiversity effects are not fully understood.

      As a starting point, the authors discuss the so-called "additive partitioning" (AP) method proposed by Loreau & Hector in 2001. The AP is the result of a mathematical rearrangement of the definition of overyielding, written in terms of relative yields (RY) of species in mixtures relative to monocultures. One term, the so-called complementarity effect (CE), is proportional to the average RY deviations from the null expectations that plants of both species "do the same" in monocultures and mixtures. The other term, the selection effect (SE), captures how these RY deviations are related to monoculture productivity. Overall, CE measures whether relative biomass gains differ from zero when averaged across all community members, and SE, whether the "relative advantage" species have in the mixture, is related to their productivity. In extreme cases, when all species benefit, CE becomes positive. When large species have large relative productivity increases, SE becomes positive. This is intuitively compatible with the idea that niche complementarity mitigates competition (CE>0), or that competitively superior species dominate mixtures and thereby driver overyielding (SE>0).

      The reviewer needs to know that these ideas are based on the same logic that positive CE represents the effects of positive interactions and positive SE represents the effects of competitive interactions. CE>0 or SE>0 can result from many different scenarios of species interactions, not necessarily “niche complementarity mitigates competition” or “competitively superior species dominate mixtures”. CE>0 and SE>0 can occur alone or together. We simply can not tell underlying mechanisms of overyielding from mathematical calculations (CE and SE), as suggested by this reviewer later.

      The reviewer criticizes us while using the same logic themselves.

      However, it is very important to understand that CE and SE capture the "statistical structure" of RY that underlies overyielding. Specifically, CE and SE are not the ultimate biological mechanisms that drive overyielding, and never were meant to be. CE also does not describe niche complementarity. Interpreting CE and SE as directly quantifying niche complementarity or resource competition, is simply wrong, although it sometimes is done. The criticism of the AP method thus in large part seems unwarranted. The alternative methods the authors discuss (lines 108-123) are based on very similar principles.

      The reviewer actually supports our point. However, CE and SE have been largely used as biological mechanisms, positive CE as the results of complementary interactions and positive SE as the results of competitive interactions (see Loreau and Hector, 2001).  

      We do not have problem with the "statistical structure" of AP; it is simply a covariance equation. It is important to know that CE and SE do not provide additional information on overyielding than NE in terms of underlying mechanisms of species interactions. Any attempt to investigate mechanism of overyielding with CE or SE can easily go wrong.

      Our competitive partitioning model incorporates effects of competitive interactions into the conventional null expectation and allows for separating different effects of species interactions. In comparison, the additive partitioning model does not have this capacity, not even designed for this purpose, as suggested by this and other reviewers.         

      The authors now set out to develop a method that aims at linking response patterns to "more true" biological mechanisms.

      Assuming that "competitive dominance" is key to understanding mixture productivity, because "competitive interactions are the predominant type of interspecific relationships in plants", the authors introduce "partial density" monocultures, i.e. monocultures that have the same planting density for a species as in a mixture. The idea is that using these partial density monocultures as a reference would allow for isolating the effect of competition by the surrounding "species matrix".

      Correct.

      The authors argue that "To separate effects of competitive interactions from those of other species interactions, we would need the hypothesis that constituent species share an identical niche but differ in growth and competitive ability (i.e., absence of positive/negative interactions)." - I think the term interaction is not correctly used here, because clearly competition is an interaction, but the point made here is that this would be a zero-sum game.

      We did not say that competition is not an interaction; we only want to separate the effect of competition from those of other species interactions.

      The authors use the ratio of productivity of partial density and full-density monocultures, divided by planting density, as a measure of "competitive growth response" (abbreviated as MG). This is the extra growth a plant individual produces when intraspecific competition is reduced.

      Correct.

      We added at lines 377-386 to discuss options to determine MG in some uncommon scenarios of species mixtures.

      Here, I see two issues: first, this rests on the assumption that there is only "one mode" of competition if two species use the same resources, which may not be true, because intraspecific and interspecific competition may differ. Of course, one can argue that then somehow "niches" are different, but such a niche definition would be very broad and go beyond the "resource set" perspective the authors adopt. Second, this value will heavily depend on timing and the relationship between maximum initial growth rates and competitive abilities at high stand densities.

      First, the "competitive effect" focusses on resource competition and other forms of competition (presumably interference competition) are included in the negative interactions.

      Second, competitive growth response varies over time and with density, and so do NE, CE, SE, and interspecific interactions.

      The authors then progress to define relative competitive ability (RC), and this time simply uses monoculture biomass as a measure of competitive ability. To express this biomass in a standardized way, they express it as different from the mean of the other species and then divide by the maximum monoculture biomass of all species.

      I have two concerns here: first, if competitive ability is the capability of a species to preempt resources from a pool also accessed by another species, as the authors argued before, then this seems wrong because one would expect that a species can simply be more productive because it has a broader niche space that it exploits. This contradicts the very narrow perspective on competitive ability the authors have adopted. This also is difficult to reconcile with the idea that specialist species with a narrow niche would outcompete generalist species with a broad niche. Second, I am concerned by the mathematical form. Standardizing by the maximum makes the scaling dependent on a single value.

      First, growth conditions are controlled in biodiversity experiments, i.e., both monocultures and mixtures are the same in resource space. Species do not have opportunity to exploit resources outside experimental area. For example, if less productive species on normal soils outperform more competitive species on saline/alkaline soil, these “less productive species” are considered “more productive”.    

      Second, as discussed in our paper (lines 367-376; Figure 1), more research is needed to determine relationships between species traits (biomass or height) and relative competitive ability. By then, scaling by the maximum would not be needed. There has been quite a lot of research on such relationships; we should leave this to subject experts to determine what would be mostly appropriate for species studied.

      As a final step, the authors calculate a "competitive expectation" for a species' biomass in the mixture, by scaling deviations from the expected yield by the product MG ⨯ RC. This would mean a species does better in a mixture when (1) it benefits most from a conspecific density reduction, and (2) has a relatively high biomass.

      Put simply, the assumption would be that if a species is productive in monoculture (high RC), it effectively does not "see" the competitors and then grows like it would be the sole species in the community, i.e. like in the partial density monoculture.

      Correct, if species competitive ability differs substantially, the more competitive species in the mixture would grow like partial density monoculture. This extra growth should not be treated as sources of positive biodiversity effects, simply because it does not result from positive species interactions.   

      Overall, I am not very convinced by the proposed method.

      (1) The proposed method seems not very systematic but rather "ad hoc". It also is much less a partitioning method than the AP method because the other term is simply the difference. It would be good if the authors investigated the mathematical form of this remainder and explored its properties.. when does complementarity occur? Would it capture complementarity and facilitation?

      AP is, by no means, systematic. Remember, AP is based on covariance equation (or Price equation) that has nothing to do with species interactions, other than nice-looking mathematical form (Bourrat et al., 2023). Ecological meanings are subjectively given to CE and SE. Therefore,  CE and SE reflect what we call them, not what they really mean.    

      The remainder measures deviations from the null expectation, due to only competition effect, and can not be partitioned any further. The remainder would be positive for more competitive species and negative for less competitive species in mixture relative to their full density monoculture. The deviation of observed yields from competitive expectations indicates dominance of positive or negative species interactions. All these are clearly outlined at lines 201-221.   

      (2) The justification for the calculation of MG and RC does not seem to follow the very strict assumptions of what competition (in the absence of complementarity) is. See my specific comments above.

      We do not see why not.

      (3) Overall, the manuscript is hard to read. This is in part a problem of terminology and presentation, and it would be good to use more systematic terms for "response patterns" and "biological mechanisms".

      To help understand the variations of competitive growth response with relative competitive ability, the x axis of Figure 1 is labelled with null expectation, competitive expectation, and competitive exclusion from minimum to maximum deviation of competitive ability from community average.

      We have followed terms used in biodiversity partitioning and changing terms can be confusing.  

      Examples:

      - on line 30, the authors write that CE is used to measure "positive" interactions and SE to measure "competitive interactions", and later name "positive" and "negative" interactions "mechanisms of species interactions". Here the authors first use "positive interaction" as any type of effect that results in a community-level biomass gain, but then they use "interaction" with reference to specific biological mechanisms (e.g. one species might attract a parasite that infests another species, which in turn may cause further changes that modify the growth of the first and other species).

      There are some differences in meaning, but that is what CE and SE have been generally used for. Using different terms can be confusing and does not help understanding the problems with AP.

      - on line 70, the authors state that "positive interaction" increases productivity relative to the null expectation, but it is clear that an interaction can have "negative" consequences for one interaction partner and "positive" ones for the other. Therefore, "positive" and "negative" interactions, when defined in this way, cannot be directly linked to "resource partitioning" and "facilitation", and "species interference" as the authors do. Also, these categories of mechanisms are still simple. For example, how do biotic interactions with enemies classify, see above?

      We are explaining effects of competitive interactions on species yield, and ultimately on community yield that can be linked to “resource partitioning" and "facilitation", and "species interference".

      More specific species interactions require detailed biological investigation and cannot be determined through partitioning of biomass production.  

      - line 145: "Under the null hypothesis, species in the mixture are assumed to be competitively equivalent (i.e., absence of interspecific interactions)". This is wrong. The assumption is that there are interspecific interactions, but that these are the same as the intraspecific ones. Weirdly, what follows is a description of the AP method, which does not belong here. This paragraph would better be moved to the introduction where the AP method is mentioned. Or omitted, since it is basically a repetition of the original Loreau & Hector paper.

      As suggested, “absence of interspecific interactions” was replaced with “equal interspecific and intraspecific interactions”.

      We have removed lines 155-162 to reduce duplication. However, our method is based on null expectation that needs to be introduced, despite it is part of AP.

      Other points:

      - line 66: community productivity, not ecosystem productivity.

      Both community productivity and ecosystem productivity are used in biodiversity research, although meaning can be slightly different. Comparatively, ecosystem productivity is more common.

      - line 68: community average responses are with respect to relative yields - this is important!

      - line 64: what are "species effects of species interactions"?

      We searched and did not find “species effects of species interactions”.

      - line 90: here "competitive" and "productive" are mixed up, and it is important to state that "suffers more" refers to relative changes, not yield changes.

      It, in fact, refers to yield changes. For example, less productive species, at active growth, are more responsive to changes in competition, while more productive species, at inactive growth (i.e., aging), are less responsive to changes in competition.   

      - line 92: "positive effect of competitive dominance": I don't understand what is meant here.

      The phrase was modified to “positive contribution of competitive dominance to ecosystem productivity based on the null expectation”.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript by Tao et al. reports on an effort to better specify the underlying interactions driving the effects of biodiversity on productivity in biodiversity experiments. The authors are especially concerned with the potential for competitive interactions to drive positive biodiversity-ecosystem functioning relationships by driving down the biomass of subdominant species. The authors suggest a new partitioning schema that utilizes a suite of partial density treatments to capture so-called competitive ability. While I agree with the authors that understanding the underlying drivers of biodiversity-ecosystem functioning relationships is valuable - I am unsure of the added value of this specific approach for several reasons.

      Strengths:

      I can find a lot of value in endeavouring to improve our understanding of how biodiversity-ecosystem functioning relationships arise. I agree with the authors that competition is not well integrated into the complementarity and selection effect and interrogating this is important.

      Weaknesses:

      (1) The authors start the introduction very narrowly and do not make clear why it is so important to understand the underlying mechanisms driving biodiversity-ecosystem functioning relationships until the end of the discussion.

      There are different ways to start introduction; we believe that starting with the problems of the current approach is the most effective for outlining the study’s objective.  

      (2) The authors criticize the existing framework for only incorporating positive interactions but this is an oversimplification of the existing framework in several ways:

      We did not criticize the existing framework for only incorporating positive interactions. We criticize the existing framework, because it is not based on mechanisms of species interactions, but is extensively used to determine underlying mechanisms driving biodiversity-ecosystem functioning relationships.

      a. The existing partitioning scheme incorporates resource partitioning which is an effect of competition.

      Resource partitioning means that species utilize resources differently, while competition means species use the same resources. “resource partitioning is an effect of competition” is not true in biodiversity experiments that are often short in duration and controlled in conditions.  

      b. The authors neglect the potential that negative feedback from species-specific pests and pathogens can also drive positive BEF and complementarity effects but is not a positive interaction, necessarily. This is discussed in Schnitzer et al. 2011, Maron et al. 2011, Hendriks et al. 2013, Barry et al. 2019, etc.

      We did not. The feedback effect will be reflected in the differences between observed yields and competitive expectations if species in mixtures have different pests and pathogens relative to monocultures. The additive partitioning does not identify these feedback effects either.

      c. Hector and Loreau (and many of the other citations listed) do not limit competition to SE because resource partitioning is a byproduct of competition.

      Positive SE has been largely interpreted as the result of competition including Hector and Loreau (2001) and many others. It needs to be clear that neither of the additive components can be linked to specific mechanisms of species interactions. 

      Does “resource partitioning is a byproduct of competition” mean that species change their niche to avoid competition? If this is what the reviewer means, it may occur through long-term evolution, but not in short-term biodiversity experiments. Hector and Loreau (2001) clearly indicated that their complementarity effect includes both resource partitioning and facilitation.   

      (3) It is unclear how this new measure relates to the selection effect, in particular. I would suggest that the authors add a conceptual figure that shows some scenarios in which this metric would give a different answer than the traditional additive partition. The example that the authors use where a dominant species increases in biomass and the amount that it increases in biomass is greater than the amount of loss from it outcompeting a subdominant species is a general example often used for a selection effect when exactly would you see a difference between the two?:<br /> a. Just a note - I do think you should see a difference between the two if the species suffers from strong intraspecific competition and has therefore low monoculture biomass but this would tend to also be a very low-density monoculture in practice so there would potentially be little difference between a low density and high-density monoculture because the individuals in a high-density monoculture would die anyway. So I am not sure that in practice you would really see this difference even if partial density plots were incorporated.

      Linking new measure to SE or CE would be difficult (see many comparisons in Tables and Figures in our manuscript), as SE and CE are derived from mathematical equation and do not represent specific mechanisms of species interactions (Hector and Loreau 2012; Bourrat et al., 2023).

      (4) One of the tricky things about these endeavors is that they often pull on theory from two different subfields and use similar terminology to refer to different things. For example - in competition theory, facilitation often refers to a positive relative interaction index (this seems to be how the authors are interpreting this) while in the BEF world facilitation often refers to a set of concrete physical mechanisms like microclimate amelioration. The truth is that both of these subfields use net effects. The relative interaction index is also a net outcome as is the complementarity effect even if it is only a piece of the net biodiversity effect. Trying to combine these two subfields to come up with a new partitioning mechanism requires interrogating the underlying assumptions of both subfields which I do not see in this paper.

      Agree, microclimate amelioration is also part of positive effect and will be reflected in the difference between observed yield and competitive expectation. We can not separate the two mechanisms of positive species interactions without investigating influences of microclimate on growth and yield.

      (5) The partial density treatment does not isolate competition in the way that the authors indicate. All of the interactions that the authors discuss are density-dependent including the mechanism that is not discussed (negative feedback from species-specific pests and pathogens). These partial density treatment effects therefore cannot simply be equated to competition as the authors indicate.:

      We use partial density monoculture to determine maximum competitive growth response, effect of density-dependent intraspecific interactions, and species competitive ability to determine the level of maximum competitive growth response species can achieve in mixtures. There may be changes in species-specific pests and pathogens from partial to full density monocultures, which will be captured in competitive growth responses of individuals. We added at lines 186-188 to indicate that the maximum competitive growth response estimated would also include the effects of density-dependent pests, pathogens, or microclimates.   

      a. Additionally - the authors use mixture biomass as a stand-in for competitive ability in some cases but mixture biomass could also be determined by the degree to which a plant is facilitated in the mixture (for example).

      We used monoculture biomass, not mixture biomass, to assess competitive ability

      (6) I found the literature citation to be a bit loose. For example, the authors state that the additive partition is used to separate positive interactions from competition (lines 70-76) and cite many papers but several of these (e.g. Barry et al. 2019) explicitly do not say this.

      Barry et al. (2019) defined CE as overproduction from monocultures, an effect of positive interactions.  

      (7) The natural take-home message from this study is that it would be valuable for biodiversity experiments to include partial density treatments but I have a hard time seeing this as a valuable addition to the field for two reasons:

      a. In practice - adding in partial density treatments would not be feasible for the vast majority of experiments which are already often unfeasibly large to maintain.

      The reviewer suggested that quantity is more important than quality. Without partial density monocultures no one can separate different effects of species interactions, as suggested by Loreau and Hector, reviewers, and many others that effects of species interactions can not be clearly differentiated with replacement series design. Unreliable scientific findings are not valuable.

      b. The density effect would likely only be valuable during the establishment phase of the experiment because species that are strongly limited by intraspecific competition will die in the full-density plots resulting in low-density monocultures. You can see this in many biodiversity experiments after the first years. Even though they are seeded (or rarely planted) at a certain density, the density after several years in many monocultures is quite low.

      True. High or low density also depends on individual size; if individuals do not get enough resources, density is high. Therefore, density effect can be strong even as density drops substantially from initial levels.  

      Reviewer #4 (Public Review):

      Summary:

      This manuscript claims to provide a new null hypothesis for testing the effects of biodiversity on ecosystem functioning. It reports that the strength of biodiversity effects changes when this different null hypothesis is used. This main result is rather inevitable. That is, one expects a different answer when using a different approach. The question then becomes whether the manuscript’s null hypothesis is both new and an improvement on the null hypothesis that has been in use in recent decades.

      It needs to be clear that we use two hypotheses, null hypothesis that is currently used with AP, and competitive hypothesis that is new with this manuscript. The null hypothesis helps determine changes in ecosystem productivity from all species interactions, while the competitive hypothesis helps partition changes in ecosystem productivity by mechanisms of species interactions, i.e., positive, negative, or competitive interactions.    

      Strengths:

      In general, I appreciate studies like this that question whether we have been doing it all wrong and I encourage consideration of new approaches.

      Weaknesses:

      Despite many sweeping critiques of previous studies and bold claims of novelty made throughout the manuscript, I was unable to find new insights. The manuscript fails to place the study in the context of the long history of literature on competition and biodiversity and ecosystem functioning. The Introduction claims the new approach will address deficiencies of previous approaches, but after reading further I see no evidence that it addresses the limitations of previous approaches noted in the Introduction. Furthermore, the manuscript does not reproducibly describe the methods used to produce the results (e.g., in Table 1) and relies on simulations, claiming experimental data are not available when many experiments have already tested these ideas and not found support for them. Finally, it is unclear to me whether rejecting the ‘new’ null hypothesis presented in the manuscript would be of interest to ecologists, agronomists, conservationists, or others. I will elaborate on each of these points below.

      First, there are many biodiversity experiments but those with partial density monocultures are rare. We found only one greenhouse experiment. We have to use simulation to illustrate different scenarios of species interactions to demonstrate how our approach works and how different it is from the AP.  

      Because of different methods used, the results of long history competition research (generally based on additive series design) cannot be used to define effects of competitive interactions in biodiversity research (generally based on replacement series design). This may be the reason that few competition researchers were cited in Loreau and Hector (2001).

      Our approach requires two hypotheses, null and competitive, and the meaning of deviation from these hypotheses are outlined at lines 201-221 for both individual species and community level assessments. Distinguishing changes in ecosystem productivity by species interactions would be of great interest to “ecologists, agronomists, conservationists, or others”.

      The critiques of biodiversity experiments and existing additive partitioning methods are overstated, as is the extent to which this new approach addresses its limitations. For example, the critique that current biodiversity experiments cannot reveal the effects of species interactions (e.g., lines 37-39) isn't generally true, but it could be true if stated more specifically. That is, this statement is incorrect as written because comparisons of mixtures, where there are interspecific and intraspecific interactions, with monocultures, where there are only intraspecific interactions, certainly provide information about the effects of species interactions (interspecific interactions). These biodiversity experiments and existing additive partitioning approaches have limits, of course, for identifying the specific types of interactions (e.g., whether mediated by exploitative resource competition, apparent competition, or other types of interactions). However, the approach proposed in this manuscript gets no closer to identifying these specific mechanisms of species interactions. It has no ability to distinguish between resource and apparent competition, for example. Thus, the motivation and framing of the manuscript do not match what it provides. I believe the entire Introduction would need to be rewritten to clarify what gap in knowledge this proposed approach is addressing and what would be gained by filling this knowledge gap.

      Our approach helps determine underlying mechanisms of species interactions, i.e., positive (resources partitioning or facilitation), negative, or competitive interactions. I am not sure how much we need to go further in identifying more specific mechanisms. If resource and apparent competition refers to resource and interference competition, our approach can tease apart them.

      I recommend that the Introduction instead clarify how this study builds on and goes beyond many decades of literature considering how competition and biodiversity effects depend on density. This large literature is insufficiently addressed in this manuscript. This fails to give credit to previous studies considering these ideas and makes it unclear how this manuscript goes beyond the many previous related studies. For example, see papers and books written by de Wit, Harper, Vandermeer, Connolly, Schmid, and many others. Also, note that many biodiversity experiments have crossed diversity treatments with a density treatment and found no significant effects of density or interactions between density and diversity (e.g., Finn et al. 2013 Journal of Applied Ecology). Thus, claiming that these considerations of density are novel, without giving credit to the enormous number of previous studies considering this, is insufficient.

      A misunderstanding here. Our approach is not designed to test density effect. The same density is held across full density monocultures and mixtures. We use partial density monocultures to determine what species may competitively achieve in full density mixture, without positive or negative interspecific interactions.  

      Replacement series designs emerged as a consensus for biodiversity experiments because they directly test a relevant null hypothesis. This is not to say that there are no other interesting null hypotheses or study designs, but one must acknowledge that many designs and analyses of biodiversity experiments have already been considered. For example, Schmid et al. reviewed these designs and analyses two decades ago (2002, chapter 6 in Loreau et al. 2002 OUP book) and the overwhelming consensus in recent decades has been to use a replacement series and test the corresponding null hypothesis.

      Some wrong impressions. We are not trying to supplant “replacement series” with “additive series”; we use “additive series” designs to supplement “replacement series” design for partitioning changes in ecosystem productivity by mechanisms of species interactions, which would not be possible with “replacement series” design alone, as suggested by many including reviewers.   

      It is unclear to me whether rejecting the 'new' null hypothesis presented in the manuscript would be of interest to ecologists, agronomists, conservationists, or others. Most biodiversity experiments and additive partitions have tested and quantified diversity effects against the null hypothesis that there is no difference between intraspecific and interspecific interactions. If there was no less competition and no more facilitation in mixtures than in monocultures, then there would be no positive diversity effects. Rejecting this null hypothesis is relevant when considering coexistence in ecology, overyielding in agronomy, and the consequences of biodiversity loss in conservation (e.g., Vandermeer 1981 Bioscience, Loreau 2010 Princeton Monograph). This manuscript proposes a different null hypothesis and it is not yet clear to me how it would be relevant to any of these ongoing discussions of changes in biodiversity.

      Our method begins with the null expectation: that intraspecific and interspecific interactions are equivalent. We then propose the competitive hypothesis as a second non-exclusive hypothesis which tests the dominance of positive or negative specific interactions. As shown by its name, the additive partitioning model has been advocated for partitioning biodiversity effects by some ecological mechanisms (CE and SE). The ecological meaning of deviation from the two hypotheses are outlined at lines 201-221 for both individual species and community level assessments.   

      The claim that all previous methods 'are not capable of quantifying changes in ecosystem productivity by species interactions and species or community level' is incorrect. As noted above, all approaches that compare mixtures, where there are interspecific interactions, to monocultures, where there are no species interactions, do this to some extent. By overstating the limitations of previous approaches, the manuscript fails to clearly identify what unique contribution it is offering, and how this builds on and goes beyond previous work.

      The reviewer implies that a partial truth equals the whole truth. The same argument can also be applied to the additive partitioning if relative yield total or response ratio provides a kind of comparison between mixture and monocultures. Our statement is correct in the way that previous approaches are not designed to separate changes in ecosystem productivity by species interactions, as indicated by other reviewers. The additive partitioning is built on Price equation (covariance equation) that has never been biologically demonstrated for relevance in biodiversity partitioning (Bourrat et al., 2023).  

      We made clear that our work is built on and beyond the null expectation with addition of competitive expectation.

      The manuscript relies on simulations because it claims that current experiments are unable to test this, given that they have replacement series designs (lines 128-131). There are, however, dozens of experiments where the replacement series was repeated at multiple densities, which would allow a direct test of these ideas. In fact, these ideas have already been tested in these experiments and density effects were found to be nonsignificant (e.g., Finn et al. 2013).

      Out of point. Again, we are not testing density effect. Partial density is used to determine competitive growth responses that species may achieve in mixture based on their relative competitive ability. We used simulations, as partial density monocultures are used only in one experimental study that has been included in our study.  

      It seems that the authors are primarily interested in trees planted at a fixed density, with no opportunity for changes in density, and thus only changes in the size of individuals (e.g., Fig. 1). In natural and experimental systems, realized density differs from the initial planted density, and survivorship of seedlings can depend on both intraspecific and interspecific interactions. Thus, the constrained conditions under which these ideas are explored in this manuscript seem narrow and far from the more complex reality where density is not fixed.

      We use fixed density only for convenience. In biodiversity experiments, density can increase or decrease over time from initial levels. However, initial density is generally used in evaluation of species interactions. If interest is community productivity, density change does not need to be considered. Again, we are not testing density effects.    

      Additional detailed comments:

      It is unclear to me which 'effects' are referred to on line 36. For example, are these diversity effects or just effects of competition? What is the response variable?

      It means the effect of competitive interactions on productivity and should be clear based on previous sentences.

      The usefulness of the approach is overstated on line 52. All partitioning approaches, including the new one proposed here, give the net result of many types of species interactions and thus cannot 'disentangle underlying mechanisms of species interactions.'

      Not sure how many types of species interactions the reviewer referred to. If mechanisms of species interactions are grouped in three categories (positive, negative, and competitive) as has been in biodiversity research, our approach can tease them apart.   

      The weaknesses of previous approaches are overstated throughout the manuscript, including in lines 60-61. All approaches provide some, but not all insights. Sweeping statements that previous approaches are not effective, without clarifying what they can and can't do, is unhelpful and incorrect. Also, these statements imply that the approach proposed here addresses the limitations of these previous approaches. I don't yet see how it does so.

      The weaknesses of previous approaches are not overstated in terms of separating changes in ecosystem productivity by species interactions. As pointed by other reviewers, none of the previous approaches are designed for quantifying changes in ecosystem productivity by species interactions.   

      The definitions given for the CE and SE on line 71 are incorrect. Competition affects both terms and CE can be negative or have nothing to do with positive interactions, as noted in many of the papers cited.

      We are not trying to define CE and SE but only point out how CE and SE have been generally used in biodiversity research (see recent publication by Feng et al., 2022).

      The proposed approach does not address the limitations noted on lines 73 and 74.

      It does in terms of sources of net biodiversity effect, whether from positive, negative or competitive interactions.

      The definition of positive interactions in lines 77 and 78 seems inconsistent with much of the literature, which instead focuses on facilitation or mutualism, rather than competition when describing positive interactions.

      Much of the literature supports our definition (see Loreau and Hector, 2001). In biodiversity research, positive interactions include resource partitioning and facilitation. What we are trying to point out is that competition affects species and community level assessments based on the null expectation and needs to be separated.

      Throughout the manuscript, competition is often used interchangeably with resource competition (e.g., line 82) and complementarity is often attributed to resource partitioning (e.g., line 77). This ignores apparent competition and partitioning enemy-free niche space, which has been found to contribute to biodiversity effects in many studies.

      If apparent competition refers to interference competition, it is included in negative interaction. Changes in species-specific pests and pathogens in mixture will be captured in positive or negative effects through facilitation or interference.  

      In what sense are competitive interactions positive for competitive species (lines 82-83)? By definition, competition is an interaction that has a negative effect. Do you mean that interspecific competition is less than intraspecific competition? I am having a very difficult time following the logic.

      I am glad the reviewer raised this question that may confuse many others and has never been clearly discussed. It all depends on how comparison is made. If species performance in mixture are compared with that in partial density monocultures, as is in competition research, competition effect is negative for all species. If comparison is made between mixture and full density monocultures, as is done in biodiversity research, competition effect should be positive for more competitive species and negative for less competitive species, with resources flowing from less to more competitive species in mixture relative to full density monocultures.   

      Therefore, the definitions of competitive interactions based on additive series design in competition research cannot be used to describe competitive interactions based on replacement series design in biodiversity research. In biodiversity research, the effects of competitive interactions are never clearly defined at species or community level and mixed up with those of other species interactions.      

      Results are asserted on lines 93-95, but I cannot find the methods that produced these results. I am unable to evaluate the work without a repeatable description of the methods.

      We have added references on sources of these data.

      The description of the null hypothesis in the common additive partitioning approach on lines 145-146 is incorrect. In the null case, it does not assume that there are no interspecific interactions, but rather that interspecific and intraspecific interactions are equivalent.

      Correct, changes have been made as suggested.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I recommend to:

      - re-organize the presentation of the material (see my concerns in the public review section). The manuscript is very difficult to read.

      Changes have been made to help with understanding of our approach. Figure 1 was modified to show the variations of competitive growth response with relative competitive ability from minimum (null expectation) to maximum (competitive exclusion).

      - explore the mathematical form the the remainder term. It seems important to understand that the remainder capture terms unrelated to competition as defined in the present scope.

      The remainder measures deviations from the null expectation, due to species differences in growth and competitive ability or competition effect. The term has clear meaning, positive for more competitive species and negative for less competitive species (lines 202-204), and does not need to be further explored or partitioned. The deviations of observed yields from competitive expectations are outlined in lines 205-221.  

      Reviewer #4 (Recommendations For The Authors):

      The authors should be sure to include reproducible methods and share any data and code.

      Both simulation and experimental data are shared through supplementary tables. Calculations are included in excel spreadsheets and do not require program coding.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This research offers an in-depth exploration and quantification of social vocalization within three families of Mongolian gerbils. In an enlarged, semi-natural environment, the study continuously monitored two parent gerbils and their four pups from P14 to P34. Through dimensionality reduction and clustering, a diverse range of gerbil call types was identified. Interestingly, distinct sets of vocalizations were used by different families in their daily interactions, with unique transition structures exhibited across these families. The primary results of this study are compelling, although some elements could benefit from clarification

      Strengths:

      Three elements of this study warrant emphasis. Firstly, it bridges the gap between laboratory and natural environments. This approach offers the opportunity to examine natural social behavior within a controlled setting (such as specified family composition, diet, and life stages), maintaining the social relevance of the behavior. Secondly, it seeks to understand short-timescale behaviors, like vocalizations, within the broader context of daily and life-stage timescales. Lastly, the use of unsupervised learning precludes the injection of human bias, such as pre-defined call categories, allowing the discovery of the diversity of vocal outputs.

      Weaknesses:

      (1) While the notable differences in vocal clusters across families are convincing, the drivers of these differences remain unclear. Are they attributable to "dialect," call usage, or specific vocalizing individuals (e.g., adults vs. pups)? Further investigation, via a literature review or additional observation, into acoustic differences between adult and pup calls is recommended. Moreover, a consistent post-weaning decrease in the bottom-left cluster (Fig. S3) invites interpretation: could this reflect drops in pup vocalization?

      Thank you for bringing up this point of clarification. Without knowledge of individual vocalizers, we are unable to rigorously assess pronunciation differences between individuals, however we can get a clear proxy for dialect through observing usage differences between families. We’ve added the following text (blue) in the Discussion to help clarify:

      “To address whether gerbils also exhibit family specific vocal features, we compared GMM-labeled vocal cluster usages across the three recorded families and showed differences in vocal type usage (Figure 3). The differences in this study align with the definition of human vocal dialect, which is a regional or social variety of language that can differ in pronunciation, grammatical, semantic and/or language use differences (Henry et al., 2015). This definition of dialect is inclusive of both pronunciation differences (e.g. a Bostonian’s characteristic pronunciation of “car” as “cah”) and usage differences (e.g. a Bostonian’s preferential usage of the words “Go Red Sox” vs. a New Yorker’s preferential usage of the words “Go Yankees”). In our case, vocal clusters can be rarely observed in some families yet highly over-expressed in others (e.g. analogous to language usage differences in humans), or highly expressed in both families, but contain subtle spectrotemporal variations (Figure 3D, Family 1 cluster 11 vs. Family 3 clusters 2, 18, 30; e.g. analogous to pronunciation differences in humans).”

      Indeed, our recordings obtained after pup removal could suggest that adults may use fewer low frequency calls (bottom left cluster in UMAP). However, this dataset does not permit a proper assessment of post-weaning pup calls. In fact, our results and the literature shows that adults are likely to use low frequency calls, but only during social interactions with pups or other adults. For example, Furuyama et al. 2022 describe a number of low frequency call types used by adults in agonistic social interactions, which look similar to a low frequency call type used by pups described in Silberstein et al. 2023. Similarly, Ter-Mikaelian et al. 2012 (their Figure 6) recorded several types of sonic vocalizations during adult social interaction. To our knowledge, it has not been shown whether gerbil pups and adults produce distinct call types. It is a challenging problem to solve, as animals placed in isolation (i.e. an experimental condition for which the identity of the vocalizer is known) vocalize infrequently and of the limited number they might emit, they do not use the full range of vocalizations described in the literature (RP personal observations). To properly address this question, one would need to elicit full use of the vocal repertoire through free social interaction, then attribute calls to individual vocalizers via sound source localization and/or head-mounted microphones — we are currently pursuing both of these technical challenges, but this is outside the scope of this manuscript.

      Although the literature reflects the limitations discussed above, we have added a brief paragraph to the Discussion (limitations section) that addresses the reviewer’s question about the development of vocalizations:

      “Although we were not able to attribute vocalizations to individual family members, we did seek to determine the importance of family structure by comparing audio recordings before and after removal of the pups at P30. The results show a clear effect of family integrity, and the sudden reduction of sonic calls following pup removal (Figure S3) could suggest that these vocalizations are produced selectively by pups.

      However, there is ample evidence that adult gerbils also produce sonic vocalizations. For example, a number of low frequency call types are used by adults during a range of social interactions (Ter-Mikaelian et al., 2012; Furuyama et al., 2022), some of which are similar to a low frequency call type used by pups (Silberstein et al., 2023). Vocalization patterns of developing gerbils depend on isolation or staged interactions. Thus, when gerbil pups are recorded during isolation, ultrasonic vocalization rate declines and sonic vocalizations increase for animals that are in a high arousal state (De Ghett 1974, Silberstein et al., 2023). As gerbils progress from juvenile to adolescent development (P17-55) a significant increase in ultrasonic vocalization rate is observed during dyadic social encounters, with a distinct change in usage pattern that depends upon the sex of each animal (Holman & Seale 1991, Holman et al. 1995). The development of vocalization types has been assessed in another member of the Gerbillinae subfamily, called fat-tailed gerbils (Pachyuromys duprasi), during isolation and handling. Here, the number of ultrasonic vocalization syllable types increase from neonatal to adult animals (Zaytseva et al. 2019), while some very low frequency sonic call types were rarely observed after P20 (Zaytseva et al. 2020). By comparison, mouse syllable usage changes during development, but pups produced 10 of the 11 syllable types produced by adults (Grimsley et al. 2011). In summary, our understanding of the maturation of vocalization usage remains limited by our inability to obtain longitudinal data from individual animals within their natural social setting. For example, when recorded in their natural environment, chimpanzees display a prolonged maturation of vocalization complexity, such as the probability of a unique utterance in a sequence, with the greatest changes occuring when animals begin to experience non-kin social interactions (Bortolato et al. 2023).”

      (2) Developmental progression, particularly during pre-weaning periods when pup vocal output remains unstable, might be another factor influencing cross-family vocal differences. Representing data from this non-stationary process as an overall density map could result in the loss of time-dependent information. For instance, were dominating call types consistently present throughout the recording period, or were they prominent only at specific times? Displaying the evolution of the density map would enhance understanding of this aspect.

      This is a great suggestion. Thank you for bringing it up. To address this, we have added an additional figure (Figure 4) to the main text (Note that the former Figure 4 is now Figure 5). New text associated with this new figure was added to the Results and Discussion sections:

      Results

      “Vocal usage differences remain stable across days of development It is possible that the observed vocal usage differences could result from varying developmental progression of vocal behavior or overexpression of certain vocal types during specific periods within the recording. To assess the potential effect of daily variation on family specific vocal usage, we visualized density maps of vocal usage across days for each of the families (Figure 4A). There are two noteworthy trends: 1.) the density map remains coarsely stable across days (rows) and 2.) the maps look distinct across families on any given day (columns). This is a qualitative approximation for the repertoire’s stability, but does not take into account variation of call type usage (as defined by GMM clustering of the latent space). Figure 4B, shows the normalized usage of each cluster type over development for each family. Cluster usages during the period of “full family, shared recording days” (postnatal days beneath the purple bars) are stable across days within families – as is apparent by the horizontal striations in the plot – though each family maintains this stability through using a unique set of call types. This is addressed empirically in Figure 4C, which shows clearly separable PCA projections of the cluster usages shown in Figure 4B (purple days). Finally, we computed the pairwise Mean Max Discrepancy (MMD) between latent distributions of vocalizations from individual recording days for each of the families (Figure 4D). This shows that across-family repertoire differences are substantially larger than within-family differences. This is visualized in a multidimensional scaling projection of the MMD matrix in Figure 4E.”

      Discussion

      “The described family differences collapse data from multiple days into a single comparison, however it’s possible that factors such as vocal development and/or high usage of particular vocal types during specific periods of the recording could explain family differences. Therefore, we took advantage of the longitudinal nature of our dataset to assess whether repertoire differences remain stable across time. First, we visualized vocal repertoire usage across days as either UMAP probability density maps (Figure 4A) or daily GMM cluster usages (Figure 4B). Though qualitative, one can appreciate that family repertoire usage remains stable across days and appears to differ on a consistent daily basis across families. To formally quantify this, we first projected GMM cluster usages from Figure 4B into PC space and show that family GMM cluster usage patterns are highly separable, regardless of postnatal day (Figure 4C). If families had used a more overlapping set of call types, then the projections would have appeared intermixed. Next, we performed a cluster-free analysis by computing the pairwise MMD distance between VAE latent distributions of vocalizations from each family and day (Figure 4D). This analysis shows very low MMD values across days within a family (i.e. the repertoire is highly consistent with itself), and high MMD values across families/days (greater than would be expected by chance; see shuffle control in Figure S2D). The relative differences in this matrix are made clear in Figure 4E, which provides additional evidence that family vocal repertoires remain stable across days and are consistently different from other families. Taken together, we believe that this is compelling evidence that differences in vocal repertoires between families are not driven by dominating call types during specific phases in the recording period; rather, families consistently emit characteristic sets of call types across days. This opens up the possibility to assess repertoire differences over much shorter time periods (e.g. 24 hours) in future studies.”

      (3) Family-specific vocalizations were credited to the transition structure, a finding that may seem obvious if the 1-gram (i.e., the proportion of call types) already differs. This result lacks depth unless it can be demonstrated that, firstly, the transition matrix provides a robust description of the data, and secondly, different families arrange the same set of syllables into unique sequences.

      Thank you for these important suggestions. We agree that it is true that the 2-gram transition structure must vary based on the 1-gram structure. To determine whether this influences the interpretation of the finding, we have added Figure S5 and the following text in the Results section:

      “To determine whether differences in 1-gram structure contribute to differences in the transition (2-gram) structure, we performed a number of controls. Although subtle, vertical streaks are clearly present in shuffled transition matrices that correspond to 1-gram usages (Figure S5A-B). Given the shuffled data structure, we sought to determine whether the observed transition probabilities differed significantly from chance levels. We randomly shuffled label sequences 1000 times independently for each family to generate a null transition matrix distribution. Using these null distributions and the observed transition probabilities, we computed a p-value for each transition using a one-sample t-test and created a binary transition matrix indicating which transitions happen above chance levels (Figure S5C, black pixels, p <= 0.05 after post hoc Benjamini-Hochberg multiple comparisons correction). As is made clear in Figure S5C, most transitions for each family occur significantly above chance levels, despite the inherent 1-gram structure. Moreover, by looking at transitions from a highly usage cluster type used roughly the same proportion across families (cluster 12), we show that families arrange the same sets of vocal clusters into unique sequences (Figure S5D). We believe that this provides compelling evidence that the 1-gram structure does not change the interpretation of the main claim that transition structure varies by family. “””

      To address your second point, we inspected frequent transitions from individual syllables to all other syllables using bigram transition probability graphs. This revealed a common trend that across all families, many shared and unshared transitions existed, suggesting that families use the same sets of syllables to make unique transition patterns. Figure S5D shows a single syllable example of the phenomenon, with red lines indicating the shared transition types between families and black showing transition patterns not shared between families (i.e. unique family-specific transitions, or lack thereof).”

      Reviewer #2 (Public Review):

      Peterson et al., perform a series of behavioral experiments to study the repertoire and variance of Mongolian gerbil vocalizations across social groups (families). A key strength of the study is the use of a behavioral paradigm which allows for long term audio recordings under naturalistic conditions. This experimental set-up results in the identification of additional vocalization types. In combination with state of the art methods for vocalization analysis, the authors demonstrate that the distribution of sound types and the transitions between these sound types across three gerbil families is different. This is a highly compelling finding which suggests that individual families may develop distinct vocal repertoires. One potential limitation of the study lies in the cluster analysis used for identifying distinct vocalization types. The authors use a Gaussian Mixed Model (GMM) trained on variational auto Encoder derived latent representation of vocalizations to classify recorded sounds into clusters. Through the analysis the authors identify 70 distinct clusters and demonstrate a differential usage of these sound clusters across families. While the authors acknowledge the inherent challenges in cluster analysis and provide additional analyses (i.e. maximum mean discrepancy, MMD), additional analysis would increase the strength of the conclusions. In particular, analysis with different cluster sizes would be valuable. An additional limitation of the study is that due to the methodology that is used, the authors can not provide any information about the bioacoustic features that contribute to differences in sound types across families which limits interpretations about how the animals may perceive and react to these sounds in an ethologically relevant manner.

      The conclusions of this paper are well supported by data, but certain parts of the data analysis should be expanded and more fully explained.

      • Can the authors comment on the potential biological significance of the 70 sound clusters? Does each cluster represent a single sound type? How many vocal clusters can be attributed to a single individual? Similarly, can the authors comment on the intra-individual and inter-individual variability of the sound types within and across families?

      Previous work documenting the Mongolian gerbil repertoire (Ter-Mikaelian 2012, Kobayasi 2012) has revealed ~12 vocalization types that vary with social context. Our thinking is that we are capturing these ~12 (plus a few more, as illustrated in Figure 2C) as well as individual or family-specific variations of some call types. Although the number of discrete call types is likely less than 70, it’s plausible that variation due to vocalizer identity pushes some calls into unique clusters. This idea is supported by the fact that both naked mole rats and Mongolian gerbils have been shown to exhibit individual-specific variation in vocalizations, though only in single call types (Barker 2021, Figure 1; Nishiyama 2011, Table I). The current study is not ideal to test this prediction, as we cannot attribute each vocalization to individual family members. Using our 4-mic array, we attempted to apply established sound source localization techniques to assign vocalizations to individuals (Neunuebel 2015), but the technique failed, presumably due to high amounts of reverberation in the arena. We are currently developing a custom deep learning based sound localization algorithm, and had hoped to extract individual animal vocalizations from our data set (part of the reason why this manuscript has taken longer than expected to return!), but the performance is not yet satisfactory for large groups of animals. We have added text to the Methods sections with the context outlined above to further justify the use of ~70 clusters.

      • As a main conclusion of the paper rests on the different distribution of sound clusters across families, it is important to validate the robustness of these differences across different cluster parameters. Specifically, the authors state that "we selected 70 clusters as the most parsimonious fit". Could the authors provide more details about how this was fit? Specifically, could the authors expand upon what is meant by "prior domain knowledge about the number of vocal types...". If the authors chose a range of cluster values (i.e. 10, 30, 50, 90) does the significance of the results still hold?

      Thank you for the suggestion, this is an important point that we have addressed with new analyses in the revision (see GMM clustering methods and new Figure S4). The prior domain knowledge referenced is with respect to the information known about the Mongolian gerbil vocal types provided in the response above. We have made this more clear in the discussion.

      We mainly based our selection of the number of clusters using the elbow method on GMM held-out log likelihood (Figure S2C). Around 70 clusters is when the likelihood begins to plateau, though it’s clear that there are a number of reasonable cluster sizes. To assess whether cluster size has an effect on interpretation of the family differences result, we added Figure S5, where we varied the number of GMM clusters used and compared cluster usage differences across families (Figure S4A). We quantified pairwise family differences in cluster usage by computing the sum of the absolute value of differential cluster usages, for each GMM cluster value (Figure S4B). We find that relative usage differences remain unchanged across the range of cluster values used, indicating that GMM cluster size does bias the finding.

      • While VAEs are powerful tools for analyzing complex datasets in this case they are restricted to analysis of spectrogram images. Have the authors identified any acoustic differences (i.e. in pitch, frequency, and other sound components) across families?

      Though it’s true that this VAE is limited to spectrograms, the VAE latent space has been shown to correspond to real acoustic features such as frequency and duration, and contain a higher representational capacity than traditional acoustic features (Goffinet 2021, Figure 2). Therefore, clustering of the latent space necessarily means that vocalizations with similar acoustic features are clustered together regardless of their family identity.

      Despite this, your point is well taken that there could be systematic differences in certain acoustic features for specific call types. We are not able to ascertain this with the current dataset. This is addressed in Barker 2021 by recording a single call type (soft chirp) from individuals within and across families. Mongolian gerbils have been shown to exhibit individual differences in the initial, terminal, minimum, and maximum frequency of the ultrasonic up-frequency modulated call type (Figure 2, top right green; Nishiyama 2011, Figure 1A ). Therefore it’s possible that family-specific differences exist for that particular call type. To assess whether other call types show family or individual differences, it’s necessary to either 1.) elicit all call types from an animal in isolation or 2.) determine vocalizer identity in social-vocal interactions. The problem with the former idea is that gerbils only produce up-frequency modulated USVs in isolation and there is no known way to elicit the full vocal repertoire in single animals. The latter idea would allow for full use of the vocal repertoire, but requires invasive techniques (e.g., skull-implanted microphones, or awake-behaving laryngeal nerve recordings) that permit assignment of vocalizations to individuals during a natural social interaction. We are actively exploring solutions to both problems.

      It’s likely that future studies will look deeper into acoustic differences between individuals and families. Therefore, we have added acoustic feature quantification of vocalizations in each of the GMM clusters as a reference (Figure S6).

      Reviewer #3 (Public Review):

      Summary:

      In this study, Peterson et al. longitudinally record and document the vocal repertoires of three Mongolian gerbil families. Using unsupervised learning techniques, they map the variability across these groups, finding that while overall statistics of, e.g., vocal emission rates and bout lengths are similar, families differed markedly in their distributions of syllable types and the transitions between these types within bouts. In addition, the large and rich data are likely to be valuable to others in the field.

      Strengths:

      - Extensive data collection across multiple days in multiple family groups.

      -  Thoughtful application of modern analysis techniques for analyzing vocal repertoires. - Careful examination of the statistical structure of vocal behavior, with indications that these gerbils, like naked mole rats, may differ in repertoire across families.

      Weaknesses:

      - The work is largely descriptive, documenting behavior rather than testing a specific hypothesis.

      - The number of families (N=3) is somewhat limited.

      We agree that the number of families is relatively small. However, our new analysis of vocal repertoire by postnatal day (Figure 4) demonstrates that the finding is quite robust. A high sample-size study was outside the scope of this initial observational study given the difficulty of obtaining and processing longitudinal data of this scale. In light of new analyses in Figure 4, we are confident that future studies will not need so much data to characterize family-specific differences. A single 24-hour recording should be sufficient, making comparison of many more families relatively straightforward.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Several minor concerns:

      (1) The three thresholds used for vocalization segmentation lack explanation.

      Figure 1C's first vocal event appears to define the first gap via the gray threshold (th_2, as the trace does not cross the black line) and the second gap via the black threshold (th_1 or th_3). And this is not addressed in the Methods section.

      Thank you for bringing this to our attention. We agree, this is presented in an unnecessarily complicated way. We have updated the methods section describing the thresholding procedure.

      “Sound onsets are detected when the amplitude exceeds 'th_3' (black dashed line, Figure 1C), and sound offset occurs when there is a subsequent local minimum e.g., amplitude less than 'th_2' (gray dashed line, Figure 1C), or 'th_1' (black dashed line, Figure 1C), whichever comes first. In this specific use case, th_2 (5) will always come before th_1 (2), therefore the gray dashed line will always be the offset. A subsequent onset will be marked if the sound amplitude crosses th_2 or th_3, whichever comes first. For example, the first sound event detected in Figure 1C shows the sound amplitude rising above the black dashed line (th_3) and marks an onset. Subsequently, the amplitude trace falls below the gray dashed line (th_2) and an offset is marked. Finally, the amplitude rises above th_2 without dipping below th_3 and an onset for a new sound event is marked. Had the amplitude dipped below th_3, a new sound event onset would be marked when the amplitude trace subsequently exceeded th_3 (e.g. between sound event 2 and 3, Figure 1C). The maximum and minimum syllable durations were selected based on published duration ranges of gerbil vocalizations (Ter-Mikaelian et al. 2012, Kobayasi & Riquimaroux, 2012).”

      (2) The determination of multi-syllabic calls could be explained further. In Figure 1C, for instance, do syllables separated by short gaps (e.g., the first syllable and the rest of the first group, and the third group in this example) belong to the same call or different calls?

      We have added an operational definition of mono vs. multisyllabic calls in the Results section:

      “Vocalizations occur as either single syllables bounded by silence (monosyllabic) or consist of combinations of single syllables without a silent interval (multisyllabic).”

      Under this definition, the examples you mentioned in Figure 1C are considered monosyllabic. One could reasonably expand the definition to include calls separated by less than X ms of silence for example, however we choose not to do that in this study. A deeper understanding of the phonation mechanisms for different gerbil vocalization types would be helpful to more rigorously determine the distinction between mono vs. multisyllabic vocalizations.

      (3) Labeling the calls shown in Fig. 3D in the latent feature space would help highlight within-family diversity and between-family similarities.

      Great suggestion. We have updated Figure 3 to include where in UMAP space each family’s preferred clusters are.

      (4) In the introduction, the statement, "Therefore, our study considers the possibility that there is a diversity of vocalizations within the gerbil family social group" doesn't naturally follow from the previous example. This could be rephrased.

      Agreed, thank you. We revised this section of the introduction to flow better.

      Reviewer #2 (Recommendations For The Authors):

      While outside the scope of the current study the authors may consider the following experiments and analysis for future studies:

      • Do vocal repertories retain their family signatures across subsequent generations of pups? (i.e. if vocalizations are continually monitored during second or third litters of the same parents).

      • Do the authors observe any long-term changes in family repertoires related to the developmental trajectory of the pups? Are there changes in individual pup vocal features or sound type usage throughout development?

      Thank you for these great suggestions. Given that naked mole rats learn vocalizations through cultural transmission, it would be interesting to see whether other subterranean species with complex social structures (gerbils, voles, rats) have similar abilities. A straightforward way to assess this possibility could be as you suggest — are latent distributions of vocalizations from multi-generational families closer together than cross-family differences? If true, this would provide compelling evidence to investigate further.

      We partially address your second suggestion in our response to Reviewer 1 and in Figure S4, which shows that the family repertoire remains stable throughout this particular period of development. This doesn’t rule out the possibility that there could be other phases of development that undergo more vocal change. Your final suggestion is an area that we are actively researching and eager to know the answer to. A follow-up question: could differences in pup vocal features contribute to differential care by parents?

      Reviewer #3 (Recommendations For The Authors):

      In all, I found the paper clearly written and the figures easy to follow. One small suggestion:

      Figure 1: I can't see the black and gray thresholds described in the caption very well. Perhaps a zoom-in to the first 0.15s or so of the normalized amplitude plot would better display these.

      Agreed, thank you. We added a zoom-in to Figure 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Unckless and colleagues address the issue of the maintenance of genetic diversity of the gene diptericin A, which encodes an antimicrobial peptide in the model organism Drosophila melanogaster.

      Strengths:

      The data indicate that flies homozygous for the dptA S69 allele are better protected against some bacteria. By contrast, male flies homozygous for the R69 allele better resist starvation than flies homozygous for the S69 allele.

      Weaknesses:

      -I am surprised by the inconsistency between the data presented in Fig. 1A and Fig. S2A for the survival of male flies after infection with P. rettgeri. I am not convinced that the data presented support the claim that females have lower survival rates than males when infected with P. rettgeri (lines 176-182).

      The two figures are pasted above (1A left, S2A right). The reviewer is correct that the two experiments look different in terms of overall outcomes for males, though qualitatively similar. These two experiments were performed by different researchers, and as much as we attempt to infect consistently from researcher to researcher, some have heavier hands than others. It is true that the genotype that has the largest sex effect is the arginine line (blue) where females (in this experiment) are as bad as the null allele, and males are more intermediate. Also note that the experiments in S2A (male and female) were done in the same block so they are the better comparison. We’ve reflected this in the manuscript.

      - The data in Fig. 2 do not seem to support the claim that female flies with either the dptA S69 or the R69 alleles have a longer lifespan than males (lines 211-215). A comment on the [delta] dpt line, which is one of the CRISPR edited lines, would be welcome.

      We’ve reworded this section based on these comments.

      - The data in Fig. 2B show that male flies with the dptA S69 or R69 alleles have the same lifespan when poly-associated with L. plantarum and A. tropicalis, which contradicts the claim of the authors (lines 256-260).

      This is correct – the effect is only in females. It has been corrected.

      Reviewer #2 (Public Review):

      Summary: In this study, the authors delve into the mechanisms responsible for the maintenance of two diptericin alleles within Drosophila populations. Diptericin is a significant antimicrobial peptide that plays a dual role in fly defense against systemic bacterial infections and in shaping the gut bacterial community, contributing to gut homeostasis.

      Strengths: The study unquestionably demonstrates the distinct functions of these two diptericin alleles in responding to systemic infections caused by specific bacteria and in regulating gut homeostasis and fly physiology. Notably, these effects vary between male and female flies.

      Weaknesses: Although the findings are highly intriguing and shed light on crucial mechanisms contributing to the preservation of both diptericin alleles in fly populations, a more comprehensive investigation is warranted to dissect the selection mechanisms at play, particularly concerning diptericin's roles in systemic infection and gut homeostasis. Unfortunately, the results from the association study conducted on wild-caught flies lack conclusive evidence.

      This is true that the wild fly association study is mostly a negative result. We’ve backed off the claim about the Morganella association.

      Major Concerns:

      Lines 120-134: The second hypothesis is not adequately defined or articulated. Please revise it to provide more clarity. Additionally, it should be explicitly stated that the first part of the first hypothesis (pathogen specificity), i.e., the superior survival of the S allele in Providencia infections compared to the R allele, has been previously investigated and supported by the results in the Unckless et al. 2016 paper. The current study aims to additionally investigate the opposite scenario: whether the R allele exhibits better survival in a different infection. Please consider revising to emphasize this point.

      We’ve reworded this section and added references to both the Unckless et al. 2016 and Hanson et al. 2023 papers.

      Figures and statistical analyses: It is essential to present the results of significant differences from the statistical analyses within Figures 1B, 2B, and 3. Additionally, please include detailed descriptions of the statistical analysis methods in the figure legends. Specify whether the error bars represent standard error or standard deviation, particularly in Figure 3, where assays were conducted with as few as 3 flies.

      We have added statistical details as requested.

      Lines 317-318 (as well as 320-328): The data related to P. rettgeri appear somewhat incomplete, and the authors acknowledge that bacterial load varies significantly, and this bacterium establishes poorly in the gut. These data may introduce more noise than clarity to the study. Please consider revising these sections by either providing more data, refining the presentation, or possibly removing them altogether.

      The fact that P. rettgeri establishes poorly in the gut in wildtype flies is the result of several unpublished experiments in the Lazzaro and Unckless labs. We don’t have this as a figure because it was not directly tested in these experiments. We’ve added a note that it is personal observation and we’ve reworked the discussion in the second section.

      Lines 335-387 and Figure 4: Although these results are intriguing and suggest interactions between functional diptericin and fly physiology, some mediated by the gut microbiome, they remain descriptive and do not significantly contribute to our understanding of the mechanism that maintains the diptericin alleles.

      While the reviewer is correct that these experiments do not elucidate mechanism, they do strongly suggest (based on the controlled nature of the experiments) that the physiological tradeoffs are due to Diptericin genotype. The disagreement is the level of “mechanism”. At the evolutionary level, the demonstration of a physiological cost of a protective immune allele is sufficient to explain the maintenance of alleles. However, we have not determined (and did not attempt to determine) why Diptericin genotype influences these traits. That will have to wait for future experiments.

      Lines 399-400: The contrast between this result and statement and the highly reproducible data presented in Figures 2-4 should be discussed.

      We’ve added some discussion to this section including a reference to the “inconstancy” of the Drosophila gut microbiome.

      Lines 422-429 and Figure 5D: The conclusion regarding an association between diptericin alleles and Morganellaceae bacteria is not clearly supported by Figure 5D and lacks statistical evidence.

      We’ve changed this to just be suggestive.

      Reviewer #3 (Public Review):

      Summary:

      This paper investigates the evolutionary aspects around a single amino acid polymorphism in an immune peptide (the antimicrobial peptide Diptericin A) of Drosophila melanogaster. This polymorphism was shown in an earlier population genetic study to be under long-term balancing selection. Using flies with different AA at this immune peptide it was found that one allelic form provides better survival of systemic infections by a bacterial pathogen, but that the alternative allele provides its carriers a longer lifespan under certain conditions (depending on the microbiota). It is suggested that these contrasting fitness effects of the two alleles contribute to balance their long-term evolutionary fate.

      Strengths:

      The approach taken and the results presented are interesting and show the way forward for studying such polymorphisms experimentally.

      Weaknesses:

      (1) A clear demonstration (in one experiment) that the antagonistic effect of the two selection pressures isolated is not provided.

      The study is overwhelming with many experiments and countless statistical tests. The overall conclusion of the many experiments and tests suggests that "dptS69 flies survive systemic infection better, while dptS69R flies survive some opportunistic gut infections better." (line 444-446). Given the number of results, different experiments, and hundreds of tests conducted, how can we make sure that the result is not just one of many possible combinations? I suggest experimentally testing this conclusion in one experiment (one may call this the "killer-experiment") with the relevant treatments being conducted at the same time, side by side, and the appropriate statistical test being conducted by a statistical test for a treatment x genotype interaction effect.

      This is a nice idea but would not work in practice since the fly lines used are different (gnotobiotic vs conventional) and gnotobiotics have to be derived from axenic lines that need a few generations to recover from the bleaching treatment.

      (2) The implication that the two forms of selection acting on the immune peptide are maintained by balancing selection is not supported.

      The picture presented about how balancing selection is working is rather simplistic and not convincing. In particular, it is not distinguished between fluctuating selection (FL) and balancing selection (BL). BL is the result of negative frequency-dependent selection. It may act within populations (e.g. Red Queen type processes, mating types) or between populations (local adaptation). FL is a process that is sometimes suggested to produce BL, but this is only the case when selection is negative frequency dependent. In most cases, FL does not lead to BL.

      The presented study is introduced with a framework of BL, but the aspects investigated are all better described as FL (as the title says: "A suite of selective pressures ..."). The two models presented in the introduction (lines 62 to 69; two pathogens, cost of resistance) are both examples for FL, not for BL.

      We’ve added a discussion of how fluctuating selection and balancing selection relate at the end of the discussion.

      Finally, no evidence is presented that the different selection pressures suggested to select on the different allelic forms of the immune peptide are acting to produce a pattern of negative frequency dependence.

      We are not arguing for negative frequency dependent selection. We assume throughout that Dpt allele does not drive overall frequency of P. rettgeri in populations since it is a ubiquitous microbe. So evolution within D. melanogaster therefore has little to no effect on density of the pathogen.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Minor Comments:

      Line 31: Rewrite the sentence mentioning "homozygous serine" for improved clarity, especially since the S/R polymorphism of Diptericin has not been introduced yet.

      This has been changed to be vague in terms of specific alleles and just refers to “one allele” vs the other.

      Lines 87-94: Consider reorganizing this paragraph to maintain a logical flow of the discussion on the Drosophila immune system and the IMD pathway.

      We explored other orders, but we think that as is (IMD to AMPs in general to AMPs in Drosophila) makes the most sense here.

      Line 99: Provide an explanation of balancing selection for a broader readership, differentiating it from other modes of selection.

      We added a brief discussion but note that the intro has significant discussion of balancing selection.

      Lines 105-106: Please provide a proper reference. Additionally, ensure that the Unkless et al. 2016 paper is correctly referenced, both in lines 111 and 138-141.

      This has been added.

      Lines 138-141: It would be beneficial to state that the previous study by Unkless et al. 2016 did not control for genetic background, which is why the assay was redone with gene editing.

      This has been added.

      Lines 296-303: Clarify the source of the survival observations and consider incorporating this data into Figure 2 for improved visualization.

      We’ve clarified that this is Figure 2.

      Lines 390-394: Explain the distinctions between vials and cages, particularly in terms of food consumption, exposure to bacteria, etc., which can be relevant to gut homeostasis.

      We’ve added a discussion of why these two approaches are complementary.

      Reviewer #3 (Recommendations For The Authors):

      Statistics

      Statistical results are limited to the presentation of p-values (several hundred of them!). For a proper assessment of the statistical analyses, one would also want to see the models used and the test statistics obtained.

      The statistical tests done are often unclear. For example, in several experiments, pools of 3 trials (blocs) of multiple animals were tested. The blocs need to be included in the model. Likewise, it seems that multiple delta-dpt fly genotypes were produced. Apparently, they were not distinguished later. Were they considered in the statistical analyses? By contrast, two lines of dptS69R flies were reported to show differences. What concept was applied to test for line difference in some cases and not in others?

      In the same dataset (i.e. data resulting from one experiment), it seems that mostly multiple tests were done. For example, in one case each treatment was contrasted to the dptS69 flies. It is generally not acceptable to break down one dataset in multiple subsets and conduct tests with each subtest. One single model for each experiment should be done. This may then be followed by post-hoc tests to see which treatments differ from each other.

      We’ve attempted to clarify these statistical approaches throughout.

      Minor points

      In the legend of Figure 3 it says: "A) monoassociations where each plot represents a different experiment,". This is unclear to me. First, how many plots are there: 3 or 12? Second, what means "experiment"? Are these treatments, or entirely different experiments? How was this statistically taken into account?

      We’ve changed this to “different condition” which is clearer. We performed statistical analysis independently for each condition and we’ve now discussed that.

      Fig. 5D. It is suggested in the text ("Most intriguing", line 426) and the figure legend that the abundance of Morganellaceae in wild-caught flies differs among genotypes. This is not visible in the figure and not convincingly shown in the text. No stats are given.

      We’ve now added that these differences are not significant.

      Line 458-461: This sentence is unclear.

      We’ve attempted to clarify.

      What is a "a traditional adaptive immune system"?

      We’ve reworded to “an adaptive immune system”.

      There are several typos in the manuscript. Please correct.

      We’ve attempted to fix typos throughout.

      Bold statements are often without references.

      We’ve attempted to add appropriate references throughout.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In the manuscript, the authors explore the mechanism by which Taenia solium larvae may contribute to human epilepsy. This is extremely important question to address because T. solium is a significant cause of epilepsy and is extremely understudied. Advances in determining how T. solium may contribute to epilepsy could have significant impact on this form of epilepsy. Excitingly, the authors convincingly show that Taenia larvae contain and release glutamate sufficient to depolarize neurons and induce recurrent excitation reminiscent of seizures. They use a combination of cutting-edge tools including electrophysiology, calcium and glutamate imaging, and biochemical approaches to demonstrate this important advance. They also show that this occurs in neurons from both mice and humans. This is relevant for pathophysiology of chronic epilepsy development. This study does not rule out other aspects of T. solium that may also contribute to epilepsy, including immunological aspects, but demonstrates a clear potential role for glutamate.

      Strengths:

      - The authors examine not only T. solium homogenate, but also excretory/secretory products which suggests glutamate may play a role in multiple aspects of disease progression.

      - The authors confirm that the human relevant pathogen also causes neuronal depolarization in human brain tissue

      - There is very high clinical relevance. Preventing epileptogenesis/seizures possibly with Glu-R antagonists or by more actively removing glutamate as a second possible treatment approach in addition to/replacing post-infection immune response.

      - Effects are consistent across multiple species (rat, mouse, human) and methodological assays (GluSnFR AND current clamp recordings AND Ca imaging)

      - High K content (comparable levels to high-K seizure models) of larvae could have also caused depolarization. Adequate experiments to exclude K and other suspected larvae contents (i.e. Substance P).

      Weaknesses:

      - Acute study is limited to studying depolarization in slices and it is unclear what is necessary/sufficient for in vivo seizure generation or epileptogenesis for chronic epilepsy. - There is likely a significant role of the immune system that is not explored here. This issue is adequately addressed in the discussion, however, and the glutamate data is considered in this context.

      Discuss impact:

      - Interfering with peri-larval glutamate signaling may hold promise to prevent ictogenesis and chronic epileptogenesis as this is a very understudied cause of epilepsy with unknown mechanistic etiology.

      Additional context for interpreting significance:

      - High medical need as most common adult onset epilepsy in many parts of the world

      We thank Reviewer 1 for their positive and thorough assessment of our manuscript. We have elected to respond to and address the following aspects from their “Recommendations For The Authors” below:

      Reviewer #1 (Recommendations For The Authors):

      Additional experiments/analysis:

      -   Fig 4a-c: Larva on a slice and not next to it? Negative results maybe because its E/S products are just washed away (assuming submerged recording chamber/conditions)? Experiments and negative results described here do not seem conclusive. Should be discussed at least?

      We agree with the reviewer and have added the following sentence to the relevant section of the Results: ‘Our submerged recording setup might have led to swift diffusion or washout of released glutamate, possibly explaining the lack of observable changes.’

      Writing & presentation:

      - Data is not always reported consistently in text and figures, examples:

      - Results in text are reported varyingly without explanation:

      - Mean and/or median? SEM or SD and/or IQR? Stat info included in text or not? i.e. lines 130/131 vs. 160/161

      Results and data are now presented in a more uniform fashion. We report medians and IQRs, sample size, statistical test result, statistical test used in that order.

      - Larval release data interrupts reading flow, lines 246-252 double up results presented in Fig 5F.

      This section has now been significantly abbreviated and reads as follows: ‘T. crassiceps larvae released a relatively constant median daily amount of glutamate, ranging from 41.59 – 60.15 ug/20 larvae, which showed no statistically significant difference across days one to six. Similarly, T. crassiceps larvae released a relatively constant median daily amount of aspartate, ranging from 9.431 – 14.18 ug/20 larvae, which showed no statistically significant difference across days one to six.’

      - Results in figures are reported in different styles:

      Results have now been made uniform, reporting medians and IQRs and: sample size, p test result, statistical test used, figure # reported in that order.

      - Fig 6: E/S glu concentration seems to be significantly higher in solium vs crassiceps (about 6fold higher in solium). Should be discussed at least.

      Given the small sample size from T. solium (see response below), we do not draw attention to this difference and instead simply make the point that T. solium larvae contain and release glutamate.

      - In this context - N=1 may be sufficient for proof of principle (release) but seems too small of a cohort to describe non-constant release of glu over days (Fig 6D). Is initial release on day 1, no release and recovery in the following days reproducible? Is very high glu content of E/S content (15-fold higher in comparison to solium homogenate AND 6-fold higher in comparison to crassiceps homogenate and E/S content). Not sure if Fig 6D is adding relevant information, especially since it is based on n = 1

      We agree that a N=1 is only sufficient for proof of principle. However it is worth noting that the measurements still reflect the cumulative release from 20 larvae. Nonetheless, the statement in text has been simplified to say: ‘These results demonstrate that T. solium larvae continually release glutamate and aspartate into their immediate surroundings.’ As this focusses on the point that the larvae release glutamate and aspartate continuously and that we can’t draw conclusions about the variability over days.

      Methods:

      - Human slices, mention cortex - what part, patient data would be interesting. I.e. etiology of epilepsy, epilepsy duration 

      In the Materials and Methods section “Brain slice preparation” we have now added a table with the requested information.

      - For Taenia solium: How were they acquired and used in these experiments?

      In the Materials and Methods section “Taenia maintenance and preparation of whole cyst homogenates and E/S products” we describe how Taenia solium larvae were acquired and used.

      - Was access resistance monitored? Add exclusion criteria for patch experiments

      Figure supplement tables containing the basic properties for each cell recording have been added for each figure and the following statements were added to the electrophysiology section of the Methods: ‘Basic properties of each cell were recorded (supplementary files 1, 2, 3, 4, 6).’ and ‘Cells were excluded from analyses if the Ra was greater than 80 Ω or if the resting membrane potential was above –40 mV.’  

      - Cannot see any reference to mouse slices in methods? Also, mouse organotypic cultures (for AAV?)? Or only acute slices from mice and organotypic hip cultures from rats? Seems to have been mouse and rat organotypic cultures? But not clear with further clarification in methods.

      We have now added the following clarification to the methods: ‘For experiments using calcium and glutamate imaging mouse hippocampal organotypic brain slices were used. For all other experiments rat hippocampal organotypic brain slices were used. A subset of experiments used acute human cortical brain slices and are specified.’

      - How long after the wash-in phase was the wash-out phase data collected?

      For wash-in recordings drugs were washed in for 8 mins before recordings were made. Drugs were washed out for at least 8 mins before wash-out recordings were made. This information has been added to the Materials and Methods section.

      - In general, the M&M section seems to have been written hastily - author's internal remarks "supplier?" are still present.

      The M&M section has been thoroughly proofread for errors and internal remarks removed or corrected.

      - A little more information on the clinical subjects would be appreciated. I.e. duration of epilepsy? Localization? What cortex? Usual temporal lobe or other regions?

      We have now added a table with this information to the Materials and Methods section “Brain slice preparation”.

      Minor corrections text/figures:

      - i.e. 3D,F,H,J show individual data points, thats great, but maybe add mean/median marker (as results are reported like this in text)  like in fig 4G,I and others

      Figures 3D,F,H & J have been revised to include median and IQR.

      - Only one patient mentioned in acknowledgements, but 2 in methods and text

      We apologize for this oversight and now acknowledge both patients in the acknowledgements.

      - Fig 1 B-F individual puffs are described as increasing - consistent with cellular effects (1st puff depolarizes, 2nd puff elicits 1 AP, 3rd puff elicits AP burst)  However, dilution ratio of homogenate or puff concentrations are not mentioned (or potentially longer than 20 ms puffs for 2nd and 3rd stimulus?) in text or figures. Seems to be enough space to indicate in figure as well (i.e. multiple or thicker arrows for subsequent puffs or label with homogenate dilution/concentration in figure).

      We state in the results section associated with Fig. 1 that increasing the amount of homogenate delivered was achieved by increasing the pressure applied to the ejection system. We now include this information in the figure legend.

      - Figure legend describes 30 ms puff for Ca imaging whereas ephys data (from text) is 20 ms puff. Was Ca imaging performed in acute mouse hippocampal slices (as figure text suggests) or were those organotypic hippocampal cultures from mice?

      Ca2+  imaging was performed in mouse hippocampal organotypic brain slice cultures. The figure text for Fig. 1 E) states “widefield fluorescence image of neurons in the dentate gyrus of a mouse hippocampal organotypic brain slice culture expressing the genetically encoded Ca2+ reporter GCAMP6s...”

      - 11.4 mM K is reported for homogenate in text only. How variable is that? How many n? No SD reported in text and no individual data points reported since this experiment is not represented as a figure.

      This has been clarified in the text by adding (N = 1, homogenate prepared from >100 larvae).

      - Same results (effect of 11.4 mM K on Vm) described twice in one paragraph, compare lines 126-131 with 131-136.

      The repetition has been removed.

      - Line 182 - example for consistency: decide IQR or SD/SEM

      To improve consistency, we have changed to median and IQR throughout.

      - Neuronal recordings are reported as hippocampal pyramidal neurons (i.e. line 222) but some recordings were made from dentate granule cells - please clarify which neurons were recorded in ephys, ca imaging, GluSnFr imaging

      For each experiment we describe which type of neurons were recorded from. For rodent recordings these were hippocampal pyramidal neurons except in the case of the Ca2+ imaging example where the widefield recording was over the dentate gyrus subfield.

      - Line 309: "should" seems to be an extra word

      We have removed the word ‘should’ and made the sentence shorter and clearer. It now reads: ‘Given our finding that cestode larvae contain and release significant quantities of glutamate, it is possible that homeostatic mechanisms for taking up and metabolizing glutamate fail to compensate for larvalderived glutamate in the extracellular space. Therefore, similar glutamate-dependent excitotoxic and epileptogenic processes that occur in stroke, traumatic brain injury and CNS tumors are likely to also occur in NCC.’

      Reviewer #2 (Public Review):

      Since neurocysticercosis is associated with epilepsy, the authors wish to establish how cestode larvae affect neurons. The underlying hypothesis is that the larvae may directly excite neurons and thus favor seizure genesis.

      To test this hypothesis, the authors collected biological materials from larvae (from either homogenates or excretory/secretory products), and applied them to hippocampal neurons (rats and mice) and human cortical neurons.

      This constitutes a major strength of the paper, providing a direct reading of larvae's biological effects. Another strength is the combination of methods, including patch clamp, Ca, and glutamate imaging.

      We thank the Reviewer 2 for their review of the strength and weaknesses of our manuscript. We respond to the identified weaknesses below.

      There are some weaknesses:

      (1) The main one relates to the statement: "Together, these results indicate that T. crassiceps larvae homogenate results not just in a transient depolarization of cells in the immediate vicinity of application, but can also trigger a wave of excitation that propagates through the brain slice in both space and time. This demonstrates that T. crassiceps homogenate can initiate seizurelike activity under suitable conditions."

      The only "evidence" of propagation is an image at two time points. It is one experiment, and there is no quantification. Either increase n's and perform a quantification, or remove such a statement.

      We acknowledge that the data is from one experiment, with the intention of demonstrating that it is plausible for intense depolarization of a subset of neurons to result in the initiation and propagation of seizure-like activity to nearby neurons under suitable conditions. However, we agree that it is prudent to remove this statement and have done so.

      Likewise, there is no evidence of seizure genesis. A single cell recording is shown. The presence of a seizure-like event should be evaluated with field recordings.

      In this experiment the Ca2+ imaging demonstrates activity spreading from the site of the restricted homogenate puff to all surrounding neurons. Furthermore, the whole-cell recoding is typical of a slice wide seizure-like event.  

      (2) Control puff experiments are lacking for Fig 1. Would puffing ACSF also produce a depolarization, and even firing, as suggested in Fig. 2D? This is needed for at least one species.

      We agree and have added this data for the rat and mouse neuron in a new Figure 1-figure supplement 1.

      (3) What is the rationale to use a Cs-based solution? Even in the presence of TTX and with blocking K channels, the depolarization may be sufficient to activate Ca channels (LVGs), which would further contribute to the depolarization. Why not perform voltage clamp recordings to directly the current?

      The intention of the Cs-based solution was to block K+ channels and reduce the effect of moderately raised K+ in the homogenate to isolate the contribution of other causative agents of depolarization (i.e. glutamate / aspartate). We agree that performing voltage clamp recordings would have been useful for directly recording the currents responsible for depolarization. 

      (4) Why did you use organotypic slices? Since you wish to model adult epilepsy, it would have been more relevant to use fresh slices from adult rats/mice. At least, discuss the caveat of using a network still in development in vitro.

      Recordings were performed 6–14 days post culture, which is equivalent to postnatal Days (P) 12 to 22. Previous work has shown that neurons in the organotypic hippocampal brain slice are relatively mature (Gähwiler et al., 1997). For example they possess mature Cl- homeostasis mechanisms at this point, as evidenced by their hyperpolarizing EGABA (Raimondo et al., 2012).  

      (5) Please include both the number of slices and number of cells recorded in each condition. This is the standard (the number of cells is not enough).

      This has now been added to all relevant sections of the results text.  

      (6) Please provide a table with the basic properties of cells (Rin, Rs, etc.). This is standard to assess the quality of the recordings.

      Tables containing the basic properties for each cell recording have been created for each figure (as Figure supplements) and the following statement was added to the electrophysiology section of the Methods: ‘Basic properties of each cell were recorded (see Figure supplements).’

      (7) Please provide a table on patient's profile. This is standard when using human material. Were these TLE cases (and "control" cortex) or epileptogenic cortex?

      We have now added a basic table on the patient’s profiles to the Materials and Methods section.

      Globally, the authors achieved their aims. They show convincingly that larvae material can depolarize neurons, with glutamate (and aspartate) as the most likely candidates.

      This is important not only because it provides mechanistic insight but also potential therapeutic targets. The result is impactful, as the authors use quasi-naturalistic conditions, to assess what might happen in the human brain. The experimental design is appropriate to address the question. It can be replicated by any interested person.

      We thank the Reviewer 2 for their enthusiastic and constructive assessment of our manuscript. We have elected to respond to and address the following aspects from their “Recommendations For The Authors” below:

      Reviewer #2 (Recommendations For The Authors):

      lines 132 and following are a repetition of those above

      These have been removed.

      line 151 Fig "2" missing

      This has been added.

      187, 190 should be E, F not C, D

      This has been changed in the text.  

      481, 482 supplier?

      This has been corrected and the correct suppliers described.

      Reviewer #3 (Public Review):

      This paper has high significance because it addresses a prevalent parasitic infection of the nervous system, Neurocysticercosis (NCC). The infection is caused by larvae of the parasitic cestode Taenia solium It is a leading cause of epilepsy in adults worldwide

      To address the effects of cestode larvae, homogenates and excretory/secretory products of larvae were added to organotypic brain slice cultures of rodents or layer 2/3 of human cortical brain slices from patients with refractory epilepsy.

      We thank Reviewer 3 for their helpful comments and suggestions for improvement which we address below.

      A self-made pressure ejection system was used to puff larvae homogenate (20 ms puff) onto the soma of patched neurons. The mechanical force could have caused depolarizaton so a vehicle control is critical. On line 150 they appear to have used saline in this regard, and clarification would be good. Were the controls here (and aCSF elsewhere) done with the low Mg2+o aCSF like the larvae homogenates?

      We agree and have added examples where aCSF alone was pressure ejected onto the same rat and mouse neurons in a new Figure 1-figure supplement 1. In Figure 1, the same aCSF as that was used to bathe the slices was used. In Figure 2D-G, either PBS (which larval homogenates were prepared in) or growth medium (which contain larval E/S products) were used as comparative controls.

      They found that neurons depolarized after larvae homogenate exposure and the effect was mediated by glutamate but not nicotinic receptors for acetylcholine (nAChRs), acid-sensing channels or substance P. To address nAChRs, they used 10uM mecamyline, and for ASICs 2mM amiloride which seems like a high concentration. Could the concentrations be confirmed for their selectivity? 

      We did not independently verify the selectivity of the antagonist concentrations used in our study. However, the persistence of depolarizations despite the use of high concentrations of mecamylamine (10 μM) and amiloride (2 mM) provides strong evidence that neither nAChRs nor ASICs are primarily responsible for mediating these responses. The high concentrations used, while potentially raising concerns about specificity, actually strengthen our conclusion that these receptor types are not involved in the observed effect.

      Glutamate receptor antagonists, used in combination, were 10uM CNQX, 50uM DAP5, and 2mM kynurenic acid. These concentrations are twice what most use. Please discuss. 

      We intentionally used higher-than-typical concentrations of glutamate receptor antagonists in our experimental design. Our rationale for this approach was to ensure maximal blockade of glutamate receptors, thereby minimizing the possibility of residual receptor activity confounding our results.

      Also, it would be very interesting to know if the glutamate receptor is AMPA, Kainic acid, or NMDA. Were metabotropic antagonists ever tested? That would be logical because CNQX/DAPR/Kynurenic acid did not block all of the depolarization.

      We appreciate the reviewer's interest in the specific glutamate receptor subtypes involved in our study. Our research primarily focused on ionotropic glutamate receptors as a group, without differentiating the individual contributions of AMPA, Kainate, and NMDA receptors. This approach, while broad, allowed us to establish the involvement of glutamatergic signalling in the observed effects. We acknowledge that we did not investigate metabotropic glutamate receptors in this study. Importantly, we demonstrate later in our manuscript that the larval products contain both glutamate and aspartate. Therefore the precise nature of the glutamate-dependent depolarization observed using a particular experimental preparation would depend on the specific types of neurons exposed to the homogenate and the expression profile of different glutamate receptor subtypes on these neurons.

      They also showed the elevated K+ in the homogenate (~11 mM) could not account for the depolarization. However, the experiment with K+ was not done in a low Mg2+o buffer (Or was it -please clarify). 

      The experiment where 11.39 mM K+ as well as the experiment with T. crass. Homogenate with a cesium internal and added TTX were all done in standard 2 mM Mg2+ containing aCSF.

      They also confirmed that only small molecules led to the depolarization after filtering out very large molecules. That supports the conclusion that glutamate - which is quite small - could be responsible. It is logical to test substance P because the Intro points out prior work links the larvae and seizures by inflammation and implicates substance P. However, why focus on nAChRs and ASIC?

      These were chosen as they are ionotropic receptors which mediate depolarization and hence could conceivably be responsible for the homogenate-induced depolarization we observed.

      The depolarizations caused seizure-like events in slices. The slices were exposed to a proconvulant buffer though- low Mg2+o. This buffer can cause spontaneous seizure-like events so it is important to know what the buffer did alone.

      We agree that a low M2+ buffer solution can elicit seizure-like events in organotypic slices alone. However, the timing of the onset of the seizure-like event in the example presented in Figure 1 strongly suggests that it was triggered by the T. crass homogenate puff. Nonetheless, on the suggestion of the other reviewers we have reduced emphasis on our experimental evidence for the ability of T. crass. homogenate to illicit seizure-like events.  

      They suggest the effects could underlie seizure generation in NCC. However, there is only one event that is seizure-like in the paper and it is just an inset. Were others similar? How frequency were they? How long?

      Please see the response above as well as our response to Reviewer 1 who raised a similar concern.

      Using Glutamate-sensing fluorescent reporters they found the larvae contain glutamate and can release it, a strength of the paper.

      Fig. 4. Could an inset be added to show the effects are very fast? That would support an effect of glutamate.

      We have not added an inset. However, given the scale bar (500 ms) for the trace provided, the response is very fast.  

      Why is aspartate relatively weak and glutamate relatively effective as an agonist?

      Glutamate generally has a higher affinity for glutamate receptors compared to aspartate. This is particularly true for AMPA and kainate receptors, where glutamate is the primary endogenous agonist. Similarly iGluSnFR has a higher sensitivity for glutamate over aspartate (Marvin et al., 2013).

      Could some of the variability in Fig 4G be due to choice of different cell types? That would be consistent with Fig 5B where only a fraction of cells in the culture showed a response to the larvae nearby. 

      Whilst differences in cell types could contribute to the variability in Fig 4G, all the responses were recorded from hippocampal pyramidal neurons and hence it is more likely that the variability is a function of other sources of variation including differences in iGluSnFR expression, depth of the cell imaged, the proximity of the puffer pipette etc. In Fig. 5B we think the lack of response may be due to the fact that any released glutamate by the live larvae was not able reach the iGluSnFR neurons at sufficient concentrations due to the nature of our submerged recording setup. We have added the following sentence to the results. ‘Our submerged recording setup might have led to swift diffusion or washout of released glutamate, possibly explaining the lack of observable changes.’

      On what basis was the ROI drawn in Fig. 5B.

      The ROI drawn in Fig. 5B was selected to include all iGluSnFR expressing neurons in the brain slice. which were captured in the field of view.

      Also in 5B, I don't see anything in the transmitted image. What should be seen exactly?

      We agree that it is difficult to resolve much in the transmitted image. However, both the brain slice on the left as well as a T. crass. larva on the right is visible and outlined with a green or orange dashed line respectively.

      Human brain slices were from temporal cortex of patients with refractory epilepsy. Was the temporal cortex devoid of pathology and EEG abnormalities? This area may be quite involved in the epilepsy because refractory epilepsy that goes to surgery is often temporal lobe epilepsy. Please discuss the limitations of studying the temporal cortex of humans with epilepsy since it may be more susceptible to depolarizations of many kinds, not just larvae.

      We acknowledge the important limitations of using temporal cortex tissue from patients with refractory epilepsy. While we aimed to use visually normal tissue, we recognize that the tissue may have underlying pathology or functional abnormalities not visible to the naked eye. It may also be more susceptible to induced depolarizations due to epilepsy-related changes in neuronal excitability. Despite these limitations, we believe our human tissue data still provides valuable data that the larval homogenates can induce depolarization in human as well as rodent neurons.  

      Please discuss the limitations of the cultures - they are from very young animals and cultured for 6-14 days.

      We acknowledge the potential limitations of our experimental model using organotypic hippocampal slice cultures from young animals. The use of relatively immature tissue may not fully represent the adult nervous system due to developmental differences in receptor expression, synaptic connections, and network properties. The 6-14 day culture period, while allowing some maturation, may induce changes that differ from the in vivo environment, including alterations in cellular physiology and network reorganization. Despite these limitations, this model provides a valuable balance between preserved local circuitry and experimental accessibility. Future studies comparing results with acute adult slices and in vivo models would be beneficial to validate and extend our findings.

      References:

      Gähwiler, B.H. et al. (1997) ‘Organotypic slice cultures: a technique has come of age.’, Trends in neurosciences, 20(10), pp. 471–7.

      Marvin, J.S. et al. (2013) ‘An optimized fluorescent probe for visualizing glutamate neurotransmission.’, Nature methods, 10(2), pp. 162–70. Available at: https://doi.org/10.1038/nmeth.2333.

      Raimondo, J.V. et al. (2012) ‘Optogenetic silencing strategies differ in their effects on inhibitory synaptic transmission.’, Nat. Neurosci., 15(8), pp. 1102–4. Available at: https://doi.org/10.1038/nn.3143.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors describe a method to probe both the proteins associated with genomic elements in cells, as well as 3D contacts between sites in chromatin. The approach is interesting and promising, and it is great to see a proximity labeling method like this that can make both proteins and 3D contacts. It utilizes DNA oligomers, which will likely make it a widely adopted method. However, the manuscript over-interprets its successes, which are likely due to the limited appropriate controls, and of any validation experiments. I think the study requires better proteomic controls, and some validation experiments of the "new" proteins and 3D contacts described. In addition, toning down the claims made in the paper would assist those looking to implement one of the various available proximity labeling methods and would make this manuscript more reliable to non-experts.

      Strengths:

      (1) The mapping of 3D contacts for 20 kb regions using proximity labeling is beautiful.

      (2) The use of in situ hybridization will probably improve background and specificity.

      (3) The use of fixed cells should prove enabling and is a strong alternative to similar, living cell methods.

      Weaknesses:

      (1) A major drawback to the experimental approach of this study is the "multiplexed comparisons". Using the mtDNA as a comparator is not a great comparison - there is no reason to think the telomeres/centrosomes would look like mtDNA as a whole. The mito proteome is much less complex. It is going to provide a large number of false positives. The centromere/telomere comparison is ok, if one is interested in what's different between those two repetitive elements. But the more realistic use case of this method would be "what is at a specific genomic element"? A purely nuclear-localized control would be needed for that. Or a genomic element that has nothing interesting at it (I do not know of one). You can see this in the label-free work: non-specific, nuclear GO terms are enriched likely due to the random plus non-random labeling in the nucleus. What would a Telo vs general nucleus GSEA look like? (GSEA should be used for quantitative data, no GO). That would provide some specificity. Figures 2G and S4A are encouraging, but a) these proteins are largely sequestered in their respective locations, and b) no validation by an orthogonal method like ChIP or Cut and Run/Tag is used.

      You can also see this in the enormous number of "enriched" proteins in the supplemental volcano plots. The hypothesis-supporting ones are labeled, but do the authors really believe all of those proteins are specific to the loci being looked at? Maybe compared to mitochondria, but it's hard to believe there are not a lot of false positives in those blue clouds. I believe the authors are more seeing mito vs nucleus + Telo than the stated comparison. For example, if you have no labeling in the nucleus in the control (Figures 1C and 2C) you cannot separate background labeling from specific labeling. Same with mito vs. nuc+Telo. It is not the proper control to say what is specifically at the Telo.

      I would like to see a Telo vs nuclear control and a Centromere vs nuc control. One could then subtract the background from both experiments, then contrast Telo vs Cent for a proper, rigorous comparison. However, I realize that is a lot of work, so rewriting the manuscript to better and more accurately reflect what was accomplished here, and its limitations, would suffice.

      (2) A second major drawback is the lack of validation experiments. References to literature are helpful but do not make up for the lack of validation of a new method claiming new protein-DNA or DNA-DNA interactions. At least a handful of newly described proximal proteins need to be validated by an orthogonal method, like ChIP qPCR, other genomic methods, or gel shifts if they are likely to directly bind DNA. It is ok to have false positives in a challenging assay like this. But it needs to be well and clearly estimated and communicated.

      (3) The mapping of 3D contacts for 20 kb regions is beautiful. Some added discussion on this method's benefits over HiC-variants would be welcomed.

      (4) The study claims this method circumvents the need for transfectable cells. However, the authors go on to describe how they needed tons of cells, now in solution, to get it to work. The intro should be more in line with what was actually accomplished.

      (5) Comments like "Compared to other repetitive elements in the human genome...." appear to circumvent the fact that this method is still (apparently) largely limited to repetitive elements. Other than Glopro, which did analyze non-repetitive promoter elements, most comparable methods looked at telomeres. So, this isn't quite the advancement you are implying. Plus, the overlap with telomeric proteins and other studies should be addressed. However, that will be challenging due to the controls used here, discussed above.

      We thank the Reviewer for their careful reading of manuscript and constructive suggestions. We plan to substantially revise the framing and presentation of manuscript to address the concerns raised by all three reviewers.

      Reviewer #2 (Public review):

      Summary

      Liu and MacGann et al. introduce the method DNA O-MAP that uses oligo-based ISH probes to recruit horseradish peroxidase for targeted proximity biotinylation at specific DNA loci. The method's specificity was tested by profiling the proteomic composition at repetitive DNA loci such as telomeres and pericentromeric alpha satellite repeats. In addition, the authors provide proof-of-principle for the capture and mapping of contact frequencies between individual DNA loop anchors.

      Strengths

      Identifying locus-specific proteomes still represents a major technical challenge and remains an outstanding issue (1). Theoretically, this method could benefit from the specificity of ISH probes and be applied to identify proteomes at non-repetitive DNA loci. This method also requires significantly fewer cells than other ISH- or dCas9-based locus-enrichment methods. Another potential advantage to be tested is the lack of cell line engineering that allows its application to primary cell lines or tissue.

      Weaknesses

      The authors indicate that DNA O-MAP is superior to other methods for identifying locus-specific proteomes. Still, no proof exists that this method could uncover proteomes at non-repetitive DNA loci. Also, there is very little validation of novel factors to confirm the superiority of the technique regarding specificity.

      The authors first tested their method's specificity at repetitive telomeric regions, and like other approaches, expected low-abundant telomere-specific proteins were absent (for example, all subunits of the telomerase holoenzyme complex). Detecting known proteins while identifying noncanonical and unexpected protein factors with high confidence could indicate that DNA O-MAP does not fully capture biologically crucial proteins due to insufficient enrichment of locus-specific factors. The newly identified proteins in Figure 1E might still be relevant, but independent validation is missing entirely. In my opinion, the current data cannot be interpreted as successfully describing local protein composition.

      Finally, the authors could have discussed the limitations of DNA O-MAP and made a fair comparison to other existing methods (2-5). Unlike targeted proximity biotinylation methods, DNA O-MAP requires paraformaldehyde crosslinking, which has several disadvantages. For instance, transient protein-protein interactions may not be efficiently retained on crosslinked chromatin. Similarly, some proteins may not be crosslinked by formaldehyde and thus will be lost during preparation (6).

      (1) Gauchier M, van Mierlo G, Vermeulen M, Dejardin J. Purification and enrichment of specific chromatin loci. Nat Methods. 2020;17(4):380-9.

      (2) Dejardin J, Kingston RE. Purification of proteins associated with specific genomic Loci. Cell. 2009;136(1):175-86.

      (3) Liu X, Zhang Y, Chen Y, Li M, Zhou F, Li K, et al. In Situ Capture of Chromatin Interactions by Biotinylated dCas9. Cell. 2017;170(5):1028-43 e19.

      (4) Villasenor R, Pfaendler R, Ambrosi C, Butz S, Giuliani S, Bryan E, et al. ChromID identifies the protein interactome at chromatin marks. Nat Biotechnol. 2020;38(6):728-36.

      (5) Santos-Barriopedro I, van Mierlo G, Vermeulen M. Off-the-shelf proximity biotinylation for interaction proteomics. Nat Commun. 2021;12(1):5015.

      (6) Schmiedeberg L, Skene P, Deaton A, Bird A. A temporal threshold for formaldehyde crosslinking and fixation. PLoS One. 2009;4(2):e4636.

      We thank the Reviewer for their constructive feedback on our work. As noted above, we plan to substantially revise the framing and presentation of manuscript to address the concerns raised by all three reviewers.

      Reviewer #3 (Public review):

      Significance of the Findings:

      The study by Liu et al. presents a novel method, DNA-O-MAP, which combines locus-specific hybridisation with proximity biotinylation to isolate specific genomic regions and their associated proteins. The potential significance of this approach lies in its purported ability to target genomic loci with heightened specificity by enabling extensive washing prior to the biotinylation reaction, theoretically improving the signal-to-noise ratio when compared with other methods such as dCas9-based techniques. Should the method prove successful, it could represent a notable advancement in the field of chromatin biology, particularly in establishing the proteomes of individual chromatin regions - an extremely challenging objective that has not yet been comprehensively addressed by existing methodologies.

      Strength of the Evidence:

      The evidence presented by the authors is somewhat mixed, and the robustness of the findings appears to be preliminary at this stage. While certain data indicate that DNA-O-MAP may function effectively for repetitive DNA regions, a number of the claims made in the manuscript are either unsupported or require further substantiation. There are significant concerns about the resolution of the method, with substantial biotinylation signals extending well beyond the intended target regions (megabases around the target), suggesting a lack of specificity and poor resolution, particularly for smaller loci. Furthermore, comparisons with previous techniques are unfounded since the authors have not provided direct comparisons with the same mass spectrometry (MS) equipment and protocols. Additionally, although the authors assert an advantage in multiplexing, this claim appears overstated, as previous methods could achieve similar outcomes through TMT multiplexing. Therefore, while the method has potential, the evidence requires more rigorous support, comprehensive benchmarking, and further experimental validation to demonstrate the claimed improvements in specificity and practical applicability.

      We thank the Reviewer for providing detailed critiques of our manuscript. As noted above, we plan to substantially revise the framing and presentation of manuscript to address the concerns raised by all three reviewers.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The crystal structure of the Sld3CBD-Cdc45 complex presented by Li et al. is a novel contribution that significantly advances our understanding of CMG formation during the rate-limiting step of DNA replication initiation. This structure provides insights into the intermediate steps of CMG formation. The study builds upon previously known structures of Sld3 and Cdc45 and offers new perspectives into how Cdc45 is loaded onto MCM DH through Sld3-Sld7. The most notable finding is the structural difference in Sld3CBD when bound to Cdc45, particularly the arrangement of the α8-helix, which is essential for Cdc45 binding and may also pertain to its metazoan counterpart, Treslin. Additionally, the conformational shift in the DHHA1 domain of Cdc45 suggests a possible mechanism for its binding to MCM2NTD.

      Strengths:

      The manuscript is generally well-written, with a precise structural analysis and a solid methodological section that will significantly advance future studies in the field. The predictions based on structural alignments are intriguing and provide a new direction for exploring CMG formation, potentially shaping the future of DNA replication research.

      Weaknesses:

      The main weakness of the manuscript lies in the lack of experimental validation for the proposed Sld3-Sld7-Cdc45 model. Specifically, the claim that Sld3 binding to Cdc45-MCM does not inhibit GINS binding, a finding that contradicts previous research, is not sufficiently substantiated with experimental evidence. To strengthen their model, the authors must provide additional experimental data to support this mechanism. Also, the authors have not compared the recently published Cryo-EM structures of the metazoan CMG helicases with their predicted models to see if Sld3/Treslin does not cause any clash with the GINS when bound to the CMG. Still, the work holds great potential in its current form but requires further experiments to confirm the authors' conclusions.

      We appreciate the reviewers’ careful reading and the comments.

      The structure of Sld3CBD-Cdc45 showed that the binding site of Cdc45 to Sld3CBD was distinct from the binding ranges of Cdc45 to GINS and MCM, indicating that the Sld3CBD, MCM, and GINS bind to separate sites of Cdc45 on the CMG complex. The SCMG-DNA model confirmed such a binding situation but did not show whether the binding of Sld3 to Cdc45 affects the recruitment of GINS (by GINS-Dbp11-Sld2) for CMG formation. We will modify our manuscript and discuss this point. Also, we will check the recently published Cryo-EM structures of the metazoan CMG helicases with their predicted models to confirm our conclusions. We will try to conduct the experiments as suggested.

      Reviewer #2 (Public review):

      Summary

      The manuscript presents valuable findings, particularly in the crystal structure of the Sld3CBD-Cdc45 interaction and the identification of additional sequences involved in their binding. The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is novel, and the results provide insights into potential conformational changes that occur upon interaction. However, the work remains incomplete as several main claims are only partially supported by experimental data, particularly the proposed model for Sld3 interaction with GINS on the CMG. Additionally, the single-stranded DNA binding data from different species do not convincingly advance the manuscript's central arguments.

      Strengths

      (1) The Sld3CBD-Cdc45 structure is a novel contribution, revealing critical residues involved in the interaction.

      (2) The model structures generated from the crystal data are well presented and provide valuable insights into the interaction sequences between Sld3 and Cdc45.

      (3) The experiments testing the requirements for interaction sequences are thorough and conducted well, with clear figures supporting the conclusions.

      (4) The conformational changes observed in Sld3 and Cdc45 upon binding are interesting and enhance our understanding of the interaction.

      (5) The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is a new and valuable addition to the field.

      Weaknesses

      (1) The proposed model for Sld3 interacting with GINS on the CMG needs more experimental validation and conflicts with published findings. These discrepancies need more detailed discussion and exploration.

      (2) The section on the binding of Sld3 complexes to origin single-stranded DNA needs significant improvement. The comparisons between Sld3-CBD, Sld3CBD-Cdc45, and Sld7-Sld3CBD-Cdc45 involve complexes from different species, limiting the comparisons' value.

      (3) The authors' model proposing the release of Sld3 from CMG based on its binding to single-stranded DNA is unclear and needs more elaboration.

      We appreciate your positive comments. As suggested, we will try to improve the experiments and manuscript and discuss in more detail, including the interaction between Sld3 and GINS on the CMG, ssDNA-binding section, and the explanations of why we use different species for comparison and more elaboration on the Sld3-release proposal.

      Reviewer #3 (Public review):

      Summary:

      The paper by Li et al. describes the crystal structure of a complex of Sld3-Cdc45-binding domain (CBD) with Cdc45 and a model of the dimer of an Sld3-binding protein, Sld7, with two Sld3-CBD-Cdc45 for the tethering. In addition, the authors showed the genetic analysis of the amino acid substitution of residues of Sld3 in the interface with Cdc45 and biochemical analysis of the protein interaction between Sld3 and Cdc45 as well as DNA binding activity of Sld3 to the single-strand DNAs of the ARS sequence.

      Strengths:

      The authors provided a nice model of an intermediate step in the assembly of an active Cdc45-MCM-GINS (CMG) double hexamers at the replication origin, which is mediated by the Sld3-Sld7 complex. The dimer of the Sld3-Sld7 complexes tethers two MCM hexamers together for the recruitment of GINS-Pol epsilon on the replication origin.

      Weaknesses:

      The biochemical analysis should be carefully evaluated with more quantitative ways to strengthen the authors' conclusion.

      We thank your positive assessment. We will provide more quantitative information and try to quantify the experiments as suggested.

    1. Author response:

      Reviewer 1:

      (1) I think the article is a little too immature in its current form. I'd recommend that the authors work on their writing. For example, the objectives of the article are not completely clear to me after reading the manuscript, composed of parts where the authors seem to focus on SGCs, and others where they study "engram" neurons without differentiating the neuronal type (Figure 5). The next version of the manuscript should clearly establish the objectives and sub-aims.

      Our overarching focus was to identify whether intrinsic physiology and circuit connectivity of SGCs contribute to their unique overrepresentation in neurons labeled as part of a behaviorally relevant dentate engram. Since our systematic analysis of “engram SGCs” did not support the proposal that engram SGCs drive robust feedforward excitation of engram GCs or feedback inhibition of non-engram GCs, we examined an alternative hypothesis that inputs drive recruitment of neurons, regardless of subtype (in figure 5). These are sparsely labeled neurons, with mixed populations of GCs and SGCs undergoing paired recordings. Since the focus of the experiment was input correlation between two simultaneously recorded neurons, we did not report the individual cell types. We regret that this caused confusion and will clarify this issue in the revised manuscript.

      (2) In addition, some results are not entirely novel (e.g., the disproportionate recruitment as well as the distinctive physiological properties of SGCs), and/or based on correlations that do not fully support the conclusions of the article. In addition to re-writing, I believe that the article would benefit from being enriched with further analyses or even additional experiments before being resubmitted in a more definitive form.

      We would like to note that while we and others have previously reported the distinctive SGC physiology, this study is the first to compare physiological properties of SGCs labeled as part of an engram to unlabeled SGCs. That was the thrust of the data presented which may have been missed and will be emphasized in the revision. Similarly, while others have shown higher SGC recruitment in dentate engrams, we had to validate this in the dentate dependent behaviors that we adopted in this study. We also note that the proportional SGC recruitment in our study, based on morphometric classification, differs from what was reported previously. These aspects of study, which were considered confirmatory, represent the necessary validation needed to proceed with the novel cell-type specific paired recordings and optogenetic analyses of engram neurons presented in subsequent sections of the manuscript. We will emphasize these considerations in the revised manuscript.

      Reviewer 2:

      (1) The authors conclude that SGCs are disproportionately recruited into cfos assemblies during the enriched environment and Barnes maze task given that their classifier identifies about 30% of labelled cells as SGCs in both cases and that another study using a different method (Save et al., 2019) identified less than 5% of an unbiased sample of granule cells as SGCs. To make matters worse, the classifier deployed here was itself established on a biased sample of GCs patched in the molecular layer and granule cell layer, respectively, at even numbers (Gupta et al., 2020). The first thing the authors would need to show to make the claim that SGCs are disproportionately recruited into memory ensembles is that the fraction of GCs identified as SGCs with their own classifier is significantly lower than 30% using their own method on a random sample of GCs (e.g. through sparse viral labelling). As the authors correctly state in their discussion, morphological samples from patch-clamp studies are problematic for this purpose because of inherent technical issues (i.e. easier access to scattered GCs in the molecular layer).

      We regret that there seems to be some confusion about use of a classifier. We did NOT use any automated classifier in this study. All cell type classifications in the study were conducted by experienced investigators examining cell morphology and classifying cells based on established morphometric criteria. In our prior study (Gupta et al., 2020) we had conducted an automated cluster analysis that was able to classify GCs and SGCs as different cell types. The principal components underlying the automated clustering in Gupta et al 2020 were consistent with the major criteria identified in prior morphology-based analyses by us and others (including Williams et al 2010 and Save et al., 2019). To date, in the absence of a validated molecular marker, morphometry from recorded and filled cells or sparsely labeled neurons is the only established method to classify SGCs. This was the approach we adopted, and this will be further clarified in the revisions.

      (2) The authors claim that recurrent excitation from SGCs onto GCs or other SGCs is irrelevant because they did not find any connections in 32 simultaneous recordings (plus 63 in the next experiment). Without a demonstration that other connections from SGCs (e.g. onto mossy cells or interneurons) are preserved in their preparation and if so at what rates, it is unclear whether this experiment is indicative of the underlying biology or the quality of the preparation. The argument that spontaneous EPSCs are observed is not very convincing as these could equally well arise from severed axons (in fact we would expect that the vast majority of inputs are not from local excitatory cells). The argument on line 418 that SGCs have compact axons isn't particularly convincing either given that the morphologies from which they were derived were also obtained in slice preparations and would be subject to the same likelihood of severing the axon. Finally, even in paired slice recordings from CA3 pyramidal cells the experimentally detected connectivity rates are only around 1% (Guzman et al., 2016). The authors would need to record from a lot more than 32 pairs (and show convincing positive controls regarding other connections) to make the claim that connectivity is too low to be relevant.

      As noted in our discussion, we are fully cognizant that potential SGC to GC connections may have been missed by the nature of slice physiology experiments and made every effort to limit this possibility. As noted in the manuscript, we only analyzed GC/SGC pairs where hilar axon collaterals of the neurons were recovered. We do not claim that SGC to GC/SGC connections are irrelevant, rather, we indicate that these connections, if present, are sparse and unlikely to drive engram refinement. Interestingly, wide field optical stimulation, designed to activate multiple labeled engram neurons and axon terminals including those of SGCs whose somata were outside the slice, did not lead to EPSCs in other unlabeled GCs or SGCs suggesting the lack of robust SGC to GC/SGC synaptic connectivity. While we have previously published paired recordings from interneurons to GCs (Proddutur  et al 2023) , we agree that recordings demonstrating the presence of SGC/GC to hilar neuron synapses would serve as an added control in the revised manuscript.

      (3) Another troubling sign is the fact that optogenetic GC stimulation rarely ever evokes feedback inhibition onto other cells which contrasts with both other in vitro (e.g. Braganza et al., 2020) and in vivo studies (Stefanelli et al., 2016) studies. Without a convincing demonstration that monosynaptic connections between SGCs/GCs and interneurons in both directions is preserved at least at the rates previously described in other slice studies (e.g. Geiger et al., 1997, Neuron, Hainmueller et al., 2014, PNAS, Savanthrapadian et al., 2014, J. Neurosci), the notion that this setting could be closer to naturalistic memory processing than the in vivo experiments in Stefanelli et al. (e.g. lines 443-444) strikes me as odd. In any case, the discussion should clearly state that compromised connectivity in the slice preparation is likely a significant confound when comparing these results.

      We would like to note that our data are consistent with Braganza 2020 study, as we explain below. Moreover, we would like to point out that the demonstration of “feedback inhibition” in the Stefanelli study was NOT in engram or behaviorally labeled neurons nor was it in vivo. As we explain below, the physiological assay in Stefanelli was in slices and in a cohort of GCs with virally driven ChR2 expression. Thus, we are fully confident that our experimental paradigm better reflects a behavioral engram. As noted in response (2, we have previously published paired monosynaptic connections from interneurons to GCs (Proddutur  et al 2023) and find the connectivity consistent with published data. However, we agree that recordings demonstrating the presence of SGC/GC to hilar neuron synapses  or recruitment of feedback inhibition by focal activation of GCs would serve to allay concerns regarding slice preparation. We also submit that we already discuss the potential concerns regarding compromised connectivity in slice preparations.

      Regarding the lack of optically evoked feedback inhibition, we would like to point out that the Braganza 2020 study examined focal optogenetic activation of GCs, where a high density of GCs was labeled using a Prox-cre line. They reported that about 2-4% of these densely labeled cells need to be recruited to evoke feedback IPSCs. Our experimental condition, where ChR2 was expressed in behaviorally labeled neurons, leads to sparse labeling much less than the focal 4% needed to evoke IPSCs in the Braganza study. We do not claim that feedback inhibition cannot be activated by focal activation of a cohort of GCs and even show an example of paired recording with feedback GC inhibition of an SGC. Our conclusion is that the few sparsely labeled neurons during a behavioral episode do not support robust feedback inhibition proposed to mediate engram refinement. We submit that our findings are fully consistent with the sparse GC driven feedback inhibition, and the need to activate a cohort of focal GCs to recruit feedback inhibition, reported in Braganza 2020

      Regarding the Stefanelli study, we maintain that our behaviorally relevant in vivo labeling approach is more naturalistic than the DREADD and Channelrhodopsin driven artificial “engrams” generated in the Stefanelli study. Of note, we used cFOS driven TRAP mice to label, in vivo, neurons active during a behavior and then undertook slice physiology studies in these mice a week later. In contrast, the slice physiology data demonstrating putative feedback inhibition in the Stefanelli study (Fig 5) used wildtype mice injected with AAV CAMKII-cre and AAV-DIO-ChR2. Thus, unlike our study, the physiological data demonstrating feedback inhibition in the Stefanelli study was not performed in a behaviorally labeled engram. Apart from the one set of histological experiments using AAV-SARE-GFP to demonstrate increased GFP labeling of SST neurons in behavior, all other data presented in the Stefanelli study are generated based on artificially generated engrams where optogenetic activation or silencing on granule cells was used to manipulate the numbers of neurons active during a task followed by histological analysis of cFOS staining or behaviors. Thus, the physiological experiments in the Stefanelli et al (2016) generated by wide field activation of a large cohort of GCs labeled by focal virally driven ChR2 expression, were similar to wide field optical stimulation studies in the Braganza 2020 study, and were NOT conducted in a behavioral engram. The strength of our study is in the use of a behaviorally tagged engram neurons for analysis and our findings in sparsely labeled neurons are consistent with the reports in Braganza 2020. We will further clarify in our discussion that the data presented in the Stefanelli study do NOT represent a natural behavior generated engram.

      (4) Probably the most convincing finding in this study is the higher zero-time lag correlation of spontaneous EPSCs in labelled vs. unlabeled pairs. Unfortunately, the fact that the authors use spontaneous EPSCs to begin with, which likely represent a mixture of spontaneous release from severed axons, minis, and coordinated discharge from intact axon segments or entire neurons, makes it very hard to determine the meaning and relevance of this finding. At the bare minimum, the authors need to show if and how strongly differences in baseline spontaneous EPSC rates between different cells and slices are contributing to this phenomenon. I would encourage the authors to use low-intensity extracellular stimulation at multiple foci to determine whether labelled pairs really share higher numbers of input from common presynaptic axons or cells compared to unlabeled pairs as they claim. I would also suggest the authors use conventional Cross correlograms (CCG; see e.g. English et al., 2017, Neuron; Senzai and Buzsaki, 2017, Neuron) instead of their somewhat convoluted interval-selective correlation analysis to illustrate co-dependencies between the event time series. The references above also illustrate a more robust approach to determining whether peaks in the CCGs exceed chance levels.

      We appreciate the comment can provide additional data on the EPSC frequency in individual labeled and unlabeled cells in the revised manuscript. As indicated in the manuscript, we constrained our analysis to cell pairs with comparable EPSC frequency in order to avoid additional confounds in analysis. We have additional experiments to show that over 50% of the sEPSCs represent action potential driven events which we will include in the revised manuscript. We thank the reviewer for the suggestion to explores alternative methods of analyses including CCGs to further strengthen our findings.

      (5) Finally, one of the biggest caveats of the study is that the ensemble is labelled a full week before the slice experiment and thereby represents a latent state of a memory rather than encoding consolidation, or recall processes. The authors acknowledge that in the discussion but they should also be mindful of this when discussing other (especially in vivo) studies and comparing their results to these. For instance, Pignatelli et al 2018 show drastic changes in GC engram activity and features driven by behavioral memory recall, so the results of the current study may be very different if slices were cut immediately after memory acquisition (if that was possible with a different labelling strategy), or if animals were re-exposed to the enriched environment right before sacrificing the animal.

      As noted by the reviewer, we fully acknowledge and are cognizant of the concern that slices prepared a week after labeling may not reflect ongoing encoding. Although our data show that labeled cells are reactivated in higher proportion during recall, we have discussed this caveat and will include alternative experimental strategies in the discussion.

      Reviewer 3:

      (1) Engram cells are (i) activated by a learning experience, (ii) physically or chemically modified by the learning experience, and (iii) reactivated by subsequent presentation of the stimuli present at the learning experience (or some portion thereof), resulting in memory retrieval. The authors show that exposure to Barnes Maze and the enriched environment-activated semilunar granule cells and granule cells preferentially in the superior blade of the dentate gyrus, and a significant fraction were reactivated on re-exposure. However, physical or chemical modification by experience was not tested. Experience modifies engram cells, and a common modification is the Hebbian, i.e., potentiation of excitatory synapses. The authors recorded EPSCs from labeled and unlabeled GCs and SGCs. Was there a difference in the amplitude or frequency of EPSCs recorded from labeled and unlabeled cells?

      We agree that we did not examine the physical or chemical modifications by experience. Although we constrained our sEPSC analysis to cell pairs with comparable sEPSC frequency, we will include data on sEPSC parameters in labeled and unlabeled cells in the revised manuscript.

      (2) The authors studied five sequential sections, each 250 μm apart across the septotemporal axis, which were immunostained for c-Fos and analyzed for quantification. Is this an adequate sample? Also, it would help to report the dorso-ventral gradient since more engram cells are in the dorsal hippocampus. Slices shown in the figures appear to be from the dorsal hippocampus.

      We thank the reviewer for the comment. We analyzed sections along the dorso-ventral gradient. As explained in the methods, there is considerable animal to animal variability in the number of labeled cells which was why we had to use matched littermate pairs in our experiments This variability could render it difficult to tease apart dorsoventral differences.

      (3) The authors investigated the role of surround inhibition in establishing memory engram SGCs and GCs. Surprisingly, they found no evidence of lateral inhibition in the slice preparation. Interneurons, e.g., PV interneurons, have large axonal arbors that may be cut during slicing. Similarly, the authors point out that some excitatory connections may be lost in slices. This is a limitation of slice electrophysiology.

      We agree that slice physiology has limitations and discuss this caveat. As noted in response (2, we have previously published paired monosynaptic connections from interneurons to GCs (Proddutur  et al 2023) and find the connectivity consistent with published data. However, we agree that recordings demonstrating the presence of SGC/GC to hilar neuron synapses  or recruitment of feedback inhibition by focal activation of GCs would serve to allay concerns regarding slice preparation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The study by Chikermane and colleagues investigates the functional, structural, and dopaminergic network substrates of cortical beta oscillations (13-30 Hz). The major strength of the work lies in the methodology taken by the authors, namely a multimodal lesion network mapping. First, using invasive electrophysiological recordings from healthy cortical territories of epileptic patients they identify regions with the highest beta power. Next, they leverage open-access MRI data and PET atlases and use the identified high-beta regions as seeds to find (1) the whole-brain functional and structural maps of regions that form the putative underlying network of high-beta regions and (2) the spatial distribution of dopaminergic receptors that show correlation with nodal connectivity of the identified networks. These steps are achieved by generating aggregate functional, structural, and dopaminergic network maps using lead-DBS toolbox, and by contrasting the results with those obtained from high-alpha regions.

      The main findings are:

      (1) Beta power is strongest across frontal, cingulate, and insular regions in invasive electrophysiological data, and these regions map onto a shared functional and structural network. (2) The shared functional and structural networks show significant positive correlations with dopamine receptors across the cortex and basal ganglia (which is not the case for alpha, where correlations are found with GABA).

      Nevertheless, a few clarifications regarding the choice of high-power electrodes and distributions of functional connectivity maps (i.e., strength and sign across cortex and sub-cortex) can help with understanding the results.

      We thank the reviewer for this critical expert assessment. 

      Reviewer #1 (Recommendations For The Authors):

      To potentially enhance the quality of the manuscript in the current version, I kindly ask the authors to address the following points:

      Major:

      (A) Power analysis of electrophysiological data

      (1) How were significant peaks identified exactly? I understand that the authors used FOOOF methodology to estimate periodic components of brain activity.

      Thank you for pointing us to this lack of clarity. The application of FOOOF consists of the fitting of a one-over-f curve that delineates the aperiodic component followed by the definition of gaussians to fit periodic activity. This allows for extraction of periodic peak power estimates that are corrected for offset and exponent of the one-over-f or non-oscillatory aperiodic component in the spectrum (further information can be found here https://fooof-tools.github.io/fooof/auto_tutorials/plot_02-FOOOF.html). We included all peaks that could be fitted using the process.

      How about aperiodic components (Figure 1, PSD plots)? 

      We share the interest in aperiodic activity with the reviewer. However, given that the primary aim of this study was the description of beta oscillations and the methodology and results presentation is already very complex, we did not include the analysis of aperiodic activity in this manuscript. This could be done in the future and it would surely be interesting to visualize the whole brain connectomic fingerprints of aperiodic exponent and offset. With regard to the purely anatomical description of nonoscillatory aperiodic activity we would like to refer to Figure 8 in Frauscher et al. Brain 2018 (https://doi.org/10.1093/brain/awy035) where this is described. We have decided not to include additional information on this matter, because a) we felt that this would further convolute the results and discussion without directly addressing any of the hypotheses and aims that we set out to tackle and b) the interpretation of aperiodic activity is still a matter of intense research with conflicting results, which warrants very careful considerations of many aspects that again would go beyond the scope of this paper. 

      In addition, to what degree would the results change if one identified the peaks relative to sites with no peak, similar to Frauscher et al. 

      Beta activity, the oscillation of interest in our analysis is ubiquitous in the brain. In fact, of 1772 channels, only 21 channels did not exhibit a beta peak detectable with FOOOF. Thus, a comparison of 1751 against 21 would not yield meaningful results. We have therefore decided to focus on the channels in which beta activity is the strongest and dominant observable oscillation. 

      If the FOOOF approach has some advantages, these should be pointed out or discussed.

      FOOOF indeed has the advantage that it provides an objective and reproducible estimation of peak oscillatory activity that accounts for differences in aperiodic activity. To the best of our knowledge, there is no other approach that is nearly as well documented, validated and computationally reproducible. 

      Changes in manuscript: We have now further clarified the definition of peak amplitudes in the results and methods section and have discussed the use of alternative measures in the limitations section of our manuscript.

      Results: “The frequency band with the highest peak amplitude was identified using the extracted peak parameter (pw) for each channel and depicted as the dominant rhythm for the respective localisation (Figure 1).”

      Methods: “Peak height was extracted using the pw parameter, which depicts peak amplitude after subtraction of any aperiodic activity.”

      Discussion: “Alternative approaches could yield different results, e.g. reusing channels for each peak that is observable and contrasting them to channels where such peak was not present. However, in our study the majority of channels exhibited beta activity, even if peaks were of low amplitude, which we believe would have led to less interpretable results.”

      (2) How exactly do the authors deal with channels with more than one peak? Some elaboration on this and how this could potentially impact the results would be appreciated. Sorry if I have missed it.

      Indeed, a description of this was lacking so we are very thankful that the reviewer pointed this out. The maximum peak amplitude method was a winner-takes-all approach where in the case of multiple peaks, the peak with the higher amplitude was chosen. This method of course has drawbacks in the form of lost or disregarded peaks and remains a limitation to this study. 

      Changes in manuscript: We have now clarified this in the methods and results sections, which now read: 

      Methods: “In case of multiple peaks within the same region, we used only the highest peak amplitude.”

      Results: “In case of multiple peaks within the same frequency band, we focused the analysis on the peak with the highest amplitude.”

      And added the following to the Limitations section of the discussion: 

      “Another limitation in our study is the fact that the statistical approach for the comparison of beta and alpha networks and even for multiple peaks within the same frequency band follows a winner takes all logic that is, by definition, a simplification, as most areas will contribute to more than one spatiospectrally distinct oscillatory network. Specifically, while multiple peaks within or across frequency bands could be present in each channel, we decided to allocate this channel to only the frequency band containing the highest peak amplitude.” 

      (B) Network mapping

      (1) Knowing that fMRI data are preprocessed by regressing the global signal, there are negative correlations across the functional networks. Unfortunately, the distribution, sign, and strength of the correlations are not quantitatively shown in any of the plots. Thus, it is unclear whether, e.g., corticocortical vs. subcortico-cortical correlations differ in strength and/or sign. I think this additional information is important for better understanding the up/down-regulation of beta, e.g., by DA signaling. Some discussion around this point in addition would be insightful, I think.

      The referee is touching upon a very important and difficult point, which we have considered very carefully. Global signal regression is a controversial topic and the neurophysiological basis of negative correlations remains to be elucidated. We can justify our use of this approach based on an expert consensus described in Murphy & Fox 2017 (https://doi.org/10.1016%2Fj.neuroimage.2016.11.052), which highlights that global signal regression can improve the specificity of positive correlations, improve the correspondence to anatomical connectivity. The truth however is that, we relied on it, because it is the more commonly used and validated approach used in lesion network and DBS connectivity mapping and implemented in the Lead Mapper pipeline. Indeed all connectivity estimates are shown in Supplementary figure 3. We remain hesitant to raise the focus to these points, because of the uncertain underlying neural correlates. However, when looking at the values, it is interesting to note that most key regions of interest exhibit positive connectivity values. 

      Changes in manuscript: We now point to the supplement containing all connectivity values in the results section more prominently: “All connectivity values including their sign are shown in figures as brain region averages parcellated with the automatic anatomical labelling atlas in supplementary figures 2&3.”

      (2) I assume no thresholding is applied to the functional connectivity maps (in a graph-theoretical sense). Please clarify (this is also related to the comment above, in particular, the strength of correlations.

      Indeed, we demonstrate SPM maps using family wise error corrected stats in figure 2, but all further analyses were performed on unthresholded maps as correctly pointed out by the referee. 

      Changes in manuscript: 

      Results: “Specifically, we analysed to what degree the spatial uptake patterns of dopamine, as measurable with fluorodopa (FDOPA; cohort average of 12 healthy subjects) and other dopamine signalling related tracers that bind D1/D2 receptors (average of N=17/44 respectively healthy subjects) or the dopamine transporter (DAT; cohort average of N=180 healthy subjects) were correlated with the unthresholded MRI connectivity maps.”

      Methods: “This parcellation was applied to both PET and unthresholded structural and functional connectivity maps using SPM and custom code.”

      Minor

      (1) Methods, Connectivity analysis: The description of (mass-univariate) GLM analysis is confusing. The maps underwent preprocessing? Which preprocessing steps are meant here? What is the dependent variable and what are the predictors exactly?

      We thank the reviewer for catching this error in our methods. We apologise for the confusion and mistake and thank the reviewer for catching it. Indeed, we have used t-tests without further preprocessing instead of a GLM. 

      Changes in manuscript: The respective section has been removed from the methods section and intermediate steps have been clarified. The section now reads: “To investigate differences between beta dominant and alpha dominant functional connectivity networks, a two sample t-test was calculated for the condition where beta was greater than alpha and vice versa using SPM. Here, the connectivity maps from each dominant channel (1005 beta functional connectivity maps and 397 alpha connectivity maps) Estimation of model parameters yielded t-values for each voxel, indicating the strength and direction of differences between the two contrasts (beta > alpha, alpha > beta). To address the issue of multiple comparisons, we applied Family-Wise Error (FWE) correction, adjusting significance thresholds such that only voxels with p < 0.05 would be included.”

      (2) I encourage the authors to find a better (visual) way of reporting Table 1, to make the main observations easier to grasp and compare (maybe a two-dimensional bar plot? Or color-coding the cells?)

      Reply: Thank you for your suggestion to improve the table, the new table is adjusted to the recommended changes to make it more readable.

      Reviewer #2 (Public Review):

      Summary:

      This is a very interesting paper that leveraged several publicly available datasets: invasive cortical recording in epilepsy patients, functional and structural connectomic data, and PET data related to dopaminergic and gaba-ergic synapses. These were combined to create a unified hypothesis of beta band oscillatory activity in the human brain. They show that beta frequency activity is ubiquitous, not just in sensorimotor areas, and cortical regions where beta predominated had high connectivity to regions high in dopamine re-uptake.

      Strengths:

      The authors leverage and integrate three publicly available human brain datasets in a creative way. While these public datasets are powerful tools for human neuroscience, it is innovative to combine these three types of data into a common brain space to generate novel findings and hypotheses. Findings are nicely controlled by separately examining cortical regions where alpha predominates (which have a different connectivity pattern). GABA uptake from PET studies is used as a control for the specificity of the relationship between beta activity and dopamine uptake. There is much interest in synchronized oscillatory activity as a mechanism of brain function and dysfunction, but the field is short on unifying hypotheses of why particular rhythms predominate in particular regions. This paper contributes nicely to that gap. It is ambitious in generating hypotheses, particularly that modulation of beta activity may be used as a "proxy" for modulating phasic dopamine release.

      Weaknesses:

      As the authors point out, the use of normative data is excellent for exploring hypotheses but does not address or explore individual variations which could lead to other insights. It is also biased to resting state activity; maps of task-related activity (if they were available) might show different findings.

      The figures, results, introduction, and methods are admirably clear and succinct but the discussion could be both shorter and more convincing.

      Reviewer #2 (Recommendations For The Authors):

      The tone of the discussion is excessively lofty and abstract, and hard to follow in places. Specific examples in comments to authors below.

      We thank the reviewer for their positive assessment and their constructive feedback on the discussion. Also in light of the other reviewers we have made a sincere effort to shorten, restructure and improve the discussion. Additionally, we have addressed all the specific comments the reviewer had below. We appended each change to the manuscript where appropriate below and have addressed all comments in the main text. Having that said, we see this paper and discussion to provide our most up-to-date and personal perspective on a correct concept on the interplay of beta oscillations and dopamine that is generalizable. Providing a concept that is so generalizable is very challenging and so far very few authors have even attempted this. One notable exception is the “status quo” concept by Fries & Engel. While we will do our very best to address the comments, we have decided not to deviate from our initial ambition to provide a discussion on a generalizable concept. Naturally such a concept must be very complex and therefore it will be hard to understand in parts. Through the revision, we hope that the readability and comprehensibility has improved, while it provides an in-depth perspective and hypothesis on how beta oscillations, dopamine and their brain circuits may facilitate brain function. Nevertheless, we want to express our honest gratitude for the thoroughness with which the reviewer has read and scrutinized our paper. The review clearly tells that the reviewer had the ambition to follow and understand what we were trying to convey, which can be rare nowadays. We are truly thankful for this.

      The first sentence is not quite true, as invasive neurophysiology was not, and cannot be, done in healthy humans. "The present study combined three openly available datasets of invasive neurophysiology, MRI connectomics, and molecular neuroimaging in healthy humans to characterise the spatial distribution of brain regions exhibiting resting beta activity, their shared circuit architecture, and its correlation with molecular markers of dopamine signaling in the human brain."

      Changes in manuscript: We have now removed the “healthy” from the respective sentence.

      "Our results motivate to conceptualise the capacity to generate.... This is not clear.

      Changes in manuscript: “Our results suggest that one common denominator of brain regions that generate beta activity, is their affiliation with beta oscillations as a feature that arises from a largescale global brain network that is modulated by dopamine.”

      "Similarly, the robust beta modulation that is elicited by voluntary action in sensorimotor cortex and its correlation with motor symptoms of Parkinson's disease is long known" - the association between movement-related cortical beta desynchronization and Parkinson's motor signs is not well described - could the authors specify and reference this?

      We thank the reviewer for pointing out this lack of clarity. We meant that independently beta is known for “movement” and for “movement disorders” and not “movement in movement disorders”. Having that said, there are some studies that suggest that beta ERD is altered in PD (e.g.https://doi.org/10.1093/cercor/bht121), but saying that this is “long known” would be an overstatement and was not our intention. We rephrased this sentence accordingly.

      Changes in manuscript: The sentence now reads: “Moreover, the robust beta modulation that is elicited by voluntary action in sensorimotor cortex and its correlation with motor symptoms of Parkinson’s disease is long known.”

      "...first fast-cyclic voltammetry experiments that allowed for combined measurement of dopamine release with invasive neurophysiology have provided first evidence that beta band oscillations in healthy non-human primates can differentially link dopamine release, beta oscillations and reward and motor control, depending on the contextual information and striatal domain" - This is not very clear - not sure what "differentially link" signifies.

      I think the fact that this is not easy to understand signifies the complexity that we and the authors of the cited paper from Ann Graybiel’s lab aimed to communicate. In fact, we stayed very close to the phrasing used in their paper to try and avoid confusion (Title: Dopamine and beta-band oscillations differentially link to striatal value and motor control” - https://doi.org/10.1126/sciadv.abb9226). The specific results go beyond the scope of the discussion but are very interesting, so I would be happy if our paper would inspire readers to look it up. 

      Changes in manuscript: We have now adapted the sentence to “In line with this more complex picture, direct measurement of dopamine concentration in non-human primates revealed specific interactions between dopamine release, beta oscillations, reward value and motor control, depending on contextual information and striatal domain. This shows that the relationship of dopamine and beta activity is not solely associated with either reward or movement and depends on where in the striatum beta activity is recorded.”

      "In fact, one could argue that it can be contextualised in a recently described framework of neural reinforcement, that serves to orchestrate the re-entrance and refinement of neural population dynamics for the production of neural trajectories" - this is not clear - for example what is a neural trajectory? What is meant by "re-entrance and refinement"?

      A neural trajectory refers to the path that the activity of a neural population takes through a high-dimensional space over time. It can be obtained through multivariate analysis of population activity with dimensionality reduction techniques, such as PCA. The concept of low-dimensional representations of high-dimensional neural activity has gained a lot of attention in computational neuroscience ever since high-channel count recordings of neural population activity have become available (an early and prominent example is Churchland et al., 2012 Nature https://doi.org/10.1038/nature11129 , while a more recent example is Safaie et al., Nature 2023 https://doi.org/10.1038/s41586-023-06714-0). The review we refer to by Rui Costa and colleagues (Athalye, V. R., Carmena, J. M. & Costa, R. M. Neural reinforcement: re-entering and refining neural dynamics leading to desirable outcomes. Curr Opin Neurobiol 60, 145–154 (2020) https://doi.org/10.1016/j.conb.2019.11.023) suggests that dopamine may serve to modulate the likelihood of a specific pattern to emerge and re-enter the cortex – basal ganglia loop, for the “reliable production of neural trajectories driving skillful behavior on-demand”. We believe that this concept could be revolutionary in our understanding of dopaminergic modulation and disoroders and together with colleague Alessia Cavallo have written an invited perspective on this topic (https://doi.org/10.1111/ejn.16222), which may help further clarify the topic. 

      Changes in manuscript: We realize that this aspect may sound a bit unclear or far away from the data in this manuscript. However, given that we have spent more than a decade thinking about beta oscillations and how they can be conceptualized, we would prefer not to entirely change our points and rather bet on the possibility that the concepts become more widely accepted and well-known. Nevertheless, we have now adapted the text to make this a bit more clear:

      “We hypothesise that, this “status quo” hypothesis could be equally or maybe even more adequately posed on the neural level. Namely, it could provide insights to what degree a certain activity pattern or synaptic connection is to be strengthened or weakened, in light of neural learning. We propose that this putative function can be contextualised in a recently described framework of neural reinforcement, that serves to orchestrate the re-entrance and refinement of neural population dynamics for the production of neural trajectories.”

      "....after which it was quickly translated to first experimental studies using cortical or subcortical beta signals in human patients44." - reference 44 only deals with the use of subcortical beta, not cortical, in adaptive control.

      The reviewer is right, in fact there is no study using motor cortex beta for adaptive DBS yet, but different studies have used different markers (especially gamma) since then. 

      Changes in manuscript: We have rephrased and added citations accordingly: “This approach, also termed adaptive DBS, was first demonstrated based on cortical beta activity that was used to adapt pallidal DBS in the MPTP non-human primate model of PD43. It was quickly translated to first experimental studies using subcortical beta signals in human patients44, followed by further research using more complex cortical and subcortical sensing setups and biomarker combinations45,46.”

      The paragraph headed " Implications for neurotechnology" is quite long and should be condensed and focused. It doesn't seem to support the last sentence, "....targeted interventions that can increase and decrease beta activity, as recently shown through phase specific modulation45 could be utilised to mimic phasic dopamine release as a neuroprosthetic approach to alter neural reinforcement38." - I don't quite follow the logic. The authors have clearly shown that beta-related circuits tend to be those linked to dopamine modulation, and may subserve tasks for which reinforcement learning is an important mechanism. However the logic of how modulation of beta activity can "substitute" for modulation of dopamine isn't clear. That would seem to require that the mechanism by which dopamine produces reinforcement, is via an effect on beta oscillation properties (phase, amplitude, frequency). Is there evidence for this? If so it should be better spelled out.

      We realize that this is very speculative at this point. Indeed, we believe that subthalamic DBS can mimic dopaminergic control and in the future there may be new treatment avenues, e.g. using neurochemical using neurochemical interfaces for which beta could be informative to mimic dopamine release but ultimately explaining this would be very complex, so we have removed the sentence. With regard to the remaining text in the section, we considered shortening / condensing but felt that this paragraph is highly relevant for the ongoing development of neurotechnology and therefore decided to only remove the first and last sentences.

      Changes in manuscript: We have removed the first and last sentences.

      "While the abovementioned prospects are promising we should cautiously consider the limitations of our study." - an unnecessary sentence to start a "limitations" section, its clearly a paragraph about limitations. In general, authors should go thru discussion and reduce verbosity; it is not nearly as well edited as the rest of the paper.

      Agreed. 

      Changes in manuscript: We removed the sentence. 

      Reviewer #3 (Public Review):

      Summary:

      In this paper, Chikermane et al. leverages a large open dataset of intracranial recordings (sEEG or ECoG) to analyze resting state (eyes closed) oscillatory activity from a variety of human brain areas. The authors identify a dominant proportion of channels in which beta band activity (12-30Hz) is most prominent and subsequently seek to relate this to anatomical connectivity data by using the sEEG/ECoG electrodes as seeds in a large set of MRI data from the human connectome project. This reveals separate regions and white matter tracts for alpha (primarily occipital) and beta (prefrontal cortex and basal ganglia) oscillations. Finally, using a third available dataset of PET imaging, the authors relate the parcellated signals to dopamine signaling as estimated by spatial uptake patterns of dopamine, and reveal a significant correlation between the functional connectivity maps and the dopamine reuptake maps, suggesting a functional relationship between the two.

      Strengths:

      Overall, I found the paper well justified, focused on an important topic, and interesting. The authors' use of 3 different open datasets was creative and informative, and it significantly adds to our understanding of different oscillatory networks in the human brain, and their more elusive relation with neuromodulator signaling networks by adding to our knowledge of the association between beta oscillations and dopamine signaling. Even my main comments about the lack of a theta network analysis and discussion points are relatively minor, and I believe this paper is valuable and informative.

      Weaknesses:

      The analyses were adequate, and the authors cleverly leveraged these different datasets to build an interesting story. The main aspect I found missing (in addition to some discussion items, see below) was an examination of the theta network. Theta oscillations have been involved in a number of cognitive processes including spatial navigation and memory, and have been proposed to have different potential originating brain regions, and it would be informative to see how their anatomical networks (e.g. as in Figure 2) look like under the author's analyses.

      The authors devote a significant portion of the discussion to relating their findings to a popular hypothesis for the function of beta oscillations, the maintenance of the "status quo", mostly in the context of motor control. As the authors acknowledge, given the static nature of the data and lack of behavior, this interpretation remains largely speculative and I found it a bit too far-reaching given the data shown in the paper. In contrast, I missed a more detailed discussion on the growing literature indicating a role for beta in mood (e.g. in Kirkby et al. 2018), especially given the apparent lack of hippocampal and amygdala involvement in the paper, which was surprising.

      We thank the reviewer for their insightful review of our manuscript. One of the aims of our paper was to provide the ground for a circuit-based conceptualization of beta activity, which does not primarily relate to behavior. Practically we have the ambition to provide a generalizable concept that can be applied to all behavioral domains including mood. The reason we focus on the “status quo” hypothesis, is that it is one of the very few if not only generalizable concept of the function of beta oscillations. Through our paper and the discussion, we have to redirect this concept towards a less cognitive/behavioral and more anatomical network based domain, while acknowledging principles that may overlap. We realize that this is very ambitious and this endeavour is necessarily very complex and not easy to communicate. In light of the reviewers comments, we have made an effort to improve the discussion as best we could without trailing too far away from what our initial aim was. We are thankful for the suggested reference, which we have now added to the discussion in the section where we have previously discussed beta as biomarker for mood, also noting the absence of beta dominant channels in amygdala and hippocampus. Here it should be clarified however, that a) only three channels were located in the amygdala of which one exhibited beta activity, we should be cautious to not overinterpret this result and b) most channels exhibited beta and just because beta wasn’t dominant, it doesn’t mean that beta is not present or important in these brain areas. Absence of evidence is not evidence for absence with the way we approached the analysis. We are thankful for the interesting reference, which we have now included our discussion. Notably the study used a complex network analysis, which we could not perform because we did not have parallel recordings from these areas in multiple patients. This is now noted in the limitations. 

      Changes in manuscript: “For example, it was shown that beta is implicated in working memory28, utilisation of salient sensory cues29, language processing30, motivation31, sleep32, emotion recognition33, mood34 and may even serve as a biomarker for depressive symptom severity in the anterior cingulate cortex35” and “One impactful study reported that beta oscillatory sub-networks of Amygdala and hippocampus could reflect human variations in mood 34. This is interesting, but highlights another relevant limitation of our study, namely that recordings in different areas were stemming from different patients and thus, such sub-network analyses on the oscillatory level could not be conducted.” 

      Major comment:

      • Although the proportion of electrodes with theta-dominant oscillations was lower (~15%) than alpha (~22%) or beta (~57%), it would be very valuable to also see the same analyses the authors carried out in these frequency bands extended to theta oscillations.

      We agree with the reviewer and appreciate the interest in other frequency bands; theta, alpha and gamma. Our primary interest was to provide a network concept of beta activity, but anticipated that interest would go beyond that frequency band. However, we also had to limit ourselves to what is communicable and comprehensible. The key aim for us was to provide a data-driven circuit description of beta activity that can lay ground for a generalizable concept of where beta oscillations emerge. Reproducing all analyses for every frequency band would clutter both the results and the discussion. Moreover, the honest truth is that funding and individual career plans of the researchers currently do not allow to allocate time for a reanalysis of all data which would be a significant effort. Therefore, we have decided to just add the topography of theta and gamma channels as a supplement. In case the reviewer is interested on a collaboration on extending this project to other frequency bands and circuits, we would like to invite them to get in touch and perhaps this could be a new collaborative project. Until then, we have extended our limitation that this would be important work for the future. 

      Changes in manuscript: 

      We have added and cited the new supplementary figure for the results from theta in the results section, which now reads: 

      “Further information on the topography of theta channels are shown in supplementary figure 1.”

      We would like to add that a sensible interpretation of results from gamma dominant channels is unlikely to be possible given the low count of channels with prominent resting activity in this frequency band. We have added the following text to the limitations section: “The aim of this study was to elucidate the circuit architecture of beta oscillations, which is why insights from this study for other frequency bands are limited. Future research investigating the specific circuits of theta, alpha and gamma oscillations and their relationship with neurotransmitter uptake could yield new important insights on the networks underlying human brain rhythms.“ 

      Reviewer #3 (Recommendations For The Authors):

      Minor comments:

      • Results: "we performed non-parametric Spearman's correlations between the structural and functional connectivity maps of beta networks with neurotransmitter uptake". This is a significantly complex analysis that requires more detail for the reader to evaluate. There is more detail in the Figure 3 legend but still insufficient. The Methods offer more detail, but I found the description of the parcellation to be vague and I would appreciate a more detailed description.

      We thank the reviewer for bringing the insufficient explanation of the methods used to calculate the correlations in analysis to our attention. We have now made an effort to provide more level of detail in the relevant paragraphs. 

      Changes in manuscript: We have now made changes to both the Results and Methods sections and added the following explanations respectively:

      Results: “Next, we resliced the beta network map and the PET images to allow for a meaningful comparison, using a combined parcellation with 476 brain regions that include cortex19, basal ganglia20, and cerebellum21. Here, each parcel – which was a collection of voxels belonging to a particular brain region – from the connectivity map was correlated with the same parcel containing average neurotransmitter uptake from the respective PET scan (see Figure 3A). In this way nonparametric Spearman’s correlations between PET intensity and structural and functional connectivity maps of beta networks were obtained, which indicate to what degree the spatial distribution of connectivity is similar to the distribution of neurotransmitter uptake.“

      Methods: “A custom master parcellation in MNI space was created in Matlab using SPM functions by combining three existing parcellations to include cortical regions19, structures of the basal ganglia20 and cerebellar regions21. Regions that were (partially) overlapping between the atlases were only selected once. The final compound parcellation had 476 regions in total. This parcellation was applied to both PET and structural and functional connectivity maps using SPM and custom code. This allowed for the calculation of spatial correlations, providing a statistical measure of spatial similarity of the PET intensity and MRI connectivity distributions. For this, Spearman’s ranked correlations were used to calculate correlations between the PET images, such as the dopamine aggregate map and both functional and structural beta connectivity networks (Figure 3). The analysis was repeated for individual tracers showing similar results Supplementary figure 2. Finally, to validate these results, a control analysis was performed using a GABA PET scan from the same open dataset of neurotransmitter uptake following the same pipeline (Figure 2A, 2B).”

      • All of the recordings were taken in an eyes-closed condition. This is likely to affect the power of alpha oscillations; the authors should comment on this.

      We agree with the reviewer that this will likely have influenced the results. However, given that the key result of our paper is the abundance and circuit topography of beta oscillations, it is unlikely that increased alpha in some channels will have led to false positive results for beta. If anything, it may have increased the contrast leading to a more conservative estimate of which channels truly show strong beta dominance. On the other hand, we should acknowledge that this limitation can affect the interpretation of the alpha result. Another reason for us to primarily focus on beta in the discussion and results presentation. 

      Changes in manuscript: We now comment on this in the results:

      “It should be noted that that alpha recordings were performed in eyes closed which is known to increase alpha power, which may influence the generalizability of the alpha maps to an eyes open condition. However, given that our primary use of alpha was to act as a control, we believe that this should not affect the interpretability of the key findings of our study.” 

      • Although the relative proportion of theta and gamma channels is lower, it would be interesting to see the distribution of channels in a SOM figure.

      As described above, we have now added supplementary figure 1 that accommodates the topography but not the network analyses.

      • Figure legend - typo - "Neither, alpha nor beta" - no comma needed.

      Now fixed, thank you for pointing is to this lapse!

      • Results: " ere, we aimed to investigate the whole brain circuit representation of beta activity, which is impossible with current neurophysiology approaches" not entirely accurate; suggest rephrasing it to "Here, we aimed to investigate the whole brain circuit representation of beta activity, which is impossible with non-invasive neurophysiology approaches "

      Thank you for suggesting the alternative formulation. 

      Changes in manuscript: The text has been modified as per the suggestion and now reads “Here, we aimed to investigate the whole brain circuit representation of beta activity, which is impossible with non-invasive neurophysiology approaches”.

      • Results - typo - "cortical brain areas, that exhibit resting beta activity share a common brain network" - no comma needed.

      Thank you for the suggestion, the comma has been removed to better the flow of the sentence structure as suggested.

    1. Author response:

      eLife Assessment

      This useful study presents the first detailed and comprehensive description of brain sulcus anatomy of a range of carnivoran species based on a robust manual labeling model allowing species comparisons. Although the database is recognized and the method for reconstructing cortical surfaces is convincing, the evidence supporting the conclusions is incomplete due to the lack of appropriate quantitative measurements and analyses. Considering additional specimens to assess intraspecies variations, as well as exploring the functional correlates of interspecies differences would increase the scope of the study. Setting an instructive foundation for comparative anatomy, this study will be of interest to neuroscientists and neuroimaging researchers interested in that field, as well as in brain morphology and sulcal patterns, their phylogeny, and ontogeny in relation to functional development and behaviour. 

      We are pleased that our primary objective of creating a comprehensive framework to navigate carnivoran brains is considered as successfully achieved and that our work is expected to be of broad interest to various disciplines, as it provides the foundation for future investigations into carnivoran brain organization.

      As we will set out below, a description of the major sulci is an appropriate measure for large-scale comparative anatomy — it is stable enough in the population of each species to not require a large N, provides a suitable variability across species, and can be related to other aspects of between-species diversity. We will include a number of additional species to increase the scope of the study, as suggested. Although a quantitative assessment of functional correlates is, in principle, beyond the scope of this first foundational paper, we will provide a first start of this as well. We emphasize, however, that this was a secondary outcome, emerging after first application of the framework.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The paper by Boch and colleagues, entitled Comparative Neuroimaging of the Carnivore Brain: Neocortical Sulcal Anatomy, compares and describes the cortical sulci of eighteen carnivore species, and sets a benchmark for future work on comparative brains. 

      Based on previous observations, electrophysiological, histological and neuroimaging studies and their own observations, the authors establish a correspondence between the cortical sulci and gyri of these species. The different folding patterns of all brain regions are detailed, put into perspective in relation to their phylogeny as well as their potential involvement in cortical area expansion and behavioral differences. 

      Strengths: 

      This is a pioneering article, very useful for comparative brain studies and conducted with great seriousness and based on many past studies. The article is well-written and very didactic. The different protocols for brain collection, perfusion, and scanning are very detailed. The images are self-explanatory and of high quality. The authors explain their choice of nomenclature and labels for sulci and gyri on all species, with many arguments. The opening on ecology and social behavior in the discussion is of great interest and helps to put into perspective the differences in folding found at the level of the different cortexes. In addition, the authors do not forget to put their results into the context of the laws of allometry. They explain, for example, that although the largest brains were the most folded and had the deepest folds in their dataset, they did not necessarily have unique sulci, unlike some of the smaller, smoother brains. 

      Weaknesses: 

      The article is aware of its limitations, not being able to take into account inter-individual variability within each species, inter-hemispheric asymmetries, or differences between males and females. However, this does not detract from their aim, which is to lay the foundations for a correspondence between the brains of carnivores so that navigation within the brains of these species can be simplified for future studies. This article does not include comparisons of morphometric data such as sulci depth, sulci wall surface, or thickness of the cortical ribbon around the sulci. 

      We thank the reviewer for their overwhelmingly positive evaluation of our work. As noted by the reviewer, our primary aim was to establish a framework for navigating carnivoran brains to lay the foundation for future research. We are pleased that this objective is deemed as successfully achieved.

      As the reviewer points out, we do not quantify within-species intraindividual differences. This is a conscious choice; we aimed to emphasize breadth of species over individuals, as is standard in large-scale comparative anatomy (cf. Heuer et al., 2023, eLife; Suarez et al., 2022, eLife). Following the logic of phylogenetic relationships, the presence of a particular sulcus in related species is also a measure of reliability. We felt safe in this choice, as previous work in both primates and carnivorans has shown that differences across major sulci across individuals are a matter of degree rather than a case of presence or absence (Connolly, 1950, External morphology of the primate brain, C.C. Thomas; Hecht et al., 2019 J Neurosci; Kawamuro 1971 Acta Anat., Kawamuro & Naito, 1977, Acta Anat.). In our revised manuscript, we aim to include some additional individuals of selected species as supplementary material, further illustrating this point.

      We feel that measures such as sulci depth, sulci wall surface, or thickness of the cortical ribbon are measures that vary more across individuals and we have therefore not included them in the study. In addition, these are measures that are not generally used as between-species comparative measures, whereas sulcal patterning is (cf. Amiez et al., 2019, Nat Comms; Connolly, 1950; Miller et al., 2021, Brain Behav Evol; Radinsky 1975, J Mammal; Radinsky 1969, Ann N Y Acad Sci; Welker & Campos 1963 J. Comp Neurol).

      Reviewer #2 (Public review): 

      Summary: 

      The authors have completed MRI-based descriptions of the sulcal anatomy of 18 carnivoran species that vary greatly in behaviour and ecology. In this descriptive study, different sulcal patterns are identified in relation to phylogeny and, to some extent, behaviour. The authors argue that the reported differences across families reflect behaviour and electrophysiology, but these correlations are not supported by any analyses. 

      Strengths: 

      A major strength of this paper is using very similar imaging methods across all specimens. Often papers like this rely on highly variable methods so that consistency reduces some of the variability that can arise due to methodology. 

      The descriptive anatomy was accurate and precise. I could readily follow exactly where on the cortical surface the authors referring. This is not always the case for descriptive anatomy papers, so I appreciated the efforts the authors took to make the results understandable for a broader audience. 

      I also greatly appreciate the authors making the images open access through their website. 

      Weaknesses: 

      Although I enjoyed many aspects of this manuscript, it is lacking in any quantitative analyses that would provide more insights into what these variations in sulcal anatomy might mean. The authors do discuss inter-clade differences in relation to behaviour and older electrophysiology papers by Welker, Campos, Johnson, and others, but it would be more biologically relevant to try to calculate surface areas or volumes of cortical fields defined by some of these sulci. For example, something like the endocast surface area measurements used by Sakai and colleagues would allow the authors to test for differences among clades, in relation to brain/body size, or behaviour. Quantitative measurements would also aid significantly in supporting some of the potential correlations hinted at in the Discussion. 

      Although quantitative measurements would be helpful, there are also some significant concerns in relation to the specimens themselves. First, almost all of these are captive individuals. We know that environmental differences can alter neocortical development and humans and nonhuman animals and domestication affects neocortical volume and morphology. Whether captive breeding affects neocortical anatomy might not be known, but it can affect other brain regions and overall brain size and could affect sulcal patterns. Second, despite using similar imaging methods across specimens, fixation varied markedly across specimens. Fixation is unlikely to affect the ability to recognize deep sulci, but variations in shrinkage could nevertheless affect overall brain size and morphology, including the ability to recognize shallow sulci. Third, the sample size = 1 for every species examined. In humans and nonhuman animals, sulcal patterns can vary significantly among individuals. In domestic dogs, it can even vary greatly across breeds. It, therefore, remains unclear to what extent the pattern observed in one individual can be generalized for a species, let alone an entire genus or family. The lack of accounting for inter-individual variability makes it difficult to make any firm conclusions regarding the functional relevance of sulcal patterns. 

      We thank the reviewer for their assessment of our work. The primary aim of this study was to establish a framework for navigating carnivoran brains by providing a comprehensive overview of all major neocortical sulci across eighteen different species. Given the inconsistent nomenclature in the literature and the lack of standardized criteria (“recipes”) for identifying the major sulci, we specifically focused on homogenizing the terminology and creating recipes for their identification. Moreover, we also generated digital surfaces of all brains and will also add sulcal masks to further facilitate future research building on our framework. We are pleased to hear that we succeeded in our primary objective.

      We respectfully disagree with the reviewer on two accounts, where we believe the reviewer is not judging the scope of the current work.

      The first is with respect to individual differences. To the best of our knowledge, differences between captive and wild animals, or indeed between individuals, do not affect the presence or absence of any major sulci. No differences in sulcal patterns were detected between captive and (semi-)wild macaques (cf. Sallet et al., 2011, Science; Testard et al., 2022, Sci Adv), different dog breeds (Hecht et al., 2019 J Neurosci) or foxes selectively bred to simulate domestication, compared to controls (Hecht et al., 2021 J. Neurosci). Indeed, we do not find major differences between wolf-like canid species, suggesting that a difference between individuals of the same species is even more unlikely. Nevertheless, we agree with the reviewer that building up a database like ours will benefit from providing as much information about the samples as possible to enable these issues to be tested. We, therefore, will update our table to include if the animals were from captive or wild populations. Moreover, we aim, where possible, to include both wild and captive animals of the same species if they are available in our revision.

      The second is in the quantification of structure/function relationships. We believe the sulci atlases themselves are the main deliverables of this project. We felt it prudent to include some qualitative descriptions of the relationship between sulci as we observed them and behaviours as known from the literature as an illustration of the possibilities that this foundational work opens us. This approach also allowed us to confirm previous findings based on observations from a less diverse range of carnivoran species and families (Radinsky 1968 J Comp Neurol; Radinsky 1969, Ann N Y Acad Sci; Welker & Campos 1963 J Comp Neurol; Welker & Seidenstein, 1959 J Comp Neurol). However, a full statistical framework for analysis is beyond the scope of this paper. Our group has previously worked on methods to quantitatively compare brain organization across species — indeed, we have developed a full framework for doing so (Mars et al., 2021, Annu Rev Neurosci), based on the idea that brains that differ in size and morphology should be compared based on anatomical features in a common feature space. Previously, we have used white matter anatomy (Mars et al., 2018, eLife) and spatial transcriptomics (Beauchamp et al., 2021, eLife). The present work presents the foundation for this approach to be expanded to sulcal anatomy, but the full development of this approach will be the topic of future communications.

      Nevertheless, we aim to include a first step quantitative analysis of the relationship between the presence and absence of particular sulci and the two behaviours of interest in our manuscript.

      We also would like to emphasize that we strongly believe that looking at measures of brain organization at a more detailed level than brain size or relative brain size is informative. Indeed, studies looking at correlations between brain size and particular behavioural variables, although very prominent in the literature, have found it very difficult to distinguish between competing behavioural hypotheses (Healy, 2021, Adaptation and the brain, OUP). In contrast, connectivity has a much more direct relationship to behavioural differences across species (Bryant et al., 2024, bioRxiv), as does sulcal anatomy (Amiez et al., 2019, Nat Comms; Miller et al., 2021, Brain Behav Evol). Moreover, such measures are less sensitive to the effects of fixation since that will affect brain size but not the presence or absence of a sulcus.

      Following the reviewer’s recommendations, we will endeavour to include an even broader range of species in the revised version.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The authors address a fundamental question for cell and tissue biology using the skin epidermis as a paradigm and ask how stratifying self-renewing epithelia induce diCerentiation and upward migration in basal dividing progenitor cells to generate suprabasal barrier-forming cells that are essential for a functional barrier formed by such an epithelium. The authors show for the first time that an increase in intracellular actomyosin contractility, a hallmark of barrier-forming keratinocytes, is suCicient to trigger terminal diCerentiation. Hence the data provide in vivo evidence of the more general interdependency of cell mechanics and diCerentiation. The data appear to be of high quality and the evidences are strengthened through a combination of diCerent genetic mouse models, RNA sequencing, and immunofluorescence analysis. 

      To generate and maintain the multilayered, barrier-forming epidermis, keratinocytes of the basal stem cell layer diCerentiate and move suprabasally accompanied by stepwise changes not only in gene expression but also in cell morphology, mechanics, and cell position. Whether any of these changes is instructive for diCerentiation itself and whether consecutive changes in diCerentiation are required remains unclear. Also, there are few comprehensive data sets on the exact changes in gene expression between diCerent states of keratinocyte diCerentiation. In this study, through genetic fluorescence labeling of cell states at diCerent developmental time points the authors were able to analyze gene expression of basal stem cells and suprabasal diCerentiated cells at two diCerent stages of maturation: E14 (embryonic day 14) when the epidermis comprises mostly two functional compartments (basal stem cells and suprabasal so-called intermediate cells) and E16 when the epidermis comprise three (living) compartments where the spinous layer separates basal stem cells from the barrier-forming granular layer, as is the case in adult epidermis. Using RNA bulk sequencing, the authors developed useful new markers for suprabasal stages of diCerentiation like MafB and Cox1. The transcription factor MafB was then shown to inhibit suprabasal proliferation in a MafB transgenic model. 

      The data indicate that early in development at E14 the suprabasal intermediate cells resemble in terms of RNA expression, the barrier-forming granular layer at E16, suggesting that keratinocytes can undergo either stepwise (E16) or more direct (E14) terminal diCerentiation. 

      Previous studies by several groups found an increased actomyosin contractility in the barrier-forming granular layer and showed that this increase in tension is important for epidermal barrier formation and function. However, it was not clear whether contractility itself serves as an instructive signal for diCerentiation. To address this question, the authors use a previously published model to induce premature hypercontractility in the spinous layer by using spastin overexpression (K10-Spastin) to disrupt microtubules (MT) thereby indirectly inducing actomyosin contractility. A second model activates myosin contractility more directly through overexpression of a constitutively active RhoA GEF (K10Arhgef11CA). Both models induce late diCerentiation of suprabasal keratinocytes regardless of the suprabasal position in either spinous or granular layer indicating that increased contractility is key to induce late diCerentiation of granular cells. A potential weakness of the K10-spastin model is the disruption of MT as the primary eCect which secondarily causes hypercontractility. However, their previous publications provided some evidence that the eCect on diCerentiation is driven by the increase in contractility (Ning et al. cell stem cell 2021). Moreover, the data are confirmed by the second model directly activating myosin through RhoA. These previous publications already indicated a role for contractility in diCerentiation but were focused on early diCerentiation. The data in this manuscript focus on the regulation of late diCerentiation in barrier-forming cells. These important data help to unravel the interdependencies of cell position, mechanical state, and diCerentiation in the epidermis, suggesting that an increase in cellular contractility in most apical positions within the epidermis can induce terminal diCerentiation. Importantly the authors show that despite contractility-induced nuclear localization of the mechanoresponsive transcription factor YAP in the barrier-forming granular layer, YAP nuclear localization is not suCicient to drive premature diCerentiation when forced to the nucleus in the spinous layer. 

      Overall, this is a well-written manuscript and a comprehensive dataset. Only the RNA sequencing result should be presented more transparently providing the full lists of regulated genes instead of presenting just the GO analysis and selected target genes so that this analysis can serve as a useful repository. The authors themselves have profited from and used published datasets of gene expression of the granular cells. Moreover, some of the previous data should be better discussed though. The authors state that forced suprabasal contractility in their mouse models induces the expression of some genes of the epidermal diCerentiation complex (EDC). However, in their previous publication, the authors showed that major classical EDC genes are actually not regulated like filaggrin and loricrin (Muroyama and Lechler eLife 2017). This should be discussed better and necessitates including the full list of regulated genes to show what exactly is regulated. 

      We thank all the reviewers for their suggestions and comments.

      Thank you especially for the reminder to include gene lists. We had an excel document with all this data but neglected to upload it with the initial manuscript decision. This includes all the gene signatures for the diCerent cell compartments across development. We will also include a page that lists all EDC genes and whether they were up-regulated in intermediate cells and cells in which contractility was induced. Further, we note that all the RNA-Seq datasets are available for use on GEO. 

      In our previous publication, we indeed included images showing a lack of change in loricrin and filaggrin in the embryos where spastin was expressed in the diCerentiated epidermis. Consistent with this, there is no change in Lor mRNA levels by RNA-Seq, (it is one of the rare EDC genes that is unchanged). In contrast, Flg mRNA was up in the RNASeq, though we didn’t see a dramatic change in protein levels. We have not further pursued whether this reflects translational regulation. That said, our data clearly show that other genes associated with granular fate were increased in the contractile skin.  

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript from Prado-Mantilla and co-workers addresses mechanisms of embryonic epidermis development, focusing on the intermediate layer cells, a transient population of suprabasal cells that contributes to the expansion of the epidermis through proliferation. Using bulk-RNA they show that these cells are transcriptionally distinct from the suprabasal spinous cells and identify specific marker genes for these populations. They then use transgenesis to demonstrate that one of these selected spinous layer-specific markers, the transcription factor MafB is capable of suppressing proliferation in the intermediate layers, providing a potential explanation for the shift of suprabasal cells into a non-proliferative state during development. Further, lineage tracing experiments show that the intermediate cells become granular cells without a spinous layer intermediate. Finally, the authors show that the intermediate layer cells express higher levels of contractilityrelated genes than spinous layers and overexpression of cytoskeletal regulators accelerates the diCerentiation of spinous layer cells into granular cells. 

      Overall the manuscript presents a number of interesting observations on the developmental stage-specific identities of suprabasal cells and their diCerentiation trajectories and points to a potential role of contractility in promoting diCerentiation of suprabasal cells into granular cells. The precise mechanisms by which MafB suppresses proliferation, how the intermediate cells bypass the spinous layer stage to diCerentiate into granular cells, and how contractility feeds into these mechanisms remain open. Interestingly, while the mechanosensitive transcription factor YAP appears deferentially active in the two states, it is shown to be downstream rather than upstream of the observed diCerences in mechanics. 

      Strengths: 

      The authors use a nice combination of RNA sequencing, imaging, lineage tracing, and transgenesis to address the suprabasal to granular layer transition. The imaging is convincing and the biological eCects appear robust. The manuscript is clearly written and logical to follow. 

      Weaknesses: 

      While the data overall supports the authors' claims, there are a few minor weaknesses that pertain to the aspect of the role of contractility, The choice of spastin overexpression to modulate contractility is not ideal as spastin has multiple roles in regulating microtubule dynamics and membrane transport which could also be potential mechanisms explaining some of the phenotypes. Use of Arghap11 overexpression mitigates this eCect to some extent but overall it would have been more convincing to manipulate myosin activity directly. It would also be important to show that these manipulations increase the levels of F-actin and myosin II as shown for the intermediate layer. It would also be logical to address if further increasing contractility in the intermediate layer would enhance the diCerentiation of these cells. 

      We agree with the reviewer that the development of additional tools to precisely control myosin activity will be of great use to the field. That said, our series of publications has clearly demonstrated that ablating microtubules results in increased contractility and that this phenocopies the eCects of Arhgef11 induced contractility (Ning et al, Cell Stem Cell 2021). Further, we showed that these phenotypes were rescued by myosin inhibition with blebbistatin. Our prior publications also showed a clear increase in junctional acto-myosin through expression of either spastin or Arhgef11, as well as increased staining for the tension sensitive epitope of alpha-catenin (alpha-18) (also in Ning et al, 2021).  We are not aware of tools that allow direct manipulation of myosin activity that currently exist in mouse models.  

      The gene expression analyses are relatively superficial and rely heavily on GO term analyses which are of course informative but do not give the reader a good sense of what kind of genes and transcriptional programs are regulated. It would be useful to show volcano plots or heatmaps of actual gene expression changes as well as to perform additional analyses of for example gene set enrichment and/or transcription factor enrichment analyses to better describe the transcriptional programs 

      We will include an excel document that lists all the gene signatures. Additionally, all of our data are deposited in GEO for others to perform their own analyses.  

      Claims of changes in cell division/proliferation changes are made exclusively by quantifying EdU incorporation. It would be useful to more directly look at mitosis. At minimum Y-axis labels should be changed from "% Dividing cells" to % EdU+ cells to more accurately represent findings 

      We will change the axis label to precisely match our analysis.  

      Despite these minor weaknesses the manuscript is overall of high quality, sheds new light on the fundamental mechanisms of epidermal stratification during embryogenesis, and will likely be of interest to the skin research community. 

      Reviewer #3 (Public review): 

      Summary: 

      This is an interesting paper by Lechler and colleagues describing the transcriptomic signature and fate of intermediate cells (ICs), a transient and poorly defined embryonic cell type in the skin. ICs are the first suprabasal cells in the stratifying skin and unlike laterdeveloping suprabasal cells, ICs continue to divide. Using bulk RNA seq to compare ICs to spinous and granular transcriptomes, the authors find that IC-specific gene signatures include hallmarks of granular cells, such as genes involved in lipid metabolism and skin barrier function that are not expressed in spinous cells. ICs were assumed to diCerentiate into spinous cells, but lineage tracing convincingly shows ICs diCerentiate directly into granular cells without passing through a spinous intermediate. Rather, basal cells give rise to the first spinous cells. They further show that transcripts associated with contractility are also shared signatures of ICs and granular cells, and overexpression of two contractility inducers (Spastin and ArhGEF-CA) can induce granular and repress spinous gene expression. This contractility-induced granular gene expression does not appear to be mediated by the mechanosensitive transcription factor, Yap. The paper also identifies new markers that distinguish IC and spinous layers and shows the spinous signature gene, MafB, is suCicient to repress proliferation when prematurely expressed in ICs. 

      Strengths: 

      Overall this is a well-executed study, and the data are clearly presented and the findings convincing. It provides an important contribution to the skin field by characterizing the features and fate of ICs, a much-understudied cell type, at high levels of spatial and transcriptomic detail. The conclusions challenge the assumption that ICs are spinous precursors through compelling lineage tracing data. The demonstration that diCerentiation can be induced by cell contractility is an intriguing finding and adds a growing list of examples where cell mechanics influence gene expression and diCerentiation. 

      Weaknesses: 

      A weakness of the study is an over-reliance on overexpression and suCiciency experiments to test the contributions of MafB, Yap, and contractility in diCerentiation. The inclusion of loss-of-function approaches would enable one to determine if, for example, contractility is required for the transition of ICs to granular fate, and whether MafB is required for spinous fate. Second, whether the induction of contractility-associated genes is accompanied by measurable changes in the physical properties or mechanics of the IC and granular layers is not directly shown. The inclusion of physical measurements would bolster the conclusion that mechanics lies upstream of diCerentiation. 

      We agree that loss of function studies would be useful. For MafB, these have been performed in cultured human keratinocytes, where loss of MafB and its ortholog cMaf results in a phenotype consistent with loss of spinous diCerentiation (Lopes-Pajares, Dev Cell 2015). Due to the complex genetics involved, generating these double mutant mice is beyond the scope of this study. Loss of function studies of myosin are also complicated by genetic redundancy of the non-muscle type II myosin genes, as well as the role for these myosins in actin cross linking in addition to contractility. In addition, we have found that these myosins are quite stable in the embryonic intestine, with loss of protein delayed by several days from the induction of recombination. Therefore, elimination of myosins by embryonic day e14.5 with our current drivers is not likely possible. Thus, generation of inducible inhibitors of contractility is a valuable future goal. 

      A number of recent papers have used AFM of skin sections to probe tissue rigidity. We have not attempted these studies and are unclear about the spatial resolution and whether, in the very thin epidermis at these stages we could spatially resolve diCerences. That said, we previously assessed the macro-contractility of tissues in which myosin activity was induced and demonstrated that there was a significant increase in this over a tissue-wide scale (Ning et al, Cell Stem Cell, 2021).  

      Finally, whether the expression of granular-associated genes in ICs provides them with some sort of barrier function in the embryo is not addressed, so the role of ICs in epidermal development remains unclear. Although not essential to support the conclusions of this study, insights into the function of this transient cell layer would strengthen the overall impact.  

      By traditional dye penetration assays, there is no epidermal barrier at the time that intermediate cells exist. One interpretation of the data is that cells are beginning to express mRNAs (and in some cases, proteins) so that they are able to rapidly generate a barrier as they become granular cells. We have attempted experiments to ablate intermediate cells with DTA expression - this resulted in ineCicient and delayed cell death and thus did not yield strong conclusions. Our findings that transcriptional regulators of granular diCerentiation (such as Grhl3 and Hopx) are also present in intermediate cells, should allow future analysis of the eCects of their ablation on the earliest stages of granular diCerentiation from intermediate cells.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This paper aims to address the establishment and maintenance of neural circuitry in the case of a massive loss of neurons. The authors used genetic manipulations to ablate the principal projection neurons, the mitral/tufted cells, in the mouse olfactory bulb. Using diphtheria toxin (Tbx21-Cre:: loxP-DTA line) the authors ablated progressively large numbers of M/T cells postnatally. By injecting diphtheria toxin (DT) into the Tbx21-Cre:: loxP-iDTR line, the authors were able to control the timing of the ablation in the adult stage. Both methods led to the successful elimination of a majority of M/TCs by 4 months of age. The authors made a few interesting observations. First, they found that the initial pruning of the remaining M/T cell primary dendrite was unaffected. However, in adulthood, a significant portion of these cells extended primary dendrites to innervate multiple glomeruli. Moreover, the incoming olfactory sensory neuron (OSN) axons, as examined for those expressing the M72 receptor, showed a divergent innervation pattern as well. The authors conclude that M/T cell density is required to maintain the dendritic structures and the olfactory map. To address the functional consequences of eliminating a large portion of principal neurons, the authors conducted a series of behavioral assays. They found that learned odor discrimination was largely intact. On the other hand, mating and aggression were reduced. The authors concluded that learned behaviors are more resilient than innate ones.

      The study is technically sound, and the results are clear-cut. The most striking result is the contrast between the normal dendritic pruning during early development and the expanded dendritic innervation in adulthood. It is a novel discovery that can lead to further investigation of how the single-glomerulus dendritic innervation is maintained. The authors conducted a

      few experiments to address potential mechanisms, but it is inconclusive, as detailed below. It is also interesting to see that the massive neuronal loss did not severely impact learned odor discrimination. This result, together with previous studies showing nearly normal odor discrimination in the absence of large portions of the olfactory bulb or scrambled innervation patterns, attests to the redundancy and robustness of the sensory system. The discussion should take into account these other studies in a historical context.

      Main comments:

      (1) In previous studies, it has been concluded that dendritic pruning unfolds independently, regardless of the innervation pattern or activity of the OSNs. The new observation bolsters this conclusion by showing that a loss of neighboring M/T cells does not affect the developmental process. A more nuanced discussion comparing the results of these studies would strengthen the paper.

      We thank the reviewer for the suggestion. We now include an extended discussion citing relevant previous works in the manuscript (Lines 351-374).

      (2) The authors propose that a certain density of M/T is required to prevent the divergent innervation of primary dendrites, but the evidence is not sufficient to support this proposal. The experiment with low-dose DT injection to ablate a smaller portion of M/T cells did not change the percentage of cells innervating two or more glomeruli. The authors suggest that a threshold must be met, but this threshold is not determined.  

      In our experiments using high-dose DT, we hypothesized that there may be many empty glomeruli (glomeruli not innervated by M/T cells), and as a result, that some of the remaining M/T cells could branch their apical dendrite tuft into multiple empty glomeruli. To test this hypothesis, we carried out another experiment using a lower dose of DT. In this experiment, the fraction of remaining M/T cells was 25% (~10,000 M/T cells), which was higher than with the high DT dose (5%, or around 2,000 M/T cells) , but still significantly lower than wild type mice (~40,000 cells M/T cells). With around 2,000 glomeruli and 10,000 M/T per bulb, it could be expected that each glomerulus would be innervated by ~5 M/T cells (on average). However, we found that the percentage of M/T cells projecting to multiple glomeruli (around 40%) was similar when either 10,000 or 2,000 of M/T remained in the bulb. In addition, it is important to emphasize that even in wt animals with a full set of M/T cells, a small percentage of M/T cells still innervate more than one glomerulus (Lin et al., 2000). Together, these observations suggest that the innervation of multiple glomeruli by M/T cells is not simply due to the presence of empty glomeruli, and that our hypothesis was not correct.

      We have added a comment explaining this issue in the Results section (Lines 200-203).

      (3) The authors suggest that neural activity is not required for this plasticity. The evidence was derived primarily from naris occlusion and neuronal silencing using Kir2.1. While the results are consistent with the notion, it is a rather narrow interpretation of how neural activity affects circuit configuration. Perturbation of neural activity also entails an increase in firing. Inducing the activity of the neurons may alter this plasticity. Silencing per se may induce a homeostatic response that expands the neurite innervation pattern to increase synaptic input to compensate for the loss of activity. Thus, further silencing the cells may not reduce multiglomerular innervation, but an increased activity may.

      The experiments with Kir2.1 demonstrate that the structural plasticity observed after reducing the total number of M/T cells in an animal is not regulated by the firing action potentials in the remaining cells. Instead, this experiment indicates that the observed structural plasticity may be regulated by other types of mechanisms (including increased synaptic excitation as suggested by the reviewer) that do not require the firing of action potentials in M/T cells. 

      We now have included a comment regarding this point (Lines 243-247).  

      (4) There is a discrepancy between this study and the one by Fujimoto et al. (Developmental Cell; 2023), which shows that not only glutamatergic inputs to the primary dendrite can facilitate pruning of remaining dendrites but also Kir2.1 overexpression can significantly perturb dendritic pruning. This discrepancy is not discussed by the authors.

      We agree that it would be useful to contrast these two works.

      In our experiments, performed in adult animals, we blocked sensory input by performing naris occlusion before we induced ablation of M/T cells. In a separate experiment, also in adult animals, we expressed the Kir2.1 channel, to reduce the ability of neurons to fire action potentials. With both types of manipulations, we observed that the ablation of a large fraction of M/T cells still caused the remaining M/T cells to maintain a single apical dendrite that sprouts several new tufts towards multiple glomeruli. A recent paper (Fujimoto et al., 2023)) in which Kir2.1 was expressed in a large percentage of M/T starting during embryonic development showed that these “silent” M/T cells failed to prune their arbors to a single dendrite. In aggregate, these observations indicate that action potentials are necessary for the normal pruning that occurs during perinatal development (Fujimoto et al., 2023), but are not required for the expansion of dendritic trees caused by ablating a large fraction of M/T cells in adult animals (our current manuscript).

      We have now explained the differences between both studies in the manuscript (Lines 427-439).

      (5) An alternative interpretation of the discrepancy between the apparent normal pruning by p10 and expanded dendritic innervation in adulthood is that there are more cells before P10, when ~25% of M/T cells are present, but at a later date only 1-3% are present. 

      The relationship between the number of M/T cells and single glomerulus innervation has not been explored during postnatal development. It would be important to test this hypothesis.

      We agree with this comment, and in lines 375-381 we discuss the discrepancy between normal refinement during development, and dendritic sprouting in adults.

      Cre is expressed in M/T cells and it induces DTA expression starting around P0. The elimination of M/T cells starts at this time, and continues until by P10, when more than 75% of M/T have been eliminated. At P21 more than 90% of M/T have been eliminated, and their number remains stable thereafter.

      Pruning of the dendrites of M/T cells starts at P0 and it is mostly complete by P10. Therefore, it is possible that between P0 and P7, when dendrites are being pruned, the number of M/T cells remaining in the bulb is still over a threshold that does not interfere with the process of normal dendrite pruning. We agree that it would be very informative to perform additional experiments in the future where a large set of M/T cells could be ablated before pruning occurs (ideally before P0). 

      (6) The authors attribute the change in the olfactory map to the loss of M/T cells. Another obvious possibility is that the diffused projection is a response to the change in the olfactory bulb size. With less space to occupy, the axons may be forced to innervate neighboring glomeruli. It is not known how the total number of glomeruli is affected. This question could be addressed by tracking developmental changes in bulb volume and glomerular numbers.

      Certainly, this is a possibility, and we have now included a comment on this regard in the manuscript (Lines 473-480). 

      We believe that there are three likely scenarios that could account for these observations:

      (a) After ablating M/T cells, the tufts of the remaining M/T cells sprout into multiple glomeruli, and this causes the axons of OSNs to project into multiple glomeruli.

      (b) Ablating M/T cells may cause changes in other OB cells that make synapses in the glomeruli (ETCs, PGCs, sAC, etc…), and the misrouting of OSN axons that we observed in our experiments may be a secondary effect caused by the elimination of M/T cells.

      (c) After ablating the majority of M/T cells, the olfactory bulb gets reduced in size, and the axons of OSNs find it difficult to precisely converge on a target that now has become smaller. As a result, the axons of OSNs fail to converge on single glomeruli.

      (7) The retained ability to discriminate odors upon reinforced training is not surprising in light of a number of earlier studies. For example, Slotnick and colleagues have shown that rats losing ~90% of the OB can retain odor discrimination. Weiss et al have shown that humans without an olfactory bulb can perform normal olfactory tasks. Gronowitz et al have used theoretical prediction and experimental results to demonstrate that perturbing the olfactory map does not have a major impact on olfactory discrimination. Fleischmann et al have shown that mice with a monoclonal nose can discriminate odors. The authors should discuss their results in these contexts.

      We apologize for this important oversight - we now include a more elaborate discussion including the relevant references as suggested in the manuscript (Line 483-496).

      (8) It should be noted that odor discrimination resulting from reinforcement training does not mean normal olfactory function. It is a highly artificial situation as the animals are overtrained. It should not be used as a measure of the robustness of the olfactory sense. Natural odor discrimination (without training), detection threshold, and innate appetitive/aversive response to certain odors may be affected. These experiments were not conducted.

      We agree that the standard tests commonly used to measure olfactory function require substantial training, and thus, are quite artificial. However, these tests are used because they allow a more precise quantification of olfactory function than those relying on natural behaviors.  

      We have now included a few sentences to address this point in the results (Lines 321322) and discussion sections (Lines 541-543).

      (9) The social behaviors were conducted using relatively coarse measures (vaginal plug and display of aggression). Moreover, these behaviors are most likely affected by the disruption of the AOB mitral cells and have little to do with the dendritic pruning process described in the paper. It is misleading to lump social behaviors with innate responses to odors.

      This point follows the same logic as the previous one. The olfactory tests that rely on natural behaviors are quite coarse and difficult to quantify. In contrast, the olfactory tests using apparatuses such as olfactometers can be quantified with precision, but they are artificial. We agree that some of the naturalistic behaviors that we studied such as mating or aggression may depend to a large extent on the AOB (although it is possible that the MOB may also be involved in these tasks to a degree). In our initial version of the manuscript, we commented on the anticipated relative involvement of the MOB and AOB in the studied tasks, but we have now added some additional sentences to make this point clearer. In addition, we now add a comment indicating that it is possible that the abnormal behaviors could simply be due to a reduction in the number of AOB M/T cells (~98.5% and ~ 85% elimination of M/T cells in the AOB in Tbx::DTA and Tbx::iDTR mice, respectively), regardless of the abnormal dendritic pruning of main OB M/T cells (Lines 530-534).

      See Figure 5E - M/T cells in AOB (Lines 1238-1239). 

      Reviewer #2 (Public Review):

      The authors make the interesting observation that the developmental refinement of apical M/T cell dendrites into individual glomeruli proceeds normally even when the majority of neighboring M/T cells are ablated. At later stages, the remaining neurons develop additional dendrites that invade multiple glomeruli ectopically, and similarly, OSN inputs to glomeruli lose projection specificity as well. The authors conclude that the normal density of M/T neurons is not required for developmental refinement, but rather for maintaining specific connectivity in adults.

      The observations are indeed quite striking; however, the authors' conclusions are not entirely supported by the data.

      (1) It is unclear whether the expression of diphtheria toxin that eventually leads to the ablation of the large majority of M/T neurons compromises the cell biology of the remaining ones.

      DT is an extremely potent toxin that kills cells by inhibiting proteins translation, and it has been demonstrated that the presence of a single DT molecule in a cell is sufficient to kill it, because of its highly efficient catalytic activity. Accordingly, previous experiments have shown that DT kills cells within a few hours after its appearance in the cytoplasm (Yamaizumi et al., 1978). In other words, all the published evidence suggests that if a cell is exposed to the action of DT, that cell will die shortly. There is no evidence that cells exposed to DT can survive and experience long-term effects. Finally, previous works have not observed any long-term changes in neurons directly caused by the actions of DT (Johnson et al., 2017).

      (2) The authors interpret the growth of ectopic dendrites later in life as a lack of maintenance of dendrite structure; however, maybe the observed changes reflect actually adaptations that optimize wiring for extremely low numbers of M/T neurons. The finding that olfactory behavior was less affected than predicted supports this interpretation.

      We do not know the cellular or molecular mechanisms that explain why reducing the density of M/T cells is followed by the growth of ectopic dendrites from the remaining M/T cells. We agree that the functional outcome of growing ectopic dendrites may result in an optimization of wiring in the bulb and could explain why olfactory function is relatively preserved. We now include a comment regarding this possibility (Lines 513-525).   

      (3) The number of remaining M/T neurons is much higher at P10 than later. Can the relatively large number of remaining neurons (or their better health status) be the reason that dendrites refine normally at the early developmental stages rather than a (currently unknown) developmental capacity that preserves refinement?

      We thank the reviewer for the suggestion, which was also raised by reviewer 1. 

      We agree with this comment, and in lines 375-381 we discuss the discrepancy between normal refinement during development, and dendritic sprouting in adults.

      Cre is expressed in M/T cells and it induces DTA expression starting around P0. The elimination of M/T cells starts at this time, and continues until by P10, when more than 75% of M/T have been eliminated. At P21 more than 90% of M/T have been eliminated, and their number remains stable thereafter.

      Pruning of the dendrites of M/T cells starts at P0 and it is mostly complete by P10. Therefore, it is possible that between P0 and P7, when dendrites are being pruned, the number of M/T cells remaining in the bulb is still over a threshold that does not interfere with the process of normal dendrite pruning. We agree that it would be very informative to perform additional experiments in the future where a large set of M/T cells could be ablated before pruning occurs (ideally before P0). 

      (4) While the effect of reduced M/T neuron density on both M/T dendrites and OSN axons is described well, the relationship between both needs to be characterized better: Is one effect preceding the other or do they occur simultaneously? Can one be the consequence of the other?

      Previous works have demonstrated that disrupting the topographic projection of the OSN axons has no effect on the structure of the apical dendrite of M/T cells (Ma et al., 2014; Nishizumi et al., 2019). Our experiments ablating a large fraction of M/T cells suggest that they are necessary for the correct targeting of OSN axons into the bulb. However, our experiments do not allow us to tell apart these 2 scenarios: 

      (a) the ablation of a large fraction of M/T cells directly causes the sprouting of the apical dendrite of M/T cells, and that this sprouting in turn causes the abnormal projection of OSN axons onto the bulb. 

      (b) the ablation of a large fraction of M/T cells first causes the axons of OSN to project abnormally onto multiple glomeruli in the bulb, and this in turn causes the dendrite of remaining M/T cells to sprout onto multiple glomeruli. 

      We now include a comment on the manuscript explaining this point. (Lines 473-492)

      (5) Page 7: the observation that not all neurons develop additional dendrites is not a sign of differences between cell types, it may be purely stochastic.

      This is correct, and we mention these 2 scenarios in the discussion (Line 407-408). 

      (6) Page 8: the fact that activity blockade did not affect the formation of ectopic dendrites does not suggest that the process is not activity-dependent: both manipulations have the same effect and may just mask each other.

      The experiments with Kir2.1 demonstrate that the structural plasticity observed after reducing the total number of M/T cells in an animal is not regulated by the firing action potentials in the remaining cells. Instead, this experiment indicates that the observed structural plasticity may be regulated by other types of mechanisms (including increased synaptic excitation as suggested by the reviewer) that do not require the firing of action potentials in M/T cells. 

      We now have included a comment regarding this point (Lines 243-247).  

      (7) It remains unclear how the observed structural changes can explain the behavioral effects.

      We agree that the relationship between structural changes and behavior was not appropriately explained in our manuscript. Our manipulations cause two major changes in the olfactory system, one primary, and several secondary. The primary change is a large reduction in the number of M/T cells both in the MOB and AOB. This reduction in M/T cell number triggers significant secondary changes in the connectivity of the bulb, including an abnormal projection of OSNs onto the OB, and the growth of ectopic dendrites from the remaining M/T cells into multiple glomeruli.

      The behavioral abnormalities displayed by these mice is ultimately caused by the reduction in the number of M/T cells, but it is likely that the secondary structural changes could regulate some of the behavioral phenomena that we observed. For example, in principle, it is possible that the ectopic dendrites innervating several glomeruli could help the bulb to perceive smells with a much reduced number of M/T cells. On the other hand, this promiscuous growth of dendrites into multiple glomeruli could make it more difficult for the animals to discriminate between smells. The same argument could be made about the fact that OSN axons project onto multiple glomeruli: we simply do not know if this change helps or makes it more difficult for the animal to detect smells.  

      We now include a comment regarding this issue (Lines 513-525).   

      Reviewer #1 (Recommendations For The Authors):

      Additional experiments and a more thorough discussion of the results, as suggested in the public review, would significantly strengthen the paper. Below are some specific parts that need to be addressed.

      There is a lack of information on how M/T cell numbers are quantified. Without the information, it is difficult to evaluate the claim. Using the tdTomato signal may miss cells that are not labeled due to the transgenic effect. 

      Although we cannot conclude that we are identifying the complete set of M/T cells (because the transgenic lines may fail to label some M/T cells), the number of M/T cells that we observed is similar to that previously reported (Richard et al., 2010). This concern has been included in the Results section (Lines 121-124).

      A more detailed description about M/T cells quantification has been added into the method section (Lines 627-632).

      There is a lack of information on the timeline of treatment and how measurement of the olfactory bulb volume is conducted.

      We now include a more detailed description of how the volume of the OB was measured in the methods (Lines 621-623).

      The volume measurement is inconsistent with the pictures shown. In Figure 1, supplemental data 2 panels B and C, it appears that the bulbs in DTA and DTR mice are about half in length in each dimension. This would translate into ~1/8 of the volume of the control mice.

      We measured the volume of the bulbs based on the Neurolucida reconstructions, and we observed that in both DTA and iDTR mice the volumes of their bulbs are roughly 50% compared to a wild type mouse. In Figure 1 - figure supplement 2 the sections that were shown for wild type, DTA and iDTR mice were not taken at the same position in the bulb, and this gave the impression that the bulbs from DTA and iDTR were much smaller than they really are. We now show sections for these three animals at equivalent positions in the bulb. 

      Figure 1 E and F have no legend.

      We apologize for this mistake - we have now added the legend for Figures 1E and F (Lines 1009-1013).

      Figure 3, supplemental data 2, it is not clear what the readers should be looking at. The data is confusing even for experts in the field. The authors should describe the figures more clearly, pointing out what they are supposed to show.

      We apologize for this, and we have now added a more detailed description of Figure3 – figure supplement 2 (Lines 1153-1167).

      In several figures, it is not clearly written what the comparisons were for where there are indications of statistical significance above the bars.

      We have now included a more detailed description of the statistics comparison in the figure legends.

      AAV serotype should be specified.

      The AAV serotype used to label M/T cells was the AAV-PHP.eB. We have added this information in the methods section of the manuscript. 

      Reviewer #2 (Recommendations For The Authors):

      Minor points

      Page 5, para 2: "The decrease in neuronal plasticity with age": it is unclear what "the decrease" refers to.

      We have changed this sentence in the text to make it clear:

      “The decrease in structural plasticity of M/T cells after apical dendrite refinement (Mizrahi and Katz, 2003),….”

      Line 146-148

      Is there a quantification of the effect of Kir2.1 overexpression alone (example shown in Figure 3D)?

      We did an experiment in IDTR animals in which a fraction of M/T cells expressed Kir2.1, and we split these animals in 2 groups: (a) animals that received an injection of DT, and (b) animals that did not receive any DT. We quantified the effect of Kir2.1 on M/T cells from animals that received DT injection (with an ablation of around of 90% of M/T cells) and we did not observe any clear statistically significant differences between cells expressing Kir2.1 or neurons that did not express Kir2.1 from other iDTR animals that also received DT injections. We did not quantify the possible effects of kir2.1 in the group of animals that did not receive DT because on a first inspection we did not observe any clear differences between Kir2.1 cells and neighboring wild type cells. 

      References

      Fujimoto S, Leiwe MN, Aihara S, Sakaguchi R, Muroyama Y, Kobayakawa R, Kobayakawa K, Saito T, Imai T. 2023. Activity-dependent local protection and lateral inhibition control synaptic competition in developing mitral cells in mice. Dev Cell S1534-5807(23)00237-X. doi:10.1016/j.devcel.2023.05.004

      Johnson RE, Tien N-W, Shen N, Pearson JT, Soto F, Kerschensteiner D. 2017. Homeostatic plasticity shapes the visual system’s first synapse. Nat Commun 8:1220. doi:10.1038/s41467-017-01332-7

      Lin DM, Wang F, Lowe G, Gold GH, Axel R, Ngai J, Brunet L. 2000. Formation of precise connections in the olfactory bulb occurs in the absence of odorant-evoked neuronal activity. Neuron 26:69–80. doi:10.1016/s0896-6273(00)81139-3

      Ma L, Wu Y, Qiu Q, Scheerer H, Moran A, Yu CR. 2014. A developmental switch of axon targeting in the continuously regenerating mouse olfactory system. Science 344:194–197. doi:10.1126/science.1248805

      Nishizumi H, Miyashita A, Inoue N, Inokuchi K, Aoki M, Sakano H. 2019. Primary dendrites of mitral cells synapse unto neighboring glomeruli independent of their odorant receptor identity. Commun Biol 2:1–12. doi:10.1038/s42003-018-0252-y

      Richard MB, Taylor SR, Greer CA. 2010. Age-induced disruption of selective olfactory bulb synaptic circuits. Proc Natl Acad Sci U S A 107:15613–15618. doi:10.1073/pnas.1007931107

      Yamaizumi M, Mekada E, Uchida T, Okada Y. 1978. One molecule of diphtheria toxin fragment A introduced into a cell can kill the cell. Cell 15:245–250. doi:10.1016/0092-8674(78)90099-5

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In their manuscript, "Nicotine enhances the stemness and tumorigenicity in intestinal stem cells via Hippo-YAP/TAZ and Notch signal pathway", authors Isotani et al claimed that this study identifies a NIC-triggered pathway regulating the stemness and tumorigenicity of ISCs and suggest the use of DBZ as a potential therapeutic strategy for treating intestinal tumors. However, the presented data do not support the primary claims.

      Weaknesses:

      My main reservation is that the quality of the results presented in the manuscript may not fully substantiate their conclusions. For instance, in Figure 2 A and B, it is challenging to discern a healthy organoid. This is significant, as the entirety of Figure 2 and several panels in Figures 3 - 5 are based on these organoid assays. Additionally, there seems to be a discrepancy in the quality of results from the western blot, as the lanes of actin do not align with other proteins (Figure 6B).

      We directly count organoids under microscopy as described previously (Igarashi M et.al., Cell.2016 Igarashi M et.al., Aging Cell.2019). When we count the number of organoids, we exactly can discern which are alive or dead organoids under microscope. Hence, we will detail the method and show which are alive or dead organoids using arrows in our revised version (Figure2A and B).

      Moreover, as reviewer1 pointed out, the number of organoids originated from intestinal or colonic crypts can be affected by dead organoids as in Figure2A and 2B. However, almost all colonies from isolated intestinal stem cells (ISCs) (Figure 2C and D) are alive, so the number of colonies are less affected by dead colonies in those experiments using isolated ISCs. Since all organoid data in Figure 3-5 are based on the same method as that of Figure2C and D, the data quality of Figures 3-5 cannot be affected by dead colonies.

      Finally, to improve data quality of Figure6B, we repeated this experiments and replaced it by new figures.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Isotani et al characterizes the hyperproliferation of intestinal stem cells (ISCs) induced by nicotine treatment in vivo. Employing a range of small molecule inhibitors, the authors systematically investigated potential receptors and downstream pathways associated with nicotine-induced phenotypes through in vitro organoid experiments. Notably, the study specifically highlights a signaling cascade involving α7-nAChR/PKC/YAP/TAZ/Notch as a key driver of nicotine-induced stem cell hyperproliferation. Utilizing a Lgr5CreER Apcfl/fl mouse model, the authors extend their findings to propose a potential role of nicotine in stem cell tumorgenesis. The study posits that Notch signaling is essential during this process.

      Strengths and Weaknesses:

      One noteworthy research highlight in this study is the indication, as shown in Figure 2 and S2, that the trophic effect of nicotine on ISC expansion is independent of Paneth cells. In the Discussion section, the authors propose that this independence may be attributed to distinct expression patterns of nAChRs in different cell types. To further substantiate these findings, it is suggested that the authors perform tissue staining of various nAChRs in the small intestine and colon. This additional analysis would provide more conclusive evidence regarding how stem cells uniquely respond to nicotine. It is also recommended to present the staining of α7-nAChR from different intestinal regions. This will provide insights into the primary target sites of nicotine in the gut tract. Additionally, it is recommended that the authors consider rephrasing the conclusion in this section (lines 123-124). The current statement implies that nicotine does not affect Paneth cells, which may be inaccurate based on the suggestion in line 275 that nicotine might influence Paneth cells through α2β4-nAChR. Providing a more nuanced conclusion would better reflect the complexity of nicotine's potential impact on Paneth cells.

      It was difficult to obtain nAchRs antibodies usable in immunostaining. Hence, we instead performed qPCR of nAchRs in ISCs and Paneth cells from isolated whole small intestine (new Figure3C), although we cannot know the difference of the nAchRs expression in different intestinal regions by this method. Although the comparatively high expression was observed in α7-nAChR and α8nAChR in both ISCs and Paneth cells, the significant difference between ISCs and Paneth cells were not observed (Figure3C). 

      Interestingly, nicotine up-regulated only the expression of α7-nAChR in ISCs, suggesting the specifical response of α7-nAChR to nicotine (Figures 3C and D). We paraphrased the conclusion of the paragraph according to reviewer’s suggestion.

      As shown in the same result section, the effect of nicotine on ISC organoid formation appears to be independent of CHIR99021, a Wnt activator. Despite this, the authors suggest a potential involvement of Wnt/β-catenin activation downstream of nicotine in Figure 4F. In the Lgr5CreER Apcfl/fl mouse model, it is known that APC loss results in a constitutive stabilization of β-catenin, thus the hyperproliferation of ISCs by nicotine treatment in this mouse model is likely beyond Wnt activation. Therefore, it is recommended that the authors reconsider the inclusion of Wnt/β-catenin as a crucial signaling pathway downstream of nicotine, given the experimental evidence provided in this study.

      We appreciate for this important suggestion. Certainly, Wnt/β-catenin was activated in Nicotine treated ISCs. However, as reviewer points out, the hyperproliferation of ISCs by nicotine treatment is likely beyond Wnt activation.  According to the reviewer’s suggestion, we removed Wnt/β-catenin as a crucial signaling pathway downstream of nicotine (Figure 5G).

      In Figure 4, the authors investigate ISC organoid formation with a panPKC inhibitor, revealing that PKC inhibition blocks nicotine-induced ISC expansion. It's noteworthy that PKC inhibitors have historically been used successfully to isolate and maintain stem cells by promoting self-renewal. Therefore, it is surprising to observe no effect or reversal effect on ISCs in this context. A previous study demonstrated that the loss of PKCζ leads to increased ISC activity both in vivo and in vitro (DOI: 10.1016/j.celrep.2015.01.007). Additionally, to strengthen this aspect of the study, it would be beneficial for the authors to present more evidence, possibly using different PKC inhibitors, to reproduce the observed results with Gö 6983. This could help address potential concerns or discrepancies and contribute to a more comprehensive understanding of the role of PKC in nicotine-induced ISC expansion.

      Gö 6983 is a pan-PKC inhibitor against for PKCα, PKCβ, PKCγ, PKCδ and PKCζ with IC50 of 7 nM, 7 nM, 6 nM, 10 nM and 60 nM, respectively. Since we used Gö 6983 at the concentration of 10nM in our experiment, we consider PKCζ may not be possible target of nicotine. Additionally, we treated using 5nM Sotrastaurin, another pan-PKC inhibitor, which is supposed not to affect PKCζ. The observed result with Gö 6983 was reproduced by Sotrastaurin (Supplemental Figure 3E).

      An additional avenue that could enhance the clinical relevance of the study is the exploration of human datasets. Specifically, leveraging scRNA-seq datasets of the human intestinal epithelium (DOI: 10.1038/s41586-021-03852-1) could provide valuable insights. Analyzing the expression patterns of nAChRs across diverse regions and cell types in the human intestine may offer a potential clinical implication.

      We analyzed distribution pattern nAChRs of by scRNA-seq datasets of the human intestinal epithelium (DOI: 10.1038/s41586-021-03852-1). In consistent with mouse data (Figure3C), the expression of human α7-nAChR is higher than that of other nAChRs. The difference of the expression between ISCs and Paneth cells is not clear as in that of mouse (Supplemental Figure4A and B). From mouse and human data, we speculate the induction of specific nAChR by nicotine is essence of ISC response to nicotine, rather than the distribution of nAChRs.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript could benefit from addressing a few minor points to enhance its quality before publication:

      (1) Ensure all images are presented in higher resolution to improve visual clarity.

      We replaced all images by those with higher resolution.

      (2) Quantify Western blot results accurately for rigor and precision in data representation.

      We quantified all blots.

      (3) Include error bars in control groups where missing, particularly in Figures 3C and 4D, to enhance data interpretation.

      We included error bars in control groups in new Figure 3C and 4D.

      (4) The layout of Figure S3B, S4A and S4B should be corrected.

      We corrected the layout of those Figures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Petty and Bruno investigate how response characteristics in the higher-order thalamic nuclei POm (typically somatosensory) and LP (typically visual) change when a stimulus (whisker air puff or visual drifting grating) of one or the other modality is conditioned to a reward. Using a two-step training procedure, they developed an elegant paradigm, where the distractor stimulus is completely uninformative about the reward, which is reflected in the licking behavior of trained mice. While the animals seem to take on to the tactile stimulus more readily, they can also associate the reward with the visual stimulus, ignoring tactile stimuli. In trained mice, the authors recorded single-unit responses in both POm and LP while presenting the same stimuli. The authors first focused on POm recordings, finding that in animals with tactile conditioning POm units specifically responded to the air puff stimulus but not the visual grating. Unexpectedly, in visually conditioned animals, POm units also responded to the visual grating, suggesting that the responses are not modality-specific but more related to behavioral relevance. These effects seem not be homogeneously distributed across POm, whereas lateral units maintain tactile specificity and medial units respond more flexibly. The authors further ask if the unexpected cross-modal responses might result from behavioral activity signatures. By regressing behavior-coupled activity out of the responses, they show that late activity indeed can be related to whisking, licking, and pupil size measures. However, cross-modal short latency responses are not clearly related to animal behavior. Finally, LP neurons also seem to change their modality-specificity dependent on conditioning, whereas tactile responses are attenuated in LP if the animal is conditioned to visual stimuli.

      The authors make a compelling case that POm neurons are less modality-specific than typically assumed. The training paradigm, employed methods, and analyses are mostly to the point, well supporting the conclusions. The findings importantly widen our understanding of higher-order thalamus processing features with the flexibility to encode multiple modalities and behavioral relevance. The results raise many important questions on the brain-wide representation of conditioned stimuli. E.g. how specific are the responses to the conditioned stimuli? Are thalamic cross-modal neurons recruited for the specific conditioned stimulus or do their responses reflect a more global shift of attention from one modality to another? 

      To elaborate on higher-order thalamic activity in relationship to conditioned behavior, a trialby-trial analysis would be very useful. Is neuronal activity predictive of licking and at which relative timing? 

      To elaborate on the relationship between neuronal activity and licking, we have created a new supplementary figure (Figure S1), where we present the lick latency of each mouse on the day of recording. We also perform more in-depth analysis of neural activity that occurs before lick onset, which is presented in a new main figure (new Figure 4). 

      Furthermore, I wonder why the (in my mind) major and from the data obvious take-away, "POm neurons respond more strongly to visual stimuli if visually conditioned", is not directly tested in the summary statistics in Figure 3h.

      We have added a summary statistic to Figure 3h and to the Results section (lines 156-157) comparing the drifting grating responses in visually and tactilely conditioned mice.  

      The remaining early visual responses in POm in visually conditioned mice after removing behavior-linked activity are very convincing (Figure 5d). It would help, however, to see a representation of this on a single-neuron basis side-by-side. Are individual neurons just coupled to behavior while others are independent, or is behaviorally coupled activity a homogeneous effect on all neurons on top of sensory activity?

      In lieu of a new figure, we have performed a new analysis of individual neurons to classify them as “stimulus tuned” and/or “movement tuned.” We find that nearly all POm cells encode movement and arousal regardless of whether they also respond to stimuli. This is presented in the Results under the heading “POm correlates with arousal and movement regardless of conditioning” (Lines 219-231).

      The conclusions on flexible response characteristics in LP in general are less strongly supported than those in POm. First, the differentiation between POm and LP relies heavily on the histological alignment of labeled probe depth and recording channel, possibly allowing for wrong assignment. 

      We appreciate the importance in differentiating between POm, LP, and surrounding regions to accurately assign a putative cell to a brain region. The method we employed (aligning an electrode track to a common reference atlas) is widely used in rodent neuroscience, especially in regions like POm and LP which are difficult to differentiate molecularly (for example, see Sibille, Nature Communications, 2022; and Schröder, Neuron, 2020). 

      Furthermore, it seems surprising, but is not discussed, that putative LP neurons have such strong responses to the air puff stimuli, in both conditioning cases. In tactile conditioning, LP air puff responses seem to be even faster and stronger than POm. In visual conditioning, drifting grating responses paradoxically seem to be later than in tactile conditioning (Fig S2e). These differences in response changes between POm and LP should be discussed in more detail and statements of "similar phenomena" in POm and LP (abstract) should be qualified.  

      We have further developed our analysis and discussion of LP activity. Our analysis of LP stimulus response latencies are now presented in greater detail in Figure S3, and we have expanded the results section accordingly (lines 266-275). We have also expanded the discussion section to both address these new analyses and speculate on what might drive these surprising “tactile responses” in LP.

      Reviewer #2 (Public Review): 

      Summary  

      This manuscript by Petty and Bruno delves into the still poorly understood role of higherorder thalamic nuclei in the encoding of sensory information by examining the activity in the Pom and LP cells in mice performing an associative learning task. They developed an elegant paradigm in which they conditioned head-fixed mice to attend to a stimulus of one sensory modality (visual or tactile) and ignore a second stimulus of the other modality. They recorded simultaneously from POm and LP, using 64-channel electrode arrays, to reveal the contextdependency of the firing activity of cells in higher-order thalamic nuclei. They concluded that behavioral training reshapes activity in these secondary thalamic nuclei. I have no major concerns with the manuscript's conclusions, but some important methodological details are lacking and I feel the manuscript could be improved with the following revisions.

      Strengths 

      The authors developed an original and elegant paradigm in which they conditioned headfixed mice to attend to a stimulus of one sensory modality, either visual or tactile, and ignore a second stimulus of the other modality. As a tactile stimulus, they applied gentle air puffs on the distal part of the vibrissae, ensuring that the stimulus was innocuous and therefore none aversive which is crucial in their study. 

      It is commonly viewed that the first-order thalamus performs filtering and re-encoding of the sensory flow; in contrast, the computations taking place in high-order nuclei are poorly understood. They may contribute to cognitive functions. By integrating top-down control, high-order nuclei may participate in generating updated models of the environment based on sensory activity; how this can take place is a key question that Petty and Bruno addressed in the present study.

      Weaknesses  

      (1) Overall, methods, results, and discussion, involving sensory responses, especially for the Pom, are confusing. I have the feeling that throughout the manuscript, the authors are dealing with the sensory and non-sensory aspects of the modulation of the firing activity in the Pom and LP, without a clear definition of what they examined. Making subsections in the results, or a better naming of what is analyzed could convey the authors' message in a clearer way, e.g., baseline, stim-on, reward.  

      We thank Reviewer 2 for this suggestion. We have adjusted the language throughout the paper to more clearly state which portions of a given trial we analyzed. We now consistently refer to “baseline,” “stimulus onset,” and “stimulus offset” periods. 

      In line #502 in Methods, the authors defined "Sensory Responses. We examined each cell's putative sensory response by comparing its firing rate during a "stimulus period" to its baseline firing rate. We first excluded overlapping stimuli, defined as any stimulus occurring within 6 seconds of a stimulus of a different type. We then counted the number of spikes that occurred within 1 second prior to the onset of each stimulus (baseline period) and within one second of the stimulus onset (stimulus period). The period within +/-50ms of the stimulus was considered ambiguous and excluded from analysis." 

      Considering that the responses to whisker deflection, while weak and delayed, were shown to occur, when present, before 50 ms in the Pom (Diamond et al., 1992), it is not clear what the authors mean and consider as "Sensory Responses"? 

      We have addressed this important concern in three ways. First, we have reanalyzed our data to include the 50ms pre- and post-stimulus time windows that were previously excluded. This did not qualitatively change our results, but updated statistical measurements are reflected in the Results and the legends of figures 3 and 7. Second, we have created a new figure (new Figure 4) which provides a more detailed analysis of early POm stimulus responses at a finer time scale. Third, we have amended the language throughout the paper to refer to “stimulus responses” rather than “sensory responses” to reflect how we cannot disambiguate between bottom-up sensory input and top-down input into POm and LP with our experimental setup. We refer only to “putative sensory responses” when discussing lowlatency (<100ms) stimulus responses.

      Precise wording may help to clarify the message. For instance, line #134: "Of cells from tactilely conditioned mice, 175 (50.4%) significantly responded to the air puff, as defined by having a firing rate significantly different from baseline within one second from air puff onset (Figure 3d, bottom)", could be written "significantly responded to the air puff" should be written "significantly increased (or modified if some decreased) their firing rate within one second after the air puff onset (baseline: ...)". This will avoid any confusion with the sensory responses per se.

      We have made this specific change suggested by the reviewer (lines 145-146) and made similar adjustments to the language throughout the manuscript to better communicate our analysis methods. 

      (2) To extend the previous concern, the latency of the modulation of the firing rate of the Pom cells for each modality and each conditioning may be an issue. This latency, given in Figure S2, is rather long, i.e. particularly late latencies for the whisker system, which is completely in favor of non-sensory "responses" per se and the authors' hypothesis that sensory-, arousal-, and movement-evoked activity in Pom are shaped by associative learning. Latency is a key point in this study. 

      Therefore, 

      - latencies should be given in the main text, and Figure S2 could be considered for a main figure, at least panels c, d, and e, could be part of Figure 3. 

      - the Figure S2b points out rather short latency responses to the air puff, at least in some cells, in addition to late ones. The manuscript would highly benefit from an analysis of both early and late latency components of the "responses" to air puffs and drafting grating in both conditions. This analysis may definitely help to clarify the authors' message. Since the authors performed unit recordings, these data are accessible.

      - it would be highly instructive to examine the latency of the modulation of Pom cells firing rate in parallel with the onset of each behavior, i.e. modification of pupil radius, whisking amplitude, lick rate (Figures 1e, g and 3a, b). The Figure 1 does not provide the latency of the licks in conditioned mice.

      - the authors mention in the discussion low-latency responses, e.g., line #299: "In both tactilely and visually conditioned mice, movement could not explain the increased firing rate at air puff onset. These low-latency responses across conditioning groups is likely due in part to "true" sensory responses driven by S1 and SpVi."; line #306: "Like POm, LP displayed varied stimulus-evoked activity that was heavily dependent on conditioning. LP responded to the air puff robustly and with low latency, despite lacking direct somatosensory inputs."  But which low-latency responses do the authors refer to? Again, this points out that a robust analysis of these latencies is missing in the manuscript but would be helpful to conclude.

      We have moved our analysis of stimulus response latency in POm to new Figure 4 in the main text and have expanded both the Results and Discussion sections accordingly. We have also analyzed the lick latency on the day of recording, included in a new supplemental Figure S1. 

      (3) Anatomical locations of recordings in the dorsal part of the thalamus. Line #122 "Our recordings covered most of the volume of POm but were clustered primarily in the anterior and medial portions of LP (Figure 2d-f). Cells that were within 50 µm of a region border were excluded from analysis." 

      How did the authors distinguish the anterior boundary of the LP with the LD nucleus just more anterior to the LP, another higher-order nucleus, where whisker-responsive cells have been isolated (Bezdudnaya and Keller, 2008)? 

      Cells within 50µm of any region boundary were excluded, including those at the border of LP and LD. We also reviewed our histology images by eye and believe that our recordings were all made posterior of LD. 

      (4) The mention in the Methods about the approval by an ethics committee is missing.  All the surgery (line #381), i.e., for the implant, the craniotomy, as well as the perfusion, are performed under isoflurane. But isoflurane induces narcosis only and not proper anesthesia. The mention of the use of analgesia is missing. 

      We thank Reviewer 2 for drawing our attention to this oversight. All experiments were conducted under the approval of the Columbia University IACUC. Mice were treated with the global analgesics buprenorphine and carprofen, the local analgesic bupivacaine, and anesthetized with isoflurane during all surgical procedures. We have amended the Methods section to include this information (Lines 458-470).

      Reviewer #3 (Public Review): 

      Petty and Bruno ask whether activity in secondary thalamic nuclei depends on the behavioral relevance of stimulus modality. They recorded from POm and LP, but the weight of the paper is skewed toward POm. They use two cohorts of mice (N=11 and 12), recorded in both nuclei using multi-electrode arrays, while being trained to lick to either a tactile stimulus (air puff against whiskers, first cohort) or a visual stimulus (drifting grating, second cohort), and ignore the respective other. They find that both nuclei, while primarily responsive to their 'home' modality, are more responsive to the relevant modality (i.e. the modality predicting reward). 

      Strengths: 

      The paper asks an important question, it is timely and is very well executed. The behavioral method using a delayed lick index (excluding impulsive responses) is well worked out. Electrophysiology methods are state-of-the-art with information about spike quality in Figure S1. The main result is novel and important, convincingly conveying the point that encoding of secondary thalamic nuclei is flexible and clearly includes aspects of the behavioral relevance of a stimulus. The paper explores the mapping of responses within POm, pointing to a complex functional structure, something that has been reported/suggested in earlier studies. 

      Weaknesses: 

      Coding: It does not become clear to which aspect of the task POm/LP is responding. There is a motor-related response (whisking, licking, pupil), which, however, after regressing it out leaves a remaining response that the authors speculate could be sensory.

      Learning: The paper talks a lot about 'learning', although it is only indirectly addressed. The authors use two differently (over-)trained mice cohorts rather than studying e.g. a rule switch in one and the same mouse, which would allow us to directly assess whether it is the same neurons that undergo rule-dependent encoding. 

      We disagree that our animals are “overtrained,” as every mouse was fully trained within 13 days. We agree that it would be interesting to study a rule-switch type experiment, but such an experiment is not necessary to reveal the profound effect that conditioning has on stimulus responses in POm and LP. 

      Mapping: The authors treat and interpret the two nuclei very much in the same vein, although there are clear differences. I would think these differences are mentioned in passing but could be discussed in more depth. Mapping using responses on electrode tracks is done in POm but not LP.

      The mapping of LP responses by anatomical location is presented in the supplemental Figure S4 (previously S3). We have expanded our discussion of LP and how it might differ from POm.

      Reviewer #1 (Recommendations For The Authors):  

      Minor writing issues: 

      122 ...67 >LP< cells?

      301 plural "are”

      We have fixed these typos.

      Figure issues

      *  3a,b time ticks are misaligned and the grey bar (bottom) seems not to align with the visual/tactile stimulus shadings.

      *  legend to Figure 3b refers to Figure 1c which is a scheme, but if 1g is meant, this mouse does not seem to have a session 12? 

      *  3c,e time ticks slightly misaligned. 

      *  5e misses shading for the relevant box plots, assuming it should be like Figure 3h.  

      We thank Reviewer 1 for pointing out these errors. We have adjusted Figures 1, 3, and 5 accordingly.

      Analyses 

      I am missing a similar summary statistics for LP as in Figure 3h 

      We have added a summary box chart of LP stimulus responses (Figure 7g), similar to that of POm in Figure 3. We have also performed similar statistical analyses, the results of which are presented in the legend for Figure 7. 

      Reviewer #2 (Recommendations For The Authors): 

      More precisions are required for the following points: 

      (1) The mention of the use of analgesia is missing and this is not a minor concern. Even if the recordings are performed 24 hours after the surgery for the craniotomy and screw insertion and several days after the main surgery for the implant, taking into account the pain of the animals during surgeries is crucial first for ethical reasons, and second because it may affect the data, especially in Pom cells: pain during surgery may induce the development of allodynia and/or hyperalgesia phenomenae and Pom responses to sensory stimuli were shown to be more robust in behavioral hyperalgesia (Masri et al., 2009).  

      We neglected to include details on the analgesics used during surgery and post-operation recovery in our original manuscript. Mice were administered buprenorphine, carprofen, and bupivacaine immediately prior to the head plate surgery and were treated with additional carprofen during recovery. Mice were similarly treated with analgesics for the craniotomy procedure. Mice were carefully observed after craniotomy, and we saw no evidence of pain or discomfort. Furthermore, mice performed the behavior at the same level pre- and postcraniotomy (now presented in Figure 1j), which also indicates that they were not in any pain. 

      (2) The head-fixed preparation is only poorly described.

      Line #414: "Prior to conditioning, mice were habituated to head fixation and given ad libitum water in the behavior apparatus for 15-25 minutes." 

      And line #425 "Mice were trained for one session per day, with each session consisting of an equal number of visual stimuli and air puffs. Sessions ranged from 20-60 minutes and about 40-120 of each stimulus. " 

      More details should be given about the head-fixation training protocol. Are 15-25 minutes the session time duration, 60 minutes, or other time duration? How long does it take to get mice well trained to the head fixation, and on which criteria?  

      Line #389: "Mice were then allowed to recover for 24 hours, after which the sealant was removed and recordings were performed. At the end of experiments,"

      The timeline is not clear: is there one day or several days of recordings? 

      We have expanded on our description of the head fixation protocol in the Methods. We describe in more detail how mice were habituated to head fixation, the timing of water restriction, and the start of conditioning/training (Habituation and Conditioning, lines 492-500).

      (4) Line #411: "Mice were deprived of water 3 days prior to the start of conditioning" followed by line #414 "Prior to conditioning, mice were habituated to head fixation and given ad libitum water in the behavior apparatus for 15-25 minutes".

      If I understood correctly, the mice were then not fully water-deprived for 3 days since they received water while head-fixed. This point may be clarified. 

      We addressed these concerns in the changes to the Methods section mentioned in the preceding point (3).

      (5) Line #157: "Modality selectivity varies with anatomical location in Pom" while the end of the previous paragraph is "This suggests that POm encoding of reward and/or licking is insensitive to task type, an observation we examine further below."

      The authors then come to anatomical concerns before coming back to what the Pom may encode in the following section. This makes the story quite confusing and hard to follow even though pretty interesting.  

      We have reordered our Figures and Results to improve the flow of the paper and remove this point of confusion. We now present results on the encoding of movement before analyzing the relationship between POm stimulus responses and anatomical location. What was old Figure 5 now precedes what was old Figure 4.

      (6) Licks Analysis. Line #99 "However, this mouse also learned that the air puff predicted a lack of reward in the shaping task, as evidenced by withholding licking upon the onset of the air puff. The mouse thus displayed a positive visual lick index and a negative tactile lick index, suggesting that it attended to both the tactile and visual stimuli (Figure 1f, middle arrow)."

      Line #105 "All visually conditioned mice exhibited a similar learning trajectory (Figure 1i left, 1j left)". 

      Interestingly, the authors revealed that mice withheld licking upon the onset of the air puff in the visual conditioning, which they did not do at the onset of the drifting grating in the tactile conditioning. This withholding was extinguished after the 8th session, which the authors interpret as the mice finally ignoring the air puff. Is this effect significant, is there a significant withholding licking upon the onset of the air puff on the 12 tested mice? 

      The withholding of licking was significant (assessed with a sign-rank test) in visually conditioned mice prior to switching to the full version of the task. Indeed, it was the abolishment of this effect after conditioning with the full version of the task that was our criterion for when a mouse was fully trained. We have elaborated on this in the Habituation and Conditioning section in the Methods.

      (1) Throughout the manuscript "Touch" is used instead of passive whisker deflection, and may be confusing with "active touch" for the whisker community readers. I recommend avoiding using "touch" instead of "passive whisker deflection".

      We appreciate that “touch” can be an ambiguous term in some contexts. However, we have limited our use of the word to refer to the percept of whisker deflection; we do not describe the air puff stimulus as a “touch.” We respectfully would like to retain the use of the word, as it is useful for comparing somatosensory stimuli to visual stimuli.

      (2) Line #395: "Air puffs (0.5-1 PSI) were delivered through a nozzle (cut p1000 pipet tip, approximately 3.5mm diameter aperture)".

      Are air puffs of <1 PSI applied, not <1 bar?  

      We thank Reviewer 3 for pointing out this inaccuracy. The air puffs were indeed between 0.5 and 1 bar, not PSI. We have addressed this in the Methods.

      (3) Line #441: "In the full task, the stimuli and reward were identical, but stimuli were presented at uncorrelated and less predictable intervals."  Do the authors mean that all stimuli are rewarded?  

      The stimuli and reward were identical between the shaping and full versions of the task. In the full version of the task, the unrewarded stimulus was truly uncorrelated with reward, rather than anticorrelated. 

      (4) Line #445 "for a mean ISI of 20 msec." ISI is not defined, I guess that it means interstimulus interval. Even if pretty obvious, to avoid any confusion for future readers, I would recommend using another acronym, especially in a manuscript about electrophysiology, since ISI is a dedicated acronym for inter-spike interval. 

      We have defined the acronym ISI as “inter-stimulus interval” when first introduced in the results (Line 82) and in the Methods (Line 511).

      (5) Line #416 "In the first phase of conditioning ("shaping"), mice were separated into two cohorts: a "tactile" cohort and a "visual" cohort. Mice were presented with tactile stimuli (a two-second air puff delivered to the distal whisker field) and visual stimuli (vertical drifting grating on a monitor). Throughout conditioning, mice were monitored via webcam to ensure that the air puff only contacted the whiskers and did not disturb the facial fur nor cause the mouse to blink, flinch, or otherwise react - ensuring the stimulus was innocuous. The stimulus types were randomly ordered. In the visual conditioning cohort, the visual stimulus was paired with a water reward (8-16µL) delivered at the time of stimulus offset. In the tactile conditioning cohort, the reward was instead paired with the offset of the air puff. Regardless of the type of conditioning, stimulus type was a balanced 50:50 with an inter-stimulus interval of 8-12 seconds (uniform distribution)." 

      The mention of the "full version of the task" will be welcome in this paragraph to clarify what the task is for the mouse in the Methods part.

      We have more clearly defined the full version of the task in a later paragraph (line 506). We believe this addresses the potential confusion caused by the original description of the conditioning paradigm. 

      (6) Line #467: "Units were assigned to the array channel on which its mean waveform was largest". 

      Should it read mean waveform "amplitude"? 

      This is correct, we have adjusted the statement accordingly. 

      (7) Line #482 "The eye camera was positioned on the right side of the face and recorded at 60 fps." Then line #487 "The trace of pupil radius over time was smoothed over 5 frames (8.3 msec).” 5 frames, with a 60fps, represent then 83 ms and not 8.3 ms.

      We have corrected this error.  

      (8) Line #121: "257 POm cells and 67 cells from 12 visually conditioned mice" 

      67 LP cells, LP is missing 

      We have corrected this error. 

      (9) Line #354: "A consistent result of attention studies in humans and nonhuman primates is the enhancement of cortical and thalamic sensory responses to an attended visual stimuli. Here, we show not just enhancement of sensory responses to stimuli within a single modality, but also across modalities. It is worth investigating further how secondary thalamus and high-order sensory cortex encode attention to stimuli outside of their respective modalities. Our surprising conclusion that the nuclei are equivalently activated by behaviorally relevant stimuli is nevertheless compatible with these previous studies."  Since higher-order thalamic nuclei are integrative centers of many cortical and subcortical inputs, they cannot be viewed simply as relay nuclei, and there is therefore no "surprising" conclusion in these results. Not surprising, but still an elegant demonstration of the contextdependent activity/responses of the Pom/LP cells. 

      We disagree. Visual stimuli activating strong POm responses and tactile stimuli activating strong LP responses - however they do it - is a surprising result. We agree that higher-order thalamic nuclei are integrative centers, but exactly what they integrate and what the integrated output means is still poorly understood.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The models described are not fundamentally novel, essentially a random intercept model (with a warping function), and some flexible covariate effects using splines (i.e., additive models).

      We respectfully but strongly disagree with the reviewer’s assessment of the novelty of our work. The models referred to by the reviewer as “random intercept models … and some flexible covariate effects” seem to relate to the estimation of normative models derived cross-sectionally as developed in and adopted from previous work, not to the work presented here. To be clear, the contributions of this work are: (i) a principled methodology to make statistical predictions for individual subjects in longitudinal studies based on a novel z-diff score, (ii) an approach to transfer information large scale normative models estimated on large scale cross-sectional data to longitudinal studies (iii) an extensive theoretical analysis of the properties of this approach and (iv) empirical evaluation on an unpublished psychosis dataset. Put simply, we provide the ability to estimate within subject change in normative models which until now only provide the ability to show a subject's position in the normative range at a given timepoint. With the exception of the reference [13] cited in the main text, we are not aware of any methods available that can achieve this. Based on this feedback combined with the feedback of the Reviewer 2, we now improved our introduction and clearly state our contribution right from the outset of the manuscript whilst also shortening the introduction to make it more concise. In this work, we are trying to be very transparent in showing to the reader that our method builds on a previously peer-reviewed model.

      The assumption of constant quantiles is very strong, and limits the utility of the model to very short term data.

      We now provide an extensive theoretical analysis of our approach (section 2.1.3), where we show that this assumption is actually not strictly necessary and that our approach yields valid inferences even under much milder assumptions. More specifically, we first provide a mathematical grounding for the assumption we made in the initial submission, then generalise our method to a wider class of residual processes and show that our original assumption of constant quantiles is not too restrictive. We also provide a simulation study to show how the practitioner can evaluate the validity and implications of this assumption on a case-by-case basis. This generalisation is described in depth in section 2.1.3.

      The schizophrenia example leads to a counter-intuitive normalization of trajectories, which leads to suspicions that this is driven by some artifact of the data modeling/imaging pipelines.

      We understand that the observed normalisation effects might appear surprising. As we outlined in our provisional response, we would like to emphasise that there is increasing evidence that the old neurodegenerative view of psychosis is an oversimplification and that trajectories of cortical thickness are highly variable across different individuals after the first psychotic episode. More specifically, we have shown in an independent sample and with different methodology that individuals treated with second-generation antipsychotics and with careful clinical follow-up can show normalisation of cortical thickness atypicalities after the first episode (https://www.medrxiv.org/content/10.1101/2024.04.19.24306008v2, now accepted in Schizophrenia Bulletin). These results are well-aligned with the results we show in this manuscript. We now added remarks on this topic into the discussion. We would also like to re-emphasise that the data were processed with the utmost rigour using state of the art processing pipelines including quality control, which we have reported as transparently as possible. The confidence that the results are not ‘driven by some artifact of the data modeling/imaging pipelines’ is also supported by the fact that analysis of a group of healthy controls did not show any significant z-diffs (see Discussion section), neither frontally nor elsewhere. If the reviewer believes there are additional quality control checks that would further increase confidence in our findings, we would welcome the reviewer to provide specific details.

      The method also assumes that the cross-sectional data is from a "healthy population" without describing what this population is (there is certainly every chance of ascertainment bias in large scale studies as well as small scale studies). This issue is completely elided over in the manuscript.

      Indeed, we do not describe the cross-sectional population used for training the models, as these models were already trained and published with in-depth description of the datasets used for the training (https://elifesciences.org/articles/72904). We now make this more explicit in the section 2.1.1. of the manuscript (page 7), and also more explicitly acknowledge the possibility of ascertainment bias in the simulation section 2.1.4. However, we would like to emphasise that such ascertainment bias is not in any way specific to the analyses we report. In fact it is present in all studies that utilise large scale cohorts such as UK Biobank. Indeed, we are currently working on another manuscript to address this question in detail, but given the complexity of this problem and the fact that many publicly available legacy studies simply do not record sufficient demographic information, e.g. to assess racial bias properly, we believe that this is beyond the scope of the current work.

      Reviewer #2 (Public Review):

      The organization and clarity of this manuscript need enhancement for better comprehension and flow. For example, in the first few paragraphs of the introduction, the wording is quite vague. A lot of information was scattered and repeated in the latter part of the introduction, and the actual challenges/motivation of this work were not introduced until the 5th paragraph.

      As noted above in our response to Reviewer 1, we significantly pruned the introduction, stating our objective in the first paragraph and elaborating on the topic later in the text. We hope that it is now less repetitive and easier to follow.

      There are no simulation studies to evaluate whether the adjustment of the crosssectional normative model to longitudinal data can make accurate estimations and inferences regarding the longitudinal changes. Also, there are some assumptions involved in the modeling procedure, for example, the deviation of a healthy control from the population over time is purely caused by noise and constant variability of error/noise across x_n, and these seem to be quite strong assumptions. The presentation of this work's method development would be strengthened if the authors can conduct a formal simulation study to evaluate the method's performance when such assumptions are violated, and, ideally, propose some methods to check these assumptions before performing the analyses.

      This comment encouraged us to zoom out from our original assumption and generalise our method to a wider class of residual processes (stationary Gaussian processes) in section 2.1.3. We now present a theoretical analysis of our model to show that our original assumption (of stable quantiles plus noise) is actually not necessary for valid inference in our method, which broadens the applicability of our method. Of course, we also discuss in what way the original assumption is restrictive and how it aligns with the more general dynamics. We also include a simulation study to evaluate the method's performance and elucidate the role of the more general dynamics in section 2.1.4.

      The proposed "z-diff score" still falls in the common form of z-score to describe the individual deviation from the population/reference level, but now is just specifically used to quantify the deviation of individual temporal change from the population level. The authors need to further highlight the difference between the "z-score" and "z-diff score", ideally at its first mention, in case readers get confused (I was confused at first until I reached the latter part of the manuscript). The z-score can also be called a measure of "standardized difference" which kind of collides with what "z-diff" implies by its name.

      We added the mention of the difference between z-score and z-diff score into the last paragraph of introduction.

      Explaining that one component of the variance is related to the estimation of the model and the other is due to prediction would be helpful for non-statistical readers.

      We now added an interpretation of the z-score in the original model below equation 7.

      It would be easier for the non-statistical reader if the authors consistently used precision or variance for all variance parameters. Probably variance would be more accessible.

      This was a very useful observation, we unified the notation and now only use variance.

      The functions psi were never explicitly described. This would be helpful to have in the supplement with a reference to that in the paper.

      Indeed, while describing the original model we had to make choices about how to condense the necessary information from the original model so that we can build upon it. As the phi function is only used for data transformation in the original model, we did not further elaborate on it, however, we now refer to the specific section of the original paper of Fraza et al. 2021 where it is described more in detail (https://www.sciencedirect.com/science/article/pii/S1053811921009873).

      What is the goal of equations (13) and (14)? The authors should clarify what the point of writing these equations is prior to showing the math. It seems like it is to obtain an estimate of \sigma_{\ksi}^2, which the reader only learns at the end.

      We corrected the formatting.

      What is the definition of "adaption" as used to describe equation (15)? In this equation, I think norm on subsample was not defined.

      We added a more detailed description of the adaptation after equation 15.

      "(the sandwich part with A)" - maybe call this an inner product so that it is not confused with a sandwich variance estimator. This is a bit unclear. Equation (8) does have the inner product involving A and \beta^{-1} does include variability of \eta. It seems like you mean that equation (8) incorrectly includes variability of \eta and does not have the right term vector component of the inner product involving A, but this needs clarifying.

      We now changed the formulation to be less confusing and also explicitly clarified the caveat regarding the difference of z-scores.

      One challenge with the z-diff score is that it does not account for whether a person sits above or below zero at the first time point. It might make it difficult to interpret the results, as the results for a particular pathology could change depending on what stage of the lifespan a person is in. I am not sure how the authors would address those challenges.

      We agree with the outlined limitation in interpretation of overall trends when the position in the visit one is different between the subjects. However, this is a much broader challenge and is not specific to our approach. This effect is generally independent of the lifespan, but may further interact with the typical lifespan of disease. rWhen the z scores are taken in the context of the cross-sectional normative models, it does make it possible to identify what the overall trend of an illness is across the lifespan, and individual patient’s z-diffs not in line (with what would this typical group trajectory predicts) may e.g. correspond to early/late onset of their individual atrophy. We now make these considerations explicitly in the discussion section.

      Reviewer #2 (Recommendations For The Authors):

      Other minor suggestions to help improve the text:...

      We thank Reviewer #2 for the list of minor suggestions to improve the text, which we all implemented in the manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Freas et al. investigated if the exceedingly dim polarization pattern produced by the moon can be used by animals to guide a genuine navigational task. The sun and moon have long been celestial beacons for directional information, but they can be obscured by clouds, canopy, or the horizon. However, even when hidden from view, these celestial bodies provide directional information through the polarized light patterns in the sky. While the sun's polarization pattern is famously used by many animals for compass orientation, until now it has never been shown that the extremely dim polarization pattern of the moon can be used for navigation. To test this, Freas et al. studied nocturnal bull ants, by placing a linear polarizer in the homing path on freely navigating ants 45 degrees shifted to the moon's natural polarization pattern. They recorded the homing direction of an ant before entering the polarizer, under the polarizer, and again after leaving the area covered by the polarizer. The results very clearly show, that ants walking under the linear polarizer change their homing direction by about 45 degrees in comparison to the homing direction under the natural polarization pattern and change it back after leaving the area covered by the polarizer again. These results can be repeated throughout the lunar month, showing that bull ants can use the moon's polarization pattern even under crescent moon conditions. Finally, the authors show, that the degree in which the ants change their homing direction is dependent on the length of their home vector, just as it is for the solar polarization pattern. 

      The behavioral experiments are very well designed, and the statistical analyses are appropriate for the data presented. The authors' conclusions are nicely supported by the data and clearly show that nocturnal bull ants use the dim polarization pattern of the moon for homing, in the same way many animals use the sun's polarization pattern during the day. This is the first proof of the use of the lunar polarization pattern in any animal.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors aimed to understand whether polarised moonlight could be used as a directional cue for nocturnal animals homing at night, particularly at times of night when polarised light is not available from the sun. To do this, the authors used nocturnal ants, and previously established methods, to show that the walking paths of ants can be altered predictably when the angle of polarised moonlight illuminating them from above is turned by a known angle (here +/- 45 degrees).

      Strengths: 

      The behavioural data are very clear and unambiguous. The results clearly show that when the angle of downwelling polarised moonlight is turned, ants turn in the same direction. The data also clearly show that this result is maintained even for different phases (and intensities) of the moon, although during the waning cycle of the moon the ants' turn is considerably less than may be expected.

      Weaknesses: 

      The final section of the results - concerning the weighting of polarised light cues into the path integrator - lacks clarity and should be reworked and expanded in both the Methods and the Results (also possibly with an extra methods figure). I was really unsure of what these experiments were trying to show or what the meaning of the results actually are.

      Rewrote these sections and added figure panel to Figure 6.

      Impact: 

      The authors have discovered that nocturnal bull ants while homing back to their nest holes at night, are able to use the dim polarised light pattern formed around the moon for path integration. Even though similar methods have previously shown the ability of dung beetles to orient along straight trajectories for short distances using polarised moonlight, this is the first evidence of an animal that uses polarised moonlight in homing. This is quite significant, and their findings are well supported by their data.

      Reviewer #3 (Public Review): 

      Summary: 

      This manuscript presents a series of experiments aimed at investigating orientation to polarized lunar skylight in a nocturnal ant, the first report of its kind that I am aware of.

      Strengths: 

      The study was conducted carefully and is clearly explained here. 

      Weaknesses: 

      I have only a few comments and suggestions, that I hope will make the manuscript clearer and easier to understand.

      Time compensation or periodic snapshots 

      In the introduction, the authors compare their discovery with that in dung beetles, which have only been observed to use lunar skylight to hold their course, not to travel to a specific location as the ants must. It is not entirely clear from the discussion whether the authors are suggesting that the ants navigate home by using a time-compensated lunar compass, or that they update their polarization compass with reference to other cues as the pattern of lunar skylight gradually shifts over the course of the night - though in the discussion they appear to lean towards the latter without addressing the former. Any clues in this direction might help us understand how ants adapted to navigate using solar skylight polarization might adapt use to lunar skylight polarization and account for its different schedule. I would guess that the waxing and waning moon data can be interpreted to this effect.

      Added a paragraph discussing this distinction in mechanisms and the limits of the current data set in untangling them. An interesting topic for a follow up to be sure.

      Effects of moon fullness and phase on precision 

      As well as the noted effect on shift magnitudes, the distributions of exit headings and reorientations also appear to differ in their precision (i.e., mean vector length) across moon phases, with somewhat shorter vectors for smaller fractions of the moon illuminated. Although these distributions are a composite of the two distributions of angles subtracted from one another to obtain these turn angles, the precision of the resulting distribution should be proportional to the original distributions. It would be interesting to know whether these differences result from poorer overall orientation precision, or more variability in reorientation, on quarter moon and crescent moon nights, and to what extent this might be attributed to sky brightness or degree of polarization.

      See below for response to this and the next reviewer comment

      N.B. The Watson-Williams tests for difference in mean angle are also sensitive to differences in sample variance. This can be ruled out with another variety of the test, also proposed by Watson and Williams, to check for unequal variances, for which the F statistic is = (n2-1)*(n1-R1) / (n1-1)*(n2-R2) or its inverse, whichever is >1. 

      We have looked at the amount of variance from the mean heading direction in terms of both the shifts and the reorientations and found no significant difference in variance between all relevant conditions. It is possible (and probably likely) that with a higher n we might find these differences but with the current data set we cannot make statistical statements regarding degradations in navigational precision.  

      As an additional analysis to address the Watson-Williams test‘s sensitivity to changes in variance, we have added var test comparisons for each of the comparisons, which is a well-established test to compare variance changes. None of these were significantly different, suggesting the observed differences in the WW tests are due to changes in the mean vector and not the distribution. We have added this test to the text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      I have only very few minor suggestions to improve the manuscript: 

      (1) While I fully agree with the authors that their study, to the best of my knowledge, provides the first proof (in any animal) of the use of the moon's polarization pattern, the many repetitions of this fact disturb the flow of the text and could be cut at several instances. 

      Yes, it is indeed repeated to an annoying degree. 

      We have removed these beyond bookending mentions (Abstract and Discussion).

      (2) In my opinion, the authors did not change the "ambient polarization pattern" when using the linear polarization filter (e.g., l. 55, 170, 177 ...). The linear polarizer presents an artificial polarization pattern with a much higher degree of polarization in comparison to the ambient polarization pattern. I would suggest re-phrasing this, to emphasize the artificial nature of the polarization pattern under the polarizer.

      We have made these suggested changes throughout the text to clarify. We no longer say the ambient pattern was   

      (3) Line 377: I do not see the link between the sentence and Figure 7 

      Changed where in the discussion we refer to Figure 7.

      (4) Figure 7 upper part: In my opinion, the upper part of Figure 7 does not add any additional value to the illustration of the data as compared to Figure 5 and could be cut.

      We thought it might be easier for some reader to see the shifts as a dial representation with the shift magnitude converted to 0-100% rather than the shifts in Figure 5. This makes it somewhat like a graphical abstract summarising the whole study.

      I agree that Figure 5 tells the same story but a reader that has little background in directional stats might find figure 7 more intuitive. This was the intent at least. 

      If it becomes a sticking point, then we can remove the upper portion.  

      Reviewer #2 (Recommendations For The Authors): 

      MINOR CORRECTIONS AND QUERIES 

      Line 117: THE majority 

      Corrected

      Lines 129-130: Do you have a reference to support this statement? I am unaware of experiments that show that homing ants count their steps, but I could have missed it.

      We have added the references that unpack the ant pedometer.  

      Line 140: remove "the" in this line. 

      Removed

      Line 170: We need more details here about the spectral transmission properties of the polariser (and indeed which brand of filter, etc.). For instance, does it allow the transmission of UV light?

      Added

      Line 239: "...tested identicALLY to ...." 

      Corrected

      Lines 242-258 (Vector testing): I must admit I found the description of these experiments very difficult to follow. I read this section several times and felt no wiser as a result. I think some thought needs to be given to better introduce the reader to the rationale behind the experiment (e.g., start by expanding lines 243-246, and maybe add a methods figure that shows the different experimental procedures).

      I have rewritten this section of the methods to clearly state the experiment rational and to be clearer as to the methodology.

      Also Added a methods panel to Figure 6.

      Line 247: "reoriented only halfway". What does this mean? Do you mean with half the expected angle?

      Yes, this is a bit unclear. We have altered for clarity:

      ‘only altered their headings by about half of the 45° e-vector shift (25.2°± 3.7°), despite being tested on near-full-moon nights.’

      Results section (in general): In Figure 1 (which is a very nice figure!) you go to all the trouble of defining b degrees (exit headings) and c degrees (reorientation headings), which are very intuitive for interpreting the results, and then you totally abandon these convenient angles in favour of an amorphous Greek symbol Phi (Figs. 2-6) to describe BOTH exit and reorientation headings. Why?? It becomes even more confusing when headings described by Phi can be typically greater than 300 degrees in the figures, but they are never even close to this in the text (where you seem to have gone back to using the b degrees and c degrees angles, without explicitly saying so). Personally, I think the b degrees and c degrees angles are more intuitive (and should be used in both the text and the figures), but if you do insist on using Phi then you should use it consistently in both the text and the figures. 

      Replaced Phi with b° and c° for both figures and in the text.

      Finally, for reorientation angles in Figure 4A, you say that the angle is 16.5 degrees. This angle should have been 143.5 degrees to be consistent with other figures. 

      Yes, the reorientation was erroneously copied from the shift data (it is identical in both the +45 shift and reorientation for Figure 4A). This has now been corrected

      Line 280, and many other lines: Wherever you refer to two panels of the same figure, they should be written as (say) Figure 2A, B not Figure 2AB.

      Changed as requested throughout the text.

      Line 295 (Waxing lunar phases): For these experiments, which nest are you using? 1 or 2?

      We have added that this is nest 1. 

      Figure 3B: The title of this panel should be "Waxing Crescent Moon" I think. 

      Ah yes, this is incorrect in the original submission. I have fixed this.

      Lines 312-313: Here it sounds as though the ants went right back to the full +/- 45 degrees orientations when they clearly didn't (it was -26.6 degrees and 189.9 degrees). Maybe tone the language down a bit here.

      Changed this to make clear the orientation shift is only ‘towards’ the ambient lunar e-vector.

      Line 327: Insert "see" before "Figure 5" 

      Added

      Line 329: See comment for Line 295. 

      We have added that this is nest 1. 

      Lines 357-373 (Vector testing): Again, because of the somewhat confusing methods section describing these experiments, these results were hard to follow, both here and in the Discussion. I don't really understand what you have shown here. Re-think how you present this (and maybe re-working the Methods will be half the battle won). 

      I have rewritten these sections to try to make clear these are ant tested with differences in vector length 6m vs. 2m, tested at the same location. Hopefully this is much clearer, but I think if these portions remain a bit confusing that a full rename of the conditions is in order. Something like long vector and short vector would help but comes with the problem of not truly describing what the purpose of the test is which is to control for location, thus the current condition names. As it stands, I hope the new clarifications adequately describe the reasoning while keeping the condition names. Of course, I am happy to make more changes here as making this clear to readers is important for driving home that the path integrator is in play.

      See current change to results as an example: ‘Both forgers with a long ~6m remaining vector (Halfway Release), or a short ~2m remaining vector (Halfway Collection & Release), tested at the same location_,_ exhibited significant shifts to the right of initial headings when the e-vector was rotated clockwise +45°.’

      Line 361: I think this should be 16.8 not 6.8 

      Yes, you are correct. Fixed in text (16.8).

      Line 365: I think this should be -12.7 not 12.7 

      Yes, you are correct. Fixed in text (–12.7).

      Line 408: "morning twilight". Should this be "morning solar twilight"? Plus "M midas" should be "M. midas"

      Added and fixed respectively.

      Line 440. "location" is spelt wrong. 

      Fixed spelling.

      Line 444: "...WITH longer accumulated vectors, ..." 

      Added ‘with’ to sentence. 

      Line 447: Remove "that just as"

      Removed.

      Line 448: "Moonlight polarised light" should be "Polarised moonlight" 

      Corrected.

      Lines 450-453: This sentence makes little sense scientifically or grammatically. A "limiting factor" can't be "accomplished". Please rephrase and explain in more detail.

      This sentence has been rephrased:

      ‘The limiting factors to lunar cue use for navigation would instead be the ant’s detection threshold to either absolute light intensity, polarization sensitivity and spectral sensitivity. Moonlight is less UV rich compared to direct sunlight and the spectrum changes across the lunar cycle (Palmer and Johnsen 2015).’

      Line 474: Re-write as "... due to the incorporation of the celestial compass into the path integrator..."

      Added.

      Reviewer #3 (Recommendations For The Authors): 

      Minor comments 

      Line 84 I am not sure that we can infer attentional processes in orientation to lunar skylight, at least it has not yet been investigated.

      Yes, this is a good point. We have changed ‘attend’ to ‘use’.  

      Line 90 This description of polarized light is a little vague; what is meant by the phrase "waves which occur along a single plane"? (What about the magnetic component? These waves can be redirected, are they then still polarized? Circular polarization?). I would recommend looking at how polarized light is described in textbooks on optics.

      Response: We have rewritten the polarised light section to be clearer using optics and light physics for background. 

      Line 92 The phrase "e-vector" has not been described or introduced up to this point.

      We now introduce e-vector and define it. 

      ‘Polarised light comprises light waves which occur along a single plane and are produced as a by-product of light passing through the upper atmosphere (Horváth & Varjú 2004; Horváth et al., 2014). The scattering of this light creates an e-vector pattern in the sky, which is arranged in concentric circles around the sun or moon's position with the maximum degree of polarisation located 90° from the source. Hence when the sun/moon is near the horizon, the pattern of polarised skylight is particularly simple with uniform direction of polarisation approximately parallel to the north-south axes (Dacke et al., 1999, 2003; Reid et al. 2011; Zeil et al., 2014).’

      Happy to make further changes as well.  

      Line 107 Diurnal dung beetles can also orient to lunar skylight if roused at night (Smolka et al., 2016), provided the sky is bright enough. Perhaps diurnal ants might do the same?

      Added the diurnal dung beetles mention as well as the reference.

      Also, a very good suggestion using diurnal bull ants.

      Line 146 Instead of lunar calendar the authors appear to mean "lunar cycle". 

      Changed

      Line 165 In Figure 1B, it looks like visual access to the sky was only partly "unobstructed". Indeed foliage covers as least part of the sky right up to the zenith.

      We have added that the sky is partially obstructed. 

      Line 179 This could also presumably be checked with a camera? 

      For this testing we tried to keep equipment to a minimum for a single researcher walking to and from the field site given the lack of public transport between 1 and 4am. But yes, for future work a camera based confirmation system would be easier. 

      Line 243 The abbreviation "PI" has not been described or introduced up to this point.

      Changes to ‘path integration derived vector lengths….’

      Line 267 The method for comparing the leftwards and rightwards shifts should be described in full here (presumably one set of shifts was mirrored onto the other?).

      We have added the below description to indicate the full description of the mirroring done to counterclockwise shifts.

      ‘To assess shift magnitude between −45° and +45° foragers within conditions, we calculated the mirror of shift in each −45° condition, allowing shift magnitude comparisons within each condition. Mirroring the −45° conditions was calculated by mirroring each shift across the 0° to 180° plane and was then compared to the corresponding unaltered +45 condition.’

      Discussion Might the brightness and spectrum of lunar skylight also play a role here?

      We have added a section to the discussion to mention the aspects of moonlight which may be important to these animals, including the spectrum, brightness and polarisation intensity.  

      Line 451 The sensitivity threshold to absolute light intensity would not be the only limiting factor here. Polarization sensitivity and spectral sensitivity may also play a role (moonlight is less UV rich than sunlight and the spectrum of twilight changes across the lunar cycle: Palmer & Johnsen, 2015). 

      Added this clarification.

      Line 478 Instead of the "masculine ordinal" symbol used (U+006F) here a degree symbol (U+00B0) should be used.

      Ah thank you, we have replaced this everywhere in the text.  

      Line 485 It should be possible to calculate the misalignment between polarization pattern before and after this interruption of celestial cues. Does the magnitude of this misalignment help predict the size of the reorientation?

      Reorientations are highly correlated with the shift size under the filter, which makes sense as larger shifts mean that foragers need to turn back more to reorient to both the ambient pattern and to return to their visual route. Reorientation sizes do not show a consistent reduction compared to under-the-filter shifts when the lunar phase is low and is potentially harder to detect.

      I have reworked this line in the text as I do not think there is much evidence for misalignment and it might be more precise to say that overnight periods where the moon is not visible may adversely impact the path integrator estimate, though it is currently unknown the full impact of this celestial cue gap of if other cues might also play a role.

      Line 642 "from their" should be "relative to" 

      Changed as requested

      Figure 1B Some mention should be made of the differences in vegetation density. 

      Added a sentence to the figure caption discussing the differences in both vegetation along the horizon and canopy cover.

      Figures 2-6 A reference line at 0 degrees change might help the reader to assess the size of orientation changes visually. Confidence intervals around the mean orientation change would also help here.

      We have now added circular grid lines and confidence intervals to the circular plots. These should help make the heading changes clear to readers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment <br /> This valuable study is a companion to a paper introducing a theoretical framework and methodology for identifying Cancer Driving Nucleotides (CDNs). While the evidence that recurrent SNVs or CDNs are common in true cancer driver genes is solid, the evidence that many more undiscovered cancer driver mutations will have CDNs, and that this approach could identify these undiscovered driver genes with about 100,000 samples, is limited. 

      Same criticism as in the eLife assessment of eLife-RP-RA-2024-99340 (https://elifesciences.org/reviewed-preprints/99340). Hence, please refer to the responses to the companion paper.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study investigates Cancer Driving Nucleotides (CDNs) using the TCGA database, finding that these recurring point mutations could greatly enhance our understanding of cancer genomics and improve personalized treatment strategies. Despite identifying 50-150 CDNs per cancer type, the research reveals that a significant number remain undiscovered, limiting current therapeutic applications, and underscoring the need for further larger-scale research.

      Strengths:

      The study provides a detailed examination of cancer-driving mutations at the nucleotide level, offering a more precise understanding than traditional gene-level analyses. The authors found a significant number of CDNs remain undiscovered, with only 0-2 identified per patient out of an expected 5-8, indicating that many important mutations are still missing. The study indicated that identifying more CDNs could potentially significantly impact the development of personalized cancer therapies, improving patient outcomes.

      Weaknesses:

      The study is constrained by relatively small sample sizes for each cancer type, which reduces the statistical power and robustness of the findings. ICGC and other large-scale WGS datasets are publicly available but were not included in this study.

      Thanks. We indeed have used all public data, including GENIE (figure 7 of the companion paper), ICGC and other integrated resources such as COSMIC. The main study is based on TCGA because it is unbiased for estimating the probability of CDN occurrences. In many datasets, the numerators are given but the denominators are not (the number of patients with the mutation / the total number of patients surveyed). In GENIE, we observed that E(u) estimated upon given sequencing panels are much smaller than in TCGA, this might be due to the selective report of nonsynonymous mutations for synonymous mutations are generally considered irrelevant in tumorigenesis.

      To be able to identify rare driver mutations, more samples are needed to improve the statistical power, which is well-known in cancer research. The challenges in direct functional testing of CDNs due to the complexity of tumor evolution and unknown mutation combinations limit the practical applicability of the findings.

      We fully agree. We now add a few sentences, making clear that the theory allows us to see how much more can be gained by each stepwise increase in sample size. For example, when the sample size reaches 106, further increases will yield almost no gain in confidence of CDNs identified (see figures of eLife-RP-RA-2024-99340. As pointed out in our provisional responses, an important strength of this pair of studies is that the results are testable. The complexity is the combination of mutations required for tumorigenesis and the identification of such combinations is the main goal and strength of this pair of studies. We add a few sentences to this effect.

      While the importance of large sample sizes in identifying cancer drivers is well-recognized, the analytical framework presented in the companion paper (https://elifesciences.org/reviewed-preprints/99340) goes a step further by quantitatively elucidating the relationship between sample size and the resolution of CDN detection.

      The question is very general as it is about multigene interactions, or epistasis. The challenges are true in all aspects of evolutionary biology, for example, the genetics of reproductive isolation(Wu and Ting 2004). The issue of epistasis is difficult because most, if not all, of the underlying mutations have to be identified in order to carry out functional tests. While the full identification is rarely feasible, it is precisely the objective of the CDN project. When the sample size increases to 100,000 for a cancer type, all point mutations for that cancer type should be identifiable.

      The QC of the TCGA data was not very strict, i.e, "patients with more than 3000 coding region point mutations were filtered out as potential hypermutator phenotypes", it would be better to remove patients beyond +/- 3*S.D from the mean number of mutations for each cancer type. Given some point mutations with >3 hits in the TCGA dataset, they were just false positive mutation callings, particularly in the large repeat regions in the human genome.

      Thanks. The GDC data portal offers data calls from multiple pipelines, enabling us to select mutations detected by at least two pipelines. While including patients with hypermutator phenotypes could introduce potential noise, as shown in Eq. 10 of the main text, our method for defining the upper limit of i* is relative robust to the fluctuations in the E(u) of the corresponding cancer population. Since readers may often ask about this, we expand the Methods section somewhat to emphasize this point.

      The codes for the statistical calculation (i.e., calculation of Ai_e, et al) are not publicly available, which makes the findings hard to be replicated.

      We have now updated the section of “Data Availability” in both papers. The key scripts for generating the major results are available at: https://gitlab.com/ultramicroevo/cdn_v1.

      Reviewer #2 (Public Review):

      Summary:

      The study proposes that many cancer driver mutations are not yet identified but could be identified if they harbor recurrent SNVs. The paper leverages the analysis from Paper #1 that used quantitative analysis to demonstrate that SNVs or CDNs seen 3 or more times are more likely to occur due to selection (ie a driver mutation) than they are to occur by chance or random mutation.

      Strengths:

      Empirically, mutation frequency is an excellent marker of a driver gene because canonical driver mutations typically have recurrent SNVs. Using the TCGA database, the paper illustrates that CDNs can identify canonical driver mutations (Figure 3) and that most CDNs are likely to disrupt protein function (Figure 2). In addition, CDNs can be shared between cancer types (Figure 4).

      Weaknesses:

      Driver alteration validation is difficult, with disagreements on what defines a driver mutation, and how many driver mutations are present in a cancer. The value proposed by the authors is that the identification of all driver genes can facilitate the design of patient-specific targeting therapies, but most targeted therapies are already directed towards known driver genes. There is an incomplete discussion of oncogenes (where activating mutations tend to target a single amino acid or repeat) and tumor suppressor genes (where inactivating mutations may be more spread across the gene). Other alterations (epigenetic, indels, translocations, CNVs) would be missed by this type of analysis.

      The above paragraph has three distinct points. We shall respond one by one.

      First, …  can facilitate the design of patient-specific targeting therapies, but most targeted therapies are already directed towards known driver genes…

      We state in the text of Discussion the following that shows only a few best-known driving mutations have been targeted. It is accurate to say that < 5% of CDNs we have identified are on the current targeting list. Furthermore, this list we have compiled is < 10% of what we expect to find.

      Direct functional test of CDNs would be to introduce putative cancer-driving mutations and observe the evolution of tumors. Such a task of introducing multiple mutations that are collectively needed to drive tumorigenesis has been done only recently, and only for the best-known cancer driving mutations (Ortmann et al. 2015; Takeda et al. 2015; Hodis et al. 2022). In most tumors, the correct combination of mutations needed is not known. Clearly, CDNs, with their strong tumorigenic strength, are suitable candidates.

      Second, “There is an incomplete discussion of oncogenes (where activating mutations tend to target a single amino acid or repeat) and tumor suppressor genes (where inactivating mutations may be more spread across the gene).”

      We sincerely thank the reviewer for this insightful comment. Below are two new paragraphs in the Discussion pertaining to the point:

      In this context, we should comment on the feasibility of targeting CDNs that may occur in either oncogenes (ONCs) or tumor suppressor genes (TSGs). It is generally accepted that ONCs drive tumorigenesis thanks to the gain-of-function (GOF) mutations whereas TSGs derive their tumorigenic powers by loss-of-function (LOF) mutations. It is worthwhile to point out that, since LOF mutations are likely to be more widespread on a gene, CDNs are biased toward GOF mutations. The often even distribution of non-sense mutations along the length of TSGs provide such evidence. As gene targeting aims to diminish gene functions, GOF mutations are perceived to be targetable whereas LOF mutations are not. By extension, ONCs should be targetable but TSGs are not. This last assertion is not true because mutations on TSGs may often be of the GOF kind as well.

      The data often suggest that mis-sense mutations on TSGs are of the GOF kind. If mis-sense mutations are far more prevalent than nonsense mutations in tumors, the mis-sense mutations cannot possibly be LOF mutations. (After all, it is not possible to lose more functions than nonsense mutations.) For example, AAA to AAC (K to Q) is a mis-sense mutation while AAA to AAT (K to stop) is a non-sense mutation. In a separate study (referred to as the escape-route analysis), we found many cases where the mis-sense mutations on TSGs are more prevalent (> 10X) than nonsense mutations. Another well-known example is the distribution of non-sense mutations TSGs. For example, on APC, a prominent TSG, non-sense mutations are far more common in the middle 20% of the gene than the rest (Zhang and Shay 2017; Erazo-Oliveras et al. 2023). The pattern suggests that even these non-sense mutations could have GOF properties. 

      The following response is about the clinical implications of our CDN analysis. Canonical targeted therapy often relies on the Tyrosine Kinase Inhibitors (TKIs) (Dang et al. 2017; Danesi et al. 2021; Waarts et al. 2022). Theoretically, any intervention that suppresses the expression of gain-of-function (GOF) CDNs could potentially have therapeutic value in cancer treatment. This leads us to a discussion of oncogenes versus TSGs in the context of GOF / LOF (loss of function) mutations. Not all mutations on oncogenes have oncogenic effect, besides, truncated mutations in oncogenes are often subject to negative selection (Bányai et al. 2021), the identification of CDNs within oncogenes is therefore crucial for developing effective cancer treatment guidelines. Secondly, while TSGs are generally believed to promote cancer development via loss of function mutations, research suggests that certain mutations within TSGs can have GOF-like effect, such as the dominant negative effect of truncated TP53 mutations (Marutani et al. 1999; de Vries et al. 2002; Gerasimavicius et al. 2022). Characterizing driver mutations as GOF or LOF mutations could potentially expand the scope of targeted cancer therapy. We’ll address this issue in a third study in preparation.

      The method could be more valuable when applied to the noncoding genome, where driver mutations in promoters or enhancers are relatively rare, or as yet to be discovered. Increasingly more cancers have had whole genome sequencing. Compared to WES, criteria for driver mutations in noncoding regions are less clear, and this method could potentially provide new noncoding driver CDNs. Observing the same mutation in more than one cancer specimen is empirically unusual, and the authors provide a solid quantitative analysis that indicates many recurrent mutations are likely to be cancer-driver mutations.

      Again, we are grateful for the comments which prompt us to expand a paragraph in Discussion, reproduced below.

      The CDN approach has two additional applications. First, it can be used to find CDNs in non-coding regions. Although the number of whole genome sequences at present is still insufficient for systematic CDN detection, the preliminary analysis suggests that the density of CDNs in non-coding regions is orders of magnitude lower than in coding regions. Second, CDNs can also be used in cancer screening with the advantage of efficiency as the targeted mutations are fewer. For the same reason, the false negative rate should be much lower too. Indeed, the false positive rate should be far lower than the gene-based screen which often shows a false positive rate of >50% (supplement File S1).

      Again, we are grateful that Reviewer #2 have addressed the potential value of our study in finding cancer drivers in non-coding regions. A major challenge in this area lies in defining the appropriate L value as presented in Eq. 10. In the main text, we used a gamma distribution to account for the variability of mutation rates across sites in coding region. For the non-coding region, we will categorize these regions based on biological annotations. The goal is to set different i* cutoffs for different genomic regions (such as heterochromatin / euchromatin, GC-rich regions or centromeric regions), and avoid false positive calls for CDN in repeated regions (Elliott and Larsson 2021; Peña et al. 2023).

      References

      Bányai L, Trexler M, Kerekes K, Csuka O, Patthy L. 2021. Use of signals of positive and negative selection to distinguish cancer genes and passenger genes. Elife 10:e59629.

      Danesi R, Fogli S, Indraccolo S, Del Re M, Dei Tos AP, Leoncini L, Antonuzzo L, Bonanno L, Guarneri V, Pierini A, et al. 2021. Druggable targets meet oncogenic drivers: opportunities and limitations of target-based classification of tumors and the role of Molecular Tumor Boards. ESMO Open 6:100040.

      Dang CV, Reddy EP, Shokat KM, Soucek L. 2017. Drugging the “undruggable” cancer targets. Nat Rev Cancer 17:502–508.

      Elliott K, Larsson E. 2021. Non-coding driver mutations in human cancer. Nat Rev Cancer 21:500–509.

      Erazo-Oliveras A, Muñoz-Vega M, Mlih M, Thiriveedi V, Salinas ML, Rivera-Rodríguez JM, Kim E, Wright RC, Wang X, Landrock KK, et al. 2023. Mutant APC reshapes Wnt signaling plasma membrane nanodomains by altering cholesterol levels via oncogenic β-catenin. Nat Commun 14:4342.

      Gerasimavicius L, Livesey BJ, Marsh JA. 2022. Loss-of-function, gain-of-function and dominant-negative mutations have profoundly different effects on protein structure. Nat Commun 13:3895.

      Hodis E, Triglia ET, Kwon JYH, Biancalani T, Zakka LR, Parkar S, Hütter J-C, Buffoni L, Delorey TM, Phillips D, et al. 2022. Stepwise-edited, human melanoma models reveal mutations’ effect on tumor and microenvironment. Science 376:eabi8175.

      Marutani M, Tonoki H, Tada M, Takahashi M, Kashiwazaki H, Hida Y, Hamada J, Asaka M, Moriuchi T. 1999. Dominant-negative mutations of the tumor suppressor p53 relating to early onset of glioblastoma multiforme. Cancer Res 59:4765–4769.

      Ortmann CA, Kent DG, Nangalia J, Silber Y, Wedge DC, Grinfeld J, Baxter EJ, Massie CE, Papaemmanuil E, Menon S, et al. 2015. Effect of Mutation Order on Myeloproliferative Neoplasms. N Engl J Med 372:601–612.

      Peña MV de la, Summanen PAM, Liukkonen M, Kronholm I. 2023. Chromatin structure influences rate and spectrum of spontaneous mutations in Neurospora crassa. Genome Res. 33:599–611.

      Takeda H, Wei Z, Koso H, Rust AG, Yew CCK, Mann MB, Ward JM, Adams DJ, Copeland NG, Jenkins NA. 2015. Transposon mutagenesis identifies genes and evolutionary forces driving gastrointestinal tract tumor progression. Nat Genet 47:142–150.

      de Vries A, Flores ER, Miranda B, Hsieh H-M, van Oostrom CThM, Sage J, Jacks T. 2002. Targeted point mutations of p53 lead to dominant-negative inhibition of wild-type p53 function. Proceedings of the National Academy of Sciences 99:2948–2953.

      Waarts MR, Stonestrom AJ, Park YC, Levine RL. 2022. Targeting mutations in cancer. J Clin Invest 132:e154943.

      Wu C-I, Ting C-T. 2004. Genes and speciation. Nat Rev Genet 5:114–122.

      Zhang L, Shay JW. 2017. Multiple Roles of APC and its Therapeutic Implications in Colorectal Cancer. JNCI: Journal of the National Cancer Institute 109:djw332.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      The authors proposed a framework to estimate the posterior distribution of parameters in biophysical models. The framework has two modules: the first MLP module is used to reduce data dimensionality and the second NPE module is used to approximate the desired posterior distribution. The results show that the MLP module can capture additional information compared to manually defined summary statistics. By using the NPE module, the repetitive evaluation of the forward model is avoided, thus making the framework computationally efficient. The results show the framework has promise in identifying degeneracy. This is an interesting work.

      We thank the reviewer for the positive comments made on our manuscript. 

      Reviewer #1 (Recommendations For The Authors): 

      I have some minor comments. 

      (1) The uGUIDE framework has two modules, MLP and NPE. Why are the two modules trained jointly? The MLP module is used to reduce data dimensionality. Given that the number of features for different models is all fixed to 6, why does one need different MLPs? This module should, in principle, be general-purpose and independent of the model used.

      The MLP must be trained together with the NPE module to maximise inference performance in terms of accuracy and precision. Although the number of features predicted by the MLP was fixed to six, the characteristics of these six features can be very different, depending on the chosen forward model and the available data, as we showed in Appendix 1 Figure 1. Training the MLP independently of the NPE would result in suboptimal performance of µGUIDE, with potentially higher bias and variance of the predicted posterior distributions. We have now added these considerations in the Methods section.

      (2) The authors mentioned at L463 that all the 3 models use 6 features. From L445 to L447, it seems model 3 has 7 unknown parameters. How can one use 6 features to estimate 7 unknowns? 

      Thank you for pointing out the lack of clarity regarding the parameters to estimate in this section. Model 3 is a three-compartment model, whose parameters of interest are the signal fraction and diffusivity from water diffusing in the neurite space (fn and Dn), the neurites orientation dispersion index (ODI), the signal fraction in cell bodies (fs), a proxy to soma radius and diffusivity (Cs), and the signal fraction and diffusivity in the extracellular space (fe and De). The signal fractions are constrained by the relationship fn + fs + fe = 1, hence fe  i_s calculated from the estimated _fn and fs. This leaves us with 6 parameters to estimate: fn, Dn, ODI, fs, Cs, De. We clarified it in the revised version of the paper. 

      (3) L471, Rician noise is not a proper term. Rician distribution is the distribution of pixel intensities observed in the presence of noise. And Rician distribution is the result of magnitude reconstruction. See "Noise in magnitude magnetic resonance images" published in 2008. I assume that real-valued Gaussian noise is added to simulated data. 

      We apologize for the confusion. We added Gaussian noise to the real and imaginary parts of the simulated signals and then used the magnitude of this noisy complex signal for our experiments. We rephrased the sentence for more clarity.

      (4) L475, why thinning is not used in MCMC? In figure 3, the MCMC results are more biased than uGUIDE, is it related to no thinning in MCMC? 

      We followed the recommendations by Harms et al. (2018) for the MCMC experiments. They analysed the impact of thinning (among other parameters) on the estimated posterior distributions. Their findings indicate that thinning is unnecessary and inefficient, and they recommend using more samples instead. For further details, we refer the reviewer to their publication, along with the theoretical works they cite. We have now added this note in the Methods section.

      (5) Did the authors try model-fitting methods with different initializations to get a distribution of the parameters? Like the paper "Degeneracy in model parameter estimation for multi‐compartmental diffusion in neuronal tissue". For the in vivo data, it is informative to see the model-fitting results.

      No, we did not try model-fitting methods with different initializations because such methods provide only a partial description of the solution landscape, which can be interpreted as a partial posterior distribution. Although this approach can help to highlight the problem of degeneracy, it does not provide a complete description of all potential solutions. In contrast, MCMC estimates the full posterior distribution, offering a more accurate and precise characterization of degeneracies and uncertainties compared to model-fitting methods with varying initializations. Hence, we decided to use MCMC as benchmark. We have now added these considerations to the Discussion section. 

      Reviewer #2 (Public Review): 

      Summary: 

      The authors improve the work of Jallais et al. (2022) by including a novel module capable of automatically learning feature selection from different acquisition protocols inside a supervised learning framework. Combining the module above with an estimation framework for estimating the posterior distribution of model parameters, they obtain rich probabilistic information (uncertainty and degeneracy) on the parameters in a reasonable computation time. 

      The main contributions of the work are: 

      (1) The whole framework allows the user to avoid manually defining summary statistics, which may be slow and tedious and affect the quality of the results. 

      (2) The authors tested the proposal by tackling three different biophysical models for brain tissue and using data with characteristics commonly used by the diffusion-MRmicrostructure research community. 

      (3) The authors validated their method well with the state-of-the-art. 

      The main weakness is: 

      (1) The methodology was tested only on scenarios with a signal-to-noise ratio (SNR) equal to 50. It is interesting to show results with lower SNR and without noise that the method can detect the model's inherent degenerations and how the degeneration increases when strong noise is present. I suggest expanding the Figure in Appendix 1 to include this information. 

      The authors showed the utility of their proposal by computing complex parameter descriptors automatically in an achievable time for three different and relevant biophysical models. 

      Importantly, this proposal promotes tackling, analysing, and considering the degenerated nature of the most used models in brain microstructure estimation. 

      We thank the reviewer for these positive remarks. 

      Concerning the main weakness highlighted by the reviewer: In our submitted work, we presented results both without noise and with a signal-to-noise ratio (SNR) equal to 50 (similar to the SNR in the experimental data analysed). Figure 5 shows exemplar posterior distributions obtained in a noise-free scenario, and Table 1 reports the number of degeneracies for each model on 10000 noise-free simulations. These results highlight that the presence of degeneracies is inherent to the model definition. Figures 3, 6 and 7 present results considering an SNR of 50. We acknowledge that results with lower SNR have not been included in the initial submission. To address this, we added a figure in the appendix illustrating the impact of noise on the posterior distributions. Specifically, Figure 1A of Appendix 2 shows posterior distributions estimated from signals generated using an exemplar set of model parameters with varying noise levels

      (no noise, SNR=50 and SNR=25). Figure 1B presents uncertainties values obtained on 1000 simulations for each noise level. We observe that, as the SNR reduces, uncertainty increases. Noise in the signal contributes to irreducible variance. The confidence in the estimates therefore reduces as the noise level increases.  

      Reviewer #2 (Recommendations For The Authors):  

      Some suggestions: 

      Panel A of Figure 2 may deserve a better explanation in the Figure's caption. 

      We agree that the description of panel A of figure 2 was succinct and added more explanation in the figure’s caption.  

      The caption of Figure 3 should mention that the panel's titles are the parameters of the used biophysical models. 

      We added in the caption of figure 3 that the names of the model parameters are indicated in the titles of the panels. We apologise for the confusion it may have created.

      In equation (3), the authors should indicate the summation index. 

      We apologise for not putting the summation index in equation 3. We added it in the revised version.

      In line 474, the authors should discuss if the systematic use of the maximum likelihood estimator as an initializer for the sampling does not bias the computed results. 

      Concerning the MCMC estimations, we followed the recommendations from Harms et al. (2018). They investigated the use of starting from the maximum likelihood estimator (MLE). They concluded that starting from the MLE allows to start in the stationary distribution of the Markov chain, removing the need for some burn-in. Additionally, they showed that initializing the sampling from the MLE has the advantage of removing salt- and pepper-like noise from the resulting mean and standard deviation maps. We have now added this note in the Methods section.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Tubert C. et al. investigated the role of dopamine D5 receptors (D5R) and their downstream potassium channel, Kv1, in the striatal cholinergic neuron pause response induced by thalamic excitatory input. Using slice electrophysiological analysis combined with pharmacological approaches, the authors tested which receptors and channels contribute to the cholinergic interneuron pause response in both control and dyskinetic mice (in the L-DOPA off state). They found that activation of Kv1 was necessary for the pause response, while activation of D5R blocked the pause response in control mice. Furthermore, in the L-DOPA off-state of dyskinetic mice, the absence of the pause response was restored by the application of clozapine. The authors claimed that (1) the D5R-Kv1 pathway contributes to the cholinergic interneuron pause response in a phasic dopamine concentration-dependent manner, and (2) clozapine inhibits D5R in the L-DOPA off state, which restores the pause response.

      Strengths:

      The electrophysiological and pharmacological approaches used in this study are powerful tools for testing channel properties and functions. The authors' group has well-established these methodologies and analysis pipelines. Indeed, the data presented were robust and reliable.

      Thank you for your comments.

      Weaknesses:

      Although the paper has strengths in its methodological approaches, there is a significant gap between the presented data and the authors' claims.

      There was no direct demonstration that the D5R-Kv1 pathway is dominant when dopamine levels are high. The term 'high' is ambiguous, and it raises the question of whether the authors believe that dopamine levels do not reach the threshold required to activate D5R under physiological conditions.

      We acknowledge that further work is necessary to clarify the role of the D5R in physiological conditions. While we haven’t found effects of the D1/D5 receptor antagonist SCH23390 on the pause response in control animals (Fig. 3), it is still possible that dopamine levels reach the threshold to stimulate D5R when burst firing of dopaminergic neurons contributes to dopamine release. We believe the pause response depends, among other factors, on the relative stimulation levels of SCIN D2 and D5 receptors, which is likely not an all-or-nothing phenomenon. To reduce ambiguity, we will change the labels referring to dopamine levels in Figure 6F.

      Furthermore, the data presented in Figure 6 are confusing. If clozapine inhibits active D5R and restores the pause response, the D5R antagonist SCH23390 should have the same effect. The data suggest that clozapine-induced restoration of the pause response might be mediated by other receptors, rather than D5R alone.

      Thank you for letting us clarify this issue. Please note that the levels of endogenous dopamine 24 h after the last L-DOPA challenge in severe parkinsonian mice are expected to be very low. In the absence of an agonist, a pure D1/D5 antagonist would not exert an effect, as demonstrated with SCH23390 alone, which did not have an impact on the SCIN response to thalamic stimulation (Fig. 6). While clozapine can also act as a D1/D5 receptor antagonist, its D1/D5 effects in absence of an agonist are attributed to its inverse agonist properties (PMID: 24931197). Notably, SCH23390 prevented the effect of clozapine, allowing us to conclude that ligand-independent D1/D5 receptor-mediated mechanisms are involved in suppressing the pause response in dyskinetic mice. We will make the point clearer in the Discussion.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Tubert et al presents the role of the D5 receptor in modulating the striatal cholinergic interneuron (CIN) pause response through D5R-cAMP-Kv1 inhibitory signaling. Their model elucidates the on / off switch of CIN pause, likely due to the different DA affinity between D2R and D5R. This machinery may be crucial in modulating synaptic plasticity in cortical-striatal circuits during motor learning and execution. Furthermore, the study bridges their previous finding of CIN hyperexcitability (Paz et al., Movement Disorder 2022) with the loss of pause response in LID mice.

      Strengths:

      The study had solid findings, and the writing was logically structured and easy to follow. The experiments are well-designed, and they properly combined electrophysiology recording, optogenetics, and pharmacological treatment to dissect/rule out most, if not all, possible mechanisms in their model.

      Thank you for your comments.

      Weaknesses:

      The manuscript is overall satisfying with only some minor concerns that need to be addressed. Manipulation of intracellular cAMP (e.g. using pharmacological analogs or inhibitors) can add additional evidence to strengthen the conclusion.

      Thank you for the suggestion. While we acknowledge that we are not providing direct evidence of the role of cAMP, we chose not to conduct these experiments because cAMP levels influence several intrinsic and synaptic currents beyond Kv1, significantly affecting  membrane oscillations and spontaneous firing, as shown in Paz et al. 2021. However, we are modifying the manuscript so there is no misinterpretation about our findings in the current work.

      Reviewer #3 (Public review):

      Summary:

      Tubert et al. investigate the mechanisms underlying the pause response in striatal cholinergic interneurons (SCINs). The authors demonstrate that optogenetic activation of thalamic axons in the striatum induces burst activity in SCINs, followed by a brief pause in firing. They show that the duration of this pause correlates with the number of elicited action potentials, suggesting a burst-dependent pause mechanism. The authors demonstrated this burst-dependent pause relied on Kv1 channels. The pause is blocked by an SKF81297 and partially by sulpiride and mecamylamine, implicating D1/D5 receptor involvement. The study also shows that the ZD7288 does not reduce the duration of the pause and that lesioning dopamine neurons abolishes this response, which can be restored by clozapine.

      Weaknesses:

      While this study presents an interesting mechanism for SCIN pausing after burst activity, there are several major concerns that should be addressed:

      (1) Scope of the Mechanism:

      It is important to clarify that the proposed mechanism may apply specifically to the pause in SCINs following burst activity. The manuscript does not provide clear evidence that this mechanism contributes to the pause response observed in behavioral animals. While the thalamus is crucial for SCIN pauses in behavioral contexts, the exact mechanism remains unclear. Activating thalamic input triggers burst activity in SCINs, leading to a subsequent pause, but this mechanism may not be generalizable across different scenarios. For instance, approximately half of TANs do not exhibit initial excitation but still pause during behavior, suggesting that the burst-dependent pause mechanism is unlikely to explain this phenomenon. Furthermore, in behavioral animals, the duration of the pause seems consistent, whereas the proposed mechanism suggests it depends on the prior burst, which is not aligned with in vivo observations. Additionally, many in vivo recordings show that the pause response is a reduction in firing rate, not complete silence, which the mechanism described here does not explain. Please address these in the manuscript.

      Thank you for your valuable feedback. While the absence of an initial burst in some TANs in vivo may suggest the involvement of alternative or additional mechanisms, it does not exclude a participation of Kv1 currents. We have seen that subthreshold depolarizations induced by thalamic inputs are sufficient to produce an afterhyperpolarization (AHP) mediated by Kv1 channels (see Tubert et al., 2016, PMID: 27568555). Although such subthreshold depolarizations are not captured in current recordings from behaving animals, intracellular in vivo recordings have demonstrated an intrinsically generated AHP after subthreshold depolarization of SCIN caused by stimulation of excitatory afferents (PMID: 15525771). Additionally, when pause duration is plotted against the number of spikes elicited by thalamic input (Fig. 1G), we found that one elicited spike is followed by an interspike interval 1.4 times longer than the average spontaneous interspike interval. We acknowledge the potential involvement of additional factors, including a decrease of excitatory thalamic input coinciding with the pause, followed by a second volley of thalamic inputs (Fig. 1G-J, after observations by Matsumoto et al., 2001- PMID: 11160526), as well as the timing of elicited spikes relative to ongoing spontaneous firing (Fig. 1D-E). Dopaminergic modulation (Fig. 3) and regional differences among striatal regions (PMID: 24559678) may also contribute to the complexity of these dynamics.

      (2) Terminology:

      The use of "pause response" throughout the manuscript is misleading. The pause induced by thalamic input in brain slices is distinct from the pause observed in behavioral animals. Given the lack of a clear link between these two phenomena in the manuscript, it is essential to use more precise terminology throughout, including in the title, bullet points, and body of the manuscript.

      While we acknowledge that our study does not include in vivo evidence, we believe ex vivo preparations have been instrumental in elucidating the mechanisms underlying the responses observed in vivo. We also agree with previous ex vivo studies in using consistent terminology. However, we will clarify the ex vivo nature of our work in the abstract and bullet points for greater transparency.

      (3) Kv1 Blocker Specificity:

      It is unclear how the authors ruled out the possibility that the Kv1 blocker did not act directly on SCINs. Could there be an indirect effect contributing to the burst-dependent pause? Clarification on this point would strengthen the interpretation of the results.

      Thank you for letting us clarify this issue. In our previous work (Tubert et al., 2016) we showed that the Kv1.3 and Kv1.1 subunits are selectively expressed in SCIN throughout the striatum. Moreover, gabaergic transmission is blocked in our preparations. We are including a phrase to make it clearer in the manuscript.

      (4) Role of D1 Receptors:

      While it is well-established that activating thalamic input to SCINs triggers dopamine release, contributing to SCIN pausing (as shown in Figure 3), it would be helpful to assess the extent to which D1 receptors contribute to this burst-dependent pause. This could be achieved by applying the D1 agonist SKF81297 after blocking nAChRs and D2 receptors.

      Thank you for letting us clarify this point. We show that blocking D2R or nAChR reduces the pause only for strong thalamic stimulation eliciting 4 SCIN spikes (Figure 3G), whereas the D1/D5 agonist SKF81297 is able to reduce the pause induced by weaker stimulation as well (Figure 3C). This may indicate that nAChR-mediated dopamine release induced by thalamic-induced bursts more efficiently activates D2R compared to D5R. We speculate that, in this context, lack of D5R activation may be necessary to keep normal levels of Kv1 currents necessary for SCIN pauses.

      (5) Clozapine's Mechanism of Action:

      The restoration of the burst-dependent pause by clozapine following dopamine neuron lesioning is interesting, but clozapine acts on multiple receptors beyond D1 and D5. Although it may be challenging to find a specific D5 antagonist or inverse agonist, it would be more accurate to state that clozapine restores the burst-dependent pause without conclusively attributing this effect to D5 receptors.

      Thank you for your insightful observation. We acknowledge the difficulty of targeting dopamine receptors pharmacologically due to the lack of highly selective D1/D5 inverse agonists. We used SCH23390, which is a highly selective D1/D5 receptor antagonist devoid of inverse agonist effects, to block clozapine’s ability to restore SCIN pauses (Figure 6C). This indicates that the restoration of SCIN pauses by clozapine depends on D1/D5 receptors. Furthermore, in a previous study, we demonstrated that clozapine’s effect on restoring SCIN excitability in dyskinetic mice (a phenomenon mediated by Kv1 channels in SCIN; Tubert et al., 2016) was not due to its action on serotonin receptors (Paz, Stahl et al., 2022). While our data do not rule out the potential contribution of other receptors, such as muscarinic acetylcholine receptors, we believe they strongly support the role of D1/D5 receptors. To reflect this, we will add a statement discussing the potential contribution of receptors beyond D1/D5.

    1. Author response:

      We thank the editor and reviewers for their feedback. We believe we can address the substantive criticisms in full, first, by providing a more explicit theoretical basis for the method. Then, we believe criticism based on assumptions about phase consistency across time points are not well founded and can be answered. Finally, in response to some reviewer comments, we will improve the surrogate testing of the method.

      We will enhance the theoretical justification for the application of higher-order singular value decomposition (SVD) to the problem of irregular sampling of the cortical area. The initial version of the manuscript was written to allow informal access to these ideas (if possible), but the reviewers find a more rigorous account appropriate. We will add an introduction to modern developments in the use of functional SVD in geophysics, meteorology & oceanography (e.g., empirical orthogonal functions) and quantitative fluid dynamics (e.g., dynamic mode decomposition) and computational chemistry. Recently SVD has been used in neuroscience studies (e.g., cortical eigenmodes). To our knowledge, our work is the first time higher-order SVD has been applied to a neuroscience problem. We use it here to solve an otherwise (apparently) intractable problem, i.e., how to estimate the spatial frequency (SF) spectrum on a sparse and highly irregular array with broadband signals.

      We will clarify the methodological strategy in more formal terms in the next version of the paper. But essentially SVD allows a change of basis that greatly simplifies quantitative analysis. Here it allows escape from estimating the SF across millions of data-points (triplets of contacts, at each sample), each of which contains multiple overlapping signals plus noise (noise here defined in the context of SF estimation) and are inter-correlated across a variety of known and unknown observational dimensions. Rather than simply average over samples, which would wash out much of the real signal, SVD allows the signals to be decomposed in a lossless manner (up to the choice of number of eigenvectors at which the SVD is truncated). The higher-order SVD we have implemented reduces the size of problem to allow quantification of SF over hundreds of components, each of which is guaranteed certain desirable properties, i.e., they explain known (and largest) amounts of variance of the original data and are orthonormal. This last property allows us to proceed as if the observations are independent. SF estimates are made within this new coordinate system.

      We will also more concretely formalise the relation between Fourier analysis and previous observations of eigenvectors of phase that are smooth gradients.

      We will very briefly review Fourier methods designed to deal with non-uniform sampling. The problems these methods are designed for fall into the non-uniform part of the spectrum from uniform–non-uniform–irregular–highly-irregular–noise. They are highly suited to, for example, interpolating between EEG electrodes to produce a uniform array for application of the fast Fourier transform (Alamia et al., 2023). However, survey across a range of applied maths fields suggests that no method exists for the degree of irregular sampling found in the sEEG arrays at issue here. In particular, the sparseness of the contact coverage presents an insurmountable hurdle to standard methods. While there exists methods for sparse samples (e.g., Margrave & Fergusen, 1999; Ying 2009), these require well-defined oscillatory behavior, e.g., for seismographic analysis. Given the problems of highly irregular sampling, sparseness of sampling and broadband, nonstationary signals, we have attempted a solution via the novel methods introduced in the current manuscript. We were able to leverage previous observations regarding the relation between eigenvectors of cortical phase and Fourier analysis, as we outline in the manuscript.

      We will extend the current 1-dimensional surrogate data to better demonstrate that the method does indeed correctly detect the ordinal relations in power on different parts of the SF spectrum. We will include the effects of a global reference signal. Simulations of cortical activity are an expensive way to achieve this goal. While the first author has published in this area, such simulations are partly a function of the assumptions put into them (i.e., spatial damping, boundary conditions, parameterization of connection fields). We will therefore use surrogate signals derived from real cortical activity to complete this task.

      Some more specific issues raised:<br /> (1) Application of the method to general neuroscience problems:<br /> The purpose of the manuscript was to estimate the SF spectrum of phase in the cortex, in the range where it was previously not possible. The purpose was not specifically to introduce a new method of analysis that might be immediately applicable to a wide range of available data-sets. Indeed, the specifics of the method are designed to overcome an otherwise intractable disadvantage of sEEG (irregular spatial sampling) in order to take advantage of its good coverage (compared to ECoG) and low volume conduction compared to extra-cranial methods. On the other hand, the developing field of functional SVD would be of interest to neuroscientists, as a set of methods to solve difficult problems, and therefore of general interest. We will make these points explicit in the next version of the manuscript. In order to make the method more accessible, we will also publish code for the key routines (construction of triplets of contacts, Morlet wavelets, calculation of higher-order SVD, calculation of SF).

      (2) Novelty:<br /> We agree with the third reviewer: if our results can convince, then the study will have an impact on the field. While there is work that has been done on phase interactions at a variety of scales, such as from the labs of Fries, Singer, Engels, Nauhaus, Logothetis and others, it does not quantify the relative power of the different spatial scales. Additionally, the research of Freeman et al. has quantified only portions of the SF spectrum of the cortex, or used EEG to estimate low SFs. We would appreciate any pointers to the specific literature the current research contributes to, namely, the SF spectrum of activity in the cortex.

      (3) Further analyses:<br /> The main results of the research are relatively simple: monotonically falling SF-power with SF; this effect occurs across the range of temporal frequencies. We provide each individual participant’s curves in the supplementary Figures. By visual inspection, it can be seen that the main result of the example participant is uniformly recapitulated. One is rarely in this position in neuroscience research, and we will make this explicit in the text.

      The research stands or falls by the adequacy of the method to estimate the SF curves. For this reason most statistical analyses and figures were reserved for ruling out confounds and exploring the limits of the methods. However, for the sake of completeness, we will now include the SF vs. SF-power correlations and significance in the next version, for each participant at each frequency.

      Since the main result was uniform across participants, and since we did not expect that there was anything of special significance about the delayed free recall task, we conclude that more participants or more tasks would not add to the result. As we point out in the manuscript, each participant is a test of the main hypothesis. The result is also consistent with previous attempts to quantify the SF spectrum, using a range of different tasks and measurement modalities (Barrie et al., 1996; Ramon & Holmes 2015; Alexander et al., 2019; Alexander et al., 2016; Freeman et al., 2003; Freeman et al. 2000). The search for those rare sEEG participants with larger coverage than the maximum here is a matter of interest to us, but will be left for a future study.

      (4) Sampling of phase and its meaningfulness:<br /> The wavelet methods used in the present study have excellent temporal resolution but poor frequency resolution. We additionally oversample the frequency range to produce visually informative plots (usually in the context of time by frequency plots, see Alexander et al., 2006; 2013; 2019). But it is not correct that the methods for estimating phase assume a narrow frequency band. Rather, the poor frequency resolution of short time-series Morlet wavelets means the methods are robust to the exact shape of the waveforms; the signal need be only approximately sinusoidal; to rise and fall. The reason for using methods that have excellent resolution in the time-domain is that previous work (Alexander et al., 2006; Patten et al. 2012) has shown that traveling wave events can last only one or two cycles, i.e., are not oscillatory in the strict sense but are non-stationary events. So while short time-window Morlet wavelets have a disadvantage in terms of frequency resolution, this means they precisely do not have the problem of assuming narrow-band sinusoidal waveforms in the signal. We strongly disagree that our analysis requires very strong assumptions about oscillations (see last point in this section).

      Our hypothesis was about the SF spectrum of the phase. When the measurement of phase is noise-like at some location, frequency and time, then this noise will not substantially contribute to the low SF parts of the spectrum compared to high SFs. Our hypothesis also concerned whether it was reasonable to interpret the existing literature on low SF waves in terms of cortically localised waves or small numbers of localised oscillators. This required us to show that low SFs dominate, and therefore that this signal must dominate any extra-cranial measurements of apparent low SF traveling waves. It does not require us to demonstrate that the various parts of the SF spectrum are meaningful in the sense of functionally significant. This has been shown elsewhere (see references to traveling waves in manuscript, to which we will also add a brief survey of research on phase dynamics).

      The calculation of phase can be bypassed altogether to achieve the initial effect described in the introduction to the methods (Fourier-like basis functions from SVD). The observed eigenvectors, increasing in spatial frequency with decreasing eigenvalues, can be reproduced by applying Gaussian windows to the raw time-series (D. Alexander, unpublished observation). For example, undertaking an SVD on the raw time-series windowed over 100ms reproduces much the same spatial eigenvectors (except that they come in pairs, recapitulating the real and imaginary parts of the signal). This reproducibility is in comparison to first estimating the phase at 10Hz using Morlet wavelets, then applying the SVD to the unit-length complex phase values.

      (5) Other issues to be addressed and improved:<br /> clarity on which experiments were analyzed (starting in the abstract) discussion of frequencies above 60Hz and caution in interpretation due to spike-waveform artefact or as a potential index of multi-unit spiking discussion of whether the ad hoc, quasi-random sampling achieved by sEEG contacts somehow inflates the low SF estimates

      References (new)<br /> Patten TM, Rennie CJ, Robinson PA, Gong P (2012) Human Cortical Traveling Waves: Dynamical Properties and Correlations with Responses. PLoS ONE 7(6): e38392. https://doi.org/10.1371/journal.pone.0038392<br /> Margrave GF, Ferguson RJ (1999) Wavefield extrapolation by nonstationary phase shift, GEOPHYSICS 64:4, 1067-1078<br /> Ying Y (2009) Sparse Fourier Transform via Butterfly Algorithm SIAM Journal on Scientific Computing, 31:3, 1678-1694

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines changes in relaxation time (T1 and T2) and magnetization transfer parameters that occur in a model system and in vivo when cells or tissue are depolarized using an equimolar extracellular solution with different concentrations of the depolarizing ion K+. The motivation is to explain T2 changes that have previously been observed by the authors in an in vivo model with neural stimulation (DIANA) and to try provide a mechanism to explain those changes.

      Strengths:

      The authors argue that the use of various concentrations of KCL in the extracellular fluid depolarize or hyperpolarize the cell pellets used and that this change in membrane potential is the driving force for the T2 (and T1-supplementary material) changes observed. In particular, they report an increase in T2 with increasing KCL concentration in the extracellular fluid (ECF) of pellets of SH-SY5Y cells. To offset the increasing osmolarity of the ECF due to the increase in KCL, the NaCL molarity of the ECF is proportionally reduced. The authors measure the intracellular voltage using patch clamp recordings, which is a gold standard. With 80 mM of KCL in the ECF, a change in T2 of the cell pellets of ~10 ms is observed with the intracellular potential recorded as about -6 mv. A very large T1 increase of ~90 ms is reported under the same conditions. The PSR (ratio of hydrogen protons on macromolecules to free water) decreases by about 10% at this 80 mM KCL concentration. Similar results are seen in a Jurkat cell line and similar, but far smaller changes are observed in vivo, for a variety of reasons discussed. As a final control, T1 and T2 values are measured in the various equimolar KCL solutions. As expected, no significant changes in T1 and T2 of the ECF were observed for these concentrations.

      Weaknesses:

      While the concepts presented are interesting, and the actual experimental methods seem to be nicely executed, the conclusions are not supported by the data for a number of reasons. This is not to say that the data isn't consistent with the conclusions, but there are other controls not included that would be necessary to draw the conclusion that it is membrane potential that is driving these T1 and T2 changes. Unfortunately for these authors, similar experiments conducted in 2008 (Stroman et al. Magn. Reson. in Med. 59:700-706) found similar results (increased T2 with KCL) but with a different mechanism, that they provide definite proof for. This study was not referenced in the current work.

      It is well established that cells swell/shrink upon depolarization/hyperpolarization. Cell swelling is accompanied by increased light transmittance in vivo, and this should be true in the pellet system as well. In a beautiful series of experiments, Stroman et al. (2008) showed in perfused brain slices that the cells swell upon equimolar KCL depolarization and the light transmittance increases. The time course of these changes is quite slow, of the order of many minutes, both for the T2-weighted MRI signal and for the light transmittance. Stroman et al. also show that hypoosmotic changes produce the exact same timecourse as the KCL depolarization changes (and vice versa for the hyperosmotic changes - which cause cell shrinkage). Their conclusion, therefore, was that cell swelling (not membrane potential) was the cause of the T2-weighted changes observed, and that these were relatively slow (on the scale of many minutes).

      What are the implications for the current study? Well, for one, the authors cannot exclude cell swelling as the mechanism for T2 changes, as they have not measured that. It is however well established that cell swelling occurs during depolarization, so this is not in question. Water in the pelletized cells is in slow/intermediate exchange with the ECF, and the solutions for the two compartment relaxation model for this are well established (see Menon and Allen, Magn. Reson. in Med. 20:214-227 (1991). The T2 relaxation times should be multiexponential (see point (3) further below). The current work cannot exclude cell swelling as the mechanism for T2 changes (it is mentioned in the paper, but not dealt with). Water entering cells dilutes the protein structures, changes rotational correlation times of the proteins in the cell and is known to increase T2. The PSR confirms that this is indeed happening, so the data in this work is completely consistent with the Stroman work and completely consistent with cell swelling associated with depolarization. The authors should have performed light scattering studies to demonstrate the presence or absence of cell swelling. Measuring intracellular potential is not enough to clarify the mechanism.

      We appreciate the reviewer’s comments. We agree that changes in cell volume due to depolarization and hyperpolarization significantly contribute to the observed changes in T2, PSR, and T1, especially in pelletized cells. For this reason, we already noted in the Discussion section of the original manuscript that cell volume changes influence the observed MR parameter changes, though this study did not present the magnitude of the cell volume changes. In this regard, we thank the reviewer for introducing the work by Stroman et al. (Magn Reson Med 59:700-706, 2008). When discussing the contribution of the cell volume changes to the observed MR parameter changes, we will additionally discuss the work of Stroman et al. in the revised manuscript.

      In addition, we acknowledge that the title and main conclusion of the original manuscript may be misleading, as we did not separately consider the effect of cell volume changes on MR parameters. To more accurately reflect the scope and results of this study and to consider the reviewer 2’s suggestion, we will adjust the title to “Responses to membrane potential-modulating ionic solutions measured by magnetic resonance imaging of cultured cells and in vivo rat cortex” and will also revise the relevant phrases in the main text.

      Finally, when [K+]-induced membrane potential changes are involved, there seems to be factors other than cell volume changes also appear to influence T2 changes. Our ongoing study shows that there are differences in T2 changes (for the same volume changes) between two different situations: pure osmotic volume changes vs. [K+]-induced volume changes (e.g., hypoosmotic vs. depolarization). Furthermore, this study suggests that mechanisms such as changes in free (primarily intracellular) and bound water within a voxel play an important role in generating this T2 difference. Our group is preparing a manuscript for this follow-up study and will report on it shortly.

      So why does it matter whether the mechanism is cell swelling or membrane potential? The reason is response time. Cell swelling due to depolarization is a slow process, slower than hemodynamic responses that characterize BOLD. In fact, cell swelling under normal homeostatic conditions in vivo is virtually non-existent. Only sustained depolarization events typically associated with non-naturalistic stimuli or brain dysfunction produce cell swelling. Membrane potential changes associated with neural activity, on the other hand, are very fast. In this manuscript, the authors have convincingly shown a signal change that is virtually the same as what was seen in the Stroman publication, but they have not shown that there is a response that can be detected with anything approaching the timescale of an action potential. So one cannot definitely say that the changes observed are due to membrane potential. One can only say they are consistent with cell swelling, regardless of what causes the cell swelling.

      For this mechanism to be relevant to explaining DIANA, one needs to show that the cell swelling changes occur within a millisecond, which has never been reported. If one knows the populations of ECF and pellet, the T2s of the ECF and pellet and the volume change of the cells in the pellet, one can model any expected T2 changes due to neuronal activity. I think one would find that these are minuscule within the context of an action potential, or even bulk action potential.

      In the context of cell swelling occurring at rapid response times, if we define cell swelling simply as an “increase in cell volume,” there are several studies reporting transient structural (or volumetric) changes (e.g., ~nm diameter change over ~ms duration) in neuron cells during action potential propagation (Akkin et al., Biophys J 93:1347-1353, 2007; Kim et al., Biophys J 92:3122-3129, 2007; Lee et al., IEEE Trans Biomed Eng 58:3000-3003, 2011; Wnek et al., J Polym Sci Part B: Polym Phys 54:7-14, 2015; Yang et al., ACS Nano 12:4186-4193, 2018). These studies show a good correlation between membrane potential changes and cell volume changes (even if very small) at the cellular level within milliseconds.

      As mentioned in the Response 1 above, this study does not address rapid dynamic membrane potential changes on the millisecond scale, which we explicitly discussed as one of the limitations in the Discussion section of the original manuscript. For this reason, we do not claim in this study that we provide the reader with definitive answers about the mechanisms involved in DIANA. Rather, as a first step toward addressing the mechanism of DIANA, this study confirms that there is a good correlation between changes in membrane potential and measurable MR parameters (e.g., T2 and PSR) when using ionic solutions that modulate membrane potential. Identifying T2 changes that occur during millisecond-scale membrane potential changes due to rapid neural activation will be further addressed in future studies.

      There are a few smaller issues that should be addressed.

      (1) Why were complicated imaging sequences used to measure T1 and T2? On a Bruker system it should be possible to do very simple acquisitions with hard pulses (which will not need dictionaries and such to get quantitative numbers). Of course, this can only be done sample by sample and would take longer, but it avoids a lot of complication to correct the RF pulses used for imaging, which leads me to the 2nd point.

      We appreciate the reviewer’s suggestion regarding imaging sequences. We would like to clarify that dictionaries were used for fitting in vivo T2 decay data, not in vitro data. Sample-by-sample nonlocalized acquisition with hard pulses may be applicable for in vitro measurements. However, for in vivo measurements, a slice-selective multi-echo spin-echo sequence was necessary to acquire T2 maps within a reasonable scan time. Our choice of imaging sequence was guided by the need to spatially resolve MR signals from specific regions of interests while balancing scan time constraints.

      (2) Figure S1 (H) is unlike any exponential T2 decay I have seen in almost 40 years of making T2 measurements. The strange plateau at the beginning and the bump around TE = 25 ms are odd. These could just be noise, but the fitted curve exactly reproduces these features. A monoexponential T2 decay cannot, by definition, produce a fit shaped like this.

      The T2 decay curves in Figure S1(H) indeed display features that deviate from a simple monoexponential decay. In our in vivo experiments, we used a multi-echo spin-echo sequence with slice-selective excitation and refocusing pulses. In such sequences, the echo train is influenced by stimulated echoes and imperfect slice profiles. This phenomenon is inherent to the pulse sequence rather than being artifacts or fitting errors (Hennig, Concepts Magn Reson 3:125-143, 1991; Lebel and Wilman, Magn Reson Med 64:1005-1014, 2010; McPhee and Wilman, Magn Reson Med 77:2057-2065, 2017). Therefore, we fitted the T2 decay curve using the technique developed by McPhee and Wilman (2017).

      (3) As noted earlier, layered samples produce biexponential T2 decays and monoexponential T1 decays. I don't quite see how this was accounted for in the fitting of the data from the pellet preparations. I realize that these are spatially resolved measurements, but the imaging slice shown seems to be at the boundary of the pellet and the extracellular media and there definitely should be a biexponential water proton decay curve. Only 5 echo times were used, so this is part of the problem, but it does mean that the T2 reported is a population fraction weighted average of the T2 in the two compartments.

      We understand the reviewer’s concern regarding potential biexponential decay due to the presence of different compartments. In our experiments, we carefully positioned the imaging slice sufficiently remote from the pellet-media interface. This approach ensures that the signal predominantly arises from the cells (and interstitial fluid), excluding the influence of extracellular media above the cell pellet. We will clearly describe the imaging slice in the revised manuscript. As mentioned in our Methods section, for in vitro experiments, we repeated a single-echo spin-echo sequence with 50 difference echo times. While Figure 1C illustrates data from five echo times for visual clarity, the full dataset with all 50 echo times was used for fitting. We will clarify this point in the revised manuscript to avoid any misunderstanding.

      (4) Delta T1 and T2 values are presented for the pellets in wells, but no absolute values are presented for either the pellets or the KCL solutions that I could find.

      As requested by the reviewer, we will include the absolute values in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      Min et al. attempt to demonstrate that magnetic resonance imaging (MRI) can detect changes in neuronal membrane potentials. They approach this goal by studying how MRI contrast and cellular potentials together respond to treatment of cultured cells with ionic solutions. The authors specifically study two MRI-based measurements: (A) the transverse (T2) relaxation rate, which reflects microscopic magnetic fields caused by solutes and biological structures; and (B) the fraction or "pool size ratio" (PSR) of water molecules estimated to be bound to macromolecules, using an MRI technique called magnetization transfer (MT) imaging. They see that depolarizing K+ and Ba2+ concentrations lead to T2 increases and PSR decreases that vary approximately linearly with voltage in a neuroblastoma cell line and that change similarly in a second cell type. They also show that depolarizing potassium concentrations evoke reversible T2 increases in rat brains and that these changes are reversed when potassium is renormalized. Min et al. argue that this implies that membrane potential changes cause the MRI effects, providing a potential basis for detecting cellular voltages by noninvasive imaging. If this were true, it would help validate a recent paper published by some of the authors (Toi et al., Science 378:160-8, 2022), in which they claimed to be able to detect millisecond-scale neuronal responses by MRI.

      Strengths:

      The discovery of a mechanism for relating cellular membrane potential to MRI contrast could yield an important means for studying functions of the nervous system. Achieving this has been a longstanding goal in the MRI community, but previous strategies have proven too weak or insufficiently reproducible for neuroscientific or clinical applications. The current paper suggests remarkably that one of the simplest and most widely used MRI contrast mechanisms-T2 weighted imaging-may indicate membrane potentials if measured in the absence of the hemodynamic signals that most functional MRI (fMRI) experiments rely on. The authors make their case using a diverse set of quantitative tests that include controls for ion and cell type-specificity of their in vitro results and reversibility of MRI changes observed in vivo.

      Weaknesses:

      The major weakness of the paper is that it uses correlational data to conclude that there is a causational relationship between membrane potential and MRI contrast. Alternative explanations that could explain the authors' findings are not adequately considered. Most notably, depolarizing ionic solutions can also induce changes in cellular volume and tissue structure that in turn alter MRI contrast properties similarly to the results shown here. For example, a study by Stroman et al. (Magn Reson Med 59:700-6, 2008) reported reversible potassium-dependent T2 increases in neural tissue that correlate closely with light scattering-based indications of cell swelling. Phi Van et al. (Sci Adv 10:eadl2034, 2024) showed that potassium addition to one of the cell lines used here likewise leads to cell size increases and T2 increases. Such effects could in principle account for Min et al.'s results, and indeed it is difficult to see how they would not contribute, but they occur on a time scale far too slow to yield useful indications of membrane potential. The authors' observation that PSR correlates negatively with T2 in their experiments is also consistent with this explanation, given the inverse relationship usually observed (and mechanistically expected) between these two parameters. If the authors could show a tight correspondence between millisecond-scale membrane potential changes and MRI contrast, their argument for a causal connection or a useful correlational relationship between membrane potential and image contrast would be much stronger. As it is, however, the article does not succeed in demonstrating that membrane potential changes can be detected by MRI.

      We appreciate the reviewer’s comments. We agree that changes in cell volume due to depolarization and hyperpolarization significantly contribute to the observed MR parameter changes. For this reason, we have already noted in the Discussion section of the original manuscript that cell volume changes influence the observed MR parameter changes. In this regard, we thank the reviewer for introducing the work by Stroman et al. (Magn Reson Med 59:700-706, 2008) and Phi Van et al. (Sci Adv 10:eadl2034, 2024). When discussing the contribution of the cell volume changes to the observed MR parameter changes, we will additionally discuss both work of Stroman et al. and Phi Van et al. in the revised manuscript.

      In addition, this study does not address rapid dynamic membrane potential changes on the millisecond scale, which we explicitly discussed as one of the limitations of this study in the Discussion section of the original manuscript. For this reason, we do not claim in this study that we provide the reader with definitive answers about the mechanisms involved in DIANA. Rather, as a first step toward addressing the mechanism of DIANA, this study confirms that there is a good correlation between changes in membrane potential and measurable MR parameters (although on a slow time scale) when using ionic solutions that modulate membrane potential. Identifying T2 changes that occur during millisecond-scale membrane potential changes due to rapid neural activation will be further addressed in future studies.

      Together, we acknowledge that the title and main conclusion of the original manuscript may be misleading. To more accurately reflect the scope and results of this study and to consider the reviewer’s suggestion, we will adjust the title to “Responses to membrane potential-modulating ionic solutions measured by magnetic resonance imaging of cultured cells and in vivo rat cortex” and will also revise the relevant phrases in the main text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, Molnar, Suranyi and colleagues have probed the genomic stability of Mycobacterium smegmatis in response to several anti-tuberculosis drugs as monotherapy and in combination. Unlike the study by Nyinoh and McFaddden http://dx.doi.org/10.1002/ddr.21497 (which should be cited), the authors use a sub-lethal dose of antibiotic. While this is motivated by sound technical considerations, the biological and therapeutic rationale could be further elaborated.

      In the mutation accumulation experiments, we needed to ensure continuous and reproducible growth of a small number of colonies across multiple passages. This technical requirement necessitated the use of sublethal drug concentrations. However, sublethal doses also have biological relevance. Noncompliance with prescribed antibiotic regimens and the presence of antibiotic residues in food due to the extensive use of antibiotics in agricultural mass production are two obvious sources of prolonged exposure to sublethal antibiotics.

      The results the authors obtain are in line with papers examining the genomic mutation rate in vitro and from patient samples in Mycobacterium tuberculosis, in vitro in Mycobacterium smegmatis and in vitro in Mycobacterium tuberculosis (although the study by HL David (PMID: 4991927) is not cited). The results are confirmatory of previous studies.

      The two cited studies, along with several others, did not distinguish between genetic mutations and phenotypic responses to drug exposure (the fluctuation test alone is not suitable for this). Therefore, their objectives are not comparable to ours, which specifically investigated whether resistant colonies carry adaptive mutations. Nevertheless, we acknowledge the relevance of these studies and have now cited them in the appropriate sections in the text.

      It is therefore puzzling why the authors propose the opposite hypothesis in the paper (i.e antibiotic exposure should increase mutation rates) merely to tear it down later. This straw-man style is entirely unnecessary.  

      The phenomenon of stress-inducible mutagenesis in bacterial evolution remains a topic of heated debate. The emergence of genetically encoded resistance may stem from either microevolution or the dissemination of pre-existing variants from polyclonal infections under drug pressure. We believe that the Introduction presents both of these hypotheses in a balanced manner to elucidate the rationale behind our mutation accumulation investigations.  

      The results on the nucleotide pools are interesting, but the statistically significant data is difficult to identify as presented, and therefore the new biological insights are unclear.

      We now indicate statistical significance in the figure, in addition to the detailed statistical analysis of all dNTP measurements provided in Table S5.

      Finally, the authors show that a fluctuation assay generates mutations with higher frequencies that the genetic stability assays, confirming the well-known effect of phenotypic antibiotic resistance.

      What we show is that the fluctuation assay generated bacteria that tolerated the applied antibiotic without developing mutations. Conclusions about mutation rates are often drawn from fluctuation assays without confirming genetic-level changes, a discrepancy that persists despite these assays accounting for both phenotypic and genotypic alterations. By combining genome sequencing with fluctuation assays, our approach emphasizes the importance of distinguishing between these changes. While fluctuation assays remain valuable, inexpensive, and simple tools for evaluating the response of bacterial populations to various selective environments, they should not be considered definitive indicators of genetic changes.

      Recommendations For The Authors:

      The quality of the figures can be significantly improved. In Figure 1, cell lengths can be shown on separate histograms or better still as violin plots to enable better comparisons.

      Thank you for the suggestion. We have revised the data presentation accordingly.

      Details for statistical tests should be provided in the figure legend.  

      Statistical details are now added in the figure legend.

      In Figure 2, the number of data points is not mentioned.

      Statistical information is now added to the new Figure 2, which has been revised extensively based on suggestions from all Referees.

      The data in Figure 3 would be much easier to comprehend as a heatmap.  

      The figure we provided is a color gradient table representing different gene expression levels, along with numerical data and statistical significance indicated within the color boxes, expanding the information content of a traditional heatmap. In response to the Referee's suggestion, we also prepared a hierarchical clustering heatmap, demonstrating that the grouping of rows and columns based on functional information in the original figure is consistent with the clustering pattern observed in the heatmap (Figure S5). As the original figure is more informative and better structured, we have included the new figure in the supplementary materials.

      No statistical tests are provided for Figure 4.

      We now indicate statistical significance in the figure and describe the statistical analysis in the figure legend, as suggested. Additionally, Table S5 is dedicated to the statistical analysis of the dNTP data.  

      Reviewer #2 (Public Review):

      In this study, the authors assess whether selective pressure from drug chemotherapy influences the emergence of drug resistance through the acquisition of genetic mutations or phenotypic tolerance. I commend the authors on their approach of utilizing the mutation accumulation (MA) assay as a means to answer this and whole genome sequencing of clones from the assay convincingly demonstrates low mutation rates in Mycobacteria when exposed to sub-inhibitory concentrations of antibiotics. Also, quantitative PCR highlighted the upregulation of DNA repair genes in Mycobacteria following drug treatment, implying the preservation of genomic integrity via specific repair pathways.

      Even though the findings stem from M. smegmatis exposure to antibiotics under in vitro conditions, this is still relevant in the context of the development of drug resistance so I can see where the authors' train of thought was heading in exploring this. However, I think important experiments to perform to more fully support the conclusion that resistance is largely associated with phenotypic rather than genetic factors would have been to either sequence clones from the ciprofloxacin tolerance assay (to show absence/ minimal genetic mutations) or to have tested the MIC of clones from the MA assay (to show an increase in MIC).

      Thank you for acknowledging the values of the manuscript and for the insightful suggestions for improvement. We agree on the necessity to directly connect the mutation accumulation experiments with the tolerance assay, and we have performed both suggested additional experiments.  

      (1) We repeated the ciprofloxacin tolerance assay (Figure S6) using a large number of plates to gather enough cells for genomic DNA extraction and whole genome sequencing. The sequencing confirmed the absence of mutations in bacteria grown in both 0.3 and 0.5 ug/ml ciprofloxacin. We integrated this result in the revised manuscript text, while the sequencing data are available at the European Nucleotide Archive (ENA) with PRJEB71590 project number.

      (2) We resuscitated three different clones from the MA assays stored at -80°C and tested the MIC of the respective drugs. The results are presented in Figure 2C. Except for EMB, we observed an increase in MIC values across the treatments.

      There seems to be a disconnect between making these conclusions from experiments conducted under different conditions, or perhaps the authors can clarify why this was done.  

      Molecular biology analysis methods are not easily compatible with long-term mutation accumulation experiments, or at least we could not establish the necessary conditions. When DNA or RNA extraction was required, we had to adjust the experimental scale for further analysis, which could be done in liquid culture. We believe that the suggested critical back-and-forth control experiments have significantly improved the comparability of the results.

      With regards to the sub-inhibitory drug concentration applied, there is significant variation in the viability as calculated by CFUs following the different treatments and there is evidence that cell death greatly affects the calculation of mutation rate (PMCID: PMC5966242). For instance, the COMBO treatment led to 6% viability whilst the INH treatment led to 80% cell viability. Are there any adjustments made to take this into account?

      We agree with and have been aware of the notion that cell death affects the calculation of the mutation rate. We included treatment optimization data on agar plates (Table 1 and Figure S2), which now demonstrate that the applied subinhibitory drug concentrations resulted in ≤10% viability across all treatments in the MA assay. This minimizes the potential discrepancy in the mutation rate calculation caused by variable cell death.  

      It would also be useful to the reader to include a supplementary table of the SNPs detected from the lineages of each treatment - to determine if at any point rifampicin treatment led to mutations in rpoB, isoniazid to katG mutations, etc.  

      Overall, while this study is tantalizingly suggestive of phenotypic tolerance playing a leading role in drug resistance (and perhaps genetic mutations a sub-ordinate role) a more substantial link is needed to clarify this.

      The SNPs identified from the lineages of each treatment are compiled in the 'unique_muts.xls' file within the Figshare document bundle that was originally enclosed with the manuscript. In response to your suggestion, we have now added a simplified version of this data set in Table S2, listing the detected SNPs. Notably, no confirmed adaptive mutation developed in our experiments; rifampicin treatment did not result in mutations in rpoB, nor did isoniazid lead to mutations in katG.

      Recommendations For The Authors:

      I would suggest moving Figure 1 to the supplementary - it shows that cell wall targeting drugs cause cell shortening and DNA replication targeting drugs cause cell elongation as would be expected and this is simply a secondary observation, not one that is central to the paper.  

      We agree that this is not a novel or unexpected observation. However, we used it as an indicator of drug effectiveness, particularly for bacteriostatic cell wall-targeting drugs in liquid culture that induced moderate cell death. Following Reviewer 1's suggestions, we extensively revised the figure to better convey our intended message. We believe the updated version now more clearly demonstrates the drugs' impact, and for this reason, we have opted to keep it in the main text.

      Figure 2 and Table 2 show the same data so this can be combined as a paneled figure or one moved to the supplementary. It would be useful to include a diagram of how the MA assay was conducted, similar to the CIP tolerance assay figure.

      Thank you for the suggestions. We have added a diagram to Figure 2 explaining the MA assay (Figure 2A), as well as the MIC experiment conducted on the MA cells (Figure 2C). To avoid redundancy, Table 2 has been removed.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript describes how antibiotics influence genetic stability and survival in Mycobacterium smegmatis. Prolonged treatment with first-line antibiotics did not significantly impact mutation rates. Instead, adaptation to these drugs appears to be mediated by upregulation of DNA repair enzymes. While this study offers robust data, findings remain correlative and fall short of providing mechanistic insights.

      Strengths:

      The strength of this study is the use of genome-wide approaches to address the specific question of whether or not mycobacteria induce mutagenic potential upon antibiotic exposure.

      Weaknesses:

      The authors suggest that the upregulation of DNA repair enzymes ensures a low mutation rate under drug pressure. However, this suggestion is based on correlative data, and there is no mechanistic validation of their speculations in this study.

      Furthermore, as detailed below, some of the statements made by the authors are not substantiated by the data presented in the manuscript.

      Finally, some clarifications are needed for the methodologies employed in this study. Most importantly, reduced colony growth should be demonstrated on agar plates to indicate that the drug concentrations calculated from liquid culture growth can be applied to agar surface growth. Without such validations, the lack of induced mutation could simply be due to the fact that the drug concentrations used in this study were insufficient.

      Thank you for appreciating the manuscript's merits and for the instructive suggestions. We agree that demonstrating reduced colony growth on agar plates is important to validate the relevance of the drug concentrations used in the study. In response, we have added the treatment optimization data on agar plates in Figure S2 and reorganized Table 1 to show the decrease in CFU achieved with the applied subinhibitory drug concentrations.

      We acknowledge that the observed upregulation of DNA repair enzymes and the low mutation rates under drug pressure represent correlative data. We removed the reference to mechanism from the abstract and avoided presenting the qPCR results as a mechanistic explanation in the text. We have only raised the possibility that correlation could be a causal relationship: "The observed upregulation of the relevant DNA repair enzymes might account for the low mutation rate even under drug pressure." We recognize the necessity for a new series of targeted experiments to provide mechanistic explanations. We added the following text to the Discussion:

      “The observed activation of DNA repair processes likely mitigates mutation pressure, ensuring genome stability. However, to confirm this hypothesis, these investigations should be conducted using genetically modified DNA repair mutant strains.”

      In the current manuscript, we aim to convincingly demonstrate that long-term antibiotic pressure did not induce the occurrence of new adaptive mutations.

      Recommendations For The Authors:

      Additional specific comments are:

      Page 2. Do not italicize "Mycobacteria", which is not considered a scientific name.

      Corrected.

      Page 4. "Bacto pepcone" is a typo.

      Corrected.

      Page 6. "Quiagen" is a typo.

      Corrected.

      Page 9. In Table 1, RIF being described as a protein synthesis inhibitor is misleading.

      Corrected.

      Page 9. The statement "Specifically, following RIF, CIP, and MMC treatments, we observed cells elongating by more than twofold, whereas INH and EMB treatments led to a reduction in cell length." cannot be justified by Figure 1, as the cell length information is not conveyed in this figure.

      Thank you for pointing this out, the revised Figure 1 conveys the cell length information.

      Page 10. If the experiment shown in Figure S1 was done in an acidic growth condition, the figure legend should clearly indicate the fact. Additionally, the assay condition should be described in detail in the Methods section.

      Thank you, the required information is now included in both the figure legend and the Methods section.

      Page 10. If PZA does not work against M. smegmatis, it seems pointless to add it to the COMBO treatment. Please clarify why it was included in the drug combination experiment.

      We added the following text to clarify the use of PZA: “Regardless of its inefficacy as a monotherapy, we included PZA in the combination treatment, as we could not rule out the possibility that PZA interacts with the other three drugs or that PZA elimination mechanisms are equally active in M. smegmatis under this regimen.”

      Page 10. Generation times calculated from liquid culture cannot be applied to colony growth on an agar plate. The growth behaviors on a solid surface will be totally different from planktonic suspension growth. The numbers of generations indicated here will be inaccurate.

      You are absolutely right. We conducted an experiment to calculate the number of generations on plates under the same conditions as used in the MA assay. We found, indeed, a different (doubled) generation time from what was determined in liquid culture. We have adjusted the mutation rates accordingly.

      Page 12. Was the experiment shown in Figure 3 done in a liquid culture? If so, the transcriptional profile could be different from the experiment shown in Figure 2, which was done on an agar plate.

      Yes, the experiment shown in Figure 3 was conducted in liquid culture. We acknowledge that the transcriptional profile could differ from the experiment shown in Figure 2, which was performed on an agar plate. However, technical limitations required us to use liquid cultures for these experiments.

      Page 14. Regarding the statement "INH and EMB coincided with a decreased concentration of these [dCTP and dTTP] nucleotides", by examining Table S5, I do not see any statistical reductions in dCTP and dTTP levels.

      Thank you for bringing this to our attention. We have made the necessary corrections to ensure that the text and data are now aligned.

      Page 14. Similarly to the comment above, the statement "RIF, CIP and MMC treatments promoted an increase in the dCTP and dTTP pools" is misleading as each drug seems to increase either dCTP or dTTP, not both.

      Same as above.

      Page 14. The authors state, "a larger overall dNTP pool size coincides with a larger cell size and vice versa (Figure 4H)". Please indicate the unit of the pool size for the graph shown in Figure 4H. According to the legend, I assume that it refers to the concentration. The term "pool size" may be misleading as it implies quantity rather than concentration.

      Page 15. Figure 4H is impossible to understand. The left y-axis label looks as if it is a ratio of cell length to volume. There is no point in having these three data on a single graph. Please separate them into individual graphs. Also, what is the spacing between the tick marks? The data also seem inconsistent with the values given in Table S1. For example, the mean volume of COMBO is larger than the control (according to Table S1), and yet the graph in Figure 4H indicates that COMBO's relative length is less than 1.

      Thank you for your feedback. We have corrected these and created what we hope is a clearer figure.

      Figure S1. Clarify what the gray shade in the graph represents.

      The gray shade was unnecessary, so we removed it when recoloring the figure to ensure a more coherent color scheme across the different treatments.

      Figure S1. Relative viability cannot be determined by OD600. CFU needs to be determined to assess cell viability.

      Thank you. We changed the incorrect term viability to growth inhibition.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This work describes the induction of SIV-specific NAb responses in rhesus macaques infected with SIVmac239, a neutralization-resistant virus. Typically, host NAb responses are not detected in animals infected with SIVmac239. In this work, seventy SIVmac239-infected macaques were retrospectively screened for NAb responses and a subset of nine animals were identified as NAb-inducers. The viral genomes from 7/9 animals that induced NAb responses were found to encode nonsynonymous mutation in the Nef gene (amino acid G63E). In contrast, Nef G63E mutation was found only in 2/19 NAb non-inducers - implicating that the Nef G63E mutation is selected in NAb inducers. Measurement of Nef G63E frequencies in plasma viruses suggested that Nef G63E selection preceded NAb induction. Nef G63E mutation was found to mediate escape from Nef-specific CD8+ T-cell responses. To examine the functional phenotype of Nef G63E mutant, its effect on downmodulation of Nef-interacting host proteins was examined. Infection of rhesus and cynomolgus macaque CD4+ T cell lines with WT or Nef G63E mutant SIV suggested that Nef mutant reduces S473 phosphorylation of AKT. Using flow cytometry-based proximity ligation assay, it was shown that Nef G63E mutation reduced binding of Nef to PI3K p85/p110 and mTORC2 GβL/mLST8 and MTOR components - kinase complex responsible AKT-S473 phosphorylation. In vitro B-cell Nef invasion and in vivo imaging/flow cytometry-based assays were employed to suggest that Nef from infected cells can target Env-specific B cells. Lastly, it was determined that NAb inducers have significantly higher Env-specific B-cells responses after Nef G63E selection when compared to NAb non-inducers. Finally, a corollary was drawn between the Nef G63E-associated B-cell/NAb induction phenotype and activated PI3K delta syndrome (APDS), which is caused by activating GOF mutations in PI3K, to suggest that Nef G63E-meidated induction of NAb response is reciprocal to APDS.

      Strengths:

      This study aims to understand the viral-host interaction that governs NAb induction in SIVmac239-infected macaques - this could enable identification of determinants important for induction of NAb responses against hard-to-neutralize tier-2/3 HIV variants. The finding that SIV-specific B-cell responses are induced following Nef G63E CD8+ T-cell escape mutant selection argue for an evolutionary trade-off between CTL escape and NAb induction. Exploitation of such a cellular-humoral immune axis could be important for HIV/AIDS vaccine efforts.

      Although more validation and mechanistic basis are needed, the corollary between PI3K hyperactive signaling during autoimmune disorders and Nef-mediated abrogated PI3K signaling could help identify novel targets and modalities for targeting immune disorders and viral infections.

      We are grateful for the supportive and insightful comments. The work did seem to unintendedly highlight a conceptual link between extrinsic and intrinsic immune perturbations. We will keep working on both wings, aiming to evoke synergisms.

      Weaknesses:

      Although the paper does have strengths in principle, the weaknesses of the paper are that the mechanistic basis of Nef-mediated induction of NAb responses are not directly examined. For example, it remains unclear whether SIVmac239 with engineered G63E mutation in Nef would induce faster and potent NAb responses. A macaque challenge study is needed to address this point.

      We appreciate the point. We do have certain difficulties in availability of macaques for de novo experiments. As partially discussed in ver1, the identified Nef phenotype selected post-acute infection confers an enhanced CD4+ T cell-killing effect (revised Fig 4F), and it is likely that de novo infection with the mutant would redirect the trajectory of infection to rapid disease/AIDS progression accompanying generalized immune failure by boosting acute-phase CD4 destruction. In other words, mutant de novo infection may not necessarily be directly discussable as an attempt for reconstitution. It appears equally critical to understand the mutant in vitro on an immunosignaling basis, and in the current work we have focused on depicting this as the first step. We will work on reconstitution experiments with emphasis on pharmacology in our future study.

      As presented, the central premise of the paper involves infected cell-generated Nef (WT or G63E mutant) being targeted to adjacent Env-specific B cells. However, it remains unclear how this is transfer takes place. A direct evidence demonstrating CD4+ T cell-associated and/or cell-free Nef being transferred to B-cell is needed to address this concern.

      We appreciate the point, also pointed out by Reviewers 2 and 3. We have performed three sets of in vitro reconstitution experiments graphically/functionally addressing how Nef transfer from CD4+ T cells to B cells can be modulated (new Fig 6) and edited text accordingly.

      The interaction between Nef and PI3K signaling components (p85, p110, GβL/mLST8, and MTOR) has been explored using PLA assay, however, this requires validation using additional biochemical and/or immunoprecipitation-based approaches. For example, is Nef (WT or mutant form) sufficient to affect PI3K-induced phosphorylation of Akt in an in vitro kinase assay? Moreover, the details regarding the binding events of WT vs mutant Nef with PI3K signaling components is lacking in this study. Lastly, it is unclear whether the interaction of Nef with PI3K signaling components is a conserved function of all primate lentiviruses or is this SIV-specific phenotype.

      We appreciate the point. Co-immunoprecipitation analysis via pulldown with the mTORC2-intrinsic cofactor Sin1 (revised Fig 4E), showing decreased G63E-Nef binding, should confer robustness to the statement combined with initial manipulation results (Fig 4C). As Sin1 is mTORC2- and not mTORC1-intrinsic, results should be strengthened. Phosflow may be a standard readout nowadays for pAkt itself. Related with sequence variation, conservation will be addressed in studies ahead. We concisely mentioned on this in the revision (Lines 390-391).

      It has been previously reported that the region of Nef encoding glycine at position 63 is not conserved in HIV-1 (Schindler et al, Journal of Virology 2004). Thus, does HIV-1 Nef also function in induction of NAb responses in humans? or the observed phenotype specific to SIV?

      We appreciate the point, and do not have an answer at the moment. We will explore in our HIV-1-infected patient cohort (Hau et al, AIDS 2022) and other occasions whether corresponding phenotypes may exist. We have mentioned on this point in the revised manuscript (Line 392-393).

      Reviewer #2 (Public Review):

      It is well known that human and simian immunodeficiency viruses (HIV and SIV, respectively) evolved numerous mechanisms to compromise effective immune responses but the underlying mechanisms remain incompletely understood. Here, Yamamoto and Matano examined the humoral immune response in a large number of rhesus macaques infected with the difficult-to-neutralize SIVmac239 strain. They identified a subgroup of animals that showed significant neutralizing Ab responses. Sequence analyses revealed that in most of these animals (7/9) but only a minority in the control group (2/19) SIVmac variants containing a CD8+ T-cell escape mutation of G63E/R in the viral Nef gene emerged. They further show that this change attenuates the ability of Nef to stimulate PI3K/Akt/mTORC2 signaling. The authors propose that this induction of SIVmac239 nAb induction is reciprocal to antibody dysregulation caused by a previously identified human PI3K gain-of-function (Ref). Altogether, the results suggest that PI3K signaling plays a key role in B-cell maturation and generation of effective nAb responses.

      Strengths of the study are that the authors analyzed a large number of SIVmac-infected macaques to unravel the biological significance of the known effect of the interaction of Nef with PI3K/Akt/mTORC2 signaling. This is interesting and may provide a novel means to improve humoral immune responses to HIV. Weaknesses are that only G63E and not G63R that also emerged in most animals was examined in most functional assays. Some effects of the G63E mutation seem modest and comparison to a grossly nef-defective SIVmac construct would be desirable to better assess to impact of the mutation of Nef-mediated stimulation of PI3K. While the impact of this Nef mutations on PI3K and the association with improved nAb responses is largely convincing, the results on the potential impact of soluble Nef on neighboring B cells is much less clear. SIVmac239 infects and manipulates helper CD4 T cells and these are essential for the activation and differentiation of B cells into antibody-producing plasma cells and effective humoral immune responses. Without additional functional evidence that Nef indeed specifically targets and manipulated B cells these results and conclusions should be made with much greater caution. Finally, the presentation of the results and conclusions is partly very convoluted and difficult to comprehend. Editing to improve clarity is highly recommended.

      We are very grateful for the supportive and visionary review and suggestions. Experiments have been performed to improve the points raised. This work inevitably involved interdisciplinary factors to even hit on the schematic (NAbs, B cells, CD4+T, CD8+T, viral escape, immunosignaling, IEI as extrapolation & microscopy implementations) and convoluted sections should have existed. We attempted streamlining of certain portions and edited writing throughout, and hope that it became more straightforward.

      Reviewer #2 (Recommendations For The Authors):

      As outlined in the public review, I found the results potentially very interesting but parts of the manuscript much more complex and confusing than necessary. In addition, the methods on the potential impact of soluble Nef on neighboring B cells in vivo was difficult to assess but altogether this part was not convincing. Have the following specific suggestions:

      We are very grateful for the scholarly review, and encouraging and suggestive comments on this orphan work. In the revision we designed experiments to address the properties of Nef transfer to append understanding on the in vivo B-cell data. Recommendations have been addressed as follows.

      (1) Title: "AIDS virus-neutralizing antibody induction reciprocal to a PI3K gain-of-function disease". Think this title hardly reflects the data; SIVmac cause simian AIDS and is not the "AIDS virus" the 2nd part is more appropriate for discussion than for the title (and the abstract).

      We appreciate the point. The original intent of the title was to conceptually bridge two differing fields of virus-host interaction and inborn errors of immunity/immunosignaling on an original article basis. Certain papers (Mudd et al, Nature 2012 etc) do utilize the term AIDS virus, and we similarly chose the term for simplification to non-virologists at initial submission.

      That being said, we understand the scholarly point raised, and feel that the initial aim can be well attained by retaining the key host effector PI3K in the title, as in the revised submission titled “SIV-specific neutralizing antibody induction following selection of a PI3K drive-attenuated nef variant”.

      (2) Abstract and throughout: As the authors show, SIVmac is not generally "neutralization resistant"; difficult to neutralize is more appropriate and should be used throughout. Also, the abstract and other parts are more complicated than necessary.

      We appreciate the point. HIV/SIV Env immunology work utilizes “neutralization-resistant” for SIVmac239 (e.g., Mason et al, PLoS Pathog 2016), and autologous titer positivity of ~10% at this size of examination does appear low amongst lentiviruses. Nevertheless, as recommended, “difficult-to-neutralize” better describes the nature, and we have switched the term accordingly.

      Linked with title modification, we reflected the comment on abstract structure and switched the main introductory sentence (Here we…) to a more data-based one instead of depicting extrapolation, and have modified phrasings in the latter half.

      (3) The intro seems a bit biased. Immune evasion due to mutations and proviral integration that play key roles in viral persistence are not mentioned. nAbs are not known to efficiently control HIV or SIV replication in vivo (not even in the present study). Thus, a more "balanced" presentation of the role of nAbs in vivo is desirable.

      We agree with the comment. Introduction in ver1 submission was compressed to just display humoral immune perturbation examples across persistence-prone viral infections, and indeed it should be much better to layout the multiscale strategies of lentiviruses in manifesting viral persistence. We have appended two sets of texts, one on the fundamental integrating retroviral life cycle and another on the wide spectrum of accessory protein-driven perturbation. As pointed out, the current endogenous induction is of course not early enough to exert suppressive impact on replication as like in exogenous Ab passive infusions. We have accordingly modulated text to improve the balance.

      (4) Lines 73-76: rephrase for clarity.

      We acknowledge the comment and have rephrased accordingly.

      (5) Line 92: "linked with sustained Env-specific B-cell responses after the mutant Nef selection". After or during in one case; the time frame varies enormously and this should be discussed.

      We appreciate the comment. The six Nef-G63E mutant-selecting NAb inducers subjected to B-cell analysis were the ones that showed precedence in Fig 2D (mutant before induction). That being said, we modified text as suggested (Line 104 in revised uploaded text). Text related to temporal deviation has been appended (Lines 378-383 in revised uploaded text).

      (6) The authors should discuss G63R and include it in the functional analyses.

      We appreciate the comment. Discussion on Nef-G63R in ver1 submission was kept minimal because statistical significance for selection was marginal. We generated a Nef-G63R mutant and results are appended in Fig 4-Figure Supplement 2.

      (7) Lines 124/5: conservation only applies to SIVsmm/mac Nefs and this region is also frequently deleted/length-variable in primary HIV-1 Nefs.

      We appreciate the comment. We modified description of the region accordingly (Lines 139-141 in revised text).

      (8) Lines 153-155: Statement doesn't seem to make sense. The triple mutant Nef SIVmac construct was not attenuated for replication but specifically disrupted in CD3 down-modulation.

      We acknowledge the comment. It had meant that the consequent plasma viral load showed a trend of decrease (as in the Graphical Abstract of the work) which should (in a simplistic view) influence antigenicity for humoral immune responses. Yet it is very true that virological replicative capacity was comparable with wild-type as in Fig.1. We have taken down the related text and rephrased it (Ref remains cited in introduction).

      (9) Lines 178/9: levels in PI3K gain-of-function mice "with full disease phenotype (Avery et al., 2018)". This needs more information, e.g. what disease exactly are they talking about?

      We are grateful for the correction, and have appended text and introduced the mentioned congenital disease in the Introduction section in advance. In-detail description is also appended in the Discussion section.

      (10) Lines 186/7: "Env-stimulating high-MOI infection also accelerated phenotype appearance, with enhanced 50% reduction (Figure 4C, right)". Modify text and corresponding figure for clarity.

      We acknowledge the comment. We revised as: “A high-MOI SIV infection, comprising higher initial concentration of extracellular Env stimuli, also accelerated phenotype appearance from day 3 to day 1 post-infection with stronger pAkt reduction”.

      (11) The validity of the results described in the section "Targeting of lymph node Env-specific B cells by Nef in vivo" was difficult to assess. Altogether, however, I didn't find them convincing, especially since a negative control (e.g. macaques infected with nef-deleted SIVmac) are missing.

      We acknowledge the comment. As a pure experimental control, whole-Nef deletion may assist for subtracted baselines. Within this work, the staining per se at least should be highly specific (mAb multiply verified in other applications and cytometry panel also designed for minimal spillover into AF488 channel). On in vivo basis, direct comparison may be somewhat frustrated by the fact that reduction in other pleiotropic effects of Nef seem to more dominate upon Nef deletion, as a set of reduced viremia, robust CD8 responses, killer CD4 responses and increased binding Ab titers (Johnson et al, J Virol 1997, Gauduin et al, J Exp Med 2006, Fukazawa et al, Nat Med 2012, Adnan et al, PLoS Pathog 2016 etc) leading to altered trajectory. We promise that we will work on refinement of the methodology in studies ahead.

      (12) Lines 309-319: This paragraph made little sense to me (as did lines 328-331).

      We acknowledge the comment and have edited both sections.

      Reviewer #3 (Additional Reviewer):

      In this manuscript, Hiroyuki Yamamoto et al examined virus-specific antibody responses and identified a subgroup of nine individuals, out of seventy SIVmac239 rhesus macaques of Burmese origin infected with SIVmac239, that develop neutralizing antibodies (NAb). The authors propose the emergence of a nef mutant (Nef-G63E) that impacts on B cell maturation resulting in PI3K gain-of-function.

      My major concerns are:

      The authors by different aspect addressed the role of the emergence of Nef-G63E mutant in individuals developing NAb. The manuscript is confused and the rational not always clearly stated. This reflects the two aspects of the manuscript (i) NAb identification in a subgroup of macaque and (2) the identification this nef mutation.

      We are grateful for the comprehensive and scholarly comments. As pointed out, the work did need to confront potential bifurcation of the influence of the obtained viral immunosignaling phenotype for CD4-intrinsic (which might be your specialty) and B-cell-intrinsic impact. Based on your suggestions we have acquired additional data and revised the manuscript as attached.

      The authors used both males (n=57) and females (n=13). However, there is no indication related to the sex regarding NAb inducers versus non-NAb Inducers. The notion of "highly pathogenic" is certainly not correct (see the introduction). Pathogenicity is also depending on monkey origin. Thus, cynomolgus are less sensitive to SIVmac239 or SIVmac251 compared to rhesus macaques (Ling B Aids 2002; Reimann KA, J Virol 2005; Cumont MC, J Virol 2008), or to pigtails used in US. Indeed, the authors used Burmese macaques, and therefore the dynamics of pathogenicity is different to rhesus macaque (Indian origin) housed in US. How many animals have been sacrificed out of the 61 animals? Herein, the animals are surviving longer (more than one year), and therefore the notion of "highly pathogenic" merits to be modulated.

      We appreciate the comment. We have accordingly appended sex information (M/F: 8/1 versus 49/12 in NAb inducers vs non-inducers, p > 0.99 by Fisher’s exact test) in the methods section. As pointed out there are differences in the frequency and rate of AIDS progression among macaques of differing origin, whereas we have also previously reported reproducible AIDS progression dependent on MHC-I genotypes in the Burmese rhesus macaques utilized (Nomura, Yamamoto et al., J Virol 2012). Adhering to advice, we have attenuated the term to “pathogenic” in the revised manuscript and appended one reference showing pathogenesis gradation from a cell-death perspective (Cumont 2008).

      Furthermore, no indication is provided regarding CD4 T cell dynamics, or CD8 T cells. In particular, the extent of T cell immunodeficiency may compromise humoral response. Therefore, this data needs to be shown. Indeed, previous reports have indicated that early CD4 T cell depletion is associated with defective humoral response. Furthermore, Tfh cell depletion was reported in several immune tissues, which are essential for B cell immune response like the spleen. Thus, this should be discussed as an alternative mechanism to the absence of NAb. Indeed, the authors found higher and persistent env-specific plasmablast cells in NAb inducers than that observed in non-NAb inducers figure 6. Why to have selected twelve individuals out of 61 individuals for assessing anti-env response (Supplemental S3 for figure 1, panel 1), and only eleven for western blots. The explanation in the text is absent. This requires to be clearly stated. See lines 108-110.

      We appreciate the comment. As in other sections, this study utilized available cryopreserved samples from a retrospective cohort, also having heterogeneity in data acquisition along the way. We acknowledge that some supplemental data are particularly limited in information, which is also a reason they are presented in SI. We felt that one important core was to secure samples for Nef-G63E-selecting NAb inducers versus viremic non-inducers, for which we acquired six versus twelve in the B-cell analysis.

      We (Nakane et al, PLoS ONE 2013) and others (Hirsch et al, J Virol 2004) have already reported on western blotting-basis that SIV-infected rapid progressors tend to manifest serological failure (impaired binding Ab-WB bands). Therefore, to compare quantitative traits at this basal stage (Fig 1), we judged that NAb inducer comparison with more non-rapid-progressing (>60 wk survival) non-inducers would be a criterion. We have mentioned on this in the revised manuscript (results/methods). Additionally, we have replaced the immunoblotting result with one more non-inducer (n = 12) to enhance results. Please note that there are lot deviations in strip-coated antigen (e.g., gp160) but the result is comparable (now covers 12/13 of animals with >60-wk survival).

      The authors indicated the frequencies of Nef-G63E mutant in figure 2 panel C. However. no information is indicated in the legend about the number of NAb non-inducers used to calculate this frequency. The authors indicated line 127, "only in two of the nineteen NAb non-inducers, including one rapid progressor". Thus, different numbers of individuals are used through the manuscript. For the readers, this is clearly a statement that needs to be clarify and to refer to what. This is not homogeneous along the text and the analyses performed.

      We appreciate the comment, and have appended the number in the revised Fig 2C. As aforementioned, heterogeneity of sample number in different sections is indeed a limitation of the work, and have mentioned this in the Discussion.

      The rational related to the sentence lines 140-142. Please clarify.. "NAb induction is not associated with these MHC-I genotypes (P = 0.25 by Fisher's exact test, data not shown) but with the Nef-G63E mutation itself".

      We appreciate the comment. We have rephrased it as:

      “Ten of nineteen NAb non-inducers also had either of these alleles (Figure 1-figure supplement 1). This did not significantly differ with the NAb inducer group (P = 0.25 by Fisher’s exact test, data not shown), indicating that NAb induction was not simply linked with possession of these MHC-I genotypes but instead required furthermore specific selection of the Nef-G63E mutation.” (Lines 159-162).

      In supplemental figure 3, only 7 individuals have been tested, while the authors indicated "Ten of nineteen NAb non-inducers also had either of these alleles". Why only seven? In NAb Burmese monkeys, the authors indicate specific T cells capable to recognize WT nef peptide, but not G63E peptide mutant. Thus, nef is immunogenic in vivo generating T cells despite to be mutated.

      In contrary, non-NAb-inducers demonstrate the absence of nef specific T cells (supplemental figure 3, excepted R01-011 panel A). Although, the authors propose an escape mutant for CD8 T cells, this is not associated with the absence of immunogenicity and not with a difference in viral load in comparison to NAb inducers (panel C). Therefore, the conclusions merit to be revised. Thus, this part of the manuscript is confusing. Please clarify the rational to link NAb and Nef specific CD8 T cells.

      We appreciate the comment. 7 out of 8 non-inducers positive for the allele and not selecting for the Nef-G63E mutant was available for analysis. The relative contribution of this single Nef62-70 epitope-specific CTL response is speculated not to be largely impacting viral control, among the many induced. This is basally discussed in a previous paper (Nomura, Yamamoto et al., J Virol 2012), more suggestive of an MHC-I haplotype-level correlation with plasma viral load. We assume that the CTL pressure-driven selection of Nef-G63E mutant was a rather pure immunosignaling trigger under persistent viremia. We appended this in the revised text (Line 172).

      In the next part of the manuscript, the authors assessed the function of this Nef-G63E mutant. The rational to introduce Ferritin in this part of the document is not clear for the reader. Furthermore, a subgroup for each (NAb+ versus NAb-) is shown: 4 for NAbneg versus 6 for NAbpos.

      We appreciate the point. As introduced, Swingler et al Cell Host Microbe 2008 reported HIV-infected macrophage-derived ferritin as a potentially B cell-disrupting factor. In that paper, viral load, ferritin and binding antibody titers positively correlated. Current data shows that SIVmac239-specific NAb induction is distinct from such kinetics already versus viral load (Fig 3-Supplement 1C), and ferritin levels were measured for some available samples more simply for confirmation. We appended three more available samples in the NAb- group. (The six NAb+/G63E animals correspond to the ones with B-cell data in Figure 7.) Statistical results appear unaffected and robust, as shown in this version. The revised manuscript incorporates appended explanation for the former.

      Similarly, whereas the authors observed a role of nef mutant on pAkt Ser473 (less induced) in comparison to WT, the authors suggest that this may have an impact on T cell survival.

      We appreciate the point. In the first submission we obtained peripheral memory Tfh decrease, whereas it is true that this is indirect. In the current revision we have addressed apoptotic cell death, shown to increase with Nef-G63E mutation (Figure 4F).

      The rational to analyze CXCR3-CXCR5+PD-1+ memory follicular Th (Tfh) is not clear. Moreover, the references used are not the adequately cited. Indeed, these papers show an expansion. See the literature for a depletion (Xu H, J Immunol. 2015; Moukambi F, PLoS Pathog. 2015; Yamamoto T, Sci Transl Med. 2015; Xu H, J Immunol. 2018 Moukambi F, Mucosal Immunol. 2019).

      We appreciate these points on in vivo CD4+ T cells.

      Peripheral memory Tfh was reported to correlate with Ab cross-reactivity in one human cohort (Locci et al, Immunity 2013) and we concisely examined the subset in the current NAb induction. We mentioned this in the revised manuscript.

      Moukambi F et al, PLoS Pathog 2015 & Mucosal Immunol 2019 are demonstrative work on acute-phase destruction. We have cited non-neonatal/vaccine-related ones suggested, including these two, in the revised manuscript. The biphasic dysregulation of Th (acute-phase destruction and chronic-phase adverse hyper-expansion) may indeed have a unique role with the current phenotype, which is beyond aim of the current analysis. We have concisely mentioned on this in the Discussion.

      Then, the authors assess the potential B-cell-intrinsic influence of the G63E-Nef phenotype. The rational here is clearly indicated, making sense with figure 1. Furthermore, this part is clearer. The dot-plots merit to be revised and the markers used better stated. The authors indicate that Nef invasion upregulates pAkt Ser473 assuming aberrant PI3K/mTORC2 signaling. What is the impact of Nef-G63E mutant on pAkt Ser473 using in vitro model of transfer. This is not addressed for comparison.

      We appreciate the remarks/suggestions, also pointed out by Reviewers 1 and 2. We have performed three sets of in vitro reconstitution experiments visually and functionally addressing how Nef transfer to B cells can be modulated (new Fig 6), and edited text accordingly.

      Minor points are:

      - the presence of references in the legend.

      -some Ab clones are in the table, however they are not used such CD38 and CD138, which are well known to be non-valid B cell markers for monkeys."

      We appreciate the suggestions.

      Mentioning on reference have been removed from the legend (Fig.1, Fig. 3) and moved to the corresponding Methods section (Fig. 1).

      We also understood this well in advance (CD38/CD138), and incorporated them in the memory B-cell panel just to check whether they ever behave in a specific pattern. As expected, no notable behavior was observed in these NAb inducers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study examines the effects of NFKB2 mutations on pituitary gland development through hypothalamic-pituitary organoids. The evidence supporting the main conclusions is solid, although analysis of additional clones to exclude inter-clone variability would strengthen the conclusions. Insight into the mechanism of action of NFKB2 during pituitary development is incomplete. This work will be of interest to endocrinologists and biologists working on pituitary gland development and disease.

      We agree with these considerations and the summary and thank the Editors for their assessment. Although we indeed share the idea that reproduction of the experiments on a second clone would be a useful confirmatory step, we have not been able to reach this goal within a reasonable time frame for the reason mentioned above (unavailability of the main research engineer knowledgeable in the challenging methods involved for organoids differentiation) and due to the long turnaround time of this kind of experiments (3 months for the whole differentiation starting form iPSC). We therefore decided to publish on a single clone while we are still aiming at reproducing our results on at least a second one and will hopefully be able to provide these additional data in a subsequent revised version. We now acknowledge this limitation in the final part of the Discussion.

      Revised text: “Conversely, a limitation of this model is the long duration of the differentiation period (approximately 3 months) and the fact that not all hiPSC clones lead to full differentiation of hypothalamo-pituitary organoids despite similar conditions of culture. For these reasons, we could not include confirmation of our results on an independent clone in the present paper.”

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      NFKB mutations are thought to be one of the causes of pituitary dysfunction, but until now they could not be reproduced in mice and their pathomechanism was unknown. The authors used the differentiation of hypothalamic-pituitary organoids from human pluripotent stem cells to recapitulate the disease in human iPS cells carrying the NFKB mutation.

      Strengths:

      The authors achieved their primary goal of recapitulating the disease in human cells. In particular, the differentiation of the pituitary gland is closely linked to the adjacent hypothalamus in embryology, and the authors have again shown that this method is useful when the hypothalamus is suspected to be involved in pituitary abnormalities caused by genetic mutations.

      Weaknesses:

      On the other hand, the pathomechanism is still not fully understood. This study provides some clues to the pathomechanism, but further analysis of NFKB expression and experiments investigating the relevant factors in more detail may help to clarify it further.

      We thank this reviewer for acknowledging that we've reached our primary objective, in particular the fact that the HPO (hypothalamo-pituitary organoid) model allows recapitulation of the disease in human cells, including hypothalamic-pituitary interactions. Regarding the pathophysiological mechanism of the disease, we must admit that it remains incompletely understood. However, we have analysed more samples by RT-qPCR and further analysed RNASeq data from NFKB2 KI organoids, which provided with more insights into the different levels where NFKB2 may play a role. We have now provided several additional figures derived from these analyses, including a synthetic figure to summarize the most relevant observed effects (Fig. 14). 

      Reviewer #2 (Public Review):

      We also thank this reviewer for the detailed analysis of our manuscript, for the valuable comments, suggestions and questions that are addressed point-by point below. 

      Summary:

      DAVID syndrome is a rare autosomal dominant disorder characterized by variable immune dysfunction and variable ACTH deficiency. Nine different families have been reported, and all have heterozygous mutations in NFKB2. The mechanism of NFKB2 action in the immune systems has been well-studied, but nothing is known about its role in the pituitary gland.

      The DAVID mutations cluster in the C-terminus of the NFKB2 and interfere with cleavage and nuclear translocation. The mutations are likely dominant negative, by affecting dimer function. ACTH deficiency can be life-threatening in neonates and adults, thus, understanding the mechanism of NFKB2 action in pituitary development and/or function is important.

      The authors use CRISPR/Cas gene editing of human iPSC-derived pituitary-hypothalamic organoids to assess the function of NFKB2 and TBX19 in pituitary development. Mutations in TBX19 are the most common, known cause of pituitary ACTH deficiency, and the mechanism of action has been studied in mice, which phenocopy the human condition. Thus, the TBX19 organoids can serve as a positive control. The Nfkb2<Lym1/Lym1> mouse model has a p.Y868* mutation that impairs cleavage of NFKB2 p100, and the immune phenotype mimics the patients with DAVID mutations, but no pituitary phenotype was evident. Thus, a human organoid model might be the only approach suitable to discover the etiology of the pituitary phenotype.

      Overall, the authors have selected an important problem, and the results suggest that the pituitary insufficiency in DAVID syndrome is caused by a developmental defect rather than an autoimmune hypophysitis condition. The use of gene editing in human iPSC-derived hypothalamic-pituitary organoids is significant, as there is only one example of this previously, namely studies on OTX2. Only a few laboratories have demonstrated the ability to differentiate iPSC or ES cells to these organoids, and the authors have improved the efficiency of differentiation, which is also significant.

      The strength of the evidence is excellent. However, the two ACTH-deficient organoid models use a single genetically engineered clone, and the potential for variability amongst clones makes the conclusions less compelling. Since the authors obtained two independent clones for NFKB2 it is not clear why only one clone was studied.

      We experienced difficulties obtaining an hiPSC population devoid of spontaneous differentiation while purifying this second clone, and did not want to delay the start of the experiments. This clone will be analysed in a follow-up study.

      Finally, the effect of TBX19 on early pituitary fate markers is somewhat surprising given the phenotype of the knockout mice and patients with mutations. Thus, the use of a single clone for that study is also worrisome.

      We agree that the effect of the TBX19 mutant on early pituitary progenitor development is rather puzzling. In our model, TBX19 is expressed throughout the whole experiment, although it is at very low levels in undifferentiated hiPSCs compared to peak expression (over 50-fold difference).

      During the CRISPR-Cas9 gene edition, we obtained a clone with a homozygous one base insertion at the cutting site, leading to a frameshift and a premature stop codon 48 bases downstream. This would result in an expected protein of 163 amino acids instead of 488, but with potentially still functional DNA-binding ability. This mutation had a similar effect on LHX3 and PITX1 as the TBX19 KI mutation, although it was even more severe. Our most likely explanation is that the two TBX19 mutants we generated have dominant negative effects. Contrary to mouse, little is known about TBX19 expression in early human pituitary development, but scRNA-seq data on human embryonic pituitaries (Zhang et al.) show low expression in undifferentiated pituitary progenitors between 7 and 9 weeks of gestation. Therefore, early expression of these dominant negative proteins could perturb differentiation in the organoids. Future development of hiPSCs lines with total absence of TBX19 should help clarify these questions.

      Strengths:

      The authors make mutations in TBX19 and NFKB2 that exist in affected patients. The TBX19 p.K146R mutation is recessive and causes isolated ACTH deficiency. Mutations in this gene account for 2/3 of isolated ACTH deficiency cases. The NFKB2 p.D865G mutation is heterozygous in a patient with recurrent infections and isolated ACTH deficiency. NFKB2 mutations are a rare cause of ACTH deficiency, and they can be associated with the loss of other pituitary hormones in some cases. However, all reported cases are heterozygous.

      The developmental studies of organoid differentiation seem rigorous in that 200 organoids were generated for each hiPSC line, and 3-10 organoids were analyzed for each time point and genotype. Differentiation analysis relied on both RNA transcript measurements and immunohistochemistry of cleared organoids using light sheet microscopy. Multiple time points were examined, including seven times for gene expression at the RNA level and two times in the later stages of differentiation for IHC.<br /> TBX19 deficient organoids exhibit reduced levels of PITX1, LHX3, and POMC (ACTH precursor) expression at the RNA and IHC level, and there are fewer corticotropes in the organoids, as ascertained by POMC IHC.

      The NFKB2 deficient organoids have a normal expression of the early pituitary transcription factor HESX1, but reduced expression of PITX2, LHX3, and POMC. Because there is no immune component in the organoid, this shows that NFKB2 mutations can affect corticotrope differentiation to produce POMC. RNA sequencing analysis of the organoids reveals potential downstream targets of NFKB2 action, including a potential effect on epithelial-to-mesenchymal-like transition and selected pituitary and hypothalamic transcription factors and signaling pathways.

      Weaknesses:

      There could be variation between individual iPSC lines that is unrelated to the genetically engineered change. While the authors check for off-target effects of the guide RNA at predicted sites using WGS, a better control would be to have independently engineered clones or to correct the engineered clone to wild type and show that the phenotypic effects are reversed.

      All NFKB2 patients are heterozygous for what appear to be dominant negative mutations that affect protein cleavage and nuclear localization of processed protein as homo or heterdimers. The organoids are homozygous for this mutation. Supplemental Figure 4 indicates that one heterozygous clone and two homozygous mutant clones were obtained. Analysis of these additional clones would give more strength to the conclusions, showing reproducibility and the effect of mutant gene dosage.

      The main goal of this work was to evaluate if and how NFKB2D865G mutation affects hypothalamic-pituitary organoids development, in order to determine if these organoids would constitute a valuable model to study DAVID syndrome.

      We thank this reviewer for noting that we identified an important question and have used appropriate novel and not widely used methods to address it, including CRISPR/Cas9 genome editing of iPSCs and disease modelling in iPSC-derived HPOs that had not previously been reported by a team other than the one that initially described it, allowing to confirm our working hypothesis that DAVID syndrome is caused by a developmental defect rather than an autoimmune hypophysitis condition. We also agree that analysing more clones, generated from same or different hiPSC lines, carrying homozygous or heterozygous mutations, and corrected mutations will be necessary in the future.

      Reviewer #3 (Public Review):

      We also thank this reviewer for the detailed analysis of our manuscript, for the valuable comments, suggestions and questions that are addressed point-by point below. 

      Summary:

      This manuscript by Mac et al addresses the causes of pituitary dysfunction in patients with DAVID syndrome which is caused by mutations in the NFKB2 gene and leads to ACTH deficiency. The authors seek to determine whether the mutation directly leads to altered pituitary development, as opposed to an autoimmune defect, by using mutating human iPSCs and then establishing organoids that differentiate into pituitary tissue. They first seek to validate the system using a well-characterised mutation of the transcription factor TBX19, which also results in ACTH deficiency in patients. Then they characterise altered pituitary cell differentiation in mutant NFKB2 organoids and show that these lack corticotrophs, which would lead to ACTH deficiency.

      Strengths:

      The conclusion of the paper that ACTH deficiency in DAVID syndrome is independent of an autoimmune input is strong.

      Weaknesses:

      (1) The authors correctly emphasise the importance of establishing the validity of an iPSC-based model in being able to recapitulate in vivo dysfunctional pituitary development through characterisation of a TBX19 knock-in mutation. Whilst this leads to the expected failure of functional corticotroph differentiation, other aspects of the normal pituitary differentiation pathway upstream of corticotroph commitment seem to have been affected in surprising ways. In particular, the loss of LHX3 and PITX1 in TBX19 mutant organoids compared with wild type requires explanation, especially as the mutant protein would only be expected to be expressed in a small proportion of anterior pituitary lineage cells.

      If the developmental expression profile of key transcription factors in mutant organoids does not recapitulate that which occurs in vivo, any interpretation of the relevance of expression differences in the NFKB2 organoids to the mechanism(s) leading to corticotroph function in vivo has to be questionable.

      See response to Reviewer #2

      It is notable that the manipulation of iPSC cells used to generate mutants through CRISPR/Cas9 editing is not applied to the control iPSC line. It is possible that these manipulations lead to changes to the iPSC cells that are independent of the mutations introduced and this may change the phenotype of the cells. A better control would have been an iPSC line with a benign knock-in (such as GFP into the ROSA26 locus).

      We agree that the issue of off-target mutations should be addressed. However, we performed whole genome sequencing on TBX19 KI and did not observe any pathogenic variants other than the intended edition. We also checked that clones isolated during the screening procedure but that returned negative for editing still had the ability to generate pituitary cells. However, we made the choice to use the isogenic original hiPSC line as it could be compared to both TBX19 KI and NFKB2 KI simultaneously, therefore reducing workload and cost of the experiments. Any other knock-in mutation, such as GFP into the ROSA26 locus would imply the same risk of off-target mutations, but presumably at other sites in the genome.

      (2) In the results section of the manuscript the authors acknowledge that hypothalamic tissue in the NFKB2 mutant organoid may be having an effect on the development of pituitary tissue. However, in the discussion the emphasis is entirely on pituitary autonomous mechanisms such as pituitary HESX1 expression or POMC gene regulation; in the conclusion of the abstract, a direct role for NFKB2 in pituitary differentiation is described. Whilst the data here may suggest a non-immune mediated alteration in pituitary function in DAVID syndrome, if this is due to alteration of the developing hypothalamus then this is not direct. A fuller discussion of the potential hypothalamic contribution and/or further characterisation of this aspect is warranted.

      We agree with this reviewer that contributions of both hypothalamic and pituitary developing tissues should be taken into account. We performed more experiments and analysed the effect of both mutations on hypothalamic growth factors expression. These results are displayed in new figure 10. The role of the hypothalamus is now clearly mentioned and highlighted in the Discussion.

      (3) qRT-PCR data presented in Figure 6A shows negligible alteration of HESX1 expression at all time points in NFKB2 mutant organoids. This is not consistent with the 2-fold increase in HESX1 expression described in day 48 organoids found by bulk RNA sequencing.

      How do the authors reconcile these results and why is one result focused on in the discussion where a potential mechanism for a blockade of normal pituitary cell differentiation is suggested? Further confirmation of HESX1 expression is required.

      In the previous version on the manuscript, the HESX1 fold-change ratio between NFKB2 KI and WT at d48 was of 2.06 (p=0.22). However, the type of representation for expression kinetics (values relative to the expression peak in WT) and the scale used made it difficult to see. In the new version of the manuscript, we analysed more samples from the same experiments, and new figure (now 6B) shows significant increase of HESX1 expression (Fc = 2.46, p=0.019) in NFKB2 KI.

      Also, qPCR results come from at least two different experiments whereas RNAseq come from a single one. For RT-qPCR, 6 HPOs per genotype were picked and further analysed. As we found that only 60-70% of organoids show signs of pituitary cell differentiation, we chose to perform a preselection of organoids, based on RT-qPCR expression of selected markers (SOX2, HESX1, PITX1, LHX3, TBX19, POU1F1 and POMC) in order to avoid having “empty” HPOs sent for bulk RNAseq. We compared HESX1 expression ratios obtained by the two different techniques on the same samples (the ones used for RNA-seq) and found values of 2.19 (p=0.03) and 1.83 (p=0.061) for RNA-seq and RT-qPCR respectively. This is illustrated in Supplementary Figure 7. Our new results thus clearly demonstrate the increase in HESX1 expression in NFKB2 KI from d27 to d75.

      (4) Throughout the authors focus on POMC gene expression and ACTH antibody immunopositive as being indicative of corticotroph cell identity. In the human fetal pituitary melanotrophs are present and most ACTH antibodies are unable to distinguish these cells from corticotrophs. Is the antibody used specifically for ACTH rather than other products of the POMC gene? It is unlikely that all the ACTH-positive cells are melanotrophs, nevertheless, it is important to know what the proportions of the 2 POMC-positive cell types are. This could be distinguished by looking for the expression of NeuroD1, which would also define whether corticotrophs are committed but not fully differentiated in the NFKB2 mutant organoids. In support of an effect on corticotrophs, it is notable that CRHR1 expression (which would be expected to be restricted to this cell type) is reduced by 84% in bulk RNAseq data (Table 1) and this may be an indicator of the loss of corticotrophs in the model.

      The antibody we used is directed against ACTH. In HPOs, PAX7 expression was barely detected during the whole experiment. Moreover, although PCSK2 transcripts were observed, their expression started very early (d27) and remained constant, suggesting that an expression of this gene in hypothalamic cells rather than pituitary cells. All these observations suggest that melanotrophs are very unlikely to be present in HPOs.

      (5) Notwithstanding the caveats about whether the organoid model recapitulates in vivo pituitary differentiation (see 1 above) and whether the bulk RNAseq accurately reflects expression levels (see 3 above), there are potentially some extremely interesting changes in gene expression shown in Table 1 which warrant further discussion. For example, there is a 25-fold reduction in POU1F1 expression which may be expected to reflect a loss of somatotrophs in the organoid (and possibly lactotrophs) and highlights the importance of characterising the effect of NFKB2 on other anterior pituitary cell types within the organoid. If somatotrophs are affected, this may be relevant to the organoids as a model of DAVID syndrome as GH deficiency has been described in some individuals with NFKB2 mutations. The huge increase in CGA expression may reflect a switch in cell fate to gonadotrophs, as has been described with a loss of TPIT in the mouse. These are examples of the changes that warrant further characterisation and discussion.

      We performed a more in-depth analysis of other pituitary lineages (mainly somatotrophs). We confirmed the strong reduction in PROP1 and POU1F1 expression in NFKB2 KI organoids. Although the strong increase in CGA expression in the mutant may raise the possibility of a redirection towards gonadotroph lineage, the lack of change in NR5A1 expression may suggest otherwise.

      These results are now illustrated in figure 12 and discussed in a full paragraph.

      (6) How do the authors explain the lack of effect of NFKB2 mutation on global NFKB signalling?

      The most likely explanation is that p100/p52 is not involved in controlling the expression of other members of NFKB signalling. Therefore, the absence of global alteration of NFKB signaling pathway shows that mutant p100/p52 protein is directly responsible for the observed phenotype.

      Recommendations for the authors:

      Reviewing editor summary of recommendation to authors:

      The use of hypothalamic-pituitary organoids can provide a fundamental understanding of pituitary gland development and differentiation. Their use to study human pituitary insufficiency is important, gaining insight into the aetiology of disease and if it implicates the hypothalamus or anterior pituitary. To this end, there is only one other example of their use in the literature, where Matsumoto et al, (2019), used OTX2-mutant hypothalamic-pituitary organoids to understand the aetiology of pituitary hypoplasia driven by OTX2 mutations. This being the second example of using gene editing in human iPSC-derived hypothalamic-pituitary organoids, these studies have improved the efficiency of differentiation previously published by Suga et al. (2011) for ES cells, and Matsumoto et al. (2019) for iPS cells. In addition, it has solidified that this method is useful, especially when studying hypothalamic involvement in human pituitary anomalies, due to the concerted development of these two structures.

      The reviewers recognise the valuable insight provided into the mechanism of NFKB2 action during pituitary development and how this human organoid model might be one of the few or only approaches suitable to discover the aetiology of the pituitary phenotype.

      The reviewers agree that both the evidence provided from the organoid model, as well as the characterisation of the phenotype are incomplete. In particular, the strength of evidence would be improved by analysing additional independent clones for both NFKB2 as well as TBX19 gene-edited iPSCs. Additionally, analysis of NFKB2 expression both in vivo and in the organoids, as well as analysis for the NFKB2 targets put forward, would be a lot more informative to help understand this phenotype.

      The main recommendations discussed are summarised here and the reviewers have elaborated on these points in their individual reviews:

      The two ACTH-deficient organoid models use a single genetically engineered clone, and the potential for variability amongst clones, unrelated to the mutation, makes the conclusions less compelling. Two independent homozygous clones were obtained for NFKB2 but only one was used, so analysis of the second clone would strengthen the findings. A heterozygous clone was also obtained and given all NFKB2 patients are heterozygous for what appears to be dominant negative mutations, the heterozygous clone ought to be analysed. Analyses of these additional clones would give more strength to the conclusions, showing reproducibility and the effect of mutant gene dosage. The reviewers provide excellent suggestions for alternative controls for the engineered iPSC lines in their specific comments.

      The effect of TBX19 mutation on early pituitary fate markers LHX3 and PITX1 is surprising given the phenotype of the knockout mice and patients with mutations. If the developmental profile of essential transcription factors does not recapitulate the in vivo expression in this well-characterised mutant, this brings the organoid model into question. Thus, analysis of a further clone for the study of mutant TBX19 would be crucial. The validity of this control affects the interpretations relying on expression differences in the NFKB2-mutant organoids.

      The study has implicated NFKB2 in pituitary development, but more insight is needed to fully understand disease pathogenesis. The authors presented potential downstream targets of NFKB2 action, including transcription factors and key signalling pathway components; further analyses of NFKB2 expression and experiments investigating the relevant factors in more detail will help elucidate this point.

      Discerning between the hypothalamus and pituitary tissue is fundamental to interpreting phenotypes: (i) To pinpoint the primary tissue affected by NFKB2 deficiency, staining for NFKB2 during development in vivo will determine if this is expressed both in the developing hypothalamus and anterior pituitary gland or only one of these tissues. (ii) Using markers of hypothalamus and pituitary to discern between these two tissues in organoids, will provide a lot of valuable information where expression changes are presented. This would help discern the contribution of the developing hypothalamus as this is still unclear and has not been discussed. Knowing which tissue compartments NFKB2 is expressed in the organoids would also be of great value.

      The organoids provide an opportunity to characterise the effects of NFKB2 on other pituitary cell types, since the bulk RNAseq presents intriguing changes indicating that not only corticotrophs may be affected. This may be of relevance to patients, which can have additional pituitary hormone deficiencies. If NFKB2 is expressed in the pituitary, demonstrating expression in the different cell types in vivo as well as in the organoids would help interpret the phenotype. Is this expressed only in corticotrophs/corticotroph precursors, or in additional endocrine cells?

      We agree with these considerations and the summary and thank the Editors for their assessment. Although we indeed share the idea that reproduction of the experiments on a second clone would be a useful confirmatory step, we have not been able to reach this goal within a reasonable time frame for the reason mentioned above (unavailability of the main research engineer knowledgeable in the challenging methods involved for organoids differentiation) and due to the long turnaround time of this kind of experiments (3 months for the whole differentiation starting form hiPSC). We therefore decided to publish on a single clone while we are still aiming at reproducing our results on at least a second one and will hopefully be able to provide these additional data in a subsequent revised version. We now acknowledge this limitation in the final part of the Discussion.

      We have analysed more samples by RT-qPCR and further analysed RNASeq data from NFKB2 KI organoids, which provided with more insights into the different levels where NFKB2 may play a role. Specifically, we now show the effect of NFKB2 mutation on hypothalamic growth factors and pituitary progenitor differentiation (figure 10), different stages of corticotroph maturation (figure 11) and effects on PROP1/POU1F1-dependent lineages (figure 12). We confronted our results to publicly available ChIPseq data concerning p52 transcriptional targets (figure 13). We have now provided several additional figures derived from these analyses, including a synthetic figure to summarize the most relevant observed effects (Fig. 14). 

      Reviewer #1 (Recommendations For The Authors):

      In organoids, it is essential to stain for NFKB: is it the hypothalamus or the pituitary that expresses NFKB, and if the pituitary, is it the corticotroph itself or the surrounding cells? If immunostaining is not available, FISH or RNAscope can be used to look at expression.

      Figure 7 shows stronger expression of p100/p52 in pituitary progenitors, and some expression in the hypothalamic part of the organoid. Due to current lack of biological material and length of experimental procedure, we could not yet determine which differentiated cell types express p100/p52, but this is clearly something we will look at in further experiments.

      Regarding Figure 7, NFKB2 (D865G/D865G) shows no LHX3 expression already at day 48. It would be better to look at expression including PITX1 at an earlier time point to see at what point differentiation is impaired.

      RT-qPCR results show no statistically significant changes in PITX1 (Fc=0.58, p=0.25) or LHX3 (Fc = 0.15; p=0.22) expression at d27, although there was a tendency towards downregulation.

      Is it really just a species difference that NFKB2-deficient mice do not have abnormal pituitary function? This needs to be discussed in the manuscript.

      Nfkb2_Lym1/Lym1 mice and _NFKB2 KI model have different but functionally very similar mutations, as they both lead to an abnormal processing of p100 and a strong reduction of p52 content. In mice, these mutations are more severe than the complete absence of Nfkb2 gene product, and they have been called “super repressors”. It is therefore surprising that no pituitary phenotype as been observed in mice. In our opinion, this constitutes a strong argument in favour of an inter-species difference, at least for the pathogenicity of this type of mutations.

      This point is now addressed in the Discussion

      Just looking at changes in gene expression by qPCR and bulk RNA-seq does not give enough information about localisation. We wish RNA-seq had at least been separated by FACS first. For example, FACS can separate the anterior pituitary and hypothalamus by EpCAM positivity/negativity (PMID: 35903276), so we would like to see gene expression in such separated samples.

      This is a pertinent suggestion. We are aware of these techniques and we hope we will be able to include them in future studies

      For Figures 2 and 6, just looking at changes in gene expression by qPCR does not provide localisation information, so either (1) immunostaining for LHX3 and NKX2.1 should be shown in each aggregate as in FigS3, or (2) qPCR should be performed on the FACSed cells. (2) qPCR on FACSed cells.

      PITX1, LHX3 (as confirmed by our immunofluorescence data) and HESX1 are only expressed in non-neural tissue. TBX19 could be expressed in the hypothalamic part of the organoid, but we observed very little immunostaining outside the outermost layers of organoids (i.e. pituitary tissue). The antibody we used to detect corticotrophs only recognizes ACTH, and therefore only marks pituitary cells.

      In addition, pathway and gene ontology analyses should be performed.

      Pathways and gene ontology have been performed. However, as organoids consist of two different tissues, the analysis of over 4800 differentially expressed genes did not give us very informative results, apart from an impairment of retinoic acid signalling that we are currently investigating

      Reviewer #2 (Recommendations For The Authors):

      The differentiation of iPSC to organoids could be variable. The authors indicate that 200 organoids were analyzed for each line, and 3-10 organoids were analyzed per time point, genotype, and assay. Is it clear that 100% of the organoids differentiate to produce corticotropes? Please clarify.

      In our experiments, almost 90% of organoids give rise to non-neural ectoderm, as demonstrated by PITX1 expression. However, depending on experiments, only 60-70% of organoids give rise to pituitary progenitors (LHX3+) and subsequently to corticotropes. This has been clarified in the text.

      For TBX19, it seems surprising that there is an effect on PITX1 and LHX3 expression, since TBX19 expression is normally activated after these genes are expressed. An effect of TBX19 on EMT would also be surprising as the knockout mice do not have dysmorphology of the stem cell niche. The only evidence for an effect is the reduced IHC for E-cadherin. If this is an important point, the authors should examine other EMT markers such as Zeb2. The TBX19 knockout mice appear to form corticotropes based on the expression of NeuroD1, even though they lack TBX19 and POMC expression. It would be reassuring to see that NeuroD1 is normally expressed in the TBX19 mutant organoids.

      We agree that the effect of the TBX19 mutant on early pituitary progenitor development is rather puzzling. In our model, TBX19 is expressed throughout the whole experiment, although it is at very low levels in undifferentiated hiPSCs compared to peak expression (over 50-fold difference).

      During the CRISPR-Cas9 gene edition, we obtained a clone with a homozygous one base insertion at the cutting site, leading to a frameshift and a premature stop codon 48 bases downstream. This would result in an expected protein of 163 amino acids instead of 488, but with potentially still functional DNA-binding ability. This mutation had a similar effect on LHX3 and PITX1 as the TBX19 KI mutation, although it was even more severe. Our most likely explanation is that the two TBX19 mutants we generated have dominant negative effects. Contrary to mouse, little is known about TBX19 expression in early human pituitary development, but scRNA-seq data on human embryonic pituitaries (Zhang et al.) show low expression in undifferentiated pituitary progenitors between 7 and 9 weeks of gestation. Therefore, early expression of these dominant negative proteins could perturb differentiation in the organoids. Future development of hiPSCs lines with total absence of TBX19 should help clarify these questions.

      Apart from the lack of change in ZEB2 expression in TBX19 KI (Fc = 1.15; p = 0.35), we did not look further for changes in EMT markers in TBX19 KI. However, we added a more detailed analysis for EMT markers expression in NFKB2 KI based on RNAseq results (see table 2).

      Due to lack of material, we could not confirm NEUROD1 expression by immunostaining. However, RT-qPCR showed there was no change in NEUROD1 expression in TBX19 KI (Fc = 0.81; p = 0.64)

      NFKB2 IHC was markedly reduced in NFKB2 D865G/D865G organoids. Based on previous experiments, the mutant protein should be expressed but not activated by proteolytic cleavage. It is possible that the antibody has a different affinity for the mutant protein and/or the uncleaved protein may be unstable. Can this be clarified? The mRNA for mutant NFKB2 appears unchanged in Table 1.

      This is puzzling indeed. We did not notice any change in NFKB2 from d27 to d105, and no significant change either between WT and NFKB2 KI. Although the antibody we used recognizes both p100 and p52, we cannot rule out the possibility that p100/p52 is degraded by pathways other than proteasome. Another possibility is that p100 interactions with other proteins may decrease the accessibility of the antibody to the epitope

      The RNA sequencing data from the NFKB2 organoids is intriguing. It suggests that the NFKB2 mutation may have a modest effect on Tbx19 transcription but not Neurod1. It also suggests there are hypothalamic effects, i.e. altered expression of hypothalamic markers in mutant organoids. Is NFKB2 expressed in the developing hypothalamus? Can normal NEUROD1 IHC be confirmed? It is also intriguing that there may be an effect on EMT. However, there seem to be some discrepancies in the direction of effect on these markers. Please clarify.

      This is related to the point just above. P100/p52 is described as a ubiquitously expressed protein. We think that it is expressed in the hypothalamic part of the organoids, but at a lower level compared to pituitary progenitors.

      As mentioned before, we could not yet confirm NEUROD1 expression by immunostaining, but RT-qPCR clearly showed there was no change in NEUROD1 expression in TBX19 KI (Fc = 0.81; p = 0.64) or NFKB2 KI (Fc = 0.88; p = 0.5). However, we investigated other markers of different stages of corticotroph differentiation (see figure 11) and found that the later stages are most affected.

      Concerning the EMT, we also found changes in the expression of other markers that are shown in Table 2 and discussed further in the text.

      Cytokines have been proposed to play important roles in pituitary differentiation, i.e. IL6. Is there any evidence for an altered cytokine or chemokine expression in the NFKB2 organoids?

      We didn’t see any change in IL6 expression NFKB2 KI (Fc = 2.34; p = 0.55), but RNAseq shows a strong increase in IL6R (Fc = 8.89; p = 2.13e-09). But at this point, the relevance of these observations remains elusive.

      Minor:

      Some patients with DAVID syndrome have pituitary hypoplasia. The authors measure organoid size and find no differences based on genotype. However, each organoid probably has a variable amount of tissue differentiated to pituitary and hypothalamic fates, therefore, the volume of the whole organoid may not be a good proxy for the amount of pituitary tissue.

      We are aware of this issue. However, for most pituitary genes measured by RT-qPCR (PITX1, LHX3, TBX19), the deltaCt values did not drastically vary for a given time point/genotype, suggesting a stable pituitary/hypothalamic ratio.

      Figure 9 shows whole transcriptome data for the NFKB2 organoids, and Table 1 lists the data for selected genes. There appears to be disagreement between the significance cut-offs used in the figure and the table. Please adjust.

      We removed the fold-change cut-offs to improve clarity

      elife120868_0_supp_2945725_rxl2z4. "haft" appears several times, but it should be "half".

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      The contribution of individual resides is shown in Figure 3c, which highlights one of the strengths of this RBM implementation - it is interpretable in a physically meaningful way. However, there are several decisions here, the justification of which is not entirely clear.

      i) Some of the residues in Fig 3c are stated as "relevant" for aminoacylated PG production. But is this the only such hidden unit? Or are there others that are sparse, bimodal, and involve "relevant" AA?

      Thanks for bringing this important question to our attention. In fact,  this was the only hidden unit involving the combination of positions 152 and 212.  Although we don't  have knowledge of all relevant amino acids for this catalytic process, the residues we uncover were however shown through experimental analysis to be critical for the catalytic function of two MprF variants, and thus since our protein of interest involved this function, any domain which did not contain these residues were excluded. We can't rule out that the domains we excluded from further analysis could be performing similar catalytic functions, but we found it unlikely considering the amino acids found in the negative portion of the weight were chemically unlikely to form a complex with the amino acid lysine. We have clarified in the text, that this selection is probably a subset of all important amino acids, however, this selection provided predictive power.

      ii) In order to filter the sequences for the second stage, only those that produce an activation over +2.0 in this particular hidden unit were taken. How was this choice made?

      The +2.0 was chosen as it ensured that the bimodal distribution was split into two distinct distributions.

      iii) How many sequences are in the set before and after this filtering? On the basis of the strength of the results that follow I expect that there are good reasons for these choices, but they should be more carefully discussed.  

      We started with 11,507 sequences and after filtering we had 7,890 to train our model with.  We think this number still maintains robust statistics. This is noted in the Dataset acquisition and pre-processing section of the Methods section.

      iv) Do the authors think that this gets all of the aminoacylated PG enzymes? Or are some missed?

      This is an interesting question that prompted us to do further analysis. We have added a new supplemental figure providing more details to this question. Based on the Uniprot derived annotations and the Pfam domain-based analysis of these sequences, the large majority of sequences that were excluded were proteins which included the LPG_synthase_C domain but not the transmembrane flippase domain required by the MprF class of enzymes, and were instead accompanied by different domains which  seem less relevant to our enzyme of interest.  It is true though, and related to question (i), that variants which might retain the functionality despite losing experimentally determined key catalytic residues could have been excluded by this method, but such sequences could still be reasonably excluded due to their dissimilarity with MprF from Streptococcus agalactiae.

      However, some similar criticisms from the last point occur here as well, namely the selection of which weights should be used to classify the enzymes' function. Again the approach is to identify hidden unit activations that are sparse (with respect to the input sequence), have a high overall magnitude, and "involve residues which could be plausibly linked to the lipid binding specificity."

      (i) Two hidden units are identified as useful for classification, but how many candidates are there that pass the first two criteria? Indeed, how many hidden units are there?

      We note in the Model training section of the methods that our final model used had 300 hidden units in total.  As to the first part of your question,  rather than systematically test the predictive power of all other hidden units to this task, we decided to use the weights that we did because of their connection to a proposed lipid binding pocket found through Autodocking experiments. While another weight might provide predictive power, it might lack this critical secondary information. Moreover, the direction of our research necessitated finding weights which first satisfied our lipid-binding pocket plausibility before using these weights to propose MprF variants to test for our novel functionality. Given the limited information we had early in the research process, to go in reverse would have provided too many options for experimental testing with reduced mechanistic justification. We included a brief explanation of our rationale in section " Restricted Boltzmann Machines can provide sensitive, rational guidance for sequence classification “ in the updated manuscript.

      ii) The criterion "involve residues which could be plausibly linked to the lipid binding specificity" is again vague. Do all of the other candidate hidden units *not* involve significant contributions from substrate-binding residues? Maybe one of the other units does a better job of discriminating substrate specificity. (As indicated in Figure 8, there are examples of enzymes that confound the proposed classification.) Why combine the activations of two units for the classification, instead of 1 or 3 or...?

      In fact, it is true that the other hidden units do not involve significant contribution to substrate-binding residues, and we will clarify this. The weights found through this RBM methodology are biased to be probabilistically independent, meaning that the residues and amino acids implicated by each weight are not shared among the other weights through the design of the model. We will update the Model Weight selection section to clarify that the weights we chose had more significantly weighted residues overlapping with the residues near the lipid-binding region than the other weights we checked. We combined these two because they were the only ones which had both overlap with these residues and predictive power of lipid activity with the few sequences we had detailed knowledge of at the time of decision (Figure 5b).

      The Model Weight section reads as follows:

      “Weights were chosen which involved sequence coordinates implicated in our function of interest. Specifically, locations identified through Autodock (Hebecker et al., 2015) where the lipid was likely to interact, and a small radius around this region to select a small set of coordinates. We chose the only weights which had both overlap with multiple residues in this chosen radius and predictive power (separation) for the three examples we had to start with.”

      Author Recommendations:

      The manuscript will likely be read by many membrane biologists/biochemists, and they might like to better understand how the RBM might be useful in their own approach. Here are some suggestions along these lines. The overall goal is to explain the RBM in *plain English* - the mathematical description in Eqs 2-4 is not easily interpretable.

      (1a) Explain that the RBM is a two-layer structure, in which one layer is the "visible" elements of the input sequence, and the other is called "hidden units." Connections are only made between visible and hidden units, but all such connections are made.

      (1b) The strengths of these connections are called "weights", and are determined in a statistical way based on a large set of input sequences. Once parametrized, the RBM is capable of capturing correlations among many positions in an input sequence - a significant advantage over the DCA approach.

      We agree with this assessment, and have updated the section of the text where we introduce the RBM with a non-technical explanation of what this method is doing. It reads as:

      “The design of this RBM can be seen in Figure 4, where the model architecture is represented by purple dots and green triangles. The dots are the “visible” layer, which take in input sequences and encode them into the “hidden” layer, where each triangle represents a separate hidden unit. The lines connecting the visible and hidden layers show that each hidden unit can see all the visible units (the statistics are global), but they cannot see any of the other hidden units, meaning the hidden units are mutually independent. This global model with mutually independent hidden units (see also the marginal distribution form shown in Equation 3) has the following useful properties: higherorder couplings between... “

      (1c) Although strictly true that the DCA model is a Boltzmann machine, it's not a typical Boltzmann machine, because all of the units are visible. Typically a Boltzmann machine would also include hidden units, in order to increase its capacity/power. 

      We have clarified the relationship between DCA and Boltzmann machines, and this section now reads as:

      This class of models is closely related to another model termed the Boltzmann machine. The Boltzmann machine formulation is closely related to the Potts model from physics, which was successfully applied in biology to elucidate important residues in protein structure and function (Morcos et al., 2011), and another example being the careful tuning of enzyme specificity in bacterial two-component regulatory systems (Cheng et al., 2014; Jiang et al., 2021). The Boltzmann machine-like formulation from Morcos et al. (2011), termed Direct Coupling Analysis (DCA), stores patterns...

      (1d) Throughout, the authors refer to the activation of the hidden units as weights, but this is not a typical usage of this terminology. Connections between units are weights and have two subscripts. Given an input sequence, the sum over these weights for a given hidden unit is its activation (Eq. 1). I suggest aligning the description with the typical usage in order to make the presentation easier to follow. Hereafter I will refer to these hidden unit activations as simply activations. 

      We agree with you, the hidden units are a collection of edge weights. We have modified the terminology in the text and in our figures to consistently refer to the collections of weights as hidden units and refer to the hidden unit outputs given a sequence input as activations.

      (1e) How many hidden units are there?

      The final model was trained with 300 hidden units.

      (2) It is redundant to say that lipids are both amphiphiles and hydrophobic...amphiphile already means hydrophobic plus hydrophilic. 

      This is true, we have edited the manuscript to reflect this.

      (3) What does this mean, and what's the point of this remark? "They [lipids] are relatively smaller than other complex biomolecules, such as proteins, thereby allowing a larger portion of their surface to interact with other macromolecules." 

      We have removed this sentence.

      Reviewer 2 (Author Recommendations):

      While the idea of filtering out a part of the sequence data obtained with BLAST makes sense per se, it would be nice if the authors could comment on the nature of the sequences corresponding to the left peak in Figure 3b. It is hypothesised in conclusion that these sequences could lack any catalytic function. Could the authors experimentally check that this is the case or provide further evidence for this hypothesis?

      Yes, in this revision we provide further evidence as a new supplementary figure S2. At the time we performed domain analysis of the sequences we excluded; most of these sequences lacked the flippase domain associated with MprF function, and instead were combined with different domains. On this basis we excluded them due to their lack of relevance to the MprF from Streptococcus agalactiae we were interested in. Although there is possibility that some relevant sequences might be excluded, our assessment is that we gained specificity by reducing the set of sequences. 

      A key step in the RBM-based approach is the identification of "meaningful" hidden units, i.e. whose values are related to biological function. In Methods, the authors explain how they selected these units based on the L1 norms of the weights and the region of interaction with the lipid. While these criteria are reasonable, I wonder whether they are too stringent. In particular, one could think that regions in the proteins not in direct contact with the lipid could also be important for binding. It is known for instance that the length of loops can affect flexibility and help regulate activity in some catalytic enzymes. So my question is: if one relaxes the criterion about the coordinates of large weight values, what happens? Are other potentially interesting hidden units identified?

      We completely agree that other regions of the protein are likely involved in determining enzyme specificity, and that focusing on solely regions which interact with the lipid is perhaps missing important contributions to the catalytic function; we hypothesize that the flippase domain itself and its interaction with the catalytic domain are involved, especially considering the concerted mechanism by which they must operate. We are currently investigating these theories and will be the subject of future work. As an initial step, we present this current work with restricted information that led to concrete predictions. We focused on the lipid binding pocket because it was one of just a few bits of information we had from the start, but as the reviewer suggests, we plan to follow up our research to try to identify other relevant hidden units and domains. 

      From a purely machine-learning point of view, it would be good to see more about cross-validation of the model. More precisely, could the authors show the log-likelihood of test set data compared to the one of training sequence data?

      We agree this is an important piece of information. We will update our methods section with this information. We performed a parameter sweep to search for the parameter’s we used in our final model, and in that testing with a random 80/20% training/test split we had a training log probability loss of -0.91, and a test loss of -0.98. However, for our final model we used all available data and did not perform a split; the final result did not change dramatically by including the additional data, and the weight structure and composition was consistent with the results presented in the paper.

      Reviewer 3 (Public Review):

      In many of the analyzed strains, the presence of the lipid species Lys-PG, Lys-Glc-DAG, and Lys-Glc2-DAG is correlated to the presence of the MprF enzyme(s), but one should keep in mind that a multitude of other membrane proteins are present that in theory could be involved in the synthesis as well. Therefore, there is no direct evidence that the MprF enzymes are linked to the synthesis of these lipid species. Although, it is unlikely that other enzymes are involved, this weakens the connection between the observed lipids and the type of MprF. 

      While there are a number of proteins found on the membrane that could play a role, we have specifically used a background strain that has a transposon in mprF that makes the bacteria incapable of synthesizing Lys-lipids (Figure 7B) unless complemented back with a functional MprF (Figure 7D-E). This led us to conclude that MprF is responsible for Lys-lipid synthesis.

      Related to this, in a few cases MprF activity is tested, but the manuscript does not contain any information on protein expression levels. Heterologous expression of membrane proteins is in general challenging and due to various reasons, proteins end up not being expressed at all. As an example, the absence of activity for the E. faecalis MprF1 and E. faecium MprF2 could very well be explained by the entire absence of the protein.

      The genes were expressed on the same plasmid to control for expression. While we did not run a western blot to examine expression levels the plasmid backbone was used as a control for protein expression. Previous research supports E. faecalis MprF1 and E. faecium MprF2 not synthesizing Lys-lipids and instead most likely play a different role in the cell membrane. 

      The title is somewhat misleading. The sequence statistics and machine learning categorized the MprFs, but the identification of a novel lipid species was a coincidence while checking/confirming the categorization. 

      We believe the title is appropriate given that the identification of Enterococcus dispar was through computational methods that led to the discovery Lys-Glc2-DAG. In other words, the categorization of potential organisms that produce lipids related to MprF has been driven by the proposition from the computational method. We agree, however, that the discovery was unexpected but would not have happened without the suggested organisms coming from the methodology presented here.  

      Please read the manuscript one more time to correct textual errors.  

      The example of the role of LPS in delivering siRNA to targeted cancer cells is a bit farfetched as LPS is very different from the lipids that are being discussed here. I would rather focus on the role of Lysyl-lipids in antibiotic resistance in the introduction.  

      We included LPS here to explain that natural lipids/components of the bacterial cell membrane could be used for drug delivery systems. While it is true LPS is quite different from Lys-lipid compounds, our goal was to create an emphasis on how the bacterial domain is a rich untapped source of lipids that could be used in biotechnology.  In this way we wanted our statement to be more broadly about bacterial lipids and the importance of their continued study for diverse applications like pharmaceuticals.

      The MS identification of Lys-Glc2-DAG is convincing, especially in combination with the fragmentation data, but the ion counts suggest low abundance. The observation would be strengthened if the identification of Lysyl-Glc2-DAG with different acyl-chain configurations has been observed. This should be then mentioned or visualized in the manuscript. 

      We agree and have added an updated Figure 8A to demonstrate the presence of different acyl-chain configurations in Enterococcus dispar.  

      Further analysis of the Enterococcus strains shows the presence of the three lipids Lys-PG, Lys-Glc-DAG, and LysGlc2-DAG, although the Lys-Glc-DAG is only detected in trace amounts. This raises questions on the specificity of the MprF for the substrate Glc-DAG. If the ratio of Glc2-DAG compared to Glc-DAG abundance is similar to the ratio of Lys-Glc2-DAG vs. Lys-Glc-DAG abundance, this would strengthen the observation that the enzyme has equal affinity. However, if there is a rather large amount of Glc-DAG but a small amount of Lys-Glc-DAG, the production of Lys-Glc-DAG might be a side-reaction. 

      The reviewer brings a relevant point of discussion, however, a clear resolution might be part of future work as we do not use spike in controls when completing lipid extractions. Because of this, it  it is not possible for us to compare lipid levels across different samples. We now include a note clarifying this in the discussion section.  

      The plotting of the MprF sequence variants using the chosen RBM weights reveals a rather complex distribution over the quadrants (Figure 8). It is rather unclear in Figure 8 why only 1 sequence is plotted for Enterococcus faecalis and faecium, while 2 different MprFs are present (and tested) for these two organisms. This should be clarified.  

      We agree this can be a source of confusion. We have further clarified this in the text that only the functional alleles were plotted in Figure 8 and that all Enterococcal alleles are plotted in Figure S3 regardless of function.

    1. Author response:

      Reviewer 1:

      The role of Fgf signaling in gliogenesis and Foxg1 in neurogenesis is well known. It is not clear if Fgf18 is a direct target of Foxg1.

      We agree with the reviewer- Fgf signaling is an established pro-gliogenic pathway (Duong et al 2019) and Foxg1 overexpression is known to promote neurogenesis in cultured neural stem cells (Branacaccio et al 2019). Our study links these two mechanisms, as the Reviewer has summarized: (a) we demonstrate that FOXG1 works via modulating Fgf signaling cell-autonomously within progenitors by regulating the levels of Fgfr3. (b) Loss of Foxg1 in postmitotic neurons results in the upregulation of Fgf ligand expression (possibly via indirect mechanisms) and this non-cell autonomously increases Fgf signaling in progenitors. Our study is entirely performed in vivo.

      Proposed revision: We will revise the manuscript to reflect that Fgf18 may be an indirect target of FOXG1 in postmitotic neurons.

      Reviewer 2:

      It wasn't clear to me why the authors chose postnatal day 14 to examine the effects of Foxg1 deletion at E15 - this is a long time window, giving time for indirect consequences of Foxg1 deletion to influence development and thereby potentially complicating the interpretation of findings. For example, the authors show that there is no increased proliferation of astrocytes or death of neurons lacking Foxg1 shortly after cre-mediated deletion, but it remains formally possible (if perhaps unlikely) that these processes could be affected later during the time window. The rationale underlying the choice of this time point should be explained.

      I don't agree with the statement in the very last sentence of the results section that "neurogenesis is not possible in the absence of [Foxg1]" as there are multiple reports in the literature demonstrating the presence of neurons in Foxg1-/- mice (eg: Xuan et al., 1995; Hanashima et al., 2002, Martynoga et al., 2005, Muzio and Mallamaci 2005). Perhaps the statement refers specifically to late-born cortical neurons. This point also arises in the discussion section.

      Proposed revisions:

      (a) We will revise the manuscript to explain why we chose postnatal day 14 to examine the effects of Foxg1 deletion at E15.

      ● We have examined the transcriptomic dysregulation after Foxg1 deletion at E17.5, which is a reasonable period to identify potential direct targets. Furthermore, FOXG1 occupies the Fgfr3 locus in ChIP-seq performed at E15.5. Together, these support the interpretation that Fgfr3 is a direct target of Foxg1.

      ● As the Reviewer notes, we have investigated the possibility of increased proliferation of astrocytes and death of neurons and found no evidence that suggests these phenomena occur in the 3 days after loss of Foxg1. Cortical neurons are postmitotic and differentiated by E18.5, the stage at which we examined CC3 staining and found no difference in cell death in control and mutants (Supplementary Figure S2C, C’). The majority of progenitors (PAX6+ve cells) that lose Foxg1 at E15.5 express the gliogenic transcription factor NFIA by E18.5 (Figure 2C, C’), but hardly any express intermediate (neurogenic) progenitor marker TBR2 (Supplementary Figure S2B, B’). It is therefore unlikely that neurons are born from Foxg1 mutant progenitors and then die at a later stage.

      ● The cellular consequences of loss of Foxg1 require additional time to detect e.g. it takes ~ 5 days for GFAP to be detected in astrocytes once they are born. The P14 timepoint permits the assessment of oligogenesis which begins after astrogliogenesis and therefore permits a comprehensive assessment of the lineage of E15.5 Foxg1 null progenitors.

      (b) Thank you for pointing out that the last sentence of the results section implied (incorrectly) that ALL neurogenesis is not possible in the absence of Foxg1 We will modify this (and the discussion) to reflect that this applies to E14/15 progenitors and late-born cortical neurons.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Sun et al. are interested in how experience can shape the brain and specifically investigate the plasticity of the Toll-6 receptor-expressing dopaminergic neurons (DANs). To learn more about the role of Toll-6 in the DANs, the authors examine the expression of the Toll-6 receptor ligand, DNT-2. They show that DNT-2 expressing cells connect with DANs and that loss of function of DNT-2 in these cells reduces the number of PAM DANs, while overexpression causes alterations in dendrite complexity. Finally, the authors show that alterations in the levels of DNT-2 and Toll-6 can impact DAN-driven behaviors such as climbing, arena locomotion, and learning and long-term memory.

      Strengths:

      The authors methodically test which neurotransmitters are expressed by the 4 prominent DNT-2 expressing neurons and show that they are glutamatergic. They also use Trans-Tango and Bac-TRACE to examine the connectivity of the DNT-2 neurons to the dopaminergic circuit and show that DNT-2 neurons receive dopaminergic inputs and output to a variety of neurons including MB Kenyon cells, DAL neurons, and possibly DANS.

      We are very pleased that Reviewer 1 found our connectivity analysis a strength.

      Weaknesses:

      (1) To identify the DNT-2 neurons, the authors use CRISPR to generate a new DN2-GAL4. They note that they identified at least 12 DNT-2 plus neurons. In Supplementary Figure 1A, the DNT-2-GAL4 driver was used to express a UAS-histoneYFP nuclear marker. From these figures, it looks like DNT-2-GAL4 is labeling more than 12 neurons. Is there glial expression?

      Indeed, we claimed that DNT-2 is expressed in at least 12 neurons (see line 141, page 6 of original manuscript), which means more than 12 could be found. The membrane tethered reporters we used – UAS-FlyBow1.1, UASmcD8-RFP, UAS-MCFO, as well as UAS-DenMark:UASsyd-1GFP – gave a consistent and reproducible pattern. However, with DNT-2GAL4>UAS-Histone-YFP more nuclei were detected that were not revealed by the other reporters. We have found also with other GAL4 lines that the patterns produced by different reporters can vary. This could be due to the signal strength (eg His-YFP is very strong) and perdurance of the reporter (e.g. the turnover of His-YFP may be slower than that of the other fusion proteins).

      We did not test for glial expression, as it was not directly related to the question addressed in this work.

      (2) In Figure 2C the authors show that DNT-2 upregulation leads to an increase in TH levels using q-RT-PCR from whole heads. However, in Figure 3H they also show that DNT-2 overexpression also causes an increase in the number of TH neurons. It is unclear whether TH RNA increases due to expression/cell or the number of TH neurons in the head.

      Figure 3H shows that over-expression of DNT-2 FL increased the number of Dcp1+ apoptotic cells in the brain, but not significantly (p=0.0939). The ability of full-length neurotrophins to induce apoptosis and cleaved neurotrophins promote cell survival is well documented in mammals. We had previously shown that DNT-2 is naturally cleaved, and that over-expression of DNT-2 does not induce apoptosis in the various contexts tested before (McIlroy et al 2013 Nature Neuroscience; Foldi et al 2017 J Cell Biol; Ulian-Benitez et al 2017 PLoS Genetics). Similarly, throughout this work we did not find DNT-2FL to induce apoptosis.

      Instead, in Figure 3G we show that over-expression of DNT-2FL causes a mild yet statistically significant increase in the number of TH+ cells. This is an important finding that supports the plastic regulation of PAM cell number. We thank the Reviewer for highlighting this point, as we had forgotten to add the significance star in the graph. In this context, we cannot rule out the possibility that the increase in TH mRNA observed when we over-express DNT-2FL could not be due to an increase in cell number instead. Unfortunately, it is not possible for us to separate these two processes at this time. Either way, the result would still be the same: an increase in dopamine production when DNT-2 levels rise.

      (3) DNT-2 is also known as Spz5 and has been shown to activate Toll-6 receptors in glia (McLaughlin et al., 2019), resulting in the phagocytosis of apoptotic neurons. In addition, the knockdown of DNT-2/Spz5 throughout development causes an increase in apoptotic debris in the brain, which can lead to neurodegeneration. Indeed Figure 3H shows that an adult-specific knockdown of DNT-2 using DNT2-GAL4 causes an increase in Dcp1 signal in many neurons and not just TH neurons.

      Indeed, we did find Dcp1+ cells in TH-negative cells too (although not widely throughout the brain). This is not surprising, as DNT-2 neurons have large arborisations that can reach a wide range of targets; DNT-2 is secreted, and could reach beyond its immediate targets; Toll-6 is expressed in a vast number of cells in the brain; DNT-2 can bind promiscuously at least also Toll-7 and other Keks, which are also expressed in the adult brain (Foldi et al 2017 J Cell Biology; Ulian-Benitez et al 2017 PLoS Genetics; Li et al 2020 eLife). Together with the findings by McLaughlin et al 2019, our findings further support the notion that DNT-2 is a neuroprotective factor in the adult brain. It will be interesting to find out what other neuron types DNT-2 maintains.

      We would like to thank Reviewer 1 for their positive comments on our work and their interesting and valuable feedback.

      Reviewer #2 (Public review):

      This paper examines how structural plasticity in neural circuits, particularly in dopaminergic systems, is regulated by Drosophila neurotrophin-2 (DNT-2) and its receptors, Toll-6 and Kek-6. The authors show that these molecules are critical for modulating circuit structure and dopaminergic neuron survival, synaptogenesis, and connectivity. They show that loss of DNT-2 or Toll-6 function leads to loss of dopaminergic neurons, dendritic arborization, and synaptic impairment, whereas overexpression of DNT-2 increases dendritic complexity and synaptogenesis. In addition, DNT-2 and Toll-6 modulate dopamine-dependent behaviors, including locomotion and long-term memory, suggesting a link between DNT-2 signaling, structural plasticity, and behavior.

      A major strength of this study is the impressive cellular resolution achieved. By focusing on specific dopaminergic neurons, such as the PAM and PPL1 clusters, and using a range of molecular markers, the authors were able to clearly visualize intricate details of synapse formation, dendritic complexity, and axonal targeting within defined circuits. Given the critical role of dopaminergic pathways in learning and memory, this approach provides a good opportunity to explore the role of DNT-2, Toll-6, and Kek-6 in experience-dependent structural plasticity. However, despite the promise in the abstract and introduction of the paper, the study falls short of establishing a direct causal link between neurotrophin signaling and experience-induced plasticity.

      Simply put, this study does not provide strong evidence that experience-induced structural plasticity requires DNT-2 signaling. To support this idea, it would be necessary to observe experience-induced structural changes and demonstrate that downregulation of DNT-2 signaling prevents these changes. The closest attempt to address this in this study was the artificial activation of DNT-2 neurons using TrpA1, which resulted in overgrowth of axonal arbors and an increase in synaptic sites in both DNT-2 and PAM neurons. However, this activation method is quite artificial, and the authors did not test whether the observed structural changes were dependent on DNT-2 signaling. Although they also showed that overexpression of DNT-2FL in DNT-2 neurons promotes synaptogenesis, this phenotype was not fully consistent with the TrpA1 activation results (Figures 5C and D).

      In conclusion, this study demonstrates that DNT-2 and its receptors play a role in regulating the structure of dopaminergic circuits in the adult fly brain. However, it does not provide convincing evidence for a causal link between DNT-2 signaling and experience-dependent structural plasticity within these circuits.

      We would like to thank Reviewer 2 for their very positive assessment of our approach to investigate structural circuit plasticity. We are delighted that this Reviewer found our cellular resolution impressive. We are also very pleased that Reviewer 2 found that our work demonstrates that DNT-2 and its receptors regulate the structure of dopaminergic circuits in the adult fly brain. This is already a very important finding that contributes to demonstrating that, rather than being hardwired, the adult fly brain is plastic, like the mammalian brain.

      We are very pleased that this Reviewer acknowledges that this work provides a good opportunity to explore the role of DNT-2, Toll-6, and Kek-6 in experience-dependent structural plasticity. We provide a molecular mechanism and proof of principle, and we demonstrate a direct link between the function of DNT-2 and its receptors in circuit plasticity, and a suggestive link to neuronal activity. Finding out the direct link to lived experience is a big task, beyond the scope of this manuscript, and we will be testing this with future projects. Nevertheless, it is important to place our findings within this context, as it opens opportunities for discovery by the neuroscience community.

      We would like to thank Reviewer 2 for the positive and thoughtful evaluation of our work, and for their feedback.

      Reviewer #3 (Public review):

      Summary:

      The authors used the model organism Drosophila melanogaster to show that the neurotrophin Toll-6 and its ligands, DNT-2 and kek-6, play a role in maintaining the number of dopaminergic neurons and modulating their synaptic connectivity. This supports previous findings on the structural plasticity of dopaminergic neurons and suggests a molecular mechanism underlying this plasticity.

      Strengths:

      The experiments are overall very well designed and conclusive. Methods are in general state-of-the-art, the sample sizes are sufficient, the statistical analyses are sound, and all necessary controls are in place. The data interpretation is straightforward, and the relevant literature is taken into consideration. Overall, the manuscript is solid and presents novel, interesting, and important findings.

      We are delighted that Reviewer 3 found our work solid, novel, interesting and with important findings. We are also very pleased that this Reviewer found that all necessary controls have been carried out.

      Weaknesses:

      There are three technical weaknesses that could perhaps be improved.

      First, the model of reciprocal, inhibitory feedback loops (Figure 2F) is speculative. On the one hand, glutamate can act in flies as an excitatory or inhibitory transmitter (line 157), and either situation can be the case here. On the other hand, it is not clear how an increase or decrease in cAMP level translates into transmitter release. One can only conclude that two types of neurons potentially influence each other.

      Thank you for pointing out that glutamate can be inhibitory. In mammals, the neurotrophin BDNF has an important function in glutamatergic synapses, thus we were intrigued by a potential evolutionary conservation. Our evidence that DNT-2A neurons could be excitatory is indirect, yet supportive: exciting DNT-2 neurons with optogenetics resulted in an increase in GCaMP in PAMs (data not shown); over-expression of DNT-2 in DNT-2 neurons increased TH mRNA levels; optogenetic activation of DNT-2 neurons results in the Dop2R-dependent downregulation of cAMP levels in DNT-2 neurons. Dop2R signals in response to dopamine, which would be released only if dopaminergic neurons had been excited. Accordingly, glutamate released from DNT-2 neurons would have been rather unlikely to inhibit DANs.

      cAMP is a second messenger that enables the activation of PKA. PKA phosphorylates many target proteins, amongst which are various channels. This includes the voltage gated calcium channels located at the synapse, whose phosphorylation increases their opening probability. Thus, a rise in cAMP could facilitate neurotransmitter release, and a downregulation would have the opposite effect. Other targets of PKA include CREB, leading to changes in gene expression. Conceivably, a decrease in PKA activity could result in the downregulation of DNT-2 expression in DNT-2 neurons. This negative feedback loop would restore the homeostatic relationship between DNT-2 and dopamine levels.

      Our data indeed demonstrate that DNT-2 and PAM neurons influence each other, not potentially, but really. We have provided data that: DNT-2 and PAMs are connected through circuitry; that the DNT-2 receptors Toll-6 and kek-6 are expressed in DANs, including in PAMs; that alterations in the levels of DNT-2 (both loss and gain of function) and loss of function for the DNT-2 receptors Toll-6 and Kek-6 alter PAM cell number, alter PAM dendritic complexity and alter synaptogenesis in PAMs; alterations in the levels of DNT-2, Toll-6 and kek-6 in adult flies alters dopamine dependent behaviours of climbing, locomotion in an arena and learning and long-term memory. These data firmly demonstrate that the two neuron types DNT-2 and PAMs influence each other.

      We have also shown that over-expression of DNT-2 in DNT-2 neurons increases TH mRNA levels, whereas activation of DNT-2 neurons decreases cAMP levels in DNT-2 neurons in a dopamine/Dop2R-dependent manner. These data show a functional interaction between DNT-2 and PAM neurons.

      Second, the quantification of bouton volumes (no y-axis label in Figure 5 C and D!) and dendrite complexity are not convincingly laid out. Here, the reader expects fine-grained anatomical characterizations of the structures under investigation, and a method to precisely quantify the lengths and branching patterns of individual dendritic arborizations as well as the volume of individual axonal boutons.

      Figure 5C, D do contain Y-axis labels, all our graphs in main manuscript and in supplementary files contain Y-axis labels.

      In fact, we did use a method to precisely quantify the lengths and branching patterns of individual dendritic arborisations, volume of individual boutons and bouton counting. These analyses were carried out using Imaris software. For dendritic branching patterns, the “Filament Autodetect” function was used. Here, dendrites were analysed by tracing semi-automatically each dendrite branch (ie manual correction of segmentation errors) to reconstruct the segmented dendrite in volume. From this segmented dendrite, Imaris provides measurements of total dendrite volume, number and length of dendrite branches, terminal points, etc. For bouton size and number, we used the Imaris “Spot” function. Here, a threshold is set to exclude small dots (eg of background) that do not correspond to synapses/boutons. All samples and genotypes are treated with the same threshold, thus the analysis is objective and large sample sizes can be analysed effectively. We have already provided a description of the use of Imaris in the methods section.

      Third, Figure 1C shows two neurons with the goal of demonstrating between-neuron variability. It is not convincingly demonstrated that the two neurons are actually of the very same type of neuron in different flies or two completely different neurons.

      We thank Reviewer 3 for raising this interesting point. It is not possible to prove which of the four DNT-2A neurons per hemibrain, which we visualised with DNT-2>MCFO, were the same neurons in every individual brain we looked at. This is because in every brain we have looked at, the soma of the neurons were not located in exactly the same location. Furthermore, the arborisation patterns are also different and unique, for each individual brain. Thus, there is natural variability in the position of the soma and in the arborisation patterns. Such variability presumably results from the combination of developmental and activity-dependent plasticity.

      We would like to thank Reviewer 3 for the very positive evaluation of our work and the interesting and valuable feedback.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Here the authors present their evidence linking the mitochondrial uniporter (MCU-1) and olfactory adaptation in C. elegans. They clearly demonstrate a behavioral defect of mcu-1 mutants in adaptation over 60 minutes and present evidence that this gene functions in the AWC primary sensory neurons at, or close to, the time of adaptation. 

      Strengths: 

      The paper is very well organized and their approach to unpacking the role of mcu-1 mutants in olfactory adaptation is very reasonable. The authors lean into diverse techniques including behavior, genetics, and pharmacological manipulation in order to flesh out their model for how MCU-1 functions in AWC neurons with respect to olfaction. 

      Weaknesses: 

      I would like to see the authors strengthen the link between mitochondrial calcium and olfactory adaptation. The authors present some gCaMP data in Figure 5 but it is unclear to me why this tool is not better utilized to explore the mechanism of MCU-1 activity. I think this is very important as the title of the paper states that "mitochondrial calcium modulates.." behavior in AWC and so it would be nice to see more evidence to support this direct connection. I would also like to see the authors place their findings into a model based on previous findings and perhaps examine whether mcu-1 is required for EGL-4 nuclear translocation, which would be straightforward to examine. 

      We agree that observing calcium levels inside the mitochondria would conclusively demonstrate that mitochondria calcium directly impacts neuropeptide secretion and behavior. We will try to do this with a mitochondrially targeted calcium indicator. We will also better integrate our findings to existing models in the literature, such as EGL-4 nuclear localization in AWC in response to prolonged odor exposure. Thank you for your comments.

      Reviewer #2 (Public review): 

      Summary: 

      In their manuscript, "Mitochondrial calcium modulates odor-mediated behavioural plasticity in C. elegans", Lee et al. aim to link a mitochondrial calcium transporter to higher-order neuronal functions that mediate memory and aversive learning behaviours. The authors characterise the role of the mitochondrial calcium uniporter, and a specific subunit of this complex, MCU-1, within a single chemosensory neuron (AWCOFF) during aversive odor learning in the nematode. By genetically manipulating mcu-1 as well as using pharmacological activators and blockers of MCU activity, the study presents compelling evidence that the activity of this individual mitochondrial ion transporter in AWCOFF is sufficient to drive animal behaviour through aversive memory formation. The authors show that perturbations to mcu-1 and MCU activity prevent aversive learning to several chemical odors associated with food absence. The authors propose a model, experimentally validated at several steps, whereby an increase in MCU activity during odor conditioning stimulates mitochondrial calcium influx and an increase in mitochondrial reactive oxygen species (mtROS) production, triggering the release of the neuropeptide NLP-1 from AWC, all of which are required to mediate future avoidance behaviour of the chemical odor. 

      Strengths: 

      Overall, the authors provided robust evidence that mitochondrial function, mediated through MCU activity, contributes to behavioural plasticity. They also demonstrated that ectopic MCU activation or mtROS during odor exposure could accelerate learning. This is quite profound, as it highlights the importance of mitochondrial function in complex neuronal processes beyond their general roles in the development and maintenance of neurons through energy homeostasis and biosynthesis, amongst their other cell-non-specific roles. 

      Weaknesses: 

      While the manuscript is generally robust, there are some concerns that should be addressed to improve the strength of the proposed model: 

      (1) Throughout the manuscript, it is implied that MCU activation caused by odor conditioning changes mitochondrial calcium levels. However, there is no direct experimental evidence of this. For example, the authors write on p.10 "This shows that H2O2 production occurs downstream of MCU activation and calcium influx into the mitochondria", and on p. 11, the statement that prolonged exposure to odors causes calcium influx. Because this is a key element of the proposed model, experimental evidence would be required to support it. 

      We are planning to measure mitochondrial calcium levels directly by using a mitochondrially targeted calcium indicator. We agree that this is a key element of our model.

      (2) Some controls missing, e.g. a heat-shock-only control in WT and mcu-1 (non-transgenic) background in Figure 1h is required to ensure the heat-shock stress does not interfere with odor learning. 

      We will conduct the experiments again with necessary controls.

      (3) Lee et al propose that mcu-1 is required at the adult stage to accomplish odor learning because inducing mcu-1 expression at larval stages did not rescue the phenotype of mcu-1 mutants during adulthood. However, the requirement of MCU for odor learning was narrowed down to a 15' window at the end of odor conditioning (Figure 5c). Is it possible that MCU-1 protein levels decline after larval induction so that MCU-1 is no longer present during adulthood when odor conditioning is performed? 

      Yes, we also noted that the early induction of MCU-1 is not effective to restore learning, and hypothesized that MCU-1 protein may be subject to high turnover. It may be that MCU-1 induced during larval stages no longer exist by the time odor conditioning is performed, although we have not confirmed this. We had a brief sentence noting this in the discussion section, but we will discuss this a little further in the revision. Thank you.

      (4) There is a limited learning effect observable after 30 minutes, and a very pronounced effect in all animals after 90 minutes. The authors very carefully dissect the learning mechanism at 60 minutes of exposure and distinguish processes that are relevant at 60 minutes from those important at 30 minutes. Some explanation or speculation as to why the processes crucial at the 60-minute mark are redundant at 90 minutes of exposure would be important. 

      I think this is in line with Reviewer #1’s comments that we should discuss our findings more in relation to existing models in the literature. We will do this in our revision.

      (5) Given the presumably ubiquitous function of mcu-1/MCU in mitochondrial calcium homeostasis, it is remarkable that its perturbation impacts only a very specific neuronal process in AWC at a very specific time. The authors should elaborate on this surprising aspect of their discovery in the discussion. 

      We will discuss the implication further in our revised manuscript.

      (6) Associated with the above comment, it remains possible that mcu-1 is required in coelomocytes for their ability to absorb NLP-1::Venus (Figure 3B), and the AWC-specific role of mcu-1 for this phenotype should be determined. 

      To confirm that mcu-1 is not required for coelomocyte uptake, we can stimulate NLP-1:Venus secretion in mcu-1 worms by adding H2O2, then observe whether Venus is observed in the coelomocytes. We will include this in our revised manuscript. Thank you for your comments.

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript reports a role for the mitochondrial calcium uniporter gene (mcu-1) in regulating associative learning behavior in C. elegans. This regulation occurs by mcu-1-dependent secretion of the neuropeptide NLP-1 from the sensory neuron AWC. The authors report a post-developmental role for mcu-1 in AWC to promote learning. The authors further show that odor conditioning leads to increases in NLP-1 secretion from AWC, and that interfering with mcu-1 function reduces NLP-1 secretion. Finally, the authors show that NLP-1 secretion increases when ROS levels in AWC are genetically or pharmacologically elevated. The authors propose that mitochondrial calcium entry through MCU-1 in response to odor conditioning leads to the generation of ROS and the subsequent increase in neuropeptide secretion to promote conditioned behavior. 

      Strengths: 

      (1) The authors show convincingly that genetically or pharmacologically manipulating MCU function impacts chemotaxis in a conditioned learning paradigm. 

      (2) The demonstration that the secretion of a specific neuropeptide can be up-regulated by MCU, ROS and odor conditioning is an important and interesting advance that addresses mechanisms by which neuropeptide secretion can be regulated in vivo. 

      Weaknesses: 

      (1) The authors conclusion that mcu-1 functions in the AWC-on neuron is not adequately supported by their rescue experiments. The promoter they use for rescue drives expression in a number of additional neurons including AWC-on, that themselves are implicated in adaptation, leaving open the possibility that mcu-1 may function non-autonomously instead of autonomously in AWC to regulate this behavior. 

      We recognized this as well, and we now have a promoter construct more specific to AWCON (str-2). Using this more specific promoter, we will confirm that the role of mcu-1 is indeed AWCON-specific in our revised manuscript.

      (2) The authors conclude MCU promotes neuropeptide release from AWC by controlling calcium entry into mitochondria, but they did not directly examine the effects of altered MCU function on calcium dynamics either in mitochondria or in the soma, even though they conducted calcium imaging experiments in AWC of wild type animals. Examination of calcium entry in mitochondria would be a direct test of their model.

      We agree. As we stated above for reviewer #1 and #2, we will include results from the mitochondrial calcium data in our revised manuscript.

      (3) The authors' conclusion that mitochondrial-derived ROS produced by MCU activation drives neuropeptide release does not appear to be experimentally supported. A major weakness of this paper is that experiments addressing whether mcu-1 activity indeed produces ROS are not included, leaving unanswered the question of whether MCU is the endogenous source of ROS that drives neuropeptide secretion.

      We can confirm this using mitochondrially targeted redox indicator roGFP, and we will be sure to include the data in the revised manuscript. Thank you for your comments.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Nicoletti et al. presents a minimal model of habituation, a basic form of non-associative learning, addressing both from dynamical and information theory aspects of how habituation can be realized. The authors identify that negative feedback provided with a slow storage mechanism is sufficient to explain habituation.

      Strengths:

      The authors combine the identification of the dynamical mechanism with information-theoretic measures to determine the onset of habituation and provide a description of how the system can gain maximum information about the environment.

      We thank the reviewer for highlighting the strength of our work.

      Weaknesses:

      I have several main concerns/questions about the proposed model for habituation and its plausibility. In general, habituation does not only refer to a decrease in the responsiveness upon repeated stimulation but as Thompson and Spencer discussed in Psych. Rev. 73, 16-43 (1966), there are 10 main characteristics of habituation, including (i) spontaneous recovery when the stimulus is withheld after response decrement; dependence on the frequency of stimulation such that (ii) more frequent stimulation results in more rapid and/or more pronounced response decrement and more rapid spontaneous recovery; (iii) within a stimulus modality, the less intense the stimulus, the more rapid and/or more pronounced the behavioral response decrement; (iv) the effects of repeated stimulation may continue to accumulate even after the response has reached an asymptotic level (which may or may not be zero, or no response). This effect of stimulation beyond asymptotic levels can alter subsequent behavior, for example, by delaying the onset of spontaneous recovery.

      These are only a subset of the conditions that have been experimentally observed and therefore a mechanistic model of habituation, in my understanding, should capture the majority of these features and/or discuss the absence of such features from the proposed model.

      We are really grateful to the reviewer for pointing out these aspects of habituation that we overlooked in the previous version of our manuscript. Indeed, our model is able to capture most of these 10 observed behaviors, specifically: 1) habituation; 2) spontaneous recovery; 3) potentiation of habituation; 4) frequency sensitivity; and 5) intensity sensitivity. Here, we are following the same terminology employed in bioRxiv 2024.08.04.606534, the paper highlighted by the referee. Regarding the hallmark 6) subliminal accumulation, we also believe that our model can capture it as well, but more analyses are needed to substantiate this claim. We will include the discussion of these points in the revised version.

      Notably, in line with the discussion in bioRxiv 2024.08.04.606534, we also think that feature 10) long-term habituation, is ambiguous and its appearance might be simply related to the other features discussed above. In the revised version, we will detail our take on this aspect in relation to the presented model.

      All other hallmarks require the presence of multiple stimuli and, as a consequence, they cannot be observed within our model, but are interesting lines of research for future investigations. We believe that this addition will help clarify the validity of the model and the relevance of our result, consequently improving the quality of our manuscript.

      Furthermore, the habituated response in steady-state is approximately 20% less than the initial response, which seems to be achieved already after 3-4 pulses, the subsequent change in response amplitude seems to be negligible, although the authors however state "after a large number of inputs, the system reaches a time-periodic steady-state". How do the authors justify these minimal decreases in the response amplitude? Does this come from the model parametrization and is there a parameter range where more pronounced habituation responses can be observed?

      The referee is correct, but this is solely a consequence of the specific set of parameters we selected. We made this choice solely for visualization purposes. In the next version, when different emerging behaviors characterizing habituation are discussed, we will also present a set of parameters for which habituation can be better appreciated, justifying our new choice.

      We stated that the time-periodic steady-state is reached “after a large number of stimuli” from a mathematical perspective. However, by using a habituation threshold, as defined in bioRxiv 2024.08.04.606534 for example, we can say that the system is habituated after a few stimuli for the set of parameters selected in the first version of the manuscript. We will also discuss this aspect in the Supplemental Material of the revised version, as it will also be important to appreciate the hallmarks of habituation listed above.

      The same is true for the information content (Figure 2f) - already at the first pulse, IU, H ~ 0.7 and only negligibly increases afterwards. In my understanding, during learning, the mutual information between the input and the internal state increases over time and the system extracts from these predictions about its responses. In the model presented by the authors, it seems the system already carries information about the environment which hardly changes with repeated stimulus presentation. The complexity of the signal is also limited, and it is very hard to clarify from the presented results, whether the proposed model can actually explain basic features of habituation, as mentioned above.

      The point about information is more subtle. We can definitely choose a set of parameters for which the information gain is higher and we will show it in the Supplemental Material of the revised version. However, as the reviewer correctly points out, it is difficult to give an interpretation of the specific value of I_U,H for such a minimal model.

      We also remark that, since the readout population and the receptor both undergo a fast dynamics (with appropriate timescales as discussed in the text), we are not observing the transient gain of information associated with the first stimulus and, as such, the mutual information presents a discontinuous behavior resembling the dynamics of the readout.

      Additionally, there have been two recent models on habituation and I strongly suggest that the authors discuss their work in relation to recent works (bioRxiv 2024.08.04.606534; arXiv:2407.18204).

      We thank the reviewer for pointing out these relevant references. We will discuss analogies and differences in the revised version of the main text. The main difference is the fact that information-theoretic aspects of habituation are not discussed in the presented references, while the idea of this work is to elucidate exactly the interplay between information gain and habituation dynamics.

      Reviewer #2 (Public review):

      In this study, the authors aim to investigate habituation, the phenomenon of increasing reduction in activity following repeated stimuli, in the context of its information-theoretic advantage. To this end, they consider a highly simplified three-species reaction network where habituation is encoded by a slow memory variable that suppresses the receptor and therefore the readout activity. Using analytical and numerical methods, they show that in their model the information gain, the difference between the mutual information between the signal and readout after and before habituation, is maximal for intermediate habituation strength. Furthermore, they demonstrate that the Pareto front corresponds to an optimization strategy that maximizes the mutual information between signal and readout in the steady state, minimizes some form of dissipation, and also exhibits similar intermediate habituation strength. Finally, they briefly compare predictions of their model to whole-brain recordings of zebrafish larvae under visual stimulation.

      The author's simplified model might serve as a solid starting point for understanding habituation in different biological contexts as the model is simple enough to allow for some analytic understanding but at the same time exhibits all basic properties of habituation in sensory systems. Furthermore, the author's finding of maximal information gain for intermediate habituation strength via an optimization principle is, in general, interesting. However, the following points remain unclear or are weakly explained:

      We thank the reviewer for deeming our work interesting and for considering it a solid starting point for understanding habituation in biological systems.

      (1) Is it unclear what the meaning of the finding of maximal information gain for intermediate habituation strength is for biological systems? Why is information gain as defined in the paper a relevant quantity for an organism/cell? For instance, why is a system with low mutual information after the first stimulus and intermediate mutual information after habituation better than one with consistently intermediate mutual information? Or, in other words, couldn't the system try to maximize the mutual information acquired over the whole time series, e.g., the time series mutual information between the stimulus and readout?

      This is an important and delicate aspect to discuss. We considered the mutual information with a prolonged stimulation when building the Pareto front, by maximizing this quantity while minimizing the dissipation. The observation that the Pareto front lies in the vicinity of the maximum of the information gain hints at the fact that reducing the information gain by increasing the mutual information at each stimulation will require more energy. However, we did not thoroughly explore this aspect by considering all sources of dissipation and the fact that habituation is, anyway, a dynamical phenomenon. In the revised version, we will clarify this point, extending our analyses.

      We would like to add that, from a naive perspective, while the first stimulation will necessarily trigger a certain mutual information, multiple observations of the same stimulus have to reflect into accumulated infor

      mation that consequently drives the onset of observed dynamical behaviors, such as habituation.

      (2) The model is very similar to (or a simplification of previous models) for adaptation in living systems, e.g., for adaptation in chemotaxis via activity-dependent methylation and demethylation. This should be made clearer.

      We apologize for having missed this point. Our choice has been motivated by the fact that we wanted to avoid any confusion between the usual definition of (perfect) adaptation and habituation. At any rate, we will add this clarification in the revised version.

      (3) It remains unclear why this optimization principle is the most relevant one. While it makes sense to maximize the mutual information between stimulus and readout, there are various choices for what kind of dissipation is minimized. Why was \delta Q_R chosen and not, for instance, \dot{\Sigma}_int or the sum of both? How would the results change in that case? And how different are the results if the mutual information is not calculated for the strong stimulation input statistics but for the background one?

      We thank the referee for giving us the opportunity to deepen this aspect of the manuscript. We decided to minimize \delta Q_R since this dissipation is unavoidable. In fact, considering the existence of two different pathways implementing sensing and feedback, the presence of any input will result in a dissipation produced by the receptor. This energy consumption is reflected in \delta Q_R. Conversely, the dissipation associated with the storage is always zero in the limit of a fast memory. However, we know that such a limit is pathological and leads to no habituation. As a consequence, in the revised version we will discuss other choices for our optimization approach, along with their potentialities and limitations.

      The dependence of the Pareto front on the stimulus strength is shown in the Supplemental Material, but not in relation to habituation and information gain. We will strengthen this part in the revised version of the manuscript, elaborating more on the connection between optimality, information gain, and dynamical behavior.

      (4) The comparison to the experimental data is not too strong of an argument in favor of the model. Is the agreement between the model and the experimental data surprising? What other behavior in the PCA space could one have expected in the data? Shouldn't the 1st PC mostly reflect the "features", by construction, and other variability should be due to progressively reduced activity levels?

      The agreement between data and model is not surprising - we agree on this - since the data exhibit habituation. However, the fact that, without any explicit biological details, our minimal model is able to capture the features of a complex neural system just by looking at the PCs is non-trivial. The 1st PC only reflects the feature that captures most of the variance of the data and, as such, it is difficult to have a-priori expectations on what it should represent. Depending on the behavior of higher-order PCs, we may include them in the revised version if any interesting results arise.

      Reviewer #3 (Public review):

      The authors use a generic model framework to study the emergence of habituation and its functional role from information-theoretic and energetic perspectives. Their model features a receptor, readout molecules, and a storage unit, and as such, can be applied to a wide range of biological systems. Through theoretical studies, the authors find that habituation (reduction in average activity) upon exposure to repeated stimuli should occur at intermediate degrees to achieve maximal information gain. Parameter regimes that enable these properties also result in low dissipation, suggesting that intermediate habituation is advantageous both energetically and for the purpose of retaining information about the environment.

      A major strength of the work is the generality of the studied model. The presence of three units (receptor, readout, storage) operating at different time scales and executing negative feedback can be found in many domains of biology, with representative examples well discussed by the authors (e.g. Figure 1b). A key takeaway demonstrated by the authors that has wide relevance is that large information gain and large habituation cannot be attained simultaneously. When energetic considerations are accounted for, large information gain and intermediate habituation appear to be a favorable combination.

      We thank the referee for this positive assessment of our work and its generality.

      While the generic approach of coarse-graining most biological detail is appealing and the results are of broad relevance, some aspects of the conducted studies, the problem setup, and the writing lack clarity and should be addressed:

      (1) The abstract can be further sharpened. Specifically, the "functional role" mentioned at the end can be made more explicit, as it was done in the second-to-last paragraph of the Introduction section ("its functional advantages in terms of information gain and energy dissipation"). In addition, the abstract mentions the testing against experimental measurements of neural responses but does not specify the main takeaways. I suggest the authors briefly describe the main conclusions of their experimental study in the abstract.

      We thank the referee for this suggestion. The revised version will present a modified abstract in line with the reviewer’s proposal.

      (2) Several clarifications are needed on the treatment of energy dissipation.

      - When substituting the rates in Eq. (1) into the definition of δQ_R above Eq. (10), "σ" does not appear on the right-hand side. Does this mean that one of the rates in the lower pathway must include σ in its definition? Please clarify.

      We apologize to the referee for this typo. Indeed, \sigma sets the energy scale of the feedback and, as such, it appears in the energetic driving given by the feedback on the receptor, i.e., together with \kappa in Eq. (1). We will fix this issue in the revised version. Moreover, we will check the entire manuscript to be sure that all formulas are consistent.

      - I understand that the production of storage molecules has an associated cost σ and hence contributes to dissipation. The dependence of receptor dissipation on <H>, however, is not fully clear. If the environment were static and the memory block was absent, the term with <H> would still contribute to dissipation. What would be the nature of this dissipation?

      In the spirit of building a paradigmatic minimal model with a thermodynamic meaning, we considered H to act as an external thermodynamic driving. Since this driving acts on a different pathway with respect to the one affected by the storage, the receptor is driven out of equilibrium by its presence. By eliminating the memory block, we would also be necessarily eliminating the presence of the pathway associated with the storage effect (“internal pathway” in the manuscript). In this case, the receptor is a 2-state, 1-pathway system and, as such, it always satisfies an effective detailed balance. As a consequence, the definition of \delta Q_R reported in the manuscript does not hold anymore and the receptor does not exhibit any dissipation. Our choice to model two different pathways has been biologically motivated. We will make this crucial aspect clearer in the revised manuscript.

      - Similarly, in Eq. (9) the authors use the ratio of the rates Γ_{s → s+1} and Γ_{s+1 → s} in their expression for internal dissipation. The first-rate corresponds to the synthesis reaction of memory molecules, while the second corresponds to a degradation reaction. Since the second reaction is not the microscopic reverse of the first, what would be the physical interpretation of the log of their ratio? Since the authors already use σ as the energy cost per storage unit, why not use σ times the rate of producing S as a metric for the dissipation rate?

      In the current version of the manuscript, we employed the scheme of a controlled birth and death process to model the coupled process of readout and storage production. Since we are not dealing with a detailed biochemical underlying network, we used this coarse-grained description to capture the main features of the dynamics. In this sense, the considered reactions produce and destroy a molecule from a certain pool even if they are controlled in different ways by the readout. However, we completely agree with the point of view of the referee and will analyze our results following their suggestion.

      (3) Impact of the pre-stimulus state. The plots in Figure 2 suggest that the environment was static before the application of repeated stimuli. Can the authors comment on the impact of the pre-stimulus state on the degree of habituation and its optimality properties? Specifically, would the conclusions stay the same if the prior environment had stochastic but aperiodic dynamics?

      The initial stimulus is indeed stochastic with an average constant in time. Model response depends on the pre-stimulus level, since it also sets the stationary storage concentration before the first “strong” stimulation arrives. This dependence is not crucial for our result but deserves proper discussion, as the referee correctly pointed out. We will clarify this point in the revised version of this study.

      (4) Clarification about the memory requirement for habituation. Figure 4 and the associated section argue for the essential role that the storage mechanism plays in habituation. Indeed, Figure 4a shows that the degree of habituation decreases with decreasing memory. The graph also shows that in the limit of vanishingly small Δ⟨S⟩, the system can still exhibit a finite degree of habituation. Can the authors explain this limiting behavior; specifically, why does habituation not vanish in the limit Δ⟨S⟩ -> 0?

      We apologize for the lack of clarity here. Actually, Δ⟨S⟩ is not strictly zero, but equal to 0.15% at the final point. However, due to rounding this appears as 0% in the plot, and we will fix it in the revised version. Let us note that the fact that Δ⟨S⟩ is small signals a nonlinear dependence of Δ⟨U⟩ from Δ⟨S⟩, but no contradiction. We will clarify this aspect in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study investigates a dietary intervention that employs a smartphone app to promote meal regularity, which may be useful. Despite no observed changes in caloric intake, the authors report significant weight loss. While the concept is very interesting and deserves to be studied due to its potential clinical relevance, the study's rigor needs to be revised, notably for its reliance on self-reported food intake, a highly unreliable way to assess food intake. Additionally, the study theorizes that the intervention resets the circadian clock, but the study needs more reliable methods for assessing circadian rhythms, such as actigraphy.

      Thank you for the positive yet critical feedback on our manuscript. We are pleased with the assessment that our study is very interesting and deserves to be continued. We have addressed the points of criticism mentioned and discussed the limitations of the study in more detail in the revised version than before.

      Nevertheless, we would like to note that one condition for our study design was that the participants were able to carry out the study in their normal everyday environment. This means that it is not possible to fully objectively record food intake - especially not over a period of eight weeks. In our view, self-reporting of food intake is therefore unavoidable and also forms the basis of comparable studies on chrononutrition. We believe that recording data with a smartphone application at the moment of eating is a reliable means of recording food consumption and is better suited than questionnaires, for example, which have to be completed retrospectively. Objectivity could be optimized by transferring photographs of the food consumed. However, even this only provides limited protection against underreporting, as photos of individual meals, snacks, or second servings could be omitted by the participants. Sporadic indirect calorimetric measurements can help to identify under-reporting, but this cannot replace real-time self-reporting via smartphone application.

      Our data show that at the behavioral level, the rhythms of food intake are significantly less variable during the intervention. Our assumption that precise mealtimes influence the circadian rhythms of the digestive system is not new and has been confirmed many times in animal and human studies. It can therefore be assumed that comparable effects also apply to the participants in our study. Of course, a measurement of physiological rhythms is also desirable for a continuation of the study. However, we suspect that cellular rhythms in tissues of the digestive tract in particular are decisive for the changes in body weight. The characterization of these rhythms in humans is at best indirectly possible via blood factors. Reduced variability of the sleep-wake rhythm, which is measured by actigraphy, may result from our intervention, but in our view is not the decisive factor for the optimization of metabolic processes.

      We have addressed the specific comments and made changes to the manuscript as indicated below.

      Reviewer #1 (Public Review):

      The authors Wilming and colleagues set out to determine the impact of regularity of feeding per se on the efficiency of weight loss. The idea was to determine if individuals who consume 2-3 meals within individualized time frames, as opposed to those who exhibit stochastic feeding patterns throughout the circadian period, will cause weight loss.

      The methods are rigorous, and the research is conducted using a two-group, single-center, randomized-controlled, single-blinded study design. The participants were aged between 18 and 65 years old, and a smartphone application was used to determine preferred feeding times, which were then used as defined feeding times for the experimental group. This adds strength to the study since restricting feeding within preferred/personalized feeding windows will improve compliance and study completion. Following a 14-day exploration phase and a 6-week intervention period in a cohort of 100 participants (inclusive of both the controls and the experimental group that completed the study), the authors conclude that when meals are restricted to 45min or less durations (MTVS of 3 or less), this leads to efficient weight loss. Surprisingly, the study excludes the impact of self-reported meal composition on the efficiency of weight loss in the experimental group. In light of this, it is important to follow up on this observation and develop rigorous study designs that will comprehensively assess the impact of changes (sustained) in dietary composition on weight loss. The study also reports interesting effects of regularity of feeding on eating behavior, which appears to be independent of weight loss. Perhaps the most important observation is that personalized interventions that cater to individual circadian needs will likely result in more significant weight loss than when interventions are mismatched with personal circadian structures.

      We would like to thank the reviewer for the positive assessment of our study.

      (1) One concern for the study is its two-group design; however, single-group cross-over designs are tedious to develop, and an adequate 'wash-out' period may be difficult to predict.

      A cross-over design would of course be highly desirable and, if feasible, would be able to provide more robust data than a two-group design. However, we have strong doubts about the feasibility of a cross-over design. Not only does the determination of the length of the washout period to avoid carry-over effects of metabolic changes pose a difficulty, but also the assumption that those participants who start with the TTE intervention will consciously or unconsciously pay attention to adherence to certain eating times in the next phase, when they are asked to eat at times like before the study.

      In a certain way, however, our study fulfills at least one arm of the cross-over design. During the follow-up period of our study, there were some participants who, by their own admission, started eating at more irregular times again, which is comparable to the mock treatment of the control subjects. And these participants gained weight again.

      (2)  A second weakness is not considering the different biological variables and racial and ethnic diversity and how that might impact outcomes. In sum, the authors have achieved the aims of the study, which will likely help move the field forward.

      In the meantime, we have at least added analyses regarding the age and gender of the participants and found no correlations with weight loss. The sample size of this pilot study was too small for a reliable analysis of the influence of ethnic diversity. If the study is continued with a larger sample size, this type of analysis will certainly come into play.

      We are pleased with the assessment that we have achieved our goals and are helping to advance the field.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigated the effects of the timing of dietary occasions on weight loss and well-being with the aim of explaining if a consistent, timely alignment of dietary occasions throughout the days of the week could improve weight management and overall well-being. The authors attributed these outcomes to a timely alignment of dietary occasions with the body's own circadian rhythms. However, the only evidence the authors provided for this hypothesis is the assumption that the individual timing of dietary occasions of the study participants identified before the intervention reflects the body's own circadian rhythms. This concept is rooted in understanding of dietary cues as a zeitgeber for the circadian system, potentially leading to more efficient energy use and weight management. Furthermore, the primary outcome, body weight loss, was self-reported by the study participants.

      Strengths:

      The innovative focus of the study on the timing of dietary occasions rather than daily energy intake or diet composition presents a fresh perspective in dietary intervention research. The feasibility of the diet plan, developed based on individual profiles of the timing of dietary occasions identified before the intervention, marks a significant step towards personalised nutrition.

      We thank the reviewer for the generally positive assessment of our study and for sharing the view that our personalized approach represents an innovative step in chrononutrion.

      Weaknesses:

      (1) Several methodological issues detract from the study's credibility, including unclear definitions not widely recognized in nutrition or dietetics (e.g., "caloric event"), lack of comprehensive data on body composition, and potential confounders not accounted for (e.g., age range, menstrual cycle, shift work, unmatched cohorts, inclusion of individuals with normal weight, overweight, and obesity).

      We have replaced the term "caloric event" with "calorie intake occasion" and otherwise revised our manuscript with regard to other terminology in order to avoid ambiguity.

      We agree with the reviewer that the determination of body composition is a very important parameter to be investigated. Such investigations will definitely be part of the future continuation of the study. In this pilot study, we aimed to clarify in principle whether our intervention approach shows effects. Since we believe that this is certainly the case, we would like to address the question of what exactly the physiological mechanisms are that explain the observed weight loss in the future.

      Part of these future studies will also include other parameters in the analyses. However, in response to the reviewer's suggestions, we have already completed analyses regarding age and gender of the participants, which show that both variables have no influence on weight loss.

      In our view, the menstrual cycle should not have a major influence on the effectiveness of a 6-week intervention.

      The inclusion of shift workers is not a problem from our point of view. If their work shifts allow them to follow their personal eating schedule, we see no violation of our hypothesis. If this is not the case, as our data in Fig. 1G show, we do not expect any weight loss. Nevertheless, the reviewer is of course right that shift work can generally be a confounding factor and have an influence on weight loss success. To our knowledge, none of the 100 participants evaluated were shift workers. In a continuation of the study, however, shift work should be an exclusion criterion. Yet, our intervention approach could be of great interest for shift workers in particular, as they may be at a particularly high risk of obesity due to irregular eating times. A separate study with shift workers alone could therefore be of particular interest.

      The fact that it turned out that the baseline BMI of the remaining 67 EG and 33 CG participants did not match is discussed in detail in the section "3.1 Limitations". Although this is a limitation, it does not raise much doubt about the effectiveness of the intervention, as a subgroup analysis shows that intervention subjects lose more weight than control subjects of the same BMI.

      The inclusion of a wide BMI range was intentional. Our hypothesis is that reduced temporal variability in eating times optimizes metabolism and therefore excess body weight is lost (which we would like to investigate specifically in future studies). We hypothesize that people living with a high BMI will experience greater optimization than people with a lower BMI. Our data in Figs. 1H and S2I suggest that this assumption is correct.

      (2) The primary outcome's reliance on self-reported body weight and subsequent measurement biases further undermines the reliability of the findings.

      Self-reported data is always more prone to errors than objectively measured data. With regard to the collection of body weight, we were severely restricted in terms of direct contact with the participants during the conduct of the study due to the Covid-19 pandemic. At least the measurement of the initial body weight (at T0), the body weight after the end of the exploration phase (at T1) and the final body weight (at T2) were measured in video calls in the (virtual) presence of the study staff. These are the measurement points that were decisive for our analyses. Intermediate self-reported measurement points were not considered for analyses. We have added in the Materials & Methods section that video calls were undertaken to minimize the risk of misreporting.

      (3) Additionally, the absence of registration in clinical trial registries, such as the EU Clinical Trials Register or clinicaltrials.gov, and the multiple testing of hypotheses which were not listed a priori in the research protocol published on the German Register of Clinical Trials impede the study's transparency and reproducibility.

      Our study was registered in the DRKS - German Clinical Trials Register in accordance with international requirements. The DRKS fulfills the same important criteria as the EU Clinical Trial Register and clinicaltrials.gov.

      We quote from the homepage of the DRKS: „The DRKS is the approved WHO Primary Register in Germany and thus meets the requirements of the International Committee of Medical Journal Editors (ICMJE). […] The WHO brings together the worldwide activities for the registration of clinical trials on the International Clinical Trials Registry Platform (ICTRP). […] As a Primary Register, the DRKS is a member of the ICTRP network.”

      We are therefore convinced that we registered our study in the correct place.

      Furthermore, in our view, we did not provide less information on planned analyses than is usual and all our analyses were covered by the information in the study registry. We have stated the hypothesis in the study register that „strict adherence to [personalized] mealtimes will lead to a strengthening of the circadian system in the digestive tract and thus to an optimization of the utilization of nutrients and ultimately to the adjustment of body weight to an individual ideal value.“

      In our view, numerous analyses are necessary to test this hypothesis. We investigated whether it is the adherence to eating times that is related to the observed weight loss (Fig. 1), or possibly other variables resulting from adherence to the meal schedule (Fig. 3). In addition, we analyzed whether the intervention optimized the utilization of nutrients, which we did based on the food composition and number of calories during the exploration and intervention phases (Fig. 2). We investigated whether the personalization of meal schedules plays a role (Fig. 3). And we attempted to analyze whether the adjustment of body weight to an individual ideal value occurs by correlating the influence of the original BMI with weight loss. Only the hypothesis that the circadian system in the digestive tract is strengthened has not yet been directly investigated, a fact that is listed as a limitation. Although it can be assumed that this has happened, as the Zeitgeber “food” has lost significant variability as a result of the intervention. The analyses on general well-being are covered in the study protocol by the listing of secondary endpoints.

      Beyond that, we did not analyze any hypotheses that were not formulated a priori.

      For these reasons, we see no restriction in transparency, reproducibility or requirements and regulations.

      Achievement of Objectives and Support for Conclusions:

      (4) The study's objectives were partially met; however, the interpretation of the effects of meal timing on weight loss is compromised by the weaknesses mentioned above. The evidence only partially supports some of the claims due to methodological flaws and unstructured data analysis.

      We hope that we have been able to dispel uncertainties regarding some interpretations through supplementary analyses and the addition of some methodological details.

      Impact and Utility:

      (5) Despite its innovative approach, significant methodological and analytical shortcomings limit the study's utility. If these issues were addressed, the research could have meaningful implications for dietary interventions and metabolic research. The concept of timing of dietary occasions in sync with circadian rhythms holds promise but requires further rigorous investigation.

      We are pleased with the assessment that our data to date is promising. We hope that the revised version will already clarify some of the doubts about the data available so far. Furthermore, we absolutely agree with the reviewer: the present study serves to verify whether our intervention approach is potentially effective for weight loss - which we believe is the case. In the next steps, we plan to include extensive metabolic studies and to adjust the limitations of the present study.

      Reviewer #3 (Public Review):

      The authors tested a dietary intervention focused on improving meal regularity in this interesting paper. The study, a two-group, single-center, randomized, controlled, single-blind trial, utilized a smartphone application to track participants' meal frequencies and instructed the experimental group to confine their eating to these times for six weeks. The authors concluded that improving meal regularity reduced excess body weight despite food intake not being altered and contributed to overall improvements in well-being.

      The concept is interesting, but the need for more rigor is of concern.

      We would like to thank the reviewer for the interest in our study.

      (1) A notable limitation is the reliance on self-reported food intake, with the primary outcome being self-reported body weight/BMI, indicating an average weight loss of 2.62 kg. Despite no observed change in caloric intake, the authors assert weight loss among participants.

      As already described above in the responses to the reviewer 2, the body weight assessment took place in video calls in the (virtual) presence of study staff, so that the risk of misreporting is minimized. We have added this information to the manuscript.

      When recording food intake, we had to weigh up the risk of misreporting against the risk of a lack of validity in a permanently monitored setting. It was important to us to investigate the effectiveness of the intervention in the participants' everyday environment and not in a laboratory setting in order to be able to convincingly demonstrate its applicability in everyday life. The restriction of self-reporting is therefore unavoidable in our view and must be accepted. It can possibly be reduced by photographing the food, but even this is not a complete protection against underreporting, as there is no guarantee that everything that is ingested is actually photographed.

      However, our analyses show that the reporting behavior of individual participants did not change significantly between the exploration and intervention phases. We do not assume that participants who underreported only did so during the exploration phase (and only ate more than reported in this study phase) and reported correctly in the intervention phase (and then indeed consumed fewer calories).  We discuss this point in the section "3.1 Limitations".

      (2) The trial's reliance on self-reported caloric intake is problematic, as participants tend to underreport intake; for example, in the NEJM paper (DOI: 10.1056/NEJM199212313272701), some participants underreported caloric intake by approximately 50%, rendering such data unreliable and hence misleading. More rigorous methods for assessing food intake are available and should have been utilized. Merely acknowledging the unreliability of self-reported caloric intake is insufficient as it would still leave the reader with the impression that there is no change in food intake when we actually have no idea if food intake was altered. A more robust approach to assessing food intake is imperative. Even if a decrease in caloric intake is observed through rigorous measurement, as I am convinced a more rigorous study would unveil testing this paradigm, this intervention may merely represent another short-term diet among countless others that show that one may lose weight by going on a diet, principally due to heightened dietary awareness.

      The risks of self-reporting, our considerations, and our analysis of participants' reporting behavior and caloric intake over the course of the study are discussed in detail both in our responses above and in the manuscript. 

      With regard to the reviewer's second argument, we have largely adapted the study protocol of the control group to that of the experimental group. Apart from the fact that the control subjects were not given guidelines on eating times and were instead only given a very rough time window of 18 hours for food intake, the content of the sessions and the measurement methods were the same in both groups. This means that the possibility of increased nutritional awareness was equally present in both groups, but only the participants in the experimental group lost a significant amount of body weight.

      In future continuations of the study, further follow-up after an even longer period than four weeks (e.g. after 6 months) can be included in the protocol in order to examine whether the effects can be sustained over a longer period.

      (3) Furthermore, the assessment of circadian rhythm using the MCTQ, a self-reported measure of chronotype, may not be as reliable as more objective methods like actigraphy.

      The MCTQ is a validated means of determining chronotype and its results are significantly associated with the results of actigraphic measurements. In our view, the MCTQ is sufficient to test our hypothesis that matching the chronobiological characteristics of participants is beneficial. Nevertheless, measurements using actigraphy could be of interest, for example to correlate the success of weight loss with parameters of the sleep-wake rhythm.

      (4) Given the potential limitations associated with self-reported data in both dietary intake and circadian rhythm assessment, the overall impact of this manuscript is low. Increasing rigor by incorporating more objective and reliable measurement techniques in future studies could strengthen the validity and impact of the findings.

      The body weight data was not self-reported, but the measurements were taken in the presence of study staff. Although optimization might be possible (see above), we do not currently see any other way of recording all calorie intake occasions in the natural environment of the participants over a period of several weeks (or possibly longer, as noted by the reviewer) other than self-report and, in our opinion, it would not be feasible. For the future continuation of the study, we are planning occasional indirect calorimetry measurements that can provide information about the actual amount of food consumed in different phases of the study. These can reveal errors in the self-report but will not be able to replace daily data collection by means of self-report.

      Reviewer #1 (Recommendations For The Authors):

      Summary:

      This interesting and timely study by Wilming and colleagues examines the effect of regularity vs. irregularity of feeding on body weight dynamics and BMI. A rigorous assessment of the same in humans needs to be improved, which this study provides. The study is well-designed, with a 14-day exploration phase followed by 6 weeks of intervention, and it is commendable to see the number of participants (100) who completed the study. Incorporation of a follow-up assessment 4 weeks after the conclusion of the study shows maintained weight loss in a subset of Experimental Group (EG) participants who continue with regular meals. There are several key observations, including particular meal times (lunch and dinner), which, when restricted to 45min or less in duration (MTVS of 3 or less), will lead to efficient weight loss, as well as correlations between baseline BMI and weight loss. The authors also exclude the impact of self-reported meal composition on the efficiency of weight loss in the EG group in the context of this study. The study reports interesting effects of regularity of feeding on eating behavior, which appears to be independent of weight loss. Finally, the authors highlight an important point: to provide attention to personalized feeding and circadian windows and that personalized interventions that cater to individual circadian structures will result in more significant weight loss. This is an important concept that needs to be brought to light. There are only a few minor comments listed below:

      Minor comments:

      (1) The authors may provide explanations for the reduction in the MTVS in the EG and the increase in the same for the Control Group (CG). The increases in MTVS in CG are surprising (lines 105-106) because it is assumed that there is no difference in CG eating patterns prior to and during the study.

      As the reviewer correctly states, our assumption was that there should be no change in the MTVS before and during the study - but we could not rule this out, as the subjects were not given any indication of the regularity of food intake in the fixed time window in the meetings with the study staff, i.e. they were not instructed to continue eating exactly as before. This would possibly have led to an effort on the part of the participants to adhere to a schedule as precisely as possible. As a result, there was a statistically significant worsening of the MTVS in the CG, which was less than 0.6 MTVS, i.e. a time span of only approx. ± 7.5 min, and remained within the MTVS 3. Since there were no correlations between the measured MTVS and the weight of the subjects in the CG and a change of about half an MTVS value has only a rather minor effect on weight, we do not attribute great significance to the observed deterioration in the MTVS.

      (2) There would be greater clarity for the readers if the authors clearly defined the study design in detail at the outset of the study, e.g., in section 2.1.

      We have included a brief summary of the study design at the end of the introduction so that the reader is already familiar with it at the beginning of the manuscript without having to switch to the material and methods section.

      (3) The data in Fig S2H is important and informs readers that the regularity of lunch and dinner is more related to body weight changes than breakfast. These data should be incorporated in the Main Figure. In addition, analyses of Table S7 data indicate that MTVS of no greater than 3 or -/+45mins of the meal-timing window is associated with efficient weight loss) should be represented in a figure panel in the Main Figures.

      As suggested by the reviewer, we have moved Fig. S2H to the main Fig. 1. In addition, Table S7 is now no longer inserted as a supplementary table but as main Table 1 in the manuscript.

      (4) The authors state in lines 222-223 that "weight changes of participants were not related to one of these changes in eating characteristics (Fig. 3B-D, Tab. S6)", referring to the shortening of feeding windows as noted in the EG group. This is a rather simplistic statement, which should be amended to include that weight changes may not relate to changes in eating characteristics per se but likely relate to changes in metabolic programming, for instance, energy expenditure increases, which have been shown to associate with these changes in eating characteristics. This is important to note.

      We have changed the wording at this point so that it is clear that we are only referring here in the results section to the results of the mathematical analysis, which showed no correlation between the eating time window and weight loss in our sample. However, we have now explicitly mentioned the change in metabolic programming correctly noted by the reviewer in the discussion at the end of section 3.

      (5) Please provide more background and details on the attributes that define individual participant chronotypes in the manuscript before discussing datasets, e.g., mSP and mEP. This is relation to narratives between 228-230: "Indeed, our data show that the later the chronotype of participants (measured by the MCTQ mid-sleep phase, mSP [24]), the later their mid-eat phase (mEP) on weekends (Fig. 3E, Tab. S6), with the mSP and mEP being almost antiphasic on average (Fig. 3F, Tab. S10)." This will help readers unfamiliar with circadian biology/chronobiology research understand the contents of this manuscript, particularly Fig 3.

      We have explained the new chronobiology terms that appear in the chapter better in the revised version so that they are easier to understand.

      Reviewer #2 (Recommendations For The Authors):

      (1) Clarify Terminology: Define or avoid using ambiguous terms such as "caloric event" to prevent confusion, especially for readers less familiar with chronobiology. Consider providing clear explanations or opting for more widely understood terms.

      We have replaced "caloric event" with “calorie intake occasion” and explain various chronobiology terms better, so that hopefully readers from other disciplines can now follow the text more easily.

      (2) Detailed Methodological Descriptions: Improve the transparency of your methods, especially concerning the measurement of primary and secondary outcomes. Address the concerns raised about the reliability of self-reported weight and the potential biases in measurement methods.

      In the section "3.1 Limitations", we have examined the aspect of the reliability of self-reported data and our measures to reduce this uncertainty in more detail. We have also added further details on the measurement of outcomes in the materials and methods section.

      (3) Address Participant Selection Criteria: Reevaluate the inclusion criteria and consider discussing the implications on the study's findings of the broad age range, the inclusion of shift work, unmatched cohorts, and inclusion of individuals with normal weight, overweight, and obesity. Provide a subgroup analysis or discuss how BMI might have influenced the results. Even though this is an additional post-hoc analysis, it would directly address one of the major weaknesses of the study design.

      We have supplemented the analyses and now show in Fig. S2G that neither age nor gender had any influence on weight loss as a result of the intervention. To our knowledge, none of the 100 participants evaluated were shift workers. Even if shift workers were part of the study without our knowledge, we do not consider this to be a problem as long as their shifts allow them to keep to certain eating times. The fact that it turned out that the baseline BMI of the remaining 67 EG and 33 CG participants did not match is discussed in detail in the section "3.1 Limitations". Our previous analysis in Fig. S2I already showed that there is a negative correlation between baseline BMI and weight loss - an interesting result, as it shows that people with a high BMI particularly benefit from the intervention. In addition, we already showed in Fig. S2J in a subgroup analysis that in all strata the BMI of EG subjects decreased more than that of CG subjects, even if they had the same initial BMI. We do not consider the wide dispersion of the BMIs of the included participants to be a weakness of the study design. On the contrary, it allows us to make a statement about which target group the intervention is particularly suitable for.

      (4) Improve Statistical Analysis: If not already done, involve a biostatistician to review the statistical analyses, particularly concerning post-hoc tests, correlation analyses, and the handling of measurement biases. Ensure that deviations from the original study protocol are clearly documented and justified.

      All analyses have already been checked by a statistician, decided together with him and approved by him.

      (5) Data Interpretation and Speculation: Limit speculation and clearly distinguish between findings supported by your data from hypotheses and future directions. Ensure that discussions about the implications of meal timing on metabolism are supported by evidence with adequate references and clearly state where further research is needed.

      We have revised the discussion and, especially through the detailed discussions of the limitations, we have emphasized more clearly what has been achieved and what still needs to be proven in future studies.

      (6) Clinical Trial Registration: Address the lack of registration in the EU Clinical Trials Register and clinicaltrials.gov. Discuss its potential implications on the study's transparency and how it aligns with current requirements and regulations.

      Our study was registered in the DRKS - German Clinical Trials Register in accordance with international requirements. The DRKS fulfills the same important criteria as the EU Clinical Trial Register and clinicaltrials.gov.

      We quote from the homepage of the DRKS: „The DRKS is the approved WHO Primary Register in Germany and thus meets the requirements of the International Committee of Medical Journal Editors (ICMJE).[…] The WHO brings together the worldwide activities for the registration of clinical trials on the International Clinical Trials Registry Platform (ICTRP). […] As a Primary Register, the DRKS is a member of the ICTRP network.”

      We are therefore convinced that we registered our study in the correct place before it began and see no restriction in transparency or requirements and regulations.

      (7) Use of Sensitive and Current Terminology: Update the manuscript to reflect the latest recommendations regarding the language used to describe obesity and patients living with obesity. This ensures respect and accuracy in reporting and aligns with contemporary standards in the field.

      We updated the manuscript accordingly.

      (8) Strengthen the Introduction: Expand the literature review to include more recent and relevant studies that contextualise your work within the broader field of chrononutrition. This could help clarify how your study builds upon or diverges from existing research.

      We have included further studies in the introduction that aim to reduce body weight by restricting food intake to certain time periods. We have also more clearly contrasted the designs of these studies with the design of our study.

      (9) Clarify Discrepancies and Errors: Address any inconsistencies, such as the discrepancy in meal timing instructions (90 minutes reported in the conclusion vs. 60 minutes reported in the methods), and ensure all figures, tables, and statistical analyses are correctly referenced and described.

      The first point mentioned by the reviewer is not an inconsistency. To ensure the feasibility of the intervention, each participant was initially given a time window of +/- 30 minutes (60 min) from the specified eating time. Our later analyses show that even a time window of +/- 45 minutes (90 min) around the specified eating time is sufficient to lose weight efficiently (see results in Table 1).

      We have checked all references to figures, tables and statistical analyses and updated them if necessary.

      (10) Discuss Limitations and Bias: More thoroughly discuss the limitations of your study, including the potential impacts of biases and how they were mitigated. Additionally, consider the effects of including shift workers and how this choice impacts the applicability of your findings.

      Section “3.1 Limitations” has now been supplemented by a number of points and discussions. As described above, we do not consider the inclusion of shift workers to be a limitation as long as they are able to adhere to the specifications of the eating time plan. We cannot derive any indications to the contrary from our data.

      (11) Consider Publishing Separate Manuscripts: If the study encompasses a wide range of outcomes or post-hoc analyses, consider separating these into distinct publications to allow for a more focused and detailed exploration of each set of findings.

      We will take this advice into consideration for future publications on the continuation of the study. As this is a pilot study that is intended to clarify whether and to what extent the intervention is effective, we believe it makes sense to report all the data in a publication.

      (12) By addressing these recommendations, the authors can significantly improve their manuscript's clarity, reliability, and impact. This would not only support the dissemination of their findings but also would contribute valuable insights into the growing field of chrononutrition.

      We hope that we have satisfactorily answered, discussed and implemented the points mentioned by the reviewer in the manuscript, so that clarity, reliability, and impact have been increased and it can offer a valuable contribution to the named field.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The report describes the control of the activity of the RNA-activated protein kinase, PKR, by the Vaccinia virus K3 protein. Repressive binding of K3 to the kinase prevents phosphorylation of its recognised substrate, EIF2α (the α subunit of the Eukaryotic Initiation Factor 2). The interaction of K3 is probed by saturation mutation within four regions of PKR chosen by modelling the molecules' interaction. They identify K3-resistant PKR variants that recognise that the K3/EIF2α-binding surface of the kinase is malleable. This is reasonably interpreted as indicating the potential adaptability of this antiviral protein to combat viral virulence factors.

      Strengths:

      This is a well-conducted study that probes the versatility of the antiviral response to escape a viral inhibitor. The experimentation is very diligent, generating and screening a large number of variants to recognise the malleability of residues at the interface between PKR and K3.

      Weaknesses:

      (1) These are minor. The protein interaction between PKR and K3 has been previously well-explored through phylogenetic and functional analyses and molecular dynamics studies, as well as with more limited site-directed mutational studies using the same experimental assays.

      Accordingly, these findings largely reinforce what had been established rather than making major discoveries.

      First, thank you for your thoughtful feedback. We agree that our results are concordant with previous findings and recognize the importance of emphasizing what we find novel in our results. We have revised the introduction (lines 65-74 of the revised_manuscript.pdf) to emphasize three findings of interest: (1) the PKR kinase domain is largely pliable across its substrate-binding interface, a remarkable quality that is most fully revealed through a comprehensive screen, (2) we were able to differentiate variants that render PKR nonfunctional from those that are susceptible to Vaccinia K3, and (3) we observe a strong correlation between PKR variants that are resistant to K3 WT and K3-H47R.

      There are some presumptions:

      (2) It isn't established that the different PKR constructs are expressed equivalently so there is the contingency that this could account for some of the functional differences.

      This is an excellent point. We have revised the manuscript to raise this caveat in the discussion (lines 247-251). One indirect reason to suppose that expression differences among our PKR variants are not a dominant source of variation is that we did not observe much variation in kinase activity in the absence of K3.

      (3) Details about the confirmation of PKR used to model the interaction aren't given so it isn't clear how accurately the model captures the active kinase state. This is important for the interaction with K3/EIF2α.

      We have expanded on Supplemental Figure 12 and our description of the AlphaFold2 models in the Materials and Methods section (lines 573-590). We clarify that these models may not accurately capture the phosphoacceptor loop of eIF2α (residues Glu49-Lys60) and the PKR β4-5 linker (Asp338-Asn350) as these are highly flexible regions that are absent in the existing crystal structure complex (PDB 2A1A) and have low AlphaFold2 confidence scores (pLDDT < 50). We also noted, in the Materials and Methods section and in the caption of Figure 1, that the modeled eIF2α closely resembles the crystal structure of standalone yeast eIF2α, which places the Ser51 phosphoacceptor site far from the PKR active site. Thus, we expect there are additional undetermined PKR residues that contact eIF2α.

      (4) Not all regions identified to form the interface between PKR and K3 were assessed in the experimentation. It isn't clear why residues between positions 332-358 weren't examined, particularly as this would have made this report more complete than preceding studies of this protein interaction.

      Great questions. We designed and generated the PKR variant library based on the vaccinia K3 crystal structure (PDB 1LUZ) aligned to eIF2α in complex with PKR (PDB 2A1A), in which PKR residues 338-350 are absent. After the genesis of the project, we generated the AlphaFold2-predicted complex of PKR and vaccinia K3, and have become very interested in the β4-β5 linker, a highly diverse region across PKR homologs which includes residues 332-358. However, this region remains unexamined in this manuscript.

      Reviewer #2 (Public Review):

      Chambers et al. (2024) present a systematic and unbiased approach to explore the evolutionary potential of the human antiviral protein kinase R (PKR) to evade inhibition by a poxviral antagonist while maintaining one of its essential functions.

      The authors generated a library of 426 single-nucleotide polymorphism (SNP)-accessible non-synonymous variants of PKR kinase domain and used a yeast-based heterologous virus-host system to assess PKR variants' ability to escape antagonism by the vaccinia virus pseudo-substrate inhibitor K3. The study identified determinant sites in the PKR kinase domain that harbor K3-resistant variants, as well as sites where variation leads to PKR loss of function. The authors found that multiple K3-resistant variants are readily available throughout the domain interface and are enriched at sites under positive selection. They further found some evidence of PKR resilience to viral antagonist diversification. These findings highlight the remarkable adaptability of PKR in response to viral antagonism by mimicry.

      Significance of the findings:

      The findings are important with implications for various fields, including evolutionary biology, virus-host interfaces, genetic conflicts, and antiviral immunity.

      Strength of the evidence:

      Convincing methodology using state-of-the-art mutational scanning approach in an elegant and simple setup to address important challenges in virus-host molecular conflicts and protein adaptations.

      Strengths:

      Systematic and Unbiased Approach:

      The study's comprehensive approach to generating and characterizing a large library of PKR variants provides valuable insights into the evolutionary landscape of the PKR kinase domain. By focusing on SNP-accessible variants, the authors ensure the relevance of their findings to naturally occurring mutations.

      Identification of Key Sites:

      The identification of specific sites in the PKR kinase domain that confer resistance or susceptibility to a poxvirus pseudosubstrate inhibition is a significant contribution.

      Evolutionary Implications:

      The authors performed meticulous comparative analyses throughout the study between the functional variants from their mutagenesis screen ("prospective") and the evolutionarily-relevant past adaptations ("retrospective").

      Experimental Design:

      The use of a yeast-based assay to simultaneously assess PKR capacity to induce cell growth arrest and susceptibility/resistance to various VACV K3 alleles is an efficient approach. The combination of this assay with high-throughput sequencing allows for the rapid characterization of a large number of PKR variants.

      Areas for Improvement:

      (5) Validation of the screen: The results would be strengthened by validating results from the screen on a handful of candidate PKR variants, either using a similar yeast heterologous assay, or - even more powerfully - in another experimental system assaying for similar function (cell translation arrest) or protein-protein interaction.

      Thank you for your thoughtful feedback. We agree that additional data to validate our findings would strengthen the manuscript. We have individually screened a handful of PKR variants in duplicate using serial dilution to measure yeast growth, and found that the results generally support our original findings. We have revised the manuscript to include these validation experiments (lines 117-119 of the revised_manuscript.pdf, Supplemental Figure 4).

      (6) Evolutionary Data: Beyond residues under positive selection, the screen would allow the authors to also perform a comparative analysis with PKR residues under purifying selection. Because they are assessing one of the most conserved ancestral functions of PKR (i.e. cell translation arrest), it may also be of interest to discuss these highly conserved sites.

      This is a great point. We do find that there are regions of the PKR kinase domain that are not amenable to genetic perturbation, namely in the glycine rich loop and active site. We contrast the PKR functional scores at conserved residues under purifying selection with those under positive selection in Figure 2E (lines 141-143).

      (7) Mechanistic Insights: While the study identifies key sites and residues involved in vaccinia K3 resistance, it could benefit from further investigation into the underlying molecular mechanisms. The study's reliance on a single experimental approach, deep mutational scanning, may introduce biases and limit the scope of the findings. The authors may acknowledge these limitations in the Discussion.

      We agree that further investigation into the underlying molecular mechanisms is warranted and we have revised the manuscript to acknowledge this point in the discussion (lines 284-288).

      (8) Viral Diversity: The study focuses on the viral inhibitor K3 from vaccinia. Expanding the analysis to include other viral inhibitors, or exploring the effects of PKR variants on a range of viruses would strengthen and expand the study's conclusions. Would the identified VACV K3-resistant variants also be effective against other viral inhibitors (from pox or other viruses)? or in the context of infection with different viruses? Without such evidence, the authors may check the manuscript is specific about the conclusions.

      This is a fantastic question that we are interested in exploring in our future studies. In the manuscript we note a strong correlation between PKR variants that evade vaccinia wild-type K3 and the K3-H47R enhanced allele, but we are curious to know if this holds when tested against other K3 orthologs such as variola virus C3. That said, we have revised the manuscript to clarify this limitation to our findings and specify vaccinia K3 where appropriate.

      Reviewer #3 (Public Review):

      Summary:

      -  This study investigated how genetic variation in the human protein PKR can enable sensitivity or resistance to a viral inhibitor from the vaccinia virus called K3.

      -  The authors generated a collection of PKR mutants and characterized their activity in a high-throughput yeast assay to identify 1) which mutations alter PKR's intrinsic biochemical activity, 2) which mutations allow for PKR to escape from viral K3, and 3) which mutations allow for escape from a mutant version of K3 that was previously known to inhibit PKR more efficiently.

      -  As a result of this work, the authors generated a detailed map of residues at the PKR-K3 binding surface and the functional impacts of single mutation changes at these sites.

      Strengths:

      -  Experiments assessed each PKR variant against three different alleles of the K3 antagonist, allowing for a combinatorial view of how each PKR mutant performs in different settings.

      -  Nice development of a useful, high-throughput yeast assay to assess PKR activity, with highly detailed methods to facilitate open science and reproducibility.

      -  The authors generated a very clean, high-quality, and well-replicated dataset.

      Weaknesses:

      (9) The authors chose to focus solely on testing residues in or near the PKR-K3 predicted binding interface. As a result, there was only a moderately complex library of PKR mutants tested. The residues selected for investigation were logical, but this limited the potential for observing allosteric interactions or other less-expected results.

      First, we greatly appreciate all your feedback on the manuscript, as well as raising this particular point. We agree that this is a moderately complex library of PKR variants, from which we begin to uncover a highly pliable domain with a few specific sites that cannot be altered. We have revised the manuscript to raise this limitation (lines 284-288 of the revised_manuscript.pdf) and encourage additional exploration of the PKR kinase domain.

      (10) For residues of interest, some kind of independent validation assay would have been useful to demonstrate that this yeast fitness-based assay is a reliable and quantitative readout of PKR activity.

      We agree that additional data to validate our findings would strengthen the manuscript. We have individually screened a handful of PKR variants in duplicate using serial dilution to measure yeast growth, and generally found that the results support our original findings. We have revised the manuscript to include this validation experiment (lines 117-119, Supplemental Figure 4).

      (11) As written, the current version of the manuscript could use more context to help a general reader understand 1) what was previously known about these PKR and K3 variants, 2) what was known about how other genes involved in arms races evolve, or 3) what predictions or goals the authors had at the beginning of their experiment. As a result, this paper mostly provides a detailed catalog of variants and their effects. This will be a useful reference for those carrying out detailed, biochemical studies of PKR or K3, but any broader lessons are limited.

      Thank you for bringing this to our attention. We have revised the introduction of the manuscript to provide more context regarding previous work demonstrating an evolutionary arms race between PKR and K3 and how single residue changes alter K3 resistance (lines 51-64).

      (12) I felt there was a missed opportunity to connect the study's findings to outside evolutionary genetic information, beyond asking if there was overlap with PKR sites that a single previous study had identified as positively selected. For example, are there any signals of balancing selection for PKR? How much allelic diversity is there within humans, and are people typically heterozygous for PKR variants? Relatedly, although PKR variants were tested in isolation here, would the authors expect their functional impacts to be recessive or dominant, and would this alter their interpretations? On the viral diversity side, how much variation is there among K3 sequences? Is there an elevated evolutionary rate, for example, in K3 at residues that contact PKR sites that can confer resistance? None of these additions are essential, but some kind of discussion or analysis like this would help to connect the yeast-based PKR phenotypic assay presented here back to the real-world context for these genes.

      We appreciate this suggestion to extend our findings to a broader evolutionary context. There is little allelic diversity of PKR in humans, with all nonsynonymous variation listed in gnomAD being rare. (PKR shows sequence diversity in comparisons across species, including across primates.) Thus, barring the possibility of variation being present in under-studied populations, there is unlikely to be balancing selection on PKR in humans. Our expectation is that beneficial mutations in PKR for evading a pseudosubstrate inhibitor would be dominant, as a small amount of eIF2α phosphorylation is capable of halting translation (Siekierka, PNAS, 1984). There is a recent report citing PKR missense variants associated with dystonia that can be dominantly or recessively inherited (Eemy et al. 2020 PMID 33236446). Elde et al. 2009 (PMID 19043403) notes that poxvirus K3 homologs are under positive selection but no specific residues have been cited to be under positive selection. The lack of allelic diversity in PKR in humans notwithstanding, PKR could experience future selection in the human population as evidenced by its rapid evolution in primates, so we fully agree that a connection to the real-world context is useful. We have noted these topics in the discussion section (lines 289-294).

      Reviewer #1 (Recommendations For The Authors):

      I have no major criticisms but ask for some clarifications and make some comments about the perceived weaknesses.

      (13)  If the authors disagree with my summation that the findings largely replicate what was known, could they detail how the findings differ from what was known about this protein interaction and the major new insights stemming from the study? Currently, the abstract is a little philosophical rather than listing the explicit discoveries of the study.

      Thank you again for raising the need for us to clearly convey the novelty of our findings. We have revised the final paragraph in our introduction as described in comment #1.

      (14) As the experimental approach is well reported it is unnecessary to confirm the proposed activity by, for instance, measures of Sui2 phosphorylation. However, previous reports have recognised that point mutants of PKR can be differentially expressed. The impact of this potential effect is unknown in the current experimentation as there are no measures of the expression of the different mutant PKR constructs. The large number of constructs used makes this verification onerous. The potential impact could be ameliorated by redundant replacing each residue (hoping different residues have different effects on expression). Still, this limitation of the study should be acknowledged in the text.

      We greatly appreciate this comment and agree that this should be made clear in the text, which we have added to the discussion of the manuscript (lines 247-251).

      (15) Preceding findings and the modeling in this report recognise an involvement in the kinase insert region (residues 332 to 358) in PKR's interaction with K3 but this region is excluded from the analysis. These residues have been largely disregarded in the preceding analysis (it is absent from the molecular structure of the kinase) so its inclusion here might have lent a more novel aspect or delivered a more complete investigation. Is there a justification for excluding this flexible loop?

      The PKR variant library was designed based on the crystal structure of K3 (PDB 1LUZ) aligned to eIF2α in complex with PKR (PDB 2A1A). After the library was designed and made we attained complete predicted structures of PKR in complex with eIF2α and K3, which largely agrees with the predicted crystal structures but contain the additional flexible loops that were not captured in the crystal structures. Though the library studied here does not explore variation in the kinase insert region, we are very interested in doing so in our future studies.

      (16)  Could the explanation of the 'PKR functional score' be clarified? The description given within the legend of SF1 was helpful, so could this be replicated earlier in the main body of the text when introducing these experiments? e.g. As PKR activity is toxic to yeast, the number of cells in the pool expressing the functional PKR will decrease over time. Thus the associated barcode read count will also decrease, while the read count for the nonfunctional PKR will increase. This is termed the PKR function score, which will be relatively lower for cells transformed with less active PKR than those with more active PKR.

      Thank you for suggesting this clarification, we have revised the manuscript to clarify our definition of the PKR functional score (lines 106-109).

      (17)  Another suggestion to clarify this term is to modify the figures. Currently, the intent of the first simulated graph in Fig 1E is clear but the inversion of the response (shown by the transposition of the colours) in the next graph (to the right) is less immediately obvious. Accordingly, the orientation of the 'PKR functional score' is uncertain. Could the authors add text to the rightmost graphic in Figure 1E by, for instance, indicating the PKR activity in the vertical column with text such as 'less active' (at the bottom), 'WT' (in the centre), and 'more activity' (at the top)? Also, the position of the inactive K296R mutant might be added to Figure 2A complementing the positioning of the active WT kinase in the first data graph of this kind.

      We appreciate your specific feedback to improve the figures of the manuscript, we have made adjustments to Figure 1E to clarify how we derive the PKR functional scores.

      (18) The authors don't use existing structures of PKR in their modelling. However, there is no information about the state of the PKR molecule used for modelling. Specific elements of the kinase domain affect its interaction with K3 so it would be informative to know the orientation of these elements in the model. Could the authors detail the state of pivotal kinase elements in their models? This could involve the alignment of the N- and C-lobes, the orientation of kinase spines (C- and R-spines), and the phosphorylation stasis of residues in the activation loop, or at least the position of this loop in relationship to that adopted in the active dimeric kinase (e.g. PDB-2A1A, 3UIU or 6D3L). Alternatively, crystallographic structures of active inactive PKR could be overlayed with the theoretical structure used for modelling (as supplementary information).

      We have revised the manuscript to describe the alignment of the predicted PKR-K3 complex with active and inactive PKR, and we have extended Supplemental Figure 12 with an overlay of the predicted structures with existing structures. We have also added a supplemental data file containing the RMSD values of PKR (from the predicted PKR-K3 complex) aligned to active (PDB 2A1A) and inactive (PDB 3UIU) or unphosphorylated (PDB 6D3L) PKR (5_Structure-Alignment-RMSD-Values.xlsx). We have also provided the AlphaFold2 best model predictions for the PKR-eIF2α complex (6_AF2_PKR-KD_eIF2a.pdb) and PKR-K3 complex (7_AF2_PKR-KD_VACV-K3.pdb). Looking across the RMSD values, the AlphaFold2 model of PKR most closely resembles unphosphorylated PKR (PDB 6D3L) though we note the activation loop is absent from PDB 6D3L and 3UIU. We also aligned the Ser51 phosphoacceptor loop of AlphaFold2 eIF2α model to PDB 1Q46 and we see that the model reflects the pre-phosphorylation state. This loop is expected to interact with the PKR active site, which is not captured in our model and we state this explicitly in the caption of Figure 1 (lines 665-668).

      (19) Could some specific residue in Figure 7 be labelled (numbered) to orient the findings? Also, the key in this figure doesn't title the residues coloured white (RE red/black/blue). The white also isn't distinguished from the green (outside the regions targeted for mutagenesis).

      Excellent suggestion, we have revised this figure to include labels for the sites to orient the reader and clarify our categorization of PKR residues in the kinase domain.

      (20)  Regarding the discussion, the authors adopt the convention of describing K3 as a pseudosubstrate. Although I realize it is common to refer to K3 as a pseudosubstrate, it isn't phosphorylated and binds slightly differently to PKR so alternative descriptors, such as 'a competitive binder', would more accurately present the protein's function. Possibly for this reason, the authors declared an expectation that evolution pressures should shift K3 to precisely mimic EIF2α. However, closer molecular mimicry shouldn't be expected for two reasons. The first is a risk of disrupting other interactions, such as the EIF2 complex. Secondly, equivalent binding to PKR would demote K3 to merely a stoichiometric competitor of EIF2α. In this instance, effective inhibition would require very high levels of K3 to compete with equivalent binding by EIF2α. This would be demanding particularly upon induction of PKR during the interferon response. To be an effective inhibitor K3 has to bind more avidly than EIF2α and merely requires a sufficient overlap with the EIF2α interface on PKR to disrupt this alternative association. This interpretation predicts that K3 is under pressure to bind PKR by a different mechanism than EIF2α.

      We appreciate your thoughtful point about the usage of the term pseudosubstrate. Ultimately, we’ve decided to continue using the term due to its historical usage in the field. The question of the optimal extent of mimicry in K3 is a fascinating one, and we greatly appreciate your thoughts. We wholly agree that the possibility of K3 having superior PKR binding relative to eIF2α would be preferable to perfect mimicry. In our Ideas and Speculation section, we propose that benefits towards increasing PKR affinity may need to be balanced against potential loss of host range resulting from overfitting to a given host’s PKR. However, the possibility that reduced mimicry could be selected to avoid disruption of eIF2 function had not occurred to us; thank you for pointing it out!

      (21) The discussion of the 'positive selection' of sites is also interesting in this context. To what extent has the proposed positive selection been quantified? My understanding is that all of the EIF2α kinases are conserved and so demonstrate lower levels of residue change that might be expected by random mutagenesis i.e. variance is under negative selection. The relatively higher rate of variance in PKR orthologs compared to other EIF2α kinases could reflect some relaxation of these constraints, rather than positive selection. Greater tolerance of change may stem from PKR 's more sporadic function in the immune response (infrequent and intermittent presence of its activating stimuli) rather than the ceaseless control of homeostasis by the other EIF2α kinases. Also, induction of PKR during the immune response might compensate for mutations that reduce its activity. I believe that the entire clade of extant poxviruses is young relative to the divergence between their hosts. Accordingly, genetic variance in PKR predates these viruses. Although a change in PKR may become fixed if it affords an advantage during infection, such an advantage to the host would be countered by the much higher mutation rates of the virus. This would appear to diminish the opportunity for a specific mutation to dominate a host population and, thereby, to differentiate host species. Rather, pressure to elude control by a rapidly evolving viral factor would favour variation at sites where K3 binds. This speculation offers an alternative perspective to the current discussion that the variance in PKR orthologs stems from positive selection driven by viral infection.

      We appreciate this stimulating feedback for discussion. Three of the four eIF2α kinases (HIR, PERK, and GCN2) appear to be under purifying selection (Elde et al. 2009, PMID 19043403), which stand in contrast to PKR. Residues under positive selection have been found throughout PKR, including the dsRNA binding domains, linker region, and the kinase domain. Importantly, the selection analysis from Elde et al. and Rothenburg et al. concluded that positive selection at these sites is more likely than relaxed selection. We agree that poxviruses are young, though we would guess that viral pseudosubstrate inhibition of PKR is ancient. Many viral proteins have been reported to directly interact with PKR, including herpes virus US11, influenza A virus NS1A, hepatitis C virus NS5A, and human immunodeficiency virus Tat. The PKR kinase domain does contain residues under purifying selection that are conserved among all four eIF2α kinases, but it also contains residues under positive selection that interface with the natural substrate eIF2α. Our work suggests that PKR is genetically pliable across several sites in the kinase domain, and we are curious to know if this pliability would hold at the same sites across the other three eIF2α kinases.

      (22) The manuscript is very well written but has a small number of typos; e.g. an aberrant 'e' ln 7 of the introduction, capitalise the R in ranavirus on the last line of the fourth paragraph of the discussion, and eIF2α (EIF2α?) is occasionally written as eIFα in the materials&methods.

      Thank you for bringing these typos to our attention! We’ve deleted the aberrant ‘e’ in the introduction, capitalized ‘Ranavirus’ in the discussion (line 265), and corrected ‘eIFα’ to ‘eIF2α’ throughout the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Additional minor edits or revisions:

      (23) Paragraph 3 of the Introduction gives the impression that most of the previous work on the PKR-virus arms race is speculative. However, it is one of the best-described and most convincing examples of virus-host arms races. Can the authors edit the paragraph accordingly?

      Thank you for bringing this to our attention. We have revised the third paragraph and strengthened the description of the evolutionary arms race between PKR and viral pseudosubstrate antagonists.

      (24) Introduction: PKR has "two" double-stranded RNA binding domains. Can the authors update the text accordingly?

      We have updated the manuscript to clarify PKR has two dsRNA binding domains (lines 44-45).

      (25) The authors test here for one of the key functions of PKR: cell growth/translation arrest. Because of PKR pleiotropy, the manuscript may be edited accordingly: For example, statements such as "We found few genetic variants render the PKR kinase domain nonfunctional" are too speculative as they may retain other (not tested here) functions.

      This is a great suggestion, we have revised the manuscript to specify our definition of nonfunction in the context of our experimental screen (lines 86-92 and 106-109) and acknowledge this limitation in our experimental screen (lines 304-307).

      (26) The authors should specify "vaccinia" K3 whenever appropriate.

      We appreciate this comment and have revised the manuscript to specify vaccinia K3 where appropriate (e.g. lines 62,66, 70, 80, 108, and 226).

      (27) Ref for ACE2 diversification may include Frank et al 2022 PMID: 35892217.

      Thank you for pointing us to this paper, we have included it as a reference in the manuscript (line 277).

      (28) Positive selection of PKR as referred to by the authors corresponds to analyses performed in primates. As shown by several studies, the sites under positive selection may vary according to host orders. Can the authors specify this ("primate") in their manuscript? And/or shortly discuss this aspect.

      Thank you for raising this point. In the manuscript we performed our analysis using vertebrate sites under positive selection as identified in Rothenburg et al. 2009 PMID 19043413 (lines 51 and figure legends). We performed the same analysis using sites under positive selection in primates (as identified by Elde et al. 2009 PMID 19043403) and again found a significant difference in PKR functional scores versus K3. We have revised the manuscript to clarify our use of vertebrate sites under positive selection (line 80-81).

      (29) We view deep mutational scanning experiments as a complementary approach to positive selection": The authors should edit this and acknowledge previous and similar work of other antiviral factors, in particular one of the first studies of this kind on MxA (Colon-Thillet et al 2019 PMID: 31574080), and TRIM5 (Tenthorey et al 2020 PMID: 32930662).

      Thank you for raising up these two papers, which we acknowledge in the revised manuscript (line 299).

      (30) We believe Figure S7 brings important results and should be placed in the Main.

      We appreciate this suggestion, and have moved the contents of the former supplementary Figure 7 to the main text, in Figure 6.

      (31) The title may specify "poxvirus".

      Thank you for the suggestion to specify the nature of our experiment, we have adjusted the title to: Systematic genetic characterization of the human PKR kinase domain highlights its functional malleability to escape a poxvirus substrate mimic (line 3).

      Reviewer #3 (Recommendations For The Authors):

      (32) No line numbers or page numbers are provided, which makes it difficult to comment.

      We sincerely apologize for this oversight and have included line numbers in our revised manuscript as well as the tracked changes document.

      (33) In the introduction, I recommend defining evolutionary arms races more clearly for a broad audience.

      Thank you for this suggestion. We have revised the manuscript in the first and third paragraphs to more clearly introduce readers to the concept of an evolutionary arms race.

      (34) The introduction could use a clearer statement of the question being considered and the gap in knowledge this paper is trying to address. Currently, the third paragraph includes many facts about PKR and the fourth paragraph jumps straight into the approach and results. Some elaboration here would convey the significance of the study more clearly. As is, the introduction reads a bit like "We wanted to do deep mutational scanning. PKR seemed like an ok protein to look at", rather than conveying a scientific question.

      This is a great suggestion to improve the introduction section. We have heavily revised the third and fourth paragraphs of the introduction to clarify the motivation, approach, and significance of our work.

      (35) Relatedly, did the authors have any hypotheses at the start of the experiment about what kinds of results they expected? e.g. What parts of PKR would be most likely to generate escape mutants? Would resistant mutants be rare or common? etc? This would help the reader to understand which results are expected vs. surprising.

      These are all great questions. We have revised the introduction of the manuscript to point out that previous studies have characterized a handful of PKR variants that evade vaccinia K3, and these variants were made at sites found to be under positive selection (lines 60-64).

      (36) A description of the different K3 variants and information about why they were chosen for study should also be added to the Introduction. It was not until Figure 5 that the reader was told that K3-H47R was the same as the 'enhanced' K3 allele you are testing.

      Thank you for bringing this to our attention, we have revised the introduction to clarify the experimental conditions (lines 65-67) and specify K3-H47R as the enhanced allele earlier in the manuscript (line 100).

      (37) Does every PKR include just a single point mutation? It would be nice to see data about the number and types of mutations in each PRK window added to Supplemental Figure 1.

      Thank you for the suggestion to improve this figure. Every PKR variant that we track has a single point mutation that generates a nonsynonymous mutation. In our PacBio sequencing of the PKR variant library we identified a few off-target variants or sequences with multiple variants, but we identified the barcodes linked to those constructs and discarded those variants in our analysis. We have revised Supplemental Figure 1 to include the number and types of mutations made at each PKR window.

      (38) In terms of the paper's logical flow, personally, I would expect to begin by testing which variants break PKR's function (Figure 3) and then proceeding to see which variants allow for K3 escape (Figure 2). Consider swapping the order of these sections.

      Thank you for this suggestion, and we can appreciate how the flow of the manuscript may be improved by swapping Figures 2 and 3. We have decided to maintain the current order of the figures because we use Figure 3 to emphasize the distinction of PKR sites that are nonfunctional versus susceptible to vaccinia K3.

      (39) Figure 3A seems like a less-informative version of Figure 4A, recommend combining these two. Same comment with Figure 5A and Figure 6A.

      We appreciate this specific feedback for the figures. Though there are similarities between figure panels (e.g. 3A and 4A) we use them to emphasize different points in each figure. For example, in Figure 3 we emphasize the general lack of variants that impair PKR kinase activity, and in Figure 4 we distinguish kinase-impaired variants from K3-susceptible variants. For this reason, and given space constraints, we have chosen to maintain the figures separately. We did decide to move the former Figure 6 to the supplement.

      (40) In general, it felt like there was a lot of repetition/re-graphing of the same data in Figures 3-6. I recommend condensing some of this, and/or moving some of the panels to supplemental figures.

      Thank you for your suggestion, we have revised the manuscript and have moved Figure 6 to Supplemental Figure 7.

      (41) In contrast, Supplemental Figure 7 is helpful for understanding the distribution of the data. Recommend moving to the main text.

      This is a great recommendation, and we have moved Supplemental Figure 7 into Figure 6.

      (42) How do the authors interpret an enrichment of positively selected sites in K3-resistant variants, but not K3-H74R-resistant variants? This seems important. Please explain.

      Thank you for this suggestion to improve the manuscript; we agree that this observation warranted further exploration. We found a strong correlation in PKR functional scores between K3 WT and K3-H47R, and with that we find sites under positive selection that are resistant to K3 WT are also resistant to K3-H47R. The lack of enrichment at positively selected sites appears to be caused by collapsed dynamic range between PKR wild-type-like and nonfunctional variants in the K3-H47R screen. We have revised the manuscript to clarify this point (line 202-204).

      (43) Discussion: The authors compare and contrast between PKR and ACE2, but it would be worth mentioning other examples of genes involved in antiviral arms races wherein flexible, unstructured loops are functionally important and are hotspots of positive selection (e.g. MxA, NLRP1, etc).

      We greatly appreciate this suggestion to improve the discussion. We note this contrast between the PKR kinase domain and the flexible linkers of MxA and NLRP1 in the revised manuscript (lines 273-274).

      (44) Speculation section: What is the host range of the vaccinia virus? Is it likely to be a generalist amongst many species' PKRs (and if so, how variable are those PKRs)? Would be worth mentioning for context if you want to discuss this topic.

      Thank you for raising this question. Vaccinia virus is the most well studied of the poxviruses, having been used as a vaccine to eradicate smallpox, and serves as a model poxvirus. Vaccinia virus has a broad host range, and though the name vaccinia derives from the Latin word “vacca” for cow the viruses origin remains uncertain (Smith 2007 https://doi.org/10.1007/978-3-7643-7557-7_1). has been used to eradicate smallpox as a vaccine and serves as a model poxvirus. Thought the natural host is unknown, it appears to be a general inhibitor of vertebrate PKRs The natural host of vaccinia virus is unknown, though there is some evidence to suggest it may be native to rabbits and does appear to be generalist.

      (45) Many papers in this field discuss interactions between PKR and K3L, rather than K3. I understand that this is a gene vs. protein nomenclature issue, but consider matching the K3L literature to make this paper easier to find.

      Thank you for bringing this to our attention. We have revised the manuscript to specify that vaccinia K3 is expressed from the K3L gene in both the abstract (line 26) and the introduction (line 56) to help make this paper easier to find when searching for “K3L” literature.

      (46) Which PKR sequence was used as the wild-type background?

      This is a great question. We used the predominant allele circulating in the human population represented by Genbank m85294.1:31-1686. We cite this sequence in the Methods (line 421) and have added it to the results section as well (lines 84).

      (47) Figure 1C: the black dashed line is difficult to see. Recommend changing the colors in 1A-1C.

      Thank you for this suggestion, we have changed the dashed lines from black to white to make them more distinguishable.

      (48) Figure 1D: Part of the point of this figure is to convey overlaps between sites under selection, K3 contact sites, and eIF2alpha contact sites, but at this scale, many of the triangles overlap. It is therefore impossible to tell if the same sites are contacted vs. nearby sites. Perhaps the zoomed-in panels showing each of the four windows in the subsequent figures are sufficient?

      Thank you for bringing this to our attention. We have scaled the triangles down to reduce their overlap in Figure 1D and list all sites of interest (predicted eIF2α and vaccinia contacts, conserved sites, and positive selection sites) in the Materials and Methods section “Predicted PKR complexes and substrate contacts”.

      (49) Figure 1E: under "1,293 Unique Combinations", there is a line between the PKR and K3 variants, which makes it look like they are expressed as a fusion protein. I believe these proteins were expressed from the same plasmid, but not as a fusion, so I recommend re-drawing. Then in the graph, the y-axis says "PKR abundance", but from the figure, it is not clear that this refers to relative abundance in a yeast pool. Perhaps "yeast growth" or similar would be clearer?

      Thank you for the specific feedback to improve Figure 1. We have made the suggested edits to clarify that PKR and vaccinia K3 are not fused but each is expressed from their own promoter. We have also changed the y-axis from “PKR Abundance” to “Yeast Growth”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) Correct capitalization errors, ensuring the first letter of each sentence is capitalized.

      Thank you for your comment. We have corrected capitalization errors.

      (2) Ensure that all technical terms and abbreviations are introduced in full when first mentioned and consistently used throughout the text.

      Thank you for your comment. we have checked and corrected the issue.

      (3) Review the manuscript for grammatical errors and improve sentence structures to enhance readability.

      Thank you for your comment. we have checked and corrected the issue.

      (4) Ensure all figures referenced in the text, such as Fig. 3G, are appropriately discussed and integrated into the narrative.

      Thank you for your comment. we have discussed and integrated Fig. 3G into the narrative (Page 12, Line 162-166).

      (5) Maintain consistent formatting, including first-line indentation and spacing before paragraphs, to improve the document's visual coherence.

      Thank you for your comment. we have checked and corrected the issue.

      (6) Provide additional explanations for the selection criteria of final model variables, particularly the rationale behind choosing the λ_1se criterion in the LASSO regression.

      Thank you for your comment. we have provided explanations for choosing the λ_1se criterion in the LASSO regression (Page 25, Line 315-316; Page 27, Line 363-364).

      (7) Conduct validation studies with cohorts from other high-altitude regions to assess the generalizability and robustness of the prediction models.

      Thank you for your comment. The lack of validation of cohorts from other high-altitude regions is a weakness in this study, and in our follow-up study, we will conduct external validation with cohorts from more other high-altitude regions to assess the generalizability and robustness of our prediction models.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, Bockorny, Muthuswamy, and Huang et al. performed proteomics analysis of plasma extracellular vesicles (EVs) from pancreatic ductal adenocarcinoma (PDAC) patients and patients with benign pancreatic diseases (chronic pancreatitis and intraductal papillary mucinous neoplasm, IPMN) to develop a 7-EV protein signature that predicts PDAC. Moreover, the authors identified PSMB4, RUVBL2, and ANKAR as being associated with metastasis. These studies provide important insight into alterations of EVs during PDAC progression and the data supporting predict PDAC with EV protein signatures are solid. However, there are certain concerns regarding the rigor and novelty of the data analysis and interpretation, as well as the clinical implications, as detailed below.

      (1) Plasma EVs were characterized by transmission electron microscopy and nanoparticle tracking analysis to confirm their morphology and size. The authors should also include an analysis of putative EV markers (e.g., tetraspanins, syntenin, ALIX, etc.) to confirm that the analyzed particles are EVs.

      We thank the reviewer for this comment. In the previous study from our co-authors who developed EVtrap method (PMID:32396726), they used electron microscopy and NTA , as well as quantification of typical EV protein markers, such as CD9, to confirm that particles isolated using EVtrap had typical characteristics of the extracellular vesicles. As such, these experiments were not replicated here. We added the following statement to the manuscript:

      “Previous analyses using electron microscopy and nanoparticle tracking also confirmed that the vast majority of particles isolated by EVtrap had diameters between 100-200 nm, consistent with exosomes (PMID:32396726). In addition, EVtrap isolates demonstrates higher abundance of CD9, a common exosome marker, as compared to isolates from other traditional EV isolation methods such as size exclusion chromatography and ultracentrifugation (PMID:32396726)”

      (2) The authors identified multiple over-expressed proteins in PDAC based on their foldchange and p-value; however, due to the heterogeneity of PDAC, it is necessary to show a heatmap displaying their abundance in all samples. High fold change does not necessarily indicate consistently high abundance in all PDAC samples.

      We thank the reviewer for this suggestion. We have now included the heatmap in the new Supplementary Figure 3.

      (3) PSMB4, RUVBL2, and ANKAR were identified as being associated with metastasis. The authors state that they intended to distinguish early and late-stage cancer samples, but it is unclear why they chose to compare metastatic and non-metastatic samples, as the non-metastatic group also includes late-stage cancer samples. This sentence should be rephrased to more accurately reflect the sample types profiled.

      We thank the reviewer for pointing this out. We would like to clarify that this analyses shown in Figures 3B and 3C pertain to patients with Metastatic vs Non-Metastatic disease, not early versus late stage. We edited the text to ensure this information is clear.

      (4) Non-metastatic and metastatic patients were separated based on global protein abundance. The samples within each group display significant heterogeneity, with some samples displaying similar patterns although they were classified into different groups (Figure 3A), and the samples within the same group, particularly the metastasis group, did not consistently exhibit similar patterns of protein abundance. The authors should clarify this point.

      We thank the reviewer for this comment. The EV proteomic expression is anticipated not to show the exact pattern across of samples of each group. The purpose of this experiment depicted in Figure 3 heatmap is to show the enrichment for pattern of expressions, but we acknowledge that not all samples from the same group have the exact proteome pattern.

      We added this statement in the discussion section:

      “As expected, the EV proteomic profiles of PDAC patients exhibited significant heterogeneity. While the above mentioned markers exhibited strong association with disease states at population levels, their abundances in individual patients varied significantly. Those observations highlight the need to develop multi-protein panels for pancreatic cancer diagnosis and prognosis.”

      (5) The authors performed the survival analysis on a set of EV proteins but did not specify the origin of these markers or how many markers were examined. The authors should show their abundances across different groups, such as different stages and metastasis status.

      We thank the reviewer for the comments. The goal of this experiment was not to identify EV proteins that performed similarly well for diagnosing and prognostication. In Figure 3A, 3B and 3C, we identified EV proteins that had better performance for diagnosis of metastatic disease. In these experiments we made  comparative analysis between patients with metastasis versus non-metastasis. In the experiment depicted in Figure 3D, the goal was to identify EV markers that had better performance is prognosticating outcomes as measured by overall survival, out of the markers identified in the previous experiments from Figure 3A. We would like to further clarify that based on our observation and others, it has become clear that EV profiles from cancer patients are highly heterogenous and we do not anticipate that a single marker will have sufficient test performance for cancer diagnosis or prognosis assessment when measured isolated. Rather, we anticipate that a panel of markers may yield better performance for diagnosis while a different combination of EV markers may have better performance for prognosis assessment.

      (6) The classification model yielded a 100% accuracy, which may refer to AUC, in their discovery cohort, but it decreased to 89% in the independent cohort. This suggests that the authors have encountered overfitting issues with their model, where it performed well on the discovery cohort but did not generalize well to the independent cohort. The authors should clarify this point. The AUC score of the 7-EV signature is 0.89 and is not equivalent to prediction accuracy. In order to demonstrate prediction accuracy, the authors should show the confusion matrix of training and testing data as well as other evaluation metrics, such as accuracy, precision, and recall.

      We thank the reviewer for providing these insightful comments. As you noted, the 7-biomarker signature machine learning model attained an impressive 100% accuracy within the internal Discovery Cohort, raising concerns about potential overfitting in the external validation dataset. Acknowledging the noted difference in AUROC of 0.11 in the external validation cohort, which surpasses the typical reported range of ~0.06-0.09, the model demonstrated a commendable AUROC of 0.89 in an independent patient cohort. Moreover, the utilization of an alternate technology to measure protein abundance in the validation dataset, underscores the model’s reproducibility and validity. We have provided the model metrics for both internal- and external-validation cohort. For these, please see updated Supplementary Figure 7, as well as the new Supplementary Figure 6 and Supplementary Figure 8. We also amended the discussion section to acknowledge that the validation cohort had limited sample size and proteins were measured in using a different method. Those factors likely contributed to the lower accuracy of predictions in the validation cohort. We addressed these limitations in the discussion section of the manuscript.

      (7) The authors should include more details of their model and the process of selection of signatures to enhance the reproducibility and transparency of their methods.

      We thank the reviewer for their valuable comments. To enhance clarity, we have incorporated additional information regarding the method employed for biomarker signature identification into the ‘Methods Section’ in page 23.  We note that Supplementary Table 7a provides details on ‘Sensitivity, Specificity, Precision, and AUC’ for the 16 markers included in the external validation study. Additionally, Supplementary Table 7b presents the contingency table for 7-biomarker signature, offering insights into model accuracy for both the Internal-Discovery and External Validation cohorts.  

      Reviewer #2 (Public Review):

      The authors intended to identify a protein signature in extracellular vesicles of serum to distinguish pancreatic ductal adenocarcinoma from benign pancreatic diseases.

      A major strength of the work presented is the valuable profiling of a significant number of patient samples, with a rich cohort of patients with pancreatic cancer, benign pancreatic diseases, and healthy controls. However, despite the strong cohorts presented, the numbers of patient samples for benign pancreatic diseases as well as controls were very limited.

      Also, the method used to isolate vesicles, EVTrap, recognizes double bilayers, which means that it can detect cellular debris and apoptotic bodies, which are very common in the circulation of patients that are undergoing chemotherapy. It would be important to identify the patients that are therapy naïve and the ones that are not because of this possible bias.

      We thank the Reviewer for these comments. We want to point out that the experiments presented in Supplementary Figure 1 (Transmission electron microscopy images and Nanoparticle tracking analysis) confirm that the vesicles isolated with EVTrap are not cellular debris and apoptotic bodies. Rather, these structures are in the nano range expected for exosomes. This is further supported by the additional work from our co-author and collaborator describing the development of EVtrap and its performance in isolating exosomes when compared to other traditional methods such as ultracentrifugation and size exclusion chromatography (PMID:32396726).

      As per the Reviewer’s request, we have provided an additional heatmap figure depicting whose patients are treatment naïve to differentiate from those who have received treatment (revised Figure 2C).

      Additionally, the transmission electron microscopy data reflect this heterogeneity of the samples, also with little identification of double bilayered vesicles. It would be important to identify some extracellular vesicles markers in those preparations to strengthen the quality of the samples analyzed.

      We appreciate the comment from the Reviewer and acknowledge the importance of identifying exosome markers on the isolate from EVtrap. These experiments have already been done and are reported in the original paper describing the development of this method by our co-authors in a separate work. In the manuscript PMID: 30080416, our collaborators demonstrated the detection of CD9, a well-known exosome marker, using Western Blot from isolates using EVtrap or ultra-centrifugation, a traditional technique to isolate exosomes. This work showed that EVtrap yielded much higher recovery rate of exosomes with lower contamination from soluble proteins. We did not repeat these already published experiments, but we amended our manuscript to reference these results.

      What is more, previously published work with this same methodology identifies around 2000 proteins per sample. It would be important to explain why in this study there seems to be a reduction in more than 50% of the amount of proteins identified in the vesicles.

      We thank the Reviewer for pointing out this important detail. In the previous work in which EVtrap was developed by our co-authors, the blood samples were processed using a different protocol, with shorter centrifugation (2,500g for 10 min) (PMID: 32396726). In the current work, we employed three centrifugation steps. As detailed in the Methods section of the manuscript, blood samples were centrifuged at 1,300g for 15 min. Then  plasma was removed from the top carefully avoiding cell pellet;  Repeat centrifugation of plasma at 2,500g for 15 min;  Again, plasma was removed from the top carefully avoiding cell pellet;  Third centrifugation at 2,500g for 15 min. This more extensive centrifugation process was intended to further increase the removal of platelets, apoptotic bodies, and other large particles and aggregates. Accordingly, we anticipate that the additional centrifugation steps decreased the contamination of our isolates but may have also decreased the amount of exosome proteins, hence the lower amount of exosome proteins identified in our study as compared to the original study from our co-authors (PMID: 32396726).

      One of the proteins that constantly surges on the analysis is KRT20. It would be important to proceed with the analysis by first filtering out possible contaminants of the proteomics, of which keratins are the most common ones.

      We thank the Reviewer for this comment. We would like to point out that we do believe that KRT20 is, in fact, cancer related and a not a contaminant. This is supported by our results presented in this manuscript showing enrichment or KRT20 in PDAC cases, and lower expression in benign samples. If this protein was a contaminant, its expression would be found uniformly in all samples, there would be no apparent reason for different expression between malignant vs benign cases, as all samples were processed following the same procedures. In addition, increased expression of KRT20 in PDAC tissues has also been reported by others. For instance, in a study by Schmiz-Winnthal  (PMID: 16364723), the authors showed that Cytokeratin 20 (KRT20) were expressed in 76% of PDAC patients and expression of KRT20 was associated with poor survival after surgical resection. Based on these observations, we believe that the KRT20 identified in our study is indeed a tumor associated EV protein rather than contamination.

      Finally, none of the 7-extracellular vesicle protein signatures has been validated by other techniques, such as western blot, in extracellular vesicles isolated by other, standard, methods, such as size exclusion chromatography.

      A distinct technique for protein analysis was done but not a different method of isolation of these vesicles. This would strengthen the results and the origin of the proteins.

      We appreciate the Reviewer’s comment. We would like to again emphasize that the goal of this manuscript was not to compare the performance of EVtrap with other traditional EV isolation approaches such as ultracentrifugation and size exclusion chromatography.  The main goal of study is to determine proteomic profiles of EVs isolated from clinical samples and provide such information to research community for further studies. As the Reviewer points out, proteins in EVs are highly heterogeneous which highlight the complexity of EV biology and interpatient heterogeneity of pancreatic cancer.  We do not anticipate the development of EV-based markers for pancreatic diagnosis can be achieved by a single team, but by a community of researchers. We hope information presented in the current study will help other researchers identify additional candidates for validation in future work. Nonetheless, we edited the manuscript to discuss the limitation of not doing cross-validation of protein detection using a different method.

      The conclusions that are reached do not fully meet the proposed aims of the identification of a protein signature in circulating extracellular vesicles that could improve early detection of the disease. The authors did not demonstrate the superiority of detection of these proteins in extracellular vesicles versus simply performing an ELISA, nor their superiority with respect to the current standard procedure for diagnosis.

      We would like to clarify to the Reviewer that the goal of this manuscript was not to prove superiority of the EV signature biomarker in diagnosing pancreatic cancer as compared to current standard of care (SOC) practice, i.e., CT scans, endoscopic ultrasound and CA19-9. In order to prove such superiority, one would require a large, randomized phase III trial with several hundred patients. This was not the pursue of our discovery EV proteomics study and we double checked our manuscript to ensure no such claim was made. Rather, we aimed at developing a new pipeline for discovery of new EV biomarkers and we believe we were able to prove that this approach was successful in discovering a new class of biomarkers based on proteins expressed on extra-cellular vesicles that have predominant expression on patients with pancreatic cancer. Future studies should continue to advance this field with goals of improving on the current standard of care diagnostic methods.

      The authors also suggest that profiling of circulating extracellular vesicles provides unique insights into systemic immune changes during pancreatic cancer development. How is this better than a regular hemogram is not clear.

      We would like to clarify that the overall goal of this study is to provide patient-relevant information for the research community to further investigate biology of extracellular vesicles. For the state 'unique insights into systemic immune changes' we referred to the fact that we discovered EVs carrying proteins involved in immune responses. Previous studies have shown that EVs play important roles in cell-cell communication, discoveries from our study provide candidates for future studies on cellular mechanisms underlying immune regulation during pancreatic cancer development.

      Finally, it would be important to determine how this signature compares with many others described in the literature that have the exact same aim. Why and how would this one be better?

      We would like to again clarify that comparing the diagnostic performance of the EV biomarkers discovered in the study against standard of care methods (CA19-9, ctDNA, CT scan) was beyond the scope of this discovery EV proteomics work. We reviewed the manuscript to ensure that no claims were made as far as superiority against point-of-care tests available in clinic.

      Reviewer #3 (Public Review):

      This work investigates the use of extracellular vesicles (EVs) in blood as a noninvasive 'liquid biopsy' to aid in the differentiation of patients with pancreatic cancer (PDAC) from those with benign pancreatic disease and healthy controls, an important clinical question where biopsies are frequently non-diagnostic. The use of extracellular vesicles as biomarkers of disease has been gaining interest in recent history, with a variety of published methods and techniques, looking at a variety of different compositions ('the molecular cargo') of EVs particularly in cancer diagnosis (Shah R, et al, N Engl J Med 2018; 379:958-966).

      This study adds to the growing body of evidence in using EVs for earlier detection of pancreatic cancer, identifying both new and known proteins of interest. Limitations in studying EVs, in general, include dealing with low concentrations in circulation and identifying the most relevant molecular cargo. This study provides validation of assaying EVs using the novel EVtrap method (Extracellular Vesicles Total Recovery And Purification),which the authors show to be more efficient than current standard techniques and potentially more scalable for larger clinical studies.

      The strength of this study is in its numbers - the authors worked with a cohort of 124 cases,93 of them which were PDAC samples, which are considered large for an EV study (Jia, E etal. BMC Cancer 22, 573 (2022)). The benign disease group (n=20, between chronic pancreatitis and IPMNs) and healthy control groups (n=11) were relatively small, but the authors were not only able to identify candidate biomarkers for diagnosis that clearly stood out in the PDAC cohort, but also validate it in an independent cohort of 36 new subjects.

      Proteins they have identified as associated with pancreatic cancer over benign disease included PDCD6IP, SERPINA12, and RUVBL2. They were even able to identify a set of EV proteins associated with metastasis and poorer prognosis, which include the proteins PSMB4, RUVBL2 and ANKAR and CRP, RALB and CD55. Their 7-EV protein signature yielded an 89% prediction accuracy for the diagnosis of PDAC against a background of benign pancreatic diseases that is compelling and comparable to other studies in the literature (Jia,E. et al. BMC Cancer 22, 573 (2022)).

      The limitations of this study are its containment within a single institution - further studies are warranted to apply the authors' 7-EV protein PRAC panel to multiple other cases at other institutions in a larger cohort.

      We are very thankful to the Reviewer for the positive feedback. We are similarly optimistic that EV-based biomarkers will assist future researchers to develop better diagnostic assays for patients with pancreatic cancer, as well as other tumor types lacking accurate blood-based tests.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study evaluates whether species can shift geographically, temporally, or both ways in response to climate change. It also teases out the relative importance of geographic context, temperature variability, and functional traits in predicting the shifts. The study system is large occurrence datasets for dragonflies and damselflies split between two time periods and two continents. Results indicate that more species exhibited both shifts than one or the other or neither, and that geographic context and temp variability were more influential than traits. The results have implications for future analyses (e.g. incorporating habitat availability) and for choosing winner and loser species under climate change. The methodology would be useful for other taxa and study regions with strong community/citizen science and extensive occurrence data.

      We thank Reviewer 1 for their time and expertise in reviewing our study. The suggestions are very helpful and will improve the quality of our manuscript.

      Strengths:

      This is an organized and well-written paper that builds on a popular topic and moves it forward. It has the right idea and approach, and the results are useful answers to the predictions and for conservation planning (i.e. identifying climate winners and losers). There is technical proficiency and analytical rigor driven by an understanding of the data and its limitations.

      We thank Reviewer 1 for this assessment.

      Weaknesses:

      (1) The habitat classifications (Table S3) are often wrong. "Both" is overused. In North America, for example, Anax junius, Cordulia shurtleffii, Epitheca cynosura, Erythemis simplicicollis, Libellula pulchella, Pachydiplax longipennis, Pantala flavescens, Perithemis tenera, Ischnura posita, the Lestes species, and several Enallagma species are not lotic breeding. These species rarely occur let alone successfully reproduce at lotic sites. Other species are arguably "both", like Rhionaeschna multicolor which is mostly lentic. Not saying this would have altered the conclusions, but it may have exacerbated the weak trait effects.

      We thank the reviewer for their expertise on this topic. We obtained these habitat classifications from field guides and trait databases, and we will review our primary sources to clarify the trait classifications. We will also reclassify the species according to the expertise of this reviewer and perform our analysis again. 

      (2) The conservative spatial resolution (100 x 100 km) limits the analysis to wide- ranging and generalist species. There's no rationale given, so not sure if this was by design or necessity, but it limits the number of analyzable species and potentially changes the inference.

      It is really helpful to have the opportunity to contextualize study design decisions like this one, and we thank the reviewer for the query. Sampling intensity is always a meaningful issue in research conducted at this scale, and we addressed it head-on in this work.

      Very small quadrats covering massive geographical areas will be critically and increasingly afflicted by sampling weaknesses, as well as creating a potentially large problem with pseudoreplication. There is no simple solution to this problem. It would be possible to create interpolated predictions of species’ distributions using Species Distribution Models, Joint Species Distribution Models, or various kinds of Occupancy Models. None of these approaches then leads to analyses that rely on directly observed patterns. Instead, they are extrapolations, and those extrapolations typically fail when tested, (for example, papers by Lee-Yaw demonstrate that it is rare for SDMs to predict things well; occupancy models often perform less well than SDMs and do not capture how things change over time - Briscoe et al. 2021, Global Change Biology). The result of employing such techniques would certainly be to make all conclusions speculative, rather than directly observable. 

      Rather than employing extrapolative models, we relied on transparent techniques that are used successfully in the core macroecology literature that address spatial variation in sampling explicitly and simply. Moreover, we constructed extensive null models that show that range and phenology changes, respectively, are contrary to expectations that arise from sampling difference. 100km quadrats make for a reasonable “middle-ground” in terms of the effects of sampling, and we will add a reference to the methods section to clarify this.

      (3) The objective includes a prediction about generalists vs specialists (L99-103) yet there is no further mention of this dichotomy in the abstract, methods, results, or discussion.

      Thank you for pointing this out - it is an editing error that should have been resolved prior to submission. We will replace the terms specialist and generalist with specific predictions based on traits.

      (4) Key references were overlooked or dismissed, like in the new edition of Dragonflies & Damselflies model organisms book, especially chapters 24 and 27.

      We thank Reviewer 1 for making us aware of this excellent reference. We will review this text and include it as a reference, in addition to other references recommended by Reviewer 1 and other reviewers.

      Reviewer #2 (Public review):

      Summary:

      This paper explores a highly interesting question regarding how species migration success relates to phenology shifts, and it finds a positive relationship. The findings are significant, and the strength of the evidence is solid. However, there are substantial issues with the writing, presentation, and analyses that need to be addressed. First, I disagree with the conclusion that species that don't migrate are "losers" - some species might not migrate simply because they have broad climatic niches and are less sensitive to climate change. Second, the results concerning species' southern range limits could provide valuable insights. These could be used to assess whether sampling bias has influenced the results. If species are truly migrating, we should observe northward shifts in their southern range limits. However, if this is an artifact of increased sampling over time, we would expect broader distributions both north and south. Finally, Figure 1 is missed panel B, which needs to be addressed.

      We thank Reviewer 2 for their time and expertise in reviewing our study.

      It is possible that some species with broad niches may not need to migrate, although in general failing to move with climate change is considered an indicator of “climate debt”, signaling that a species may be of concern for conservation (ex. Duchenne et al. 2021, Ecology Letters). We will revise the discussion to acknowledge potential differences in outcomes.

      We used null models to test whether our results regarding range shifts were robust, and if they varied due to increased sampling over time. We found that observed northern range limit shifts are not consistent with expectations derived from changes in sampling intensity (Figure S1, S2). 

      We thank Reviewer 2 for pointing out this error in Figure 1. This conceptual figure was a challenge to construct, as it must illustrate how phenology and range shifts can occur simultaneously or uniquely to enable a hypothetic odonate to track its thermal niche over time. In a previous version of the figure, we had a second panel and we failed to remove the reference to that panel when we simplified the figure. 

      Reviewer #3 (Public review):

      Summary:

      In their article "Range geographies, not functional traits, explain convergent range and phenology shifts under climate change," the authors rigorously investigate the temporal shifts in odonate species and their potential predictors. Specifically, they examine whether species shift their geographic ranges poleward or alter their phenology to avoid extreme conditions. Leveraging opportunistic observations of European and North American odonates, they find that species showing significant range shifts also exhibited earlier phenological shifts. Considering a broad range of potential predictors, their results reveal that geographical factors, but not functional traits, are associated with these shifts.

      We thank Reviewer 3 for their expertise and the time they spent reviewing our study. Their suggestions are very helpful and will improve the quality of our manuscript.

      Strengths:

      The article addresses an important topic in ecology and conservation that is particularly timely in the face of reports of substantial insect declines in North America and Europe over the past decades. Through data integration the authors leverage the rich natural history record for odonates, broadening the taxonomic scope of analyses of temporal trends in phenology and distribution to this taxon. The combination of phenological and range shifts in one framework presents an elegant way to reconcile previous findings improving our understanding of the drivers of biodiversity loss.

      We thank Reviewer 3 for this assessment.

      Weaknesses:

      The introduction and discussion of the article would benefit from a stronger contextualization of recent studies on biological responses to climate change and the underpinning mechanism.

      The presentation of the results (particularly in figures) should be improved to address the integrative character of the work and help readers extract the main results. While the writing of the article is generally good, particularly the captions and results contain many inconsistencies and lack important detail. With the multitude of the relationships that were tested (the influence of traits) the article needs more coherence.

      We thank Reviewer 3 for these suggestions. We will revise the introduction and discussion to better contextualize species’ responses to climate change and the mechanisms behind them. We will carefully review all figures and captions, and we will make changes to improve the clarity of the text and the presentation of results.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors show that the Gαs-stimulated activity of human membrane adenylyl cyclases (mAC) can be enhanced or inhibited by certain unsaturated fatty acids (FA) in an isoform-specific fashion. Thus, with IC50s in the 10-20 micromolar range, oleic acid affects 3-fold stimulation of membrane-preparations of mAC isoform 3 (mAC3) but it does not act on mAC5. Enhanced Gαs-stimulated activities of isoforms 2, 7, and 9, while mAC1 was slightly attenuated, but isoforms 4, 5, 6, and 8 were unaffected. Certain other unsaturated octadecanoic FAs act similarly. FA effects were not observed in AC catalytic domain constructs in which TM domains are not present. Oleic acid also enhances the AC activity of isoproterenol-stimulated HEK293 cells stably transfected with mAC3, although with lower efficacy but much higher potency. Gαs-stimulated mAC1 and 4 cyclase activity were significantly attenuated in the 20-40 micromolar by arachidonic acid, with similar effects in transfected HEK cells, again with higher potency but lower efficacy. While activity mAC5 was not affected by unsaturated FAs, neutral anandamide attenuated Gαs-stimulation of mAC5 and 6 by about 50%. In HEK cells, inhibition by anandamide is low in potency and efficacy. To demonstrate isoform specificity, the authors were able to show that membrane preparations of a domain-swapped AC bearing the catalytic domains of mAC3 and the TM regions of mAC5 are unaffected by oleic acid but inhibited by anandamide. To verify in vivo activity, in mouse brain cortical membranes 20 μM oleic acid enhanced Gαs-stimulated cAMP formation 1.5-fold with an EC50 in the low micromolar range.

      Strengths:

      (1) A convincing demonstration that certain unsaturated FAs are capable of regulating membrane adenylyl cyclases in an isoform-specific manner, and the demonstration that these act at the AC transmembrane domains.

      (2) Confirmation of activity in HEK293 cell models and towards endogenous AC activity in mouse cortical membranes.

      (3) Opens up a new direction of research to investigate the physiological significance of FA regulation of mACs and investigate their mechanisms as tonic or regulated enhancers or inhibitors of catalytic activity.

      (4) Suggests a novel scheme for the classification of mAC isoforms.

      Weaknesses:

      (1) Important methodological details regarding the treatment of mAC membrane preps with fatty acids are missing.

      We will address this issue in more detail.

      (2) It is not evident that fatty acid regulators can be considered as "signaling molecules" since it is not clear (at least to this reviewer) how concentrations of free fatty acids in plasma or endocytic membranes are hormonally or otherwise regulated.

      Although this question is not the subject of this ms., we will address this question in more detail in the discussion of the revision.

      Reviewer #2 (Public review):

      Summary:

      The authors extend their earlier findings with bacterial adenylyl cyclases to mammalian enzymes. They show that certain aliphatic lipids activate adenylyl cyclases in the absence of stimulatory G proteins and that lipids can modulate activation by G proteins. Adding lipids to cells expressing specific isoforms of adenylyl cyclases could regulate cAMP production, suggesting that adenylyl cyclases could serve as 'receptors'.

      Strengths:

      This is the first report of lipids regulating mammalian adenylyl cyclases directly. The evidence is based on biochemical assays with purified proteins, or in cells expressing specific isoforms of adenylyl cyclases.

      Weaknesses:

      It is not clear if the concentrations of lipids used in assays are physiologically relevant. Nor is there evidence to show that the specific lipids that activate or inhibit adenylyl cyclases are present at the concentrations required in cell membranes. Nor is there any evidence to indicate that this method of regulation is seen in cells under relevant stimuli.

      Although this question is not the subject of this ms., we will address this question in more detail in the discussion of the revision.

      Reviewer #3 (Public review):

      Summary:

      Landau et al. have submitted a manuscript describing for the first time that mammalian adenylyl cyclases can serve as membrane receptors. They have also identified the respective endogenouse ligands which act via AC membrane linkers to modify and control Gs-stimulated AC activity either towards enhancement or inhibition of ACs which is family and ligand-specific. Overall, they have used classical assays such as adenylyl cyclase and cAMP accumulation assays combined with molecular cloning and mutagenesis to provide exceptionally strong biochemical evidence for the mechanism of the involved pathway regulation.

      Strengths:

      The authors have gone the whole long classical way from having a hypothesis that ACs could be receptors to a series of MS studies aimed at ligand indentification, to functional studies of how these candidate substances affect the activity of various AC families in intact cells. They have used a large array of techniques with a paper having clear conceptual story and several strong lines of evidence.

      Weaknesses:

      (1) At the beginning of the results section, the authors say "We have expected lipids as ligands". It is not quite clear why these could not have been other substances. It is because they were expected to bind in the lipophilic membrane anchors? Various lipophilic and hydrophilic ligands are known for GPCR which also have transmembrane domains. Maybe 1-2 additional sentences could be helpful here.

      Will be done as suggested.

      (2) In stably transfected HEK cells expressing mAC3 or mAC5, they have used only one dose of isoproterenol (2.5 uM) for submaximal AC activation. The reference 28 provided here (PMID: 33208818) did not specifically look at Iso and endogenous beta2 adrenergic receptors expressed in HEK cells. As far as I remember from the old pharmacological literature, this concentration is indeed submaximal in receptor binding assays but regarding AC activity and cAMP generation (which happen after signal amplification with a so-called receptor reserve), lower Iso amounts would be submaximal. When we measure cAMP, these are rather 10 to 100 nM but no more than 1 uM at which concentration response dependencies usually saturate. Have the authors tried lower Iso concentrations to prestimulate intracellular cAMP formation? I am asking this because, with lower Iso prestimulation, the subsequent stimulatory effects of AC ligands could be even greater.

      The best way to address this issue is to establish a concentration-response curve for Iso-stimulated cAMP formation using the permanently transfected cells. We note that in the past isoproterenol concentrations used in biochemical or electrophysiological experiments differed substantially.

      (3) The authors refer to HEK cell models as "in vivo". I agree that these are intact cells and an important model to start with. It would be very nice to see the effects of the new ligands in other physiologically relevant types of cells, and how they modulate cAMP production under even more physiological conditions. Probably, this is a topic for follow-up studies.

      The last sentence is correct.

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The authors have achieved their aims to a very high degree, their results do nicely support their conclusions. There is only one point (various classical GPCR concentrations, please see above) that would be beneficial to address.

      Without any doubt, this is a groundbreaking study that will have profound implications in the field for the next years/decades. Since it is now clear that mammalian adenylyl cyclases are receptors for aliphatic fatty acids and anandamide, this will change our view on the whole signaling pathway and initiate many new studies looking at the biological function and pathophysiological implications of this mechanism. The manuscript is outstanding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      It is not clear from the methods section how free FAs were applied to membrane preparations or HEK293 cells. Were FAs solubilized in organic solvents, or introduced as micelles?

      The requested info is inserted into the M&M section

      Could the authors comment on what is known about the concentration of oleic acid and other non-saturated fatty acids in plasma membranes relative to those required to produce allosteric effects on cyclase activity?

      This info is now included in the last paragraph of the discussion.

      It would be worthwhile to test the effect of FAs on basal (not Gαs-stimulated) activity of mACs.

      This has been carried with mAC isoforms 2, 3, 7, and 9 in which oleic acid enhances Gsα-stimulated activity. Due to the low levels of basal activities interpretable data were not obtained.

      Do triglycerides esterified with oleic acid stimulate mAC3 and other sensitive isoforms?

      Experiments were done with triolein and 2-oleoyl-glycerol (the answer is no). The data are presented in Fig. 3 and in the appendix Fig.’s 8, 9, 14; structural formulas in appendix 2 Fig. 4 were updated.

      Does the quantity plotted on the vertical axis of Figure 1, right panel represent "Fractional Stimulation by Oleic acid" rather than simply "Fold Stimulation"? Clearly, as shown in the two left-most panels, Gαs stimulates both mAC and mAC5. Rather it seems that the ratio (oleic acid stimulation) / (Gαs stimulation) remains constant. This observation supports the statement in the discussion that "We suppose that in mAC3 the equilibrium of two differing ground states favors a Gαs-unresponsive state and the effector oleic acid concentration-dependently shifts this equilibrium to a Gαs-responsive state". It could also be said that the effect of oleic acid is additive, and in constant proportion to that of Gαs.

      This comment certainly is related to Fig. 2:

      The ratio would be (Gsα + oleic acid stimulation) / (Gsα-stimulation), i.e., fractional stimulation by addition of oleic acid is identical to fold stimulation.

      We have amended the legend to fig. 2C for clarification.

      The last sentence is wrong because oleic acid alone does not stimulate.

      It is stated on page 3, 2nd to last line that "The action of oleic acid on mAC3 was instantaneous...". Since the earliest time point is taken at 5 minutes, the claim that the action of the lipid is instantaneous cannot be made. Information about kinetics would be useful to have, since it is possible that the lipid must be released from a micelle and be incorporated into the AC membrane fraction before it is active.

      The first point is 3 min.

      We deleted the word “instantaneous” and added the correlation coefficients for both conditions in the legend to appendix 2; fig. 1 for clarification.

      The data spread in Figure 4 and other figures showing similar data is significant, to the extent that the computed value for EC50 may not be of high precision. Authors should cite the correlation coefficient for the overall fit and uncertainty for the EC50 value (in addition to significances by t-test of individual data points).

      This will not add valuable information. Pearsons correlation coefficients are only for linear relationships.

      (cf. N.N. Kachouie, W. Deebani (2020) Association Factor for Identifying Linear and Nonlinear Correlations in Noisy Conditions. Entropy 22:440)

      The "switch" between relatively low potency and high efficacy in membrane preps to high potency and low efficacy in cells is remarkable. Could this have a methodological basis or is it reflective of the mechanism by which FAs access mACs in membrane preps vs. cell membranes, or perhaps some biochemical transformation of the lipid in cells?

      Honestly, we do not know.

      The authors should note that there is some precedence for this work:

      J Nakamura , N Okamura, S Usuki, S Bannai, Inhibition of adenylyl cyclase activity in brain membrane fractions by arachidonic acid and related unsaturated fatty acids. Arch Biochem Biophys. 2001 May 1;389(1):68-76. doi: 10.1006/abbi.2001.2315.

      The effects of FA deficiencies on AC and related activities have been noted:

      Alam SQ, Mannino SJ, Alam BS, McDonough K Effect of essential fatty acid deficiency on forskolin binding sites, adenylate cyclase, and cyclic AMP-dependent protein kinase activity, the levels of G proteins and ventricular function in rat heart. J Mol Cell Cardiol. 1995 Aug;27(8):1593-604. doi: 10.1016/s0022-2828(95)90491-3. PMID: 8523422

      The latter publications are supportive of, and provide context to, the author's findings.

      Both references are mentioned and cited.

      Minor points:

      The significance of the coloring scheme in Figure 5C bar graph should be stated in the legend.

      Done.

      In the introduction, it is stated that "The protein displayed two similar catalytic domains (C1 and C2) and two dissimilar hexahelical membrane anchors (TM1 and TM2)". In both cases, the respective domains can be said to be similar in overall fold, but - certainly in the case of the catalytic domains - different in amino acid sequence in functionally important regions of the domain.

      Done: Changed wording.

      The statement in the introduction that "The domain architecture, TM1-C1-TM2-C2, clearly indicated a pseudoheterodimeric protein composed of two concatenated bacterial precursor proteins" The authors refer to the fact that mammalian enzymes are pseudo heterodimers whereas bacterial type III cyclases are dimers of identical subunits.

      Done.

      Reviewer #2 (Recommendations for the authors):

      The title need not state that a 'new class of receptors' has been identified. There is no direct evidence that the lipids bind to the enzymes, and the affinities can only be surmised from the EC50 graphs. To call a protein a receptor requires evidence to show that the binding is specific by showing that binding can be inhibited by a large excess of 'unlabelled' ligand. This could have been done by procuring labelled lipids for experimental verification.

      As is well known, lipids easily bind to proteins. In this study no purified proteins were used. Therefore, binding assays most likely would result in unreliable data.

      The paper would have benefitted from showing sequence alignments in the TM domains of the ACs discussed in the paper. Further, a phylogenetic tree of mammalian ACs would also reveal which enzymes from other species may be regulated similarly to those described in the paper. This would be important for researchers who use other model organisms to study cAMP signalling.

      Such data are in multiple papers accessible in the literature. Where deemed appropriate we inserted references.

      Figures 1A and 1B show data from only two experiments. A third experiment would have been useful in order to show the statistical significance of the data.

      At this stage more experiments would not have affected further experimental plans.

      Statements made in the text (for example, the last paragraph on page 6) state only the mean value and not the SDs. This would have been important to include even if the data is shown in the appendix. The same is true in the Legend of Figure 2. Why have the authors decided to use SEM and not SDs?

      The reason is specified in M&M.

      Concentrations of lipids used in biochemical assays are in the micromolar range. This suggests that we have moderate affinity binding, more in the range of an enzyme for a substrate rather than a receptor-ligand interaction.

      We happen to disagree. Clearly, the differential activities, enhancing or attenuating Gsα-stimulated mAC activities is most plausibly explained by mAC receptor properties. mACs have enzyme activities using fatty acids as substrates.

      The authors add lipids to cells and show changes in cAMP levels in their presence and absence. They also discuss how these extracellular lipids could be produced. Do you think this is necessary in vivo, though? Could the lipids present in membranes naturally act as regulators? Do specific lipid concentrations differ in different cell types, suggesting tissue-specific regulation of these mammalian Acs?

      These are things that could be discussed in the manuscript.

      The last paragraph of the discussion deals with these questions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The extra macrochaetae (emc) gene encodes the only Inhibitor of DNA binding protein (Id protein) in Drosophila. Its best-known function is to inhibit proneural genes during development. However, the emc mutants also display nonproneural phenotypes. In this manuscript, the authors examined four non-proneural phenotypes of the emc mutants and reported that they are all caused by inappropriate non-apoptotic caspase activity. These non-neuronal phenotypes are: reduced growth of imaginal discs, increased speed of the morphogenetic furrow, and failure to specify R7 photoreceptor neurons and cone cells during eye development. Double mutants between emc and either H99 (which deletes the three pro-apoptotic genes reaper, grim, and hid) or the initiator caspase dronc suppress these mutant phenotypes of emc suggesting that the cell death pathway and caspase activity are mediating these emc phenotypes. In previous work, the authors have shown that emc mutations elevate the expression of ex which activates the SHW pathway (aka the Hippo pathway). One known function of the SHW pathway is to inhibit Yorkie which controls the transcription of the inhibitor of apoptosis, Diap1. Consistently, in emc clones the levels of Diap1 protein are reduced which might explain why caspase activity is increased in emc clones giving rise to the four non-neural phenotypes of emc mutants.

      However, this increased caspase activity is not causing ectopic apoptosis, hence the authors propose that this is nonapoptotic caspase activity. In the last part of the manuscript, the authors ruled out that Wg, Dpp, and Hh signaling are the target of caspases, but instead identified Notch signaling as the target of caspases, specifically the Notch ligand Delta. Protein levels of Delta are increased in emc clones in an H99- and dronc-dependent manner. The authors conclude that caspase-dependent non-apoptotic signaling underlies multiple roles of emc that are independent of proneural bHLH proteins.

      Strengths:

      Overall, this is an interesting manuscript and the findings are intriguing. It adds to the growing number of non-apoptotic functions of apoptotic proteins and caspases in particular. The manuscript is well written and the data are usually convincingly presented.

      Weaknesses:

      (1)  One major concern I have is the observation by the authors in Figure 3C in which protein levels of Diap1 are still reduced in emc H99 double mutant clones. If Diap1 is still reduced in these clones, shouldn't caspases still be derepressed? Given that emc H99 double mutants rescue all emc phenotypes examined, the observation that Diap1 levels are still reduced in emc H99 clones is inconsistent with the authors' model. The authors need to address this inconsistency.

      The effect of H99 emc clones on Diap1 protein levels is consistent with our conclusions.  The reviewer’s concern probably relates to previous work that shows that RHG proteins act by antagonizing DIAP1, so that Diap1 is epistatic to RHG (PMID:10481910), and that RHG proteins affect DIAP1 protein levels, and in particular that HID promotes DIAP1 ubiquitylation leading to its destruction (PMID:12021767).  First, epistasis means that in the absence of DIAP1, RHG levels do not affect cell survival.  DIAP1 protein is not absent in emc/emc eye clones, however, it is reduced.  It is not only possible but expected that RHG levels would affect survival when DIAP1 levels are only reduced.  Secondly, we did not see a difference in DIAP1 levels between H99/H99 clones and H99/+ cells within the same specimen, suggesting that rpr, grim and hid might not affect DIAP1 levels. It is possible that Hid protein only affects DIAP1 levels when overexpressed, as in the aforementioned paper (PMID:12021767), and that physiological RHG levels affect DIAP1 activity.  The H99 deficiency also eliminates Rpr and Grim, which may affect DIAP1 without ubiquitylating it. In our experiments, however, there are no cells completely wild type for the H99 region for comparison in the same specimen, so our results do not rule out the H99 deletion having a dominant effect on DIAP1 levels both inside and outside the clones.  What our data clearly showed is that emc affected DIAP1 levels independently of any potential RHG effect, and we hypothesized this was through diap1 transcription, because we showed previously that emc affects yki, a transcriptional regulator of the diap1 gene, but we have not demonstrated transcriptional regulation of diap1 directly in emc clones.  We modified the manuscript to better delineate these issues (lines 275-284).    

      (2) Are Diap1 protein levels reduced in all emc clones, including clones anterior to the furrow? This is difficult to see in Figure 3B. it is also recommended to look in emc mosaic wing discs.

      We now mention that DIAP1 levels were only reduced in  emc clones posterior to the morphogenetic furrow, not anterior to the morphogenetic furrow or in emc clones in wing imaginal discs (lines 284-5) and Figure 3 supplement 1.  

      (3) The authors speculate that Delta may be a direct target of caspase cleavage (Figure 9B), but then rule it out for a good reason. However, I assume that the increased protein levels of Delta in emc clones (Figure 7) are the results of increased transcription. In that case, shouldn't caspases control the transcriptional machinery leading to Delta expression?

      Thank you for suggesting that caspases control the transcription of Dl.  We added this possibility to the manuscript (lines 499-500).  At one time there was a Dl-LacZ transcriptional reporter, which would have made it straightforward to assess Dl transcription in emc clones, but this strain does not seem to exist now.  We have not attempted in situ hybridization to Dl transcripts in mosaic discs.  

      (4) How does caspase activity in emc clones cause reduced growth? Is this also mediated through Delta signaling?

      We do not know what is the caspase target responsible for reduced growth in wing discs.

      (5) Figure 1M: Is there a similar result with emc dronc mosaics?

      The emc dronc clones do not show as dramatic a growth advantage in a Minute background.  This is consistent with the smaller effect of emc dronc in the non-Minute background also (Figure 1N).  We mention this in the revised paper (lines 232-3).     

      Reviewer #2 (Public Review):

      Id proteins are thought to function by binding and antagonizing basic helix-loop-helix (bHLH) transcription factors but new findings demonstrate roles for emc including in tissues where no proneural (Drosophila bHLH) genes are known to function. The authors propose a new mechanism for developmental regulation that entails restraining new/novel non-apoptotic functions of apoptotic caspases.

      Specifically, the data suggest that loss of emc leads to reduced expression of diap1 and increased apoptotic caspase activity, which does not induce apoptosis but elevates Delta expression to increase N activity and cause developmental defects. Indeed, many of the phenotypes of emc mutant clones can be rescued by a chromosomal deficiency that reduces caspase activation or by mutations in the initiator caspase Dronc. A related manuscript that shows that loss of emc results in increased da, linked previously to diap1 expression, provides supporting data. There is increasing appreciation that apoptotic caspases have non-apoptotic roles. This study adds to the emerging field and should be of interest to readers.

      The data, for the most part, support the conclusions but I do have concerns about some of the data and the interpretations that should be addressed.

      Reviewer #3 (Public Review):

      The work extends earlier studies on the Drosophila Id protein EMC to uncover a potential pathway that explains several tissue-scale developmental abnormalities in emc mutants. It also describes a non-apoptotic role for caspases in cell biology.

      Strengths:

      The work adds to an emerging new set of functions for caspases beyond their canonical roles as cell death mediators. This novelty is a major strength as well as its reliance on genetic-based in vivo study. The study will be of interest to those who are curious about caspases in general.

      Weaknesses:

      The manuscript relies on imaging experiments using genetic mosaic imaginal discs. It is for the most part a qualitative analysis, showing representative samples with a small number of mutant clones in each. Although the senior author has a long track record of using experiments like this to rigorously discover regulatory mechanisms in this system, it is straightforward in 2023 to use Fiji and other image analysis tools to measure fluorescence. Such measurements could be done for all replicate clones of a given genotype as well as genetic control sampling. These could be presented in plots that would not only provide quantitative and statistical measurements, but will be more reader- friendly to those who are not fly people.

      We added quantification of anti-Delta and anti-Diap1 levels to the manuscript (Figures 3E and 7E).  We agree that this facilitates statistical confirmation of the results and may be more accessible to non-experts.  We do have concerns that these quantifications might be given too much weight.  For example, we cannot measure the background level of anti-DIAP1 labeling by labeling diap1 null mutant cells, because such cells do not survive.  Although we measure ~20% reduction in emc clones in the eye disc, and none in the wing disc, both measures could be underestimates if some of the labeling is non-specific, as is very possible.  We discuss this in the Methods (lines 166-9).

      Likewise, more details are needed to describe how clone areas were measured in Figure 1. Did they measure each clone and its twin spot, and then calculate the area ratio for each clone and its paired twin spot? This would be the correct way to analyze the data, yielding many independent measurements of the ratio. And doing so would obviate the need to log transform the data which is inexplicable unless they were averaging clones and twins within a disc and making replicates. More explanation is needed and if they indeed averaged, then they need to calculate the ratios pairwise for each clone and twin.

      We added details of clone size measurements and analysis to the methods (lines 141-6).  Although it might be useful to compare individual clones and corresponding twin spots, the only rigorous way to associate individual clones with individual twin spots, or even to determine what is one clone and what is one twin spot, is to use recombination rates low enough that significantly less than one recombination occurs per disc.  This would require many more dissections and we did not do this.  We now clarify in the manuscript that the analysis is indeed based on the ratio of total area of clones and twin spots with replicates, and that Log-transformation is to improve the normality of the ratio data suitable for parametric significance testing, not because clones and twin spots were summed from each sample.  We consulted with a statistician over this approach.  

      Reviewer #1 (Recommendations For The Authors):

      Lines 319/320: "Frizzled-3 RFP expression was not changed in in emc clones (Figure 4A)". This was actually not shown in Fig 4A (in fact this result was not shown at all). Fig 4A shows the result for emc nkd3 which the authors incorrectly assigned to Figure 4B (line 324).

      We apologize for labeling Figure 4A and 4B incorrectly.

      The title of Figure 6 is inaccurate. The title does not indicate what is shown in this figure. A more accurate title would be: Notch activity and function in emc mutant clones.

      We provided a new title for Figure 6. 

      Reviewer #2 (Recommendations For The Authors):

      There is no information on how reproducible the data is. How many discs were examined in each experiment and in how many technical or biological replicates? Can fluorescence signals be quantified within and outside the clones and presented to illustrate reproducibility and significance? This is especially needed for Fig 7, which shows key data that N ligand Delta is elevated in emc clones but dronc and H99 mutations rescue this phenotype. I can see that the Dl signal is brighter in the GFP- emc clone in Fig 7B but I can also see a brighter Dl signal in the small clone and perhaps also in the large clone in C. The difference between B and C could be simply disc-to-disc variation, which should be addressed with quantification and presentation of all data points.

      We added the number of samples to each figure legend.  We quantified the fluorescence signals for Figures 3 and 7.  Quantification shows that the difference between 7B and 7C is highly significant, not disc to disc variation.

      Fig 2B does not support the conclusion. It is supposed to show premature Sens expression and therefore abnormal morphogenetic furrow progression in emc clones. But the yellow arrow is pointing to GFP+ (wild type) cells and it is within this GFP+ region that most premature Sens expression is seen.

      We relocated the arrows in Figure 2B to point precisely to the premature differentiation.  When the morphogenetic furrow is accelerated in emc mutant, GFP – tissue, it does not stop when wild type, GFP+ tissue is encountered again, it continues at a normal pace.  Accordingly, emc+ regions that are anterior to emc- regions can also experience accelerated differentiation (please see lines 594-8).

      Fig 1 shows that while H99 deficiency restores the growth of emc clones to wild type level (Fig 1N), placing these in the Minute background made emc clones grow better than emc wild type but Minute neighbors (Fig 1M). The latter cells were nearly absent, suggesting elimination through cell competition. For the rest of the figures, some experiments are done in the Minute background (e.g., emc H99 clones in Fig 2D) while others are not in the Minute background (e.g., emc H99 clones in Fig 7D). Why the switch between backgrounds from experiment to experiment?

      Figure 2D shows emc H99 clones in a Minute background so that it can be compared with panels 2A-C, which show clones of other genotypes in a Minute background.  These clones almost take over the eye disc.  In Figure 7D, it was important to show the Dl expression pattern in a substantial wild type region, which could only be shown using the non-Minute background.  We have no indication that a Minute background changes the properties of the nonMinute clone, other than allowing its greater growth.  

      The first 3 paragraphs of the Introduction are overly detailed and read more like a review article. These could be made more concise to focus on the founding data for this manuscript, which are the published findings that emc mutations elevate ex expression (line 129) and that ex mutants show elevated diap1 expression (line 125). These do not show up until the very end of the Introduction.

      We shortened the Introduction to focus more rapidly on the topics relevant to these experiments.

      In several places, the space between the end of the sentence and the citation is missing (e.g., lines 57, 68, and 75).

      The spacing of citations was fixed.

      Line 247. 'morphogenetic furrow that found each ommatidia...' should use a word besides 'found.'

      We corrected line 247.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors show that inhibiting caspases rescues the growth defect of emc clones. However, they did not find excessive TUNEL staining in emc clones that would explain why the clones would be so small - excessive cell death. How reliable was their tunel staining in being able to detect excessive apoptosis (only negative data was shown). Could they induce excessive cell death using radiation or some other means to ensure the assay is robust? If death is not occurring in emc clones, a deficiency worth addressing is that they do not discuss or explore how the caspases then inhibit clone growth. Is it expanded cell cycle times, or smaller cells?? And that phenotype does not fit with their end model of Delta being the only moderator of emc since it is not playing a significant role in tissue growth anterior to the furrow.One would assume using the commercial antibody against activated caspase would be another readout for emc clones and this would bolster their claim that excessive caspase activation occurs in the emc cells.

      We have added Dcp1 staining in Figure 2 supplement 3 to show that TUNEL staining is reliable.

      (2) Figure 3D has really large emc clones when GMR-Diap is present. But the large clones are anterior to the furrow where Diap would not be overexpressed. Is this just an unusual sample with a coincidentally big emc M+ clone? It speaks to my concerns about the qualitative nature of the data.

      We replaced Figure 3D with an example of smaller clones.  Nowhere have we suggested that  GMR-DIAP1 affects clone size.

      (3) Figure 9B is very speculative and not appropriate since the authors have zero data to support that cleavage mechanism. It is fit for the next paper if the idea is correct. The panel should be removed.

      We did not intend Figure 9B to imply that we think Dl itself is the relevant target of non-apoptotic caspases.  Since apparently we gave that impression, we removed this to a supplemental figure.  We still think it is worth showing that Dl does not contain predicted caspase sites expected to activate signaling. 

      (4) Figure 9A could be made more clear. Their pathway represents the mutant cells in the mosaic disc. Why not also outline what you think is happening in the emc+ cells as well?

      It is difficult to make a comparable diagram for normal cells, because none of this pathway happens in normal cells.  We modified the figure legend to indicate this (lines 677-8).

      (5) The one emc ci clone they show spanning the furrow has a very non-continuous furrow advance phenotype. This is unlike the emc clones where the furrow advance is graded about the clone. And it resembles the SuH clones they show. This result and the synergistic effect on clone sizes they mention need more discussion and thought put into it. It argues ci is doing something with respect to emc action. loss of ci might not rescue size and furrow advance but actually, it makes it worse! This is interesting and might suggest an inhibitory role for ci in emc or a parallel role for ci in mediating growth and progression that is redundant with emc.

      We agree that aspects of the emc ci phenotype are not clear.  We discuss this in the revised manuscript (lines 373-5).  

      (6) Related to point 7, it is a weak argument for non-autonomy that graded furrow advance in emc clones is evidence for emc acting nonautonomously through Delta. Its weakness is combined with its lack of significance relative to the other findings. It should be deleted as should the SuH data.

      We agree that the evidence that emc affects morphogenetic furrow progression non-autonomously is not compelling and have revised the manuscript to soften this conclusion (lines 426-7).  We do not want to remove this idea, because it does in fact have significance for other findings.  Specifically, it supports the idea that the emc effect in the morphogenetic furrow is due to trans-activation by Delta, whereas  the effect on R7 and cone cell differentiation is due to autonomous cis-inhibition.  We think this is important to keep in the paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) This experiment sought to determine what effect congenital/early-onset hearing loss (and associated delay in language onset) has on the degree of inter-individual variability in functional connectivity to the auditory cortex. Looking at differences in variability rather than group differences in mean connectivity itself represents an interesting addition to the existing literature. The sample of deaf individuals was large, and quite homogeneous in terms of age of hearing loss onset, which are considerable strengths of the work. The experiment appears well conducted and the results are certainly of interest. I do have some concerns with the way that the project has been conceptualized, which I share below.

      Thank you for acknowledging the strengths and novelty of our study. We have now addressed the conceptual issues raised; please see below in the specific comments.

      (2) The authors should provide careful working definitions of what exactly they think is occurring in the brain following sensory deprivation. Characterizing these changes as 'largescale neural reorganization' and 'compensatory adaptation' gives the impression that the authors believe that there is good evidence in support of significant structural changes in the pathways between brain areas - a viewpoint that is not broadly supported (see Makin and Krakauer, 2023). The authors report changes in connectivity that amount to differences in coordinated patterns of BOLD signal across voxels in the brain; accordingly, their data could just as easily (and more parsimoniously) be explained by the unmasking of connections to the auditory cortex that are present in typically hearing individuals, but which are more obvious via MR in the absence of auditory inputs.

      We thank the Reviewer for the suggestion to clarify and better support our stance regarding reorganization. We indeed believe that the adaptive changes in the auditory cortex in deafness represent real functional recruitment for non-auditory functions, even in the relatively limited large-scale anatomical connectivity changes. This is supported by animal works showing causal evidence for the involvement of deprived auditory cortices in non-auditory tasks, in a way that is not found in hearing controls (e.g., Lomber et al., 2010, Meredith et al., 2011, reviewed in Alencar et al., 2019; Lomber et al., 2020). Whether the word “reorganization” should be used is indeed debated recently (Makin and Krakauer, 2023). Beyond terminology, we do agree that the basis for the changes in recruitment seen in the brains of people with deafness or blindness is largely based on the typical anatomical connectivity at birth. We also agree that at the group level, there is poor evidence of large-scale anatomical connectivity differences in deprivation. However, we think there is more than ample evidence that the unmasking and more importantly re-weighting of non-dominant inputs gives rise to functional changes. This is supported by the relatively weaker reorganization found in late-onset deprivation as compared to early-onset deprivation. If unmasking of existing connectivity without any functional additional changes were sufficient to elicit the functional responses to atypical stimuli (e.g., non-visual in blindness and non-auditory in deafness), one would expect there to be no difference between early- and late-onset deprivation in response patterns. Therefore, we believe that the fact that these are based on functions with some innate pre-existing inputs and integration is the mechanism of reorganization, not a reason not to treat it as reorganization. Specifically, in the case of this manuscript, we report the change in variability of FC from the auditory cortex, which is greater in deafness than in typically hearing controls. This is not an increase in response per se, but rather more divergent values of FC from the auditory cortex, which are harder to explain in terms of ‘unmasking’ alone, unless one assumes unmasking is particularly variable. The mechanistic explanation for our findings is that in the absence of auditory input’s fine-tuning and pruning of the connectivity of the auditory cortex, more divergent connectivity strength remains among the deaf. Thus, auditory input not only masks non-dominant inputs but also prunes/deactivates exuberant connectivity, in a way that generates a more consistently connected auditory system. We have added a shortened version of these clarifications to the discussion (lines 351-372).

      (3) I found the argument that the deaf use a single modality to compensate for hearing loss, and that this might predict a more confined pattern of differential connectivity than had been previously observed in the blind to be poorly grounded. The authors themselves suggest throughout that hearing loss, per se, is likely to be driving the differences observed between deaf and typically-hearing individuals; accordingly, the suggestion that the modality in which intentional behavioral compensation takes place would have such a large-scale effect on observed patterns of connectivity seems out of line.

      Thank you for your critical insight regarding our rationale on modality use and its impact on connectivity patterns in the deaf compared to the blind. After some thought, we agree that the argument presented may not be sufficiently strong and could distract from the main findings of our study. Therefore, we have decided to remove this claim from our revised manuscript.

      (4) The analyses highlighting the areas observed to be differentially connected to the auditory cortex and areas observed to be more variable in their connectivity to the auditory cortex seem somewhat circular. If the authors propose hearing loss as a mechanism that drives this variability in connectivity, then it is reasonable to propose hypotheses about the directionality of these changes. One would anticipate this directionality to be common across participants and thus, these areas would emerge as the ones that are differently connected when compared to typically hearing folks.

      We are a little uncertain how to interpret this concern.  If the question was about the logic leading to our statement that variability is driven by hearing loss, then yes, we indeed were proposing hearing loss as a mechanism that drives this variability in connectivity to the auditory cortex; we regret this was unclear in the original manuscript. This logic parallels the proposal made with regard to the increased variability in FC in blindness; deprivation leads to more variable outcomes, due to the lack of developmental environmental constraints (Sen et al., 2022). Specifically, we first analyzed the differences in within-group variability between deaf and hearing individuals (Fig. 1A), followed by examining the variability ratio (Fig. 1B) in the same regions that demonstrated differences. The first analysis does not specify which group shows higher variability; therefore, the second analysis is essential to clarify the direction of the effect and identify which group, and in which regions, exhibits greater variability. We have clarified this in the revised manuscript (lines 125-127): “To determine which group has larger individual differences in these regions (Figure 1B), we computed the ratio of variability between the two groups (deaf/hearing) in the areas that showed a significant difference in variability (Figure 1A)”. Nevertheless, this comment can also be interpreted as predicting that any change in FC due to deafness would lead to greater variability. In this case, it is also important to mention that while we would expect regions with higher variability to also show group differences between the deaf and the hearing (Figure 2), our analysis demonstrates that variability is present even in regions without significant group mean differences. Similarly, many areas that show a difference between the groups in their FC do not show a change in variability (for example, the bilateral anterior insula and sensorimotor cortex). In fact, the correlation between the regions with higher FC variability (Figure 1A) and those showing FC group differences (Figure 2B) is significant but rather modest, as we now acknowledge in our revised manuscript (lines 324-328). Therefore, increased FC and increased variability of FC are not necessarily linked. 

      (5) While the authors describe collecting data on the etiology of hearing loss, hearing thresholds, device use, and rehabilitative strategies, these data do not appear in the manuscript, nor do they appear to have been included in models during data analysis. Since many of these factors might reasonably explain differences in connectivity to the auditory cortex, this seems like an omission.

      We thank the Reviewer for their comment regarding the inclusion of these variables in our manuscript. We have now included additional information in the main text and a supplementary table in the revised manuscript that elaborates further on the etiology of hearing loss and all individual information that characterizes our deaf sample. Although we initially intended to include individual factors (e.g., hearing threshold, duration of hearing aid use, and age of first use) in our models, this was not feasible for the following reasons: 1) for some subjects, we only have a level  of hearing loss rather than specific values, which we could not use quantitatively as a nuisance variable (it was typical in such testing to ascertain the threshold of loss as belonging to a deafness level, such as “profound” and not necessarily go into more elaborate testing to identify the specific threshold), and 2) this information was either not collected for the hearing participants (e.g., hearing threshold) or does not apply to them (e.g., age of hearing aid use), which made it impossible to use the complete model with all these variables. Modeling the groups separately with different variables would also be inappropriate. Last, the distribution of the values and the need for a large sample to rigorously assess a difference in variability also precluded sub-dividing the group to subgroup based on these values. 

      Therefore, we opted for a different way to control for the potential influence of these variables on FC variability in the deaf. We tested the correlation between the FC from the auditory cortex and each of these parameters in the areas that showed increased FC in deafness (Figures 1A, B), to see if it could account for the increased variability. This ROI analysis did not reveal any significant correlations (all p > .05, prior to correction for multiple comparisons; see Figures S4, S5, and S6 for scatter plots). The maximal variability explained in these ROIs by the hearing factors was r2\=0.096, whereas the FC variability (Figure 1B) was increased by at least 2 in the deaf. Therefore, it does not seem like these parameters underlie the increased variability in deafness. To test if these variables had a direct effect on FC variability in other areas in the brain, we also directly computed the correlation between FC and each factor individually. At the whole-brain level, the results indicate a significant correlation between AC-FC and hearing threshold, as well as a correlation between AC-FC and the age of hearing aid use onset, but not for the duration of hearing aid use (Figure S3). While these may be interesting on their own, and are added to the revised manuscript, the regions that show significant correlations with hearing threshold and age of hearing aid use are not the same regions that exhibit FC variability in the deaf (Figures 1A, B).

      Overall, these findings suggest that although some of these factors may influence FC, they do not appear to be the driving factors behind FC variability. Finally, in terms of rehabilitative strategies, only one deaf subject reported having received long-term oral training from teachers. This participant started this training at age 2, as now described in the participants’ section. We thank the reviewer for raising this concern and allowing us to show that our findings do not stem from simple differences ascribed to auditory experience in our participants. 

      Reviewer #2 (Public Review):

      (1) The paper has two main merits. Firstly, it documents a new and important characteristic of the re-organization of the brains of the deaf, namely its variability. The search for a welldefined set of functions for the deprived auditory cortex of the deaf has been largely unsuccessful, with several task-based approaches failing to deliver unanimous results. Now, one can understand why this was the case: most likely there isn't a fixed one well-defined set of functions supported by an identical set of areas in every subject, but rather a variety of functions supported by various regions. In addition, the paper extends the authors' previous findings from blind subjects to the deaf population. It demonstrates that the heightened variability of connectivity in the deprived brain is not exclusive to blindness, but rather a general principle that applies to other forms of deprivation. On a more general level, this paper shows how sensory input is a driver of the brain's reproducible organization.

      We thank the Reviewer for their observations regarding the merits of our study. We appreciate the recognition of the novelty in documenting the variability of brain reorganization in deaf individuals. 

      (2) The method and the statistics are sound, the figures are clear, and the paper is well-written. The sample size is impressively large for this kind of study.

      We thank the Reviewer for their positive feedback on the methodology, statistical analysis, clarity of figures, and the overall composition of our paper. We are also grateful for the acknowledgment of our large sample size, which we believe significantly strengthens the statistical power and the generalizability of our findings.

      (3) The main weakness of the paper is not a weakness, but rather a suggestion on how to provide a stronger basis for the authors' claims and conclusions. I believe this paper could be strengthened by including in the analysis at least one of the already published deaf/hearing resting-state fMRI datasets (e.g. Andin and Holmer, Bonna et al., Ding et al.) to see if the effects hold across different deaf populations. The addition of a second dataset could strengthen the evidence and convincingly resolve the issue of whether delayed sign language acquisition causes an increase in individual differences in functional connectivity to/from Broca's area. Currently, the authors may not have enough statistical power to support their findings.

      We thank the Reviewer for their constructive suggestion to reinforce the robustness of our findings. While we acknowledge the potential value of incorporating additional datasets to strengthen our conclusions, the datasets mentioned (Andin and Holmer, Bonna et al., Ding et al.) are not publicly available, which limits our ability to include them in our analysis. Additionally, datasets that contain comparable groups of delayed and native deaf signers are exceptionally rare, further complicating the possibility of their inclusion. Furthermore, to discern individual differences within these groups effectively, a substantially larger sample size is necessary. As such, we were unfortunately unable to perform this additional analysis. This is a challenge we acknowledge in the revised manuscript (lines 442-445), especially when the group is divided into subcategories based on the level of language acquisition, which indeed reduces our statistical power. We have however, now integrated the individual task accuracy and reaction time parameters as nuisance variables in calculating the variability analyses; all the results are fully replicated when accounting for task difficulty. We also report that there was no group difference in activation for this task between the groups which could affect our findings. 

      We would like to note that while we would like to replicate these findings in an additional cohort using resting-state, we do not anticipate the state in which the participants are scanned to greatly affect the findings. FC patterns of hearing individuals have been shown to be primarily shaped by common system and stable individual features, and not by time, state, or task (Finn et al., 2015; Gratton et al., 2018; Tavor et al., 2016). While the task may impact FC variability, we have recently shown that individual FC patterns are stable across time and state even in the context of plasticity due to visual deprivation (Amaral et al., 2024). Therefore, we expect that in deafness as well there should not be meaningful differences between resting-state and task FC networks, in terms of FC individual differences. That said, we are exploring collaborations and other avenues to access comparable datasets that might enable a more powerful analysis in future work. This feedback is very important for guiding our ongoing efforts to verify and extend our conclusions.

      (4) Secondly, the authors could more explicitly discuss the broad implications of what their results mean for our understanding of how the architecture of the brain is determined by the genetic blueprint vs. how it is determined by learning (page 9). There is currently a wave of strong evidence favoring a more "nativist" view of brain architecture, for example, face- and object-sensitive regions seem to be in place practically from birth (see e.g. Kosakowski et al., Current Biology, 2022). The current results show what is the role played by experience.

      We thank the Reviewer for highlighting the need to elaborate on the broader implications of our findings in relation to the ongoing debate of nature vs. nurture. We agree that this discussion is crucial and have expanded our manuscript to address this point more explicitly. We now incorporate a more detailed discussion of how our results contribute to understanding the significant role of experience in shaping individual neural connectivity patterns, particularly in sensory-deprived populations (lines 360-372).

      Reviewer #3 (Public Review):

      Summary:

      (1) This study focuses on changes in brain organization associated with congenital deafness. The authors investigate differences in functional connectivity (FC) and differences in the variability of FC. By comparing congenitally deaf individuals to individuals with normal hearing, and by further separating congenitally deaf individuals into groups of early and late signers, the authors can distinguish between changes in FC due to auditory deprivation and changes in FC due to late language acquisition. They find larger FC variability in deaf than normal-hearing individuals in temporal, frontal, parietal, and midline brain structures, and that FC variability is largely driven by auditory deprivation. They suggest that the regions that show a greater FC difference between groups also show greater FC variability.

      Strengths:

      -  The manuscript is well written.

      -  The methods are clearly described and appropriate.

      -  Including the three different groups enables the critical contrasts distinguishing between different causes of FC variability changes.

      -  The results are interesting and novel.

      We thank the Reviewer for their positive and detailed feedback. Their acknowledgment of the clarity of our methods and the novelty of our results is greatly appreciated.

      Weaknesses:

      (2) Analyses were conducted for task-based data rather than resting-state data. It was unclear whether groups differed in task performance. If congenitally deaf individuals found the task more difficult this could lead to changes in FC.

      We thank the Reviewer for their observation regarding possible task performance differences between deaf and hearing participants and their potential effect on the results. Indeed, there was a difference in task accuracy between these groups. To account for this variation and ensure that our findings on functional connectivity were not confounded by task performance, we now included individual task accuracy and reaction time as nuisance variables in our analyses. This approach allowed us to control for any performance differences. The results now presented in the revised manuscript account for the inclusion of these two nuisance variables (accuracy and reaction time) and completely align with our original conclusions, highlighting increased variability in deafness, which is found in both the entire deaf group at large, as well as when equating language experience and comparing the hearing and native signers. The correlation between variability and group differences also remains significant, but its significance is slightly decreased, a moderate effect we acknowledge in the revised manuscript (see comment #4). The differences between the delayed signers and native signers are also retained (Figure 3), now aligning better with language-sensitive regions, as previously predicted. The inclusion of the task difficulty predictors also introduced an additional finding in this analysis, a significant cluster in the right aIFG. Therefore, the inclusion of these predictors reaffirms the robustness of the conclusions drawn about FC variability in the deaf population.

      We would like to note that while we would like to replicate these findings in an additional cohort using resting-state if we had access to such data, we do not anticipate the state in which the participants are scanned to greatly affect the findings. FC patterns of hearing individuals have been shown to be primarily shaped by common system and stable individual features, and not by time, state, or task (Finn et al., 2015; Gratton et al., 2018; Tavor et al., 2016). While the task may impact FC variability, we have recently shown that individual FC patterns are stable across time and state even in the context of plasticity due to visual deprivation (Amaral et al., 2024). Therefore, we expect that in deafness as well there should not be meaningful differences between resting-state and task FC networks, in terms of FC individual differences. We have also addressed this point in our manuscript (lines 442-451).

      (3) No differences in overall activation between groups were reported. Activation differences between groups could lead to differences in FC. For example, lower activation may be associated with more noise in the data, which could translate to reduced FC.

      We thank the reviewer for noting the potential implications of overall activation differences on FC. In our analysis of the activation for words, we found no significant clusters showing a group difference between the deaf and hearing participants (p < .05, cluster-corrected for multiple comparisons) - we also added this information to the revised manuscript (lines 542-544). This suggests that the differences in FC observed are not confounded by variations in overall brain activation between the groups under these conditions.

      (4) Figure 2B shows higher FC for congenitally deaf individuals than normal-hearing individuals in the insula, supplementary motor area, and cingulate. These regions are all associated with task effort. If congenitally deaf individuals found the task harder (lower performance), then activation in these regions could be higher, in turn, leading to FC. A study using resting-state data could possibly have provided a clearer picture.

      We thank the Reviewer for pointing out the potential impact of task difficulty on FC differences observed in our study. As addressed in our response to comment #2, task accuracy and reaction times were incorporated as nuisance variables in our analysis. Further, these areas showed no difference in activation between the groups (see response to comment #3 above). Notably, the referred regions still showed higher FC in congenitally deaf individuals even when controlling for these performance differences. Additionally, these findings are consistent with results from studies using resting-state data in deaf populations, further validating our observations. Specifically, using resting-state data, Andin & Holmer (2022), have shown higher FC for deaf (compared to hearing individuals) from auditory regions to the cingulate cortex, insular cortex, cuneus and precuneus, supramarginal gyrus, supplementary motor area, and cerebellum. Moreover, Ding et al. (2016) have shown higher FC for the deaf between the STG and anterior insula and dorsal anterior cingulated cortex. This suggests that the observed FC differences are likely reflective of genuine neuroplastic adaptations rather than mere artifacts of task difficulty. Although we wish we could augment our study with resting-state data analyzed similarly, we could not at present acquire or access such a dataset. We acknowledge this limitation of our study (lines 442-451) in the revised manuscript and intend to confirm that similar results will be found with resting state data in the future.

      (5) The correlation between the FC map and the FC variability map is 0.3. While significant using permutation testing, the correlation is low, and it is not clear how great the overlap is.

      We acknowledge that the correlation coefficient of 0.3, while statistically significant, indicates a moderate overlap. It's also worth noting that, using our new models that include task performance as a nuisance variable, this value has decreased somewhat, to 0.24 (which is still highly significant). It is important to note that the visual overlap between the maps is not a good estimate of the correlation, which was performed on the unthresholded maps, to estimate the link not only between the most significant peaks of the effects, but across the whole brain patterns. This correlation is meant to suggest a trend rather than a strong link, but especially due to its consistency with the findings in blindness, we believe this observation merits further investigation and discussion. As such, we kept it in the revised manuscript while moderating our claims about its strength.

      Reviewer #1 (Recommendations For The Authors):

      (1) Page 4: Does auditory cortex FC variability..." FC is not yet defined.

      Corrected, thanks.

      (2) Page 4: "It showed lower variability..." What showed this?

      Clarified, thanks.

      (3) Page 11: "highlining the importance" should read "highlighting the importance".

      Corrected, thanks.

      (4) Page 11: Do you really mean to suggest functional connectivity does not vary as a function of task? This would not seem well supported.

      We do not suggest that FC doesn’t vary as a function of task, and have revised this section (lines 447-451). 

      (5) Page 12: "there should not to be" should read "there should not be".

      Corrected, thanks.

      (6) Page 12: "and their majority" should read "and the majority".

      Corrected, thanks.

      Reviewer #2 (Recommendations For The Authors):

      Major

      (1) Although this is a lot of work, I nonetheless have another suggestion on how to test if your results are strong and robust. Perhaps you could analyze your data using an ROI/graph-theory approach. I am not an expert in graph theory analysis, but for sure there is a simple and elegant statistic that captures the variability of edge strength variability within a population. This approach could not only validate your results with an independent analysis and give the audience more confidence in their robustness, but it could also provide an estimate of the size of the effect size you found. That is, it could express in hard numbers how much more variable the connections from auditory cortex ROI's are, in comparison to the rest of the brain in the deaf population, relative to the hearing population.

      We thank the Reviewer for suggesting the use of graph theory as a method to further validate our findings. While we see the potential value in this approach, we believe it may be beyond the scope of the current paper, and merits a full exploration of its own, which we hope to do in the future.  However, we understand the importance of showing the uniqueness of the connectivity of the auditory cortex ROI as compared to the rest of the brain. So, in order to bolster our results, we conducted an additional analysis using control regions of interest (ROIs). Specifically, we calculated the inter-individual variability using all ROIs from the CONN Atlas (except auditory and language regions) as the control seed regions for the FC. We showed that the variability of connectivity from the auditory cortex is uniquely more increased on deafness, as compared to these control ROIs (Figure S1). This additional analysis supports the specificity of our findings to the auditory cortex in the deaf population. We aim to integrate more analytic approaches, including graph theory methods, in our future work.

      Minor

      (1) Some citations display the initial of the author in addition to the last name, unless there is something I don't know about the citation system, the initial shouldn't be there.

      This is due to the citation style we're using (APA 7th edition, as suggested by eLife), which requires including the first author's initials in all in-text citations when citing multiple authors with the same last name.  

      Reviewer #3 (Recommendations For The Authors):

      (1) I recommend that the authors provide behavioral data and results for overall neural activation.

      Thanks. We have added these to the revised manuscript. Specifically, we report that there was no difference in the activation for words (p < .05, cluster-corrected for multiple comparisons) between the deaf and hearing participants. Further, we report the behavioral averages for accuracy and reaction time for each group, and have now used these individual values explicitly as nuisance variables in the revised analyses.

      (2) For the correlation between FC and FC variability, it seemed a bit odd that the permuted data were treated additionally (through Gaussian smoothing). I understand the general logic (i.e., to reintroduce smoothness), but this approach provides more smoothing to the permutation than the original data. It is hard to know what this does to the statistical distribution. I recommend using a different approach or at least also reporting the p-value for non-smoothed permutation data.

      In response to this suggestion and to ensure transparency in our results, we have now included also the p-value for the non-smoothed permutation data in our revised manuscript (still highly significant; p < .0001). Thanks for this proposal.

      (3) For the map comparison, a plot with different colors, showing the FC map, the FC variability map, and one map for the overlap on the same brain may be helpful.

      We thank the Reviewer for their suggestion to visualize the overlap between the maps. However, we performed the correlation analysis using the unthresholded maps, as mentioned in the methods section of our manuscript, specifically to estimate the link not only between the most significant peaks of the effects, but across the whole brain patterns. This is why the maps displayed in the figures, which are thresholded for significance, may not appear to match perfectly, and may actually obscure the correlation across the brain. This methodological detail is crucial for interpreting the relationship and overlap between these maps accurately but also explains why the visualization of the overlap is, unfortunately, not very informative.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary

      The authors asked if parabrachial CGRP neurons were only necessary for a threat alarm to promote freezing or were necessary for a threat alarm to promote a wider range of defensive behaviors, most prominently flight.

      Major Strengths of Methods and Results

      The authors performed careful single-unit recording and applied rigorous methodologies to optogenetically tag CGRP neurons within the PBN. Careful analyses show that single-units and the wider CGRP neuron population increases firing to a range of unconditioned stimuli. The optogenetic stimulation of experiment 2 was comparatively simpler but achieved its aim of determining the consequence of activating CGRP neurons in the absence of other stimuli. Experiment 3 used a very clever behavioral approach to reveal a setting in which both cue-evoked freezing and flight could be observed. This was done by having the unconditioned stimulus be a "robot" traveling along a circular path at a given speed. Subsequent cue presentation elicited mild flight in controls and optogenetic activation of CGRP neurons significantly boosted this flight response. This demonstrated for the first time that CGRP neuron activation does more than promote freezing. The authors conclude by demonstrating that bidirectional modulation of CGRP neuron activity bidirectionally aTects freezing in a traditional fear conditioning setting and aTects both freezing and flight in a setting in which the robot served as the unconditioned stimulus. Altogether, this is a very strong set of experiments that greatly expand the role of parabrachial CGRP neurons in threat alarm.

      We would like to sincerely thank the reviewer for the positive and insightful comments on our work. We greatly appreciate the acknowledgment of our new behavioral approach, which allowed us to observe a dynamic spectrum of defensive behaviors in animals. Our use of the robot-based paradigm, which enables the observation of both freezing and flight, has been instrumental in expanding our understanding of how parabrachial CGRP neurons modulate diverse threat responses. We are pleased that the reviewer found this methodological innovation to be a valuable contribution to the field.

      Weaknesses

      In all of their conditioning studies the authors did not include a control cue. For example, a sound presented the same number of times but unrelated to US (shock or robot) presentation. This does not detract from their behavioral findings. However, it means the authors do not know if the observed behavior is a consequence of pairing. Or is a behavior that would be observed to any cue played in the setting? This is particularly important for the experiments using the robot US.

      We appreciate the reviewer’s insightful comment regarding the absence of a control cue in our conditioning studies. First, we would like to mention that, in response to the Reviewer 3, we have updated how we present our flight data by following methods from previously published papers (Fadok et al., 2017; Borkar et al., 2024). Instead of counting flight responses, we calculated flight scores as the ratio of the velocity during the CS to the average velocity in the 7 s before the CS on the conditioning day (or 10 s for the retention test). This method better captures both the speed and duration of fleeing during CS. With this updated approach, we observed a significant difference in flight scores between the ChR2 and control groups, even during conditioning, which may partly address the reviewer’s concern about whether the observed behavior is a consequence of CS-US pairing.

      However, we agree with the reviewer that including an unpaired group would provide stronger evidence, and in response, we conducted an additional experiment with an unpaired group. In this unpaired group, the CS was presented the same number of times, but the robot US was delivered randomly within the inter-trial interval. The unpaired group did not exhibit any notable conditioned freezing or flight responses. We believe that this additional experiment, now reflected in Figure 3, further strengthens our conclusion that the fleeing behavior is driven by associative learning between the CS and US, rather than a reaction to the cue itself.

      The authors make claims about the contribution of CGRP neurons to freezing and fleeing behavior, however, all of the optogenetic manipulations are centered on the US presentation period. Presently, the experiments show a role for these neurons in processing aversive outcomes but show little role for these neurons in cue responding or behavior organizing. Claims of contributions to behavior should be substantiated by manipulations targeting the cue period.

      We appreciate the reviewer’s constructive comments. We would like to emphasize that our primary objective in this study was to investigate whether activating parabrachial CGRP neurons—thereby increasing the general alarm signal—would elicit different defensive behaviors beyond passive freezing. To this end, we focused on manipulating CGRP neurons during the US period rather than the cue period.

      Previous studies have shown that CGRP neurons relay US signals, and direct activation of CGRP neurons has been used as the US to successfully induce conditioned freezing responses to the CS during retention tests (Han et al., 2015; Bowen et al., 2020). In our experiments, we also observed that CGRP neurons responded exclusively to the US during conditioning with the robot (Figure 1F), and stimulating these neurons in the absence of any external stimuli elicited strong freezing responses (Figure 2B). These findings, collectively, suggest that activation of CGRP neurons during the CS period would predominantly result in freezing behavior.

      Therefore, we manipulated the activity of CGRP neurons during the US period to examine whether adjusting the perceived threat level through these neurons would result in diverse dfensive behaivors when paired with chasing robot. We observed that enhancing CGRP neuron activity while animals were chased by the robot at 70 cm/s made them react as if chased at a higher speed (90 cm/s), leading to increased fleeing behaviors. While this may not fully address the role of these neurons in cue responding or behavior organizing, we found that silencing CGRP neurons with tetanus toxin (TetTox) abolished fleeing behavior even when animals were chased at high speeds (90 cm/s), which usually elicits fleeing without CGRP manipulation (Figure 5). This supports the conclusion that CGRP neurons are necessary for processing fleeing responses.

      In summary, manipulating CGRP neurons during the US period was essential for effectively investigating their role in adjusting defensive responses, thereby expanding our understanding of their function within the general alarm system. We hope this clarifies our experimental design and addresses the concern the reviewer has raised.

      Appraisal

      The authors achieved their aims and have revealed a much greater role for parabrachial CGRP neurons in threat alarm.

      Discussion

      Understanding neural circuits for threat requires us (as a field) to examine diverse threat settings and behavioral outcomes. A commendable and rigorous aspect of this manuscript was the authors decision to use a new behavioral paradigm and measure multiple behavioral outcomes. Indeed, this manuscript would not have been nearly as impactful had they not done that. This novel behavior was combined with excellent recording and optogenetic manipulations - a standard the field should aspire to. Studies like this are the only way that we as a field will map complete neural circuits for threat.

      We sincerely thank the reviewer for their positive and encouraging comments. We are grateful for the acknowledgment of our efforts in employing a novel behavioral paradigm to study diverse defensive behaviors. We are pleased that our work contributes to advancing the understanding of neural circuits involved in threat responses.

      Reviewer #3 (Public Review):

      Strengths:

      The study used optogenetics together with in vivo electrophysiology to monitor CGRP neuron activity in response to various aversive stimuli including robot chasing to determine whether they encode noxious stimuli diTerentially. The study used an interesting conditioning paradigm to investigate the role of CGRP neurons in the PBN in both freezing and flight behaviors.

      Weakness:

      The major weakness of this study is that the chasing robot threat conditioning model elicits weak unconditioned and conditioned flight responses, making it diTicult to interpret the robustness of the findings. Furthermore, the conclusion that the CGRP neurons are capable of inducing flight is not substantiated by the data. No manipulations are made to influence the flight behavior of the mouse. Instead, the manipulations are designed to alter the intensity of the unconditioned stimulus.

      We sincerely thank the reviewer for the thoughtful and constructive comments on our manuscript. In response to this feedback, we revisited our analysis of the flight responses and compared our methods with those used in previous literatures examining similar behaviors.

      We reviewed a study investigating sex differences in defensive behavior using rats (Gruene et al., 2015). In that study, the CS was presented for 30 s, and active defensive behvaior – referred to as ‘darting’ – was quantified as ‘Dart rate (dart/min)’. This was calculated by doubling the number of darts counted during the 30-s CS presentation to extrapolate to a per-min rate. The highest average dart rate observed was approximatley 1.5. Another relevant studies using mice quantified active defensive behavior by calculating a flight score—the ratio of the average speed during each CS to the average speed during the 10 s pre-CS period (Fadok et al., 2017; Borkar et al., 2024). This method captures multiple aspects of flight behavior during CS presentation, including overall velocity, number of bouts, and duration of fleeing. Moreover, it accounts for each animal’s individual velocity prior to the CS, reflecting how fast the animals were fleeing relative to their baseline activity.

      In our original analysis, we quantified flight responses by counting rapid fleeing movements, defined as movements exceeding 8 cm/s. This approach was consistent with our previous study using the same robot paradigm to observe unique patterns of defensive behavior related to sex differences (Pyeon et al., 2023). Based on our earlier findings, where this approach effectively identified significant differences in defensive behaviors, we believed that this method was appropriate for capturing conditioned flight behavior within our specific experimental context. However, prompted by the reviewer's insightful comments, we recognized that our initial method might not fully capture the robustness of the flight responses. Therefore, we re-analyzed our data using the flight score method described by Fadok and colleagues, which provides a more sensitive measure of fleeing during the CS.

      Re-analyzing our data revealed a more robust flight response than previously reported, demonstrating that additional CGRP neuron stimulation promoted flight behavior in animals during conditioning, addressing the concern that the data did not substantiate the role of CGRP neurons in inducing flight. In addition, we would like to emphasize the findings from our final experiment, where silencing CGRP neurons, even under high-threat conditions (90 cm/s), prevented animals from exhibiting flight responses. This demonstrates that CGRP neurons are necessary in influencing flight responses.

      We have updated all flight data in the manuscript and revised the relevant figures and text accordingly. We appreciate the opportunity to enhance our analysis. The reviewer's insightful observation led us to adopt a better method for quantifying flight behavior, which substantiates our conclusion about the role of CGRP neurons in modulating defensive responses.

      Borkar, C.D., Stelly, C.E., Fu, X., Dorofeikova, M., Le, Q.-S.E., Vutukuri, R., et al. (2024). Top- down control of flight by a non-canonical cortico-amygdala pathway. Nature 625(7996), 743-749.

      Bowen, A.J., Chen, J.Y., Huang, Y.W., Baertsch, N.A., Park, S., and Palmiter, R.D. (2020). Dissociable control of unconditioned responses and associative fear learning by parabrachial CGRP neurons. Elife 9, e59799.

      Fadok, J.P., Krabbe, S., Markovic, M., Courtin, J., Xu, C., Massi, L., et al. (2017). A competitive inhibitory circuit for selection of active and passive fear responses. Nature 542(7639), 96-100.

      Gruene, T.M., Flick, K., Stefano, A., Shea, S.D., and Shansky, R.M. (2015). Sexually divergent expression of active and passive conditioned fear responses in rats. Elife 4, e11352.

      Han, S., Soleiman, M.T., Soden, M.E., Zweifel, L.S., and Palmiter, R.D. (2015). Elucidating an a_ective pain circuit that creates a threat memory. Cell 162(2), 363-374.

      Pyeon, G.H., Lee, J., Jo, Y.S., and Choi, J.-S. (2023). Conditioned flight response in female rats to naturalistic threat is estrous-cycle dependent. Scientific Reports 13(1), 20988.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript from So et al. describes what is suggested to be an improved protocol for single-nuclei RNA sequencing (snRNA-seq) of adipose tissue. The authors provide evidence that modifications to the existing protocols result in better RNA quality and nuclei integrity than previously observed, with ultimately greater coverage of the transcriptome upon sequencing. Using the modified protocol, the authors compare the cellular landscape of murine inguinal and perigonadal white adipose tissue (WAT) depots harvested from animals fed a standard chow diet (lean mice) or those fed a high-fat diet (mice with obesity). 

      Strengths: 

      Overall, the manuscript is well-written, and the data are clearly presented. The strengths of the manuscript rest in the description of an improved protocol for snRNA-seq analysis. This should be valuable for the growing number of investigators in the field of adipose tissue biology that are utilizing snRNA-seq technology, as well as those other fields attempting similar experiments with tissues possessing high levels of RNAse activity. 

      Moreover, the study makes some notable observations that provide the foundation for future investigation. One observation is the correlation between nuclei size and cell size, allowing for the transcriptomes of relatively hypertrophic adipocytes in perigonadal WAT to be examined. Another notable observation is the identification of an adipocyte subcluster (Ad6) that appears "stressed" or dysfunctional and likely localizes to crown-like inflammatory structures where proinflammatory immune cells reside. 

      Weaknesses:  

      Analogous studies have been reported in the literature, including a notable study from Savari et al. (Cell Metabolism). This somewhat diminishes the novelty of some of the biological findings presented here. Moreover, a direct comparison of the transcriptomic data derived from the new vs. existing protocols (i.e. fully executed side by side) was not presented. As such, the true benefit of the protocol modifications cannot be fully understood. 

      We agree with the reviewer’s comment on the limitations of our study. Following the reviewer's suggestion, we performed a new analysis by integrating our data with those from the study by Emont et al. Please refer to the Recommendation for authors section below for further details.

      Reviewer #2 (Public Review):

      Summary: 

      In the present manuscript So et al utilize single-nucleus RNA sequencing to characterize cell populations in lean and obese adipose tissues. 

      Strengths: 

      The authors utilize a modified nuclear isolation protocol incorporating VRC that results in higherquality sequencing reads compared with previous studies. 

      Weaknesses:  

      The use of VRC to enhance snRNA-seq has been previously published in other tissues. The snRNA-seq snRNA-seq data sets presented in this manuscript, when compared with numerous previously published single-cell analyses of adipose tissue, do not represent a significant scientific advance. 

      Figure 1-3: The snRNA-seq data obtained by the authors using their enhanced protocol does not represent a significant improvement in cell profiling for the majority of the highlighted cell types including APCs, macrophages, and lymphocytes. These cell populations have been extensively characterized by cytoplasmic scRNA-seq which can achieve sufficient sequencing depth, and thus this study does not contribute meaningful additional insight into these cell types. The authors note an increase in the number of rare endothelial cell types recovered, however this is not translated into any kind of functional analysis of these populations. 

      We acknowledge the reviewer's comments on the limitations of our study, particularly the lack of extension of our snRNA-seq data into functional studies of new biological processes. However, this manuscript has been submitted as a Tools and Resources article. As an article of this type, we provide detailed information on our snRNA-seq methods and present a valuable resource of high-quality mouse adipose tissue snRNA-seq data. In addition, we demonstrate that our improved method offers novel biological insights, including the identification of subpopulations of adipocytes categorized by size and functionality. We believe this study offers powerful tools and significant value to the research community.

      Figure 4: The authors did not provide any evidence that the relative fluorescent brightness of GFP and mCherry is a direct measure of the nuclear size, and the nuclear size is only a moderate correlation with the cell size. Thus sorting the nuclei based on GFP/mCherry brightness is not a great proxy for adipocyte diameter. Furthermore, no meaningful insights are provided about the functional significance of the reported transcriptional differences between small and large adipocyte nuclei. 

      To address the reviewer's point, we analyzed the Pearson correlation coefficient for nucleus size vs. adipocyte size and found R = 0.85, indicating a strong positive correlation. In addition, we performed a new experiment to determine the correlation between nuclear GFP intensity and adipocyte nucleus size, finding a strong correlation with R = 0.91. These results suggest that nuclear GFP intensity can be a strong proxy for adipocyte size. Furthermore, we performed gene ontology analysis on genes differentially regulated between large and small adipocyte nuclei. We found that large adipocytes promote processes involved in insulin response, vascularization and DNA repair, while inhibiting processes related to cell migration, metabolism and the cytoskeleton. We have added these new data as Figure 4E, S6E, S6G, and S6H (page 11)

      Figure 5-6: The Ad6 population is highly transcriptionally analogous to the mAd3 population from Emont et al, and is thus not a novel finding. Furthermore, in the present data set, the authors conclude that Ad6 are likely stressed/dying hypertrophic adipocytes with a global loss of gene expression, which is a well-documented finding in eWAT > iWAT, for which the snRNA-seq reported in the present manuscript does not provide any novel scientific insight. 

      As the reviewer pointed out, a new analysis integrating our data with the previous study found that Ad3 from our study is comparable to mAd3 from Emont et al. in gene expression profiles. However, significant discrepancies in population size and changes in response to obesity were observed, likely due to differences in technical robustness. The dysfunctional cellular state of this population, with compromised RNA content, may have hindered accurate capture in the previous study, while our protocol enabled precise detection. This underscores the importance of our improved snRNA-seq protocol for accurately understanding adipocyte population dynamics. We have revised the manuscript to include new data in Figure S7 (page 14).

      Reviewer #3 (Public Review): 

      Summary:  

      The authors aimed to improve single-nucleus RNA sequencing (snRNA-seq) to address current limitations and challenges with nuclei and RNA isolation quality. They successfully developed a protocol that enhances RNA preservation and yields high-quality snRNA-seq data from multiple tissues, including a challenging model of adipose tissue. They then applied this method to eWAT and iWAT from mice fed either a normal or high-fat diet, exploring depot-specific cellular dynamics and gene expression changes during obesity. Their analysis included subclustering of SVF cells and revealed that obesity promotes a transition in APCs from an early to a committed state and induces a pro-inflammatory phenotype in immune cells, particularly in eWAT. In addition to SVF cells, they discovered six adipocyte subpopulations characterized by a gradient of unique gene expression signatures. Interestingly, a novel subpopulation, termed Ad6, comprised stressed and dying adipocytes with reduced transcriptional activity, primarily found in eWAT of mice on a high-fat diet. Overall, the methodology is sound, the writing is clear, and the conclusions drawn are supported by the data presented. Further research based on these findings could pave the way for potential novel interventions in obesity and metabolic disorders, or for similar studies in other tissues or conditions. 

      Strengths:  

      • The authors developed a robust snRNA-seq technique that preserves the integrity of the nucleus and RNA across various tissue types, overcoming the challenges of existing methods. 

      • They identified adipocyte subpopulations that follow adaptive or pathological trajectories during obesity. 

      • The study reveals depot-specific differences in adipose tissues, which could have implications for targeted therapies. 

      Weaknesses: 

      • The adipose tissues were collected after 10 weeks of high-fat diet treatment, lacking the intermediate time points for identifying early markers or cell populations during the transition from healthy to pathological adipose tissue. 

      We agree with the reviewers regarding the limitations of our study. To address the reviewer’s comment, we revised the manuscript to include this in the Discussion section (page 17).  

      • The expansion of the Ad6 subpopulation in obese iWAT and gWAT is interesting. The author claims that Ad6 exhibited a substantial increase in eWAT and a moderate rise in iWAT (Figure 4C). However, this adipocyte subpopulation remains the most altered in iWAT upon obesity. Could the authors elaborate on why there is a scarcity of adipocytes with ROS reporter and B2M in obese iWAT?

      We observed an increase in the levels of H2DCFA reporter and B2M protein fluorescence in adipocytes from iWAT of HFD-fed mice, although this increase was much less compared to eWAT, as shown in Figure 6B (left panel). These increases in iWAT were not sufficient for most cells to exceed the cutoff values used to determine H2DCFA and B2M positivity in adipocytes during quantitative analysis. We have revised the manuscript to clarify these results (page 13).

      • While the study provides extensive data on mouse models, the potential translation of these findings to human obesity remains uncertain. 

      To address the reviewer’s point, we expanded our discussion on the differences in adipocyte heterogeneity between mice and humans. We attempted to identify human adipocyte subclusters that resemble the metabolically unhealthy Ad6 adipocytes found in mice in our study; however, we did not find any similar adipocyte types. It has been reported that human adipocyte heterogeneity does not correspond well to that of mouse adipocytes (Emont et al. 2022). In addition, the heterogeneity of human adipocyte populations is not reproducible between different studies (Massier et al. 2023). Interestingly, this inconsistency is unique to adipocytes, as other cell types in adipose tissues display reproducible sub cell types across species and studies (Massier et al. 2023). Our findings indicate that adipocytes may exhibit a unique pathological cellular state with significantly reduced RNA content, which may contribute to the poor consistency in adipocyte heterogeneity in prior studies with suboptimal RNA quality. Therefore, using a robust method to effectively preserve RNA quality may be critical for accurately characterizing adipocyte populations, especially in disease states. It may be important to test in future studies whether our snRNA-seq protocol can identify consistent heterogeneity in adipocyte populations across different species, studies, and individual human subjects. We have revised the manuscript to include this new discussion (page 17).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Suggested points to address: 

      (1) The authors suggest that their improved protocol for maintaining RNA/nucleus integrity results in a more comprehensive analysis of adipose tissue heterogeneity. The authors compare the quality of their snRNA-seq data to those generated in prior studies (e.g., Savari et al.). What is not clear is whether additional heterogeneity/clusters can be observed due directly to the protocol modifications. A direct head-to-head comparison of the protocols executed in parallel would of course be ideal; however, integrating their new dataset with the corresponding data from Savari et al. could help address this question and help readers understand the benefits of this new protocol vs. existing protocols. 

      The data from Savari et al. are of significantly lower quality, likely because they were generated using earlier versions of the 10X Genomics system, and this study lacks iWAT data. To address the reviewer’s point, we instead integrated our data with those from the other study by Emont et al. (2022), which used comparable tissue types and experimental systems. The integrated analysis confirmed the improved representation of all cell types present in adipose tissues in our study, with higher quality metrics such as increased Unique Molecular Identifiers (UMIs) and the number of genes per nucleus. These results indicate that our protocol offers significant advantages in generating a more accurate representation of each cell type and their gene expression profiles. New data are included in Figure S2 (page 7).

      (2) The exact frequency of the Ad6 population in eWAT of mice maintained on HFD is a little unclear. From the snRNA-seq data, it appears that roughly 47% of the adipocytes are in this "stressed state." In Figure 6, it appears that greater than 75% of the adipocytes express B2M (Ad6 marker) and greater than 75% of adipocytes are suggested to be devoid of measurable PPARg expression. The latter seems quite high as PPARg expression is essential to maintain the adipocyte phenotype. Is there evidence of de-differentiation amongst them (i.e. acquisition of progenitor cell markers)? Presenting separate UMAPs for the chow vs. HFD state may help visualize the frequency of each adipocyte population in the two states. Inclusion of the stromal/progenitor cells in the visualization may help understand if cells are de-differentiating in obesity as previously postulated by the authors. Related to Point # 1 above, is this population observed in prior studies and at a similar frequency?

      To address the reviewer’s point, we analyzed the expression of adipocyte progenitor cell (APC) markers, such as Pdgfra, in the Ad6 population. We did not detect significant expression of APC markers, suggesting that Ad6 does not represent dedifferentiating adipocytes. Instead, they are likely stressed and dying cells characterized by an aberrant state of transcription with a global decline.

      When integrating our data with the datasets by Emont et al., we observed an adipocyte population in the previous study, mAd3, comparable to Ad6 in our study, with similar marker gene expression and lower transcript abundance. However, the population size of mAd3 was much smaller than that of Ad6 in our data and did not show consistent population changes during obesity. This discrepancy may be due to different technical robustness; the dysfunctional cellular state of this population, with its severely compromised RNA contents, may have made it difficult to accurately capture using standard protocols in the previous study, while our protocol enabled robust and precise detection. We added new data in Figure S6I and S7 (page 14) and revised the Discussion (page 17).

      Additional points  

      (1) The authors should be cautious in describing subpopulations as "increasing" or "decreasing" in obesity as the data are presented as proportions of a parent population. A given cell population may be "relatively increased." 

      To address the reviewer's point, we revised the manuscript to clarify the "relative" changes in cell populations during obesity in the relevant sections (pages 8, 9, 10, 11, and 15).

      (2) The authors should also be cautious in ascribing "function" to adipocyte populations based solely on their expression signatures. Statements such as those in the abstract, "...providing novel insights into the mechanisms orchestrating adipose tissue remodeling during obesity..." should probably be toned down as no such mechanism is truly demonstrated. 

      To address the reviewer's point, we revised the manuscript by removing or replacing the indicated terms or phrases with more suitable wording in the appropriate sections (page 2, 10, 12, 14)

      Reviewer #3 (Recommendations For The Authors): 

      (1) The authors might consider expanding a discussion on the potential implications of their findings, especially the newly identified adipocyte subpopulations and depot-specific differences for human studies. 

      To address the reviewer’s point, we attempted to identify human adipocyte subclusters that resembled our dysfunctional Ad6 adipocytes in mice; however, we did not find any similar adipocyte types. It has been reported that human adipocyte heterogeneity does not correspond well to that of mouse adipocytes (Emont et al. 2022). In addition, the heterogeneity of human adipocyte populations is not reproducible between different studies (Massier et al. 2023). Interestingly, this inconsistency is unique to adipocytes, as other cell types in adipose tissues display reproducible sub cell types across species and studies (Massier et al. 2023). Our findings indicate that adipocytes may exhibit a unique pathological cellular state with significantly reduced RNA content, which may contribute to the poor consistency in adipocyte heterogeneity in prior studies with suboptimal RNA quality. Therefore, using a robust method to effectively preserve RNA quality may be critical for accurately characterizing adipocyte populations, especially in disease states. It may be important to test in future studies whether our snRNA-seq protocol can identify consistent heterogeneity in adipocyte populations across different species, studies, and individual human subjects. We have revised the manuscript to include this new discussion (page 17)

      (2) typo: "To generate diet-induced obesity models". 

      We revised the manuscript to correct it.

    1. Author response:

      Reviewer #1 (Public Review):

      The authors examined the hypothesis that plasma ApoM, which carries sphingosine-1-phosphate (S1P) and activates vascular S1P receptors to inhibit vascular leakage, is modulated by SGLT2 inhibitors (SGLTi) during endotoxemia. They also propose that this mechanism is mediated by SGLTi regulation of LRP2/ megalin in the kidney and that this mechanism is critical for endotoxin-induced vascular leak and myocardial dysfunction. The hypothesis is novel and potentially exciting. However, the author's experiments lack critical controls, lack rigor in multiple aspects, and overall does not support the conclusions.

      Thank you for these comments. We have now directly addressed this hypothesis by using proximal tubule-specific inducible megalin/Lrp2 knockout mice, which remains an innovative hypothesis about how SGLT2i can reduce vascular leak.

      Reviewer #2 (Public Review):

      Apolipoprotein M (ApoM) is a plasma carrier for the vascular protective lipid mediator sphingosine 1-phospate (S1P). The plasma levels of S1P and its chaperones ApoM and albumin rapidly decline in patients with severe sepsis, but the mechanisms for such reductions and their consequences for cardiovascular health remain elusive. In this study, Ripoll and colleagues demonstrate that the sodium-glucose co-transporter inhibitor dapagliflozin (Dapa) can preserve serum ApoM levels as well as cardiac function after LPS treatment of mice with diet-induced obesity. They further provide data to suggest that Dapa preserves serum ApoM by increasing megalin-mediated reabsorption of ApoM in renal proximal tubules and that ApoM improves vascular integrity in LPS treated mice. These observations put forward a potential therapeutic approach to sustain vascular protective S1P signaling that could be relevant to other conditions of systemic inflammation where plasma levels of S1P decrease. However, although the authors are careful with their statements, the study falls short of directly implicating megalin in ApoM reabsorption and of ApoM/S1P depletion in LPS-induced cardiac dysfunction and the protective effects of Dapa.

      The observations reported in this study are exciting and potentially of broad interest. The paper is well written and concise, and the statements made are mostly supported by the data presented. However, the mechanism proposed and implied is mostly based on circumstantial evidence, and the paper could be substantially improved by directly addressing the role of megalin in ApoM reabsorption and serum ApoM and S1P levels and the importance of ApoM for the preservation for cardiac function during endotoxemia. Some observations that are not necessarily in line with the model proposed should also be discussed.

      The authors show that Dapa preserves serum ApoM and cardiac function in LPS-treated obese mice. However, the evidence they provide to suggest that ApoM may be implicated in the protective effect of Dapa on cardiac function is indirect. Direct evidence could be sought by addressing the effect of Dapa on cardiac function in LPS treated ApoM deficient and littermate control mice (with DIO if necessary).

      The authors also suggest that higher ApoM levels in mice treated with Dapa and LPS reflect increased megalin-mediated ApoM reabsorption and that this preserves S1PR signaling. This could be addressed more directly by assessing the clearance of labelled ApoM, by addressing the impact of megalin inhibition or deficiency on ApoM clearance in this context, and by measuring S1P as well as ApoM in serum samples.

      Methods: More details should be provided in the manuscript for how ApoM deficient and transgenic mice were generated, on sex and strain background, and on whether or not littermate controls were used. For intravital microscopy, more precision is needed on how vessel borders were outland and if this was done with or without regard for FITC-dextran. Please also specify the type of vessel chosen and considerations made with regard to blood flow and patency of the vessels analyzed. For statistical analyses, data from each mouse should be pooled before performing statistical comparisons. The criteria used for choice of test should be outlined as different statistical tests are used for similar datasets. For all data, please be consistent in the use of post-tests and in the presentation of comparisons. In other words, if the authors choose to only display test results for groups that are significantly different, this should be done in all cases. And if comparisons are made between all groups, this should be done in all cases for similar sets of data.

      Thank you for these comments. We have now tested the direct role of Lrp2 with respect to SGLT2i in vivo and in vitro, and our study now shows that Lrp2 is required for the effect of dapagliflozin on ApoM. ApoM deficient and transgenic mice were previously described and published by our group (PMID: 37034289) and others (PMID: 24318881), and littermate controls were used throughout our manuscript. We agree that the effect on cardiac function is likely indirect in these models, and as yet we do not have the tools in the LPS model to separate potential endothelial protective vs cardiac effects. In addition, since the ApoM knockout has multiple abnormalities that include hypertension, secondary cardiac hypertrophy, and an adipose/browning phenotype, all of which may influence its response to Dapa in terms of cardiac function, these studies will be challenging to perform and will require additional models that are beyond the scope of this manuscript.

      For intravital microscopy, vessel borders were outlined blindly without regard for FITC-dextran. We believe it is important to show multiple blood vessels per mouse since, as the reviewer points out, there is quite a bit of vessel heterogeneity. These tests were performed in the collaborator’s laboratory, and data analysis was blinded, and the collaborator was unaware of the study hypothesis at the time the measurements were performed and analyzed. They have previously reported this is a valid method to show cremaster vessel permeability (PMID: 26839042).

      We have updated our methods section and updated the figure legends to clearly indicate the statistical analyses we used. For 2 group comparison we used student’s t-test, and for multiple groups one-way ANOVA with Sidak's correction for multiple comparisons was used throughout the paper when the data are normally distributed, and Kruskal-Wallis was used when the data are not normally distributed.

      Reviewer #3 (Public Review):

      The authors have performed well designed experiments that elucidate the protective role of Dapa in sepsis model of LPS. This model shows that Dapa works, in part, by increasing expression of the receptor LRP2 in the kidney, that maintains circulating ApoM levels. ApoM binds to S1P which then interacts with the S1P receptor stimulating cardiac function, epithelial and endothelial barrier function, thereby maintaining intravascular volume and cardiac output in the setting of severe inflammation. The authors used many experimental models, including transgenic mice, as well as several rigorous and reproducible techniques to measure the relevant parameters of cardiac, renal, vascular, and immune function. Furthermore, they employ a useful inhibitor of S1P function to show pharmacologically the essential role for this agonist in most but not all the benefits of Dapa. A strength of the paper is the identification of the pathway responsible for the cardioprotective effects of SGLT2is that may yield additional therapeutic targets. There are some weaknesses in the paper, such as, studying only male mice, as well as providing a power analysis to justify the number of animals used throughout their experimentation. Overall, the paper should have a significant impact on the scientific community because the SGLT2i drugs are likely to find many uses in inflammatory diseases and metabolic diseases. This paper provides support for an important mechanism by which they work in conditions of severe sepsis and hemodynamic compromise.

      Thank you for these comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewers' 1 and 2 concern on endothelial cells (ECs) transcription changes on culture.

      We have now addressed this concern by FACS-sorting ECs (Fig. 7A revised) and comparing our data with previous studies (S. Fig. 1C). Our major claim was the epigenetic repression of EC genes, including those involved in BBB formation and angiogenesis, during later development. To further strengthen our claim, we knocked out HDAC2 during the later stages of development to prevent this epigenetic repression. As shown in the first version of the manuscript, this knockout results in enhanced angiogenesis and a leaky BBB.

      In the revised version, we have FACS-sorted CD31+ ECs from E-17.5 WT and HDAC2 ECKO mice, followed by ultra-low mRNA sequencing. Confirming the epigenetic repression via HDAC2, the HDAC2-deleted ECs showed high expression of BBB genes such as ZO-1, OCLN, MFSD2A, and GLUT1, and activation of the Wnt signaling pathway as indicated by the upregulation of Wnt target genes such as Axin2 and APCDD1. Additionally, to validate the increased angiogenesis phenotype observed, angiogenesis-related genes such as VEGFA, FLT1, and ENG were upregulated.

      Since the transcriptomics of brain ECs during developmental stages has already been published in Hupe et al., 2017, we did not attempt to replicate this. However, we compared our differentially regulated genes from E-13.5 versus adult stages with the transcriptome changes during development reported by Hupe et al., 2017. We found a significant overlap in important genes such as CLDN5, LEF1, ZIC3, and MFSD2A (S. Fig. 1C).

      As pointed out by the reviewer, culture-induced changes cannot be ruled out from our data. We have included a statement in the manuscript: "Even though we used similar culture conditions for both embryonic and adult cortical ECs, culture-induced changes have been reported previously and should be considered as a varying factor when interpreting our results."

      Reviewer-1 Comment 2- An additional concern is that for many experiments, siRNA knockdowns are performed without validation of the efficacy of the knockdown.

      We have now provided the protein expression data for HDAC2 and EZH2 in the revised manuscript Supplementary Figure- 2A.

      Reviewer-1 Comment 3- Some experiments in the paper are promising, however. For example, the knockout of HDAC2 in endothelial cells resulting in BBB leakage was striking. Investigating the mechanisms underlying this phenotype in vivo could yield important insights.

      We appreciate your positive comment. The in vivo HDAC2 knockout experiment serves as a validation of our in vitro findings, demonstrating that the epigenetic regulator HDAC2 can control the expression of endothelial cell (EC) genes involved in angiogenesis, blood-brain barrier (BBB) formation, and maturation. To investigate the mechanism behind the underlying phenotype of HDAC2 ECKO, we performed mRNA sequencing on HDAC2 ECKO E-17.5 ECs and discovered that vascular and BBB maturation is hindered by preventing the epigenetic repression of BBB, angiogenesis, and Wnt target genes (Fig. 7A). As a result, the HDAC2 ECKO phenotype showed increased angiogenesis and BBB leakage. This strengthens our hypothesis that HDAC2-mediated epigenetic repression is critical for BBB and vascular maturation.

      Reviewer 2 Comment-2 The use of qPCR assays for quantifying ChIP and transcript levels is inferior to ChIPseq and RNAseq. Whole genome methods, such as ChIPseq, permit a level of quality assessment that is not possible with qPCR methods. The authors should use whole genome NextGen sequencing approaches, show the alignment of reads to the genome from replicate experiments, and quantitatively analyze the technical quality of the data.

      We appreciate the reviewer's comment. While whole-genome methods like ChIP-seq offer comprehensive and high-throughput data, ChIP-qPCR assays remain valuable tools due to their sensitivity, specificity, and suitability for validation and targeted analysis. Our ChIP analysis identify the crucial roles of HDAC2 and PRC2, two epigenetic enzymes, in CNS endothelial cells (ECs). In vivo data presented in Figure 4 further support this finding through observed phenotypic differences. We concur that a comprehensive analysis of HDAC2 and PRC2 target genes in ECs is essential. A comprehensive analysis of HDAC2 and PRC2 target genes in ECs is currently underway and will be the subject of a separate publication due to the extensive nature of the data.

      Reviewer 2 Comment-3 Third, the observation that pharmacologic inhibitor experiments and conditional KO experiments targeting HDAC2 and the Polycomb complex perturb EC gene expression or BBB integrity, respectively, is not particularly surprising as these proteins have broad roles in epigenetic regulation in a wide variety of cell types.

      We appreciate the comments from the reviewers. Our results provide valuable insights into the specific epigenetic mechanisms that regulate BBB genes It is important to recognize that different cell types possess stage-specific distinct epigenetic landscapes and regulatory mechanisms. Rather than having broad roles across diverse cell types, it is more likely that HDAC2 (eventhough there are several other class and subtypes of HDACs) and the Polycomb complex exhibit specific functions within the context of EC gene expression or BBB integrity.

      Moreover, the significance of our findings is enhanced by the fact that epigenetic modifications are often reversible with the assistance of epigenetic regulators. This makes them promising targets for BBB modulation. Targeting epigenetic regulators can have a widespread impact, as these mechanisms regulate numerous genes that collectively have the potential to promote the vascular repair.

      A practical advantage is that FDA-approved HDAC2 inhibitors, as well as PRC2 inhibitors (such as those mentioned in clinical trials NCT03211988 and NCT02601950, are already available. This facilitates the repurposing of drugs and expedites their potential for clinical translation.

    1. Author response:

      Reviewer #1 (Public Review):

      Padilha et al. aimed to find prospective metabolite biomarkers in serum of children aged 6-59 months that were indicative of neurodevelopmental outcomes. The authors leveraged data and samples from the cross-sectional Brazilian National Survey on Child Nutrition (ENANI-2019), and an untargeted multisegment injection-capillary electrophoresis-mass spectrometry (MSI-CE-MS) approach was used to measure metabolites in serum samples (n=5004) which were identified via a large library of standards. After correlating the metabolite levels against the developmental quotient (DQ), or the degree of which age-appropriate developmental milestones were achieved as evaluated by the Survey of Well-being of Young Children, serum concentrations of phenylacetylglutamine (PAG), cresol sulfate (CS), hippuric acid (HA) and trimethylamine-N-oxide (TMAO) were significantly negatively associated with DQ. Examination of the covariates revealed that the negative associations of PAG, HA, TMAO and valine (Val) with DQ were specific to younger children (-1 SD or 19 months old), whereas creatinine (Crtn) and methylhistidine (MeHis) had significant associations with DQ that changed direction with age (negative at -1 SD or 19 months old, and positive at +1 SD or 49 months old). Further, mediation analysis demonstrated that PAG was a significant mediator for the relationship of delivery mode, child's diet quality and child fiber intake with DQ. HA and TMAO were additional significant mediators of the relationship of child fiber intake with DQ.

      Strengths of this study include the large cohort size and study design allowing for sampling at multiple time points along with neurodevelopmental assessment and a relatively detailed collection of potential confounding factors including diet. The untargeted metabolomics approach was also robust and comprehensive allowing for level 1 identification of a wide breadth of potential biomarkers. Given their methodology, the authors should be able to achieve their aim of identifying candidate serum biomarkers of neurodevelopment for early childhood. The results of this work would be of broad interest to researchers who are interested in understanding the biological underpinnings of development and also for tracking development in pediatric populations, as it provides insight for putative mechanisms and targets from a relevant human cohort that can be probed in future studies. Such putative mechanisms and targets are currently lacking in the field due to challenges in conducting these kind of studies, so this work is important.

      However, in the manuscript's current state, the presentation and analysis of data impede the reader from fully understanding and interpreting the study's findings.

      Particularly, the handling of confounding variables is incomplete. There is a different set of confounders listed in Table 1 versus Supplementary Table 1 versus Methods section Covariates versus Figure 4. For example, Region is listed in Supplementary Table 1 but not in Table 1, and Mode of Delivery is listed in Table 1 but not in Supplementary Table 1. Many factors are listed in Figure 4 that aren't mentioned anywhere else in the paper, such as gestational age at birth or maternal pre-pregnancy obesity.

      We thank the reviewer for their comment. We would like to clarify that initially, the tables had different variables because they have different purposes. Table 1 aims to characterize the sample on variables directly related to the children’s and mother’s features and their nutritional status. Supplementary File 1(previously named supplementary table 1) summarizes the sociodemographic distribution of the development quotient. Neither of the tables concerned the metabolite-DQ relationships and their potential covariates, they only provide context for subsequent analyses by characterizing the sample and the outcome. Instead, the covariates included in the regression models were selected using the Direct Acyclic Graph presented in Figure 1.

      To avoid this potential confusion however, we included the same variables in Table 1 and Supplementary File 1(page 38) and we discussed the selection of model covariates in Figure 4 in more detail here in the letter and in the manuscript.

      The authors utilize the directed acrylic graph (DAG) in Figure 4 to justify the further investigation of certain covariates over others. However, the lack of inclusion of the microbiome in the DAG, especially considering that most of the study findings were microbial-derived metabolite biomarkers, appears to be a fundamental flaw. Sanitation and micronutrients are proposed by the authors to have no effect on the host metabolome, yet sanitation and micronutrients have both been demonstrated in the literature to affect microbiome composition which can in turn affect the host metabolome.

      Thank you for your comment. We appreciate that the use of DAG and lack of the microbiome in the DAG are concerns. This has been already discussed in reply #1 to the editor that has been pasted below for convenience:

      Thank you for the comment and suggestions. It is important to highlight that there is no data on microbiome composition. We apologize if there was an impression such data is available. The main goal of conducting this national survey was to provide qualified and updated evidence on child nutrition to revise and propose new policies and nutritional guidelines for this demographic. Therefore, collection of stool derived microbiome (metagenomic) data was not one of the objectives of ENANI-2019. This is more explicitly stated as a study limitation in the revised manuscript on page 17, lines 463-467:

      “Lastly, stool microbiome data was not collected from children in ENANI-2019 as it was not a study objective in this large population-based nutritional survey. However, the lack of microbiome data does not reduce the importance/relevance, since there is no evidence that microbiome and factors affecting microbiome composition are confounders in the association between serum metabolome and child development.”

      Besides, one must consider the difficulties and costs in collecting and analyzing microbiome composition in a large population-based survey. In contrast, the metabolome data has been considered a priority as there was already blood specimens collected to inform policy on micronutrient deficiencies in Brazil. However, due to funding limitations we had to perform the analysis in a subset of our sample, still representative and large enough to test our hypothesis with adequate study power (more details below).

      We would like to argue that there is no evidence that microbiome and factors affecting microbiome composition are confounders on the association between serum metabolome and child development. First, one should revisit the properties of a confounder according to the epidemiology literature that in short states that confounding refers to an alternative explanation for a given conclusion, thus constituting one of the main problems for causal inference (Kleinbaum, Kupper, and Morgenstern, 1991; Greenland & Robins, 1986; VanderWeele, 2019). In our study, we highlight that certain serum metabolites associated with the developmental quotient (DQ) in children were circulating metabolites (e.g., cresol sulfate, hippuric acid, phenylacetylglutamine, TMAO) previously reported to depend on dietary exposures, host metabolism and gut microbiota activity. Our discussion cites other published work, including animal models and observational studies, which have reported how these bioactive metabolites in circulation are co-metabolized by commensal gut microbiota, and may play a role in neurodevelopment and cognition as mediated by environmental exposures early in life.

      In fact, the literature on the association between microbiome and infant development is very limited. We performed a search using terms ‘microbiome’ OR ‘microbiota’ AND ‘child development’ AND ‘systematic’ OR ‘meta-analysis’ and found only one study: ‘Associations between the human immune system and gut microbiome with neurodevelopment in the first 5 years of life: A systematic scoping review’ (DOI 10.1002/dev.22360). The authors conclude: ‘while the immune system and gut microbiome are thought to have interactive impacts on the developing brain, there remains a paucity of published studies that report biomarkers from both systems and associations with child development outcomes.’ It is important to highlight that our criteria to include confounders on the directed acyclic graph (DAG) was based on the literature of systematic reviews or meta-analysis and not on single isolated studies.

      In summary, we would like to highlight that there is no microbiome data in ENANI-2019 and in the event such data was present, we are confident that based on the current stage of the literature, there is no evidence to consider such construct in the DAG, as this procedure recommends that only variables associated with the exposure and the outcome should be included. Please find more details on DAG below.

      Moreover, we would like to clarify that we have not stated that sanitation and micronutrients have no effect on the serum metabolome, instead, these constructs were not considered on the DAG.

      To make it clearer, we have modified the passage about DAG in the methods section. New text, page 9, lines 234-241:

      “The subsequent step was to disentangle the selected metabolites from confounding variables. A Directed Acyclic Graph (DAG; Breitling et al., 2021) was used to more objectively determine the minimally sufficient adjustments for the regression models to account for potentially confounding variables while avoiding collider variables and variables in the metabolite-DQ causal pathways, which if controlled for would unnecessarily remove explained variance from the metabolites and hamper our ability to detect biomarkers. To minimize bias from subjective judgments of which variables should and should not be included as covariates, the DAG only included variables for which there was evidence from systematic reviews or meta-analysis of relationships with both the serum metabolome and DQ (Figure 1). Birth weight, breastfeeding, child's diet quality, the child's nutritional status, and the child's age were the minimal adjustments suggested by the DAG. Birth weight was a variable with high missing data, and indicators of breastfeeding practice data (referring to exclusive breastfeeding until 6 months and/or complemented until 2 years) were collected only for children aged 0–23 months. Therefore, those confounders were not included as adjustments. Child's diet quality was evaluated as MDD, the child's nutritional status as w/h z-score, and the child's age in months.”

      Additionally, the authors emphasized as part of the study selection criteria the following, "Due to the costs involved in the metabolome analysis, it was necessary to further reduce the sample size. Then, samples were stratified by age groups (6 to 11, 12 to 23, and 24 to 59 months) and health conditions related to iron metabolism, such as anemia and nutrient deficiencies. The selection process aimed to represent diverse health statuses, including those with no conditions, with specific deficiencies, or with combinations of conditions. Ultimately, through a randomized process that ensured a balanced representation across these groups, a total of 5,004 children were selected for the final sample (Figure 1)."

      Therefore, anemia and nutrient deficiencies are assumed by the reader to be important covariates, yet, the data on the final distribution of these covariates in the study cohort is not presented, nor are these covariates examined further.

      Thank you for the comments. We apologize for the misunderstanding and will amend the text to make our rationale clearer in the revised version of the manuscript.

      We believed the original text was clear enough in stating that the sampling process was performed aiming to maintain the representativeness of the original sample. This sampling process considered anemia and nutritional deficiencies, among other variables. However, we did not aim to include all relevant covariates of the DQ-metabolome relationship; these were decided using the DAG, as described in the manuscript and other sessions of this letter. Therefore, we would like to emphasize that our description of the sampling process does not assumes anemia and nutritional deficiencies are important covariates for the DQ-metabolome relationship.

      We rewrote this text part, page 11, lines 279-285:

      “Due to the costs involved in the metabolome analysis, it was necessary to reduce the sample size that is equivalent to 57% of total participants from ENANI-2019 with stored blood specimens. Therefore, the infants were stratified by age groups (6 to 11, 12 to 23, and 24 to 59 months) and health conditions such as anemia and micronutrient deficiencies. The selection process aimed to represent diverse health statuses to the original sample. Ultimately, 5,004 children were selected for the final sample through a random sampling process that ensured a balanced representation across these groups (Figure 2).”

      The inclusion of specific covariates in Table 1, Supplementary Table 1, the statistical models, and the mediation analysis is thus currently biased as it is not well justified.

      We appreciate the reviewer comment. However, it would have been ideal to receive a comment/critic with a clearer and more straightforward argumentation, so we could try to address it based on our interpretation.

      Please refer to our response to item #1 above regarding the variables in the tables and figures. The covariates in the statistical models were selected using the DAG, which is a cutting-edge procedure that aims to avoid bias and overfitting, a common situation when confounders are adjusted for without a clear rationale. We elaborate on the advantages of using the DAG in response to item #6 and in page 9 of the manuscript. The statistical models we use follow the best practices in the field when dealing with a large number of collinear predictors and a continuous outcome (see our response to the editor’s 4th comment). Finally, the mediation analyses were done to explore a few potential explanations for our results from the PLSR and multiple regression analyses. We only ran mediation analyses for plausible mechanisms for which the variables of interest were available in our data. Please see our response to reviewer 3’s item #1 for a more detailed explanation on the mediation analysis.

      Finally, it is unclear what the partial-least squares regression adds to the paper, other than to discard potentially interesting metabolites found by the initial correlation analysis.

      Thank you for the question. As explained in response to the editor’s item #4, PLS-based analyses are among the most commonly used analyses for parsing metabolomic data (Blekherman et al., 2011; Wold et al., 2001; Gromski et al. 2015). This procedure is especially appropriate for cases in which there are multiple collinear predictor variables as it allows us to compare the predictive value of all the variables without relying on corrections for multiple testing. Testing each metabolite in separate correlations corrected for multiple comparisons is less appropriate because the correlated nature of the metabolites means the comparisons are not truly independent and would cause the corrections (which usually assume independence) to be overly strict. As such, we only rely on the correlations as an initial, general assessment that gives context to subsequent, more specific analyses. Given that our goal is to select the most predictive metabolites, discarding the less predictive metabolites is precisely what we aim to achieve. As explained above and in response to the editor’s item #4, the PLSR allows us to reach that goal without introducing bias in our estimates or losing statistical power.  

      Reviewer #2 (Public Review):

      A strength of the work lies in the number of children Padilha et al. were able to assess (5,004 children aged 6-59 months) and in the extensive screening that the Authors performed for each participant. This type of large-scale study is uncommon in low-to-middle-income countries such as Brazil.

      The Authors employ several approaches to narrow down the number of potentially causally associated metabolites.

      Could the Authors justify on what basis the minimum dietary diversity score was dichotomized? Were sensitivity analyses undertaken to assess the effect of this dichotomization on associations reported by the article? Consumption of each food group may have a differential effect that is obscured by this dichotomization.

      Thank you for the observation. We would like to emphasize that the child's diet quality was assessed using the minimum dietary diversity (MDD) indicator proposed by the WHO (World Health Organization & United Nations Children’s Fund (UNICEF), 2021). This guideline proposes the cutoff used in the present study. We understand the reviewer’s suggestion to use the consumption of healthy food groups as an evaluation of diet quality, but we chose to follow the WHO proposal to assess dietary diversity. This indicator is widely accepted and used as a marker and provides comparability and consistency with other published studies.

      Could the Authors specify the statistical power associated with each analysis?

      To the best of our knowledge, we are not aware of power calculation procedures for PLS-based analyses. However, given our large sample size, we do not believe power was an issue with the analyses. For our regression analyses, which typically have 4 predictors, we had 95% power to detect an f-squared of 0.003 and an r of 0.05 in a two-sided correlation test considering an alpha level of 0.05.

      New text, page 11, lines 296-298:

      “Given the size of our sample, statistical power is not an issue in our analyses. Considering an alpha of 0.05 for a two-sided test, a sample size of 5000 has 95% power to detect a correlation of r = 0.05 and an effect of f2 = 0.003 in a multiple regression model with 4 predictors.”

      Could the Authors describe in detail which metric they used to measure how predictive PLSR models are, and how they determined what the "optimal" number of components were?

      We chose the model with the fewest number of components that maximized R2 and minimized root mean squared error of prediction (RMSEP). In the training data, the model with 4 components had a lower R2 but a lower RMSEP, therefore we chose the model with 3 components which had a higher R2 than the 4-component model and lower RMSEP than the model with 2 components. However, the number of components in the model did not meaningfully change the rank order of the metabolites on the VIP index.

      New text, page 8, lines 220-224:

      “To better assess the predictiveness of each metabolite in a single model, a PLSR was conducted. PLS-based analyses are the most commonly used analyses when determining the predictiveness of a large number of variables as they avoid issues with collinearity, sample size, and corrections for multiple-testing (Blekherman et al., 2011; Wold et al., 2001; Gromski et al. 2015).”

      New text, page 12, lines 312-314:

      “In PLSR analysis, the training data suggested that three components best predicted the data (the model with three components had the highest R2, and the root mean square error of prediction (RMSEP) was only slightly lower with four components). In comparison, the test data showed a slightly more predictive model with four components (Figure 3—figure supplement 2).”

      The Authors use directed acyclic graphs (DAG) to identify confounding variables of the association between metabolites and DQ. Could the dataset generated by the Authors have been used instead? Not all confounding variables identified in the literature may be relevant to the dataset generated by the Authors.

      Thank you for the question. The response is most likely no, the current dataset should not be used to define confounders as these must be identified based on the literature. The use of DAGs has been widely explored as a valid tool for justifying the choice of confounding factors in regression models in epidemiology. This is because DAGs allow for a clear visualization of causal relationships, clarify the complex relationships between exposure and outcome. Besides, DAGs demonstrate the authors' transparency by acknowledging factors reported as important but not included/collected in the study. This has been already discussed in reply #1 to the editor that has been pasted below for convenience.

      Thank you for the comment and suggestions. It is important to highlight that there is no data on microbiome composition. We apologize if there was an impression such data is available. The main goal of conducting this national survey was to provide qualified and updated evidence on child nutrition to revise and propose new policies and nutritional guidelines for this demographic. Therefore, collection of stool derived microbiome (metagenomic) data was not one of the objectives of ENANI-2019. This is more explicitly stated as a study limitation in the revised manuscript on page 17, lines 463-467:

      “Lastly, stool microbiome data was not collected from children in ENANI-2019 as it was not a study objective in this large population-based nutritional survey. However, the lack of microbiome data does not reduce the importance/relevance, since there is no evidence that microbiome and factors affecting microbiome composition are confounders in the association between serum metabolome and child development.”

      Besides, one must consider the difficulties and costs in collecting and analyzing microbiome composition in a large population-based survey. In contrast, the metabolome data has been considered a priority as there was already blood specimens collected to inform policy on micronutrient deficiencies in Brazil. However, due to funding limitations we had to perform the analysis in a subset of our sample, still representative and large enough to test our hypothesis with adequate study power (more details below).

      We would like to argue that there is no evidence that microbiome and factors affecting microbiome composition are confounders on the association between serum metabolome and child development. First, one should revisit the properties of a confounder according to the epidemiology literature that in short states that confounding refers to an alternative explanation for a given conclusion, thus constituting one of the main problems for causal inference (Kleinbaum, Kupper, and Morgenstern, 1991; Greenland & Robins, 1986; VanderWeele, 2019). In our study, we highlight that certain serum metabolites associated with the developmental quotient (DQ) in children were circulating metabolites (e.g., cresol sulfate, hippuric acid, phenylacetylglutamine, TMAO) previously reported to depend on dietary exposures, host metabolism and gut microbiota activity. Our discussion cites other published work, including animal models and observational studies, which have reported how these bioactive metabolites in circulation are co-metabolized by commensal gut microbiota, and may play a role in neurodevelopment and cognition as mediated by environmental exposures early in life.

      In fact, the literature on the association between microbiome and infant development is very limited. We performed a search using terms ‘microbiome’ OR ‘microbiota’ AND ‘child development’ AND ‘systematic’ OR ‘meta-analysis’ and found only one study: ‘Associations between the human immune system and gut microbiome with neurodevelopment in the first 5 years of life: A systematic scoping review’ (DOI 10.1002/dev.22360). The authors conclude: ‘while the immune system and gut microbiome are thought to have interactive impacts on the developing brain, there remains a paucity of published studies that report biomarkers from both systems and associations with child development outcomes.’ It is important to highlight that our criteria to include confounders on the directed acyclic graph (DAG) was based on the literature of systematic reviews or meta-analysis and not on single isolated studies.

      In summary, we would like to highlight that there is no microbiome data in ENANI-2019 and in the event such data was present, we are confident that based on the current stage of the literature, there is no evidence to consider such construct in the DAG, as this procedure recommends that only variables associated with the exposure and the outcome should be included. Please find more details on DAG below.

      Moreover, we would like to clarify that we have not stated that sanitation and micronutrients have no effect on the serum metabolome, instead, these constructs were not considered on the DAG.

      To make it clearer, we have modified the passage about DAG in the methods section. New text, page 9, lines 234-241:

      “The subsequent step was to disentangle the selected metabolites from confounding variables. A Directed Acyclic Graph (DAG; Breitling et al., 2021) was used to more objectively determine the minimally sufficient adjustments for the regression models to account for potentially confounding variables while avoiding collider variables and variables in the metabolite-DQ causal pathways, which if controlled for would unnecessarily remove explained variance from the metabolites and hamper our ability to detect biomarkers. To minimize bias from subjective judgments of which variables should and should not be included as covariates, the DAG only included variables for which there was evidence from systematic reviews or meta-analysis of relationships with both the serum metabolome and DQ (Figure 1). Birth weight, breastfeeding, child's diet quality, the child's nutritional status, and the child's age were the minimal adjustments suggested by the DAG. Birth weight was a variable with high missing data, and indicators of breastfeeding practice data (referring to exclusive breastfeeding until 6 months and/or complemented until 2 years) were collected only for children aged 0–23 months. Therefore, those confounders were not included as adjustments. Child's diet quality was evaluated as MDD, the child's nutritional status as w/h z-score, and the child's age in months.”

      Were the systematic reviews or meta-analyses used in the DAG performed by the Authors, or were they based on previous studies? If so, more information about the methodology employed and the studies included should be provided by the Authors.

      Thank you for the question. The reviews or meta-analyses used in the DAG have been conducted by other authors in the field. This has been laid out more clearly in our methods section.

      New text, page 9, lines 234-241:

      “The subsequent step was to disentangle the selected metabolites from confounding variables. A Directed Acyclic Graph (DAG; Breitling et al., 2021) was used to more objectively determine the minimally sufficient adjustments for the regression models to account for potentially confounding variables while avoiding collider variables and variables in the metabolite-DQ causal pathways, which if controlled for would unnecessarily remove explained variance from the metabolites and hamper our ability to detect biomarkers. To minimize bias from subjective judgments of which variables should and should not be included as covariates, the DAG only included variables for which there was evidence from systematic reviews or meta-analysis of relationships with both the metabolome and DQ (Figure 1). Birth weight, breastfeeding, child's diet quality, the child's nutritional status, and the child's age were the minimal adjustments suggested by the DAG. Birth weight was a variable with high missing data, and indicators of breastfeeding practice data (referring to exclusive breastfeeding until 6 months and/or complemented until 2 years) were collected only for children aged 0–23 months. Therefore, those confounders were not included as adjustments. Child's diet quality was evaluated as MDD, the child's nutritional status as w/h z-score, and the child's age in months.”

      Approximately 72% of children included in the analyses lived in households with a monthly income superior to the Brazilian minimum wage. The cohort is also biased towards households with a higher level of education. Both of these measures correlate with developmental quotient. Could the Authors discuss how this may have affected their results and how generalizable they are?

      Thank you for your comment. This has been already discussed in reply #6 to the editor and that has been pasted below for convenience.

      Thank you for highlighting this point. The ENANI-2019 is a population-based household survey with national coverage and representativeness for macroregions, sex, and one-year age groups (< 1; 1-1.99; 2-2.99; 3-3.99; 4-5). Furthermore, income quartiles of the census sector were used in the sampling. The study included 12,524 households 14,588 children, and 8,829 infants with blood drawn.

      Due to the costs involved in metabolome analysis, it was necessary to further reduce the sample size to around 5,000 children that is equivalent to 57% of total participants from ENANI-2019 with stored blood specimens. To avoid a biased sample and keep the representativeness and generability, the 5,004 selected children were drawn from the total samples of 8,829 to keep the original distribution according age groups (6 to 11 months, 12 to 23 months, and 24 to 59 months), and some health conditions related to iron metabolism, e.g., anemia and nutrient deficiencies. Then, they were randomly selected to constitute the final sample that aimed to represent the total number of children with blood drawn. Hence, our efforts were to preserve the original characteristics of the sample and the representativeness of the original sample.

      The ENANI-2019 study does not appear to present a bias towards higher socioeconomic status. Evidence from two major Brazilian population-based household surveys supports this claim. The 2017-18 Household Budget Survey (POF) reported an average monthly household income of 5,426.70 reais, while the Continuous National Household Sample Survey (PNAD) reported that in 2019, the nominal monthly per capita household income was 1,438.67 reais. In comparison, ENANI-2019 recorded a household income of 2,144.16 reais and a per capita income of 609.07 reais in infants with blood drawn, and 2,099.14 reais and 594.74 reais, respectively, in the serum metabolome analysis sample.

      In terms of maternal education, the 2019 PNAD-Education survey indicated that 48.8% of individuals aged 25 or older had at least 11 years of schooling. When analyzing ENANI-2019 under the same metric, we found that 56.26% of ≥25 years-old mothers of infants with blood drawn had 11 years of education or more, and 51.66% in the metabolome analysis sample. Although these figures are slightly higher, they remain within a reasonable range for population studies.

      It is well known that higher income and maternal education levels can influence child health outcomes, and acknowledging this, ENANI-2019 employed rigorous sampling methods to minimize selection biases. This included stratified and complex sampling designs to ensure that underrepresented groups were adequately included, reducing the risk of skewed conclusions. Therefore, the evidence strongly suggests that the ENANI-2019 sample is broadly representative of the Brazilian population in terms of both socioeconomic status and educational attainment.

      Further to this, could the Authors describe how inequalities in access to care in the Brazilian population may have affected their results? Could they have included a measure of this possible discrepancy in their analyses?

      Thank you for the concern.

      The truth is that we are not in a position to answer this question because our study focused on gathering data on infant nutritional status and there is very limited information on access to care to allow us to hypothesize. Another important piece of information is that this national survey used sampling procedures that aimed to make the sample representative of the 15 million Brazilian infants under 5 years. Therefore, the sample is balanced according to socio-economic strata, so there is no evidence to make us believe inequalities in access to health care would have played a role.

      The Authors state that the results of their study may be used to track children at risk for developmental delays. Could they discuss the potential for influencing policies and guidelines to address delayed development due to malnutrition and/or limited access to certain essential foods?

      The point raised by the reviewer is very relevant. Recognizing that dietary and microbial derived metabolites involved in the gut-brain axis could be related to children's risk of developmental delays is the first step to bringing this topic to the public policy agenda. We believe the results can contribute to the literature, which should be used to accumulate evidence to overcome knowledge gaps and support the formulation and redirection of public policies aimed at full child growth and development; the promotion of adequate and healthy nutrition and food security; the encouragement, support, and protection of breastfeeding; and the prevention and control of micronutrient deficiencies.  

      Reviewer #3 (Public Review):

      The ENANI-2019 study provides valuable insights into child nutrition, development, and metabolomics in Brazil, highlighting both challenges and opportunities for improving child health outcomes through targeted interventions and further research.

      Readers might consider the following questions:

      (1) Should investigators study the families through direct observation of diet and other factors to look for a connection between food taken in and gut microbiome and child development?

      As mentioned before, the ENANI-2019 did not collect data on stool derived microbiome. However, there is data on child dietary intake with 24-hour recall that can be further explored in other studies.

      (2) Can an examination of the mother's gut microbiome influence the child's microbiome? Can the mother or caregiver's microbiome influence early childhood development?

      The questions raised by the reviewer are interesting and has been explored by other authors. However, we do not have microbiota data from the child nor from the mother/caregiver.

      (3) Is developmental quotient enough to study early childhood development? Is it comprehensive enough?

      Yes, we are confident it is comprehensive enough.

      According to the World Health Organization, the term Early Childhood Development (ECD) refers to the cognitive, physical, language, motor, social and emotional development between 0 - 8 years of age. The SWCY milestones assess the domains of cognition, language/communication and motor. Therefore, it has enough content validity to represent ECD.

      The SWYC is recommended for screening ECD by the American Society of Pediatrics. Furthermore, we assessed the internal consistency of the SWYC milestones questionnaire using ENANI-2019 data and Cronbach's alpha. The findings indicated satisfactory reliability (0.965; 95% CI: 0.963–0.968).

      The SWCY is a screening instrument and indicates if the ECD is not within the expected range. If one of the above-mentioned domains are not achieved as expected the child may be at risk of ECD delay. Therefore, DQ<1 indicates that a child has not reached the expected ECD for the age group. We cannot say that children with DQ≥1 have full ECD, since we do not assess the socio-emotional domains. However, DQ can track the risk of ECD delay.

      References

      Blekherman, G., Laubenbacher, R., Cortes, D. F., Mendes, P., Torti, F. M., Akman, S., ... & Shulaev, V. (2011). Bioinformatics tools for cancer metabolomics. Metabolomics, 7, 329-343.

      Gromski, P. S., Muhamadali, H., Ellis, D. I., Xu, Y., Correa, E., Turner, M. L., & Goodacre, R. (2015). A tutorial review: Metabolomics and partial least squares-discriminant analysis–a marriage of convenience or a shotgun wedding. Analytica chimica acta, 879, 10-23.

      Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and intelligent laboratory systems, 58(2), 109-130.

      LUIZ, RR., and STRUCHINER, CJ. Inferência causal em epidemiologia: o modelo de respostas potenciais [online]. Rio de Janeiro: Editora FIOCRUZ, 2002. 112 p. ISBN 85-7541-010-5. Available from SciELO Books http://books.scielo.org.

      GREENLAND, S. & ROBINS, J. M. Identifiability, exchangeability, and epidemiological Confounding. International Journal of Epidemiolgy, 15(3):413-419, 1986.

      Freitas-Costa NC, Andrade PG, Normando P, et al. Association of development quotient with nutritional status of vitamins B6, B12, and folate in 6–59-month-old children: Results from the Brazilian National Survey on Child Nutrition (ENANI-2019). The American journal of clinical nutrition 2023;118(1):162-73. doi: https://doi.org/10.1016/j.ajcnut.2023.04.026

      Sheldrick RC, Schlichting LE, Berger B, et al. Establishing New Norms for Developmental Milestones. Pediatrics 2019;144(6) doi: 10.1542/peds.2019-0374 [published Online First: 2019/11/16]

      Drachler Mde L, Marshall T, de Carvalho Leite JC. A continuous-scale measure of child development for population-based epidemiological surveys: a preliminary study using Item Response Theory for the Denver Test. Paediatric and perinatal epidemiology 2007;21(2):138-53. doi: 10.1111/j.1365-3016.2007.00787.x [published Online First: 2007/02/17]

      VanderWeele, TJ Princípios de seleção de fatores de confusão. Eur J Epidemiol 34, 211–219 (2019). https://doi.org/10.1007/s10654-019-00494-6

      David G. Kleinbaum, Lawrence L. Kupper; Hal Morgenstern. Epidemiologic Research: Principles and Quantitative Methods. 1991

      Yan R, Liu X, Xue R, Duan X, Li L, He X, Cui F, Zhao J. Association between internet exclusion and depressive symptoms among older adults: panel data analysis of five longitudinal cohort studies. EClinicalMedicine 2024;75. doi: 10.1016/j.eclinm.2024.102767.

      Zhong Y, Lu H, Jiang Y, Rong M, Zhang X, Liabsuetrakul T. Effect of homemade peanut oil consumption during pregnancy on low birth weight and preterm birth outcomes: a cohort study in Southwestern China. Glob Health Action. 2024 Dec 31;17(1):2336312.

      Aristizábal LYG, Rocha PRH, Confortin SC, et al. Association between neonatal near miss and infant development: the Ribeirão Preto and São Luís birth cohorts (BRISA). BMC Pediatr. 2023;23(1):125. Published 2023 Mar 18. doi:10.1186/s12887-023-03897-3

      Al-Haddad BJS, Jacobsson B, Chabra S, et al. Long-term risk of neuropsychiatric disease after exposure to infection in utero. JAMA Psychiatry. 2019;76(6):594-602. doi:10.1001/jamapsychiatry.2019.0029

      Chan, A.Y.L., Gao, L., Hsieh, M.HC. et al. Maternal diabetes and risk of attention-deficit/hyperactivity disorder in offspring in a multinational cohort of 3.6 million mother–child pairs. Nat Med 30, 1416–1423 (2024).

      Hernan MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.

      Greenland S; Pearl J; Robins JM. Confounding and collapsibility in causal inference. Statist Sci. 14 (1) 29 - 46 1999. https://doi.org/10.1214/ss/1009211805

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Summary: 

      This paper is focused on the role of Cadherin Flamingo (Fmi) - also called Starry night (stan) - in cell competition in developing Drosophila tissues. A primary genetic tool is monitoring tissue overgrowths caused by making clones in the eye disc that express activated Ras (RasV12) and that are depleted for the polarity gene scribble (scrib). The main system that they use is ey-flp, which makes continuous clones in the developing eye-antennal disc beginning at the earliest stages of disc development. It should be noted that RasV12, scrib-i (or lgl-i) clones only lead to tumors/overgrowths when generated by continuous clones, which presumably creates a privileged environment that insulates them from competition. Discrete (hs-flp) RasV12, lgl-i clones are in fact outcompeted (PMID: 20679206), which is something to bear in mind. 

      We think it is unlikely that the outcome of RasV12, scrib (or lgl) competition depends on discrete vs. continuous clones or on creation of a privileged environment. As shown in the same reference mentioned by the reviewer, the outcome of RasV12, scrib (or lgl) tumors greatly depends on the clone being able to grow to a certain size. The authors show instances of discrete clones where larger RasV12, lgl clones outcompete the surrounding tissue and eliminate WT cells by apoptosis, whereas smaller clones behave more like losers. It is not clear what aspect of the environment determines the ability of some clones to grow larger than others, but in neither case are the clones prevented from competition. Other studies show that in mammalian cells, RasV12, scrib clones are capable of outcompeting the surrounding tissue, such as in Kohashi et al (2021), where cells carrying both mutations actively eliminate their neighbors.

      The authors show that clonal loss of Fmi by an allele or by RNAi in the RasV12, scrib-i tumors suppresses their growth in both the eye disc (continuous clones) and wing disc (discrete clones). The authors attributed this result to less killing of WT neighbors when Myc over-expressing clones lacking Fmi, but another interpretation (that Fmi regulates clonal growth) is equally as plausible with the current results. 

      See point (1) for a discussion on this.

      Next, the authors show that scrib-RNAi clones that are normally out-competed by WT cells prior to adult stages are present in higher numbers when WT cells are depleted for Fmi. They then examine death in RasV12, scrib-i ey-FLP clones, or in discrete hsFLP UAS-Myc clones. They state that they see death in WT cells neighboring RasV12, scrib-i clones in the eye disc (Figures 4A-C). Next, they write that RasV12, scrib-I cells become losers (i.e., have apoptosis markers) when Fmi is removed. Neither of these results are quantified and thus are not compelling. They state that a similar result is observed for Myc over-expression clones that lack Fmi, but the image was not compelling, the results are not quantified and the controls are missing (Myc over-expressing clones alone and Fmi clones alone). 

      We assayed apoptosis in UAS-Myc clones in eye discs but neglected to include the results in Figure 4. We include them in the updated manuscript. Regarding Fmi clones alone, we direct the reviewer’s attention to Fig. 2 Supplement 1 where we showed that fminull clones cause no competition. Dcp-1 staining showed low levels of apoptosis unrelated to the fminull clones or twin-spots.

      Regarding the quantification of apoptosis, we did not provide a quantification, in part because we observe a very clear visual difference between groups (Fig. 4A-K), and in part because it is challenging to come up with a rigorous quantification method. For example, how far from a winner clone can an apoptotic cell be and still be considered responsive to the clone? For UASMyc winner clones, we observe a modest amount of cell death both inside and outside the clones, consistent with prior observations. For fminull UAS-Myc clones, we observe vastly more cell death within the fminull UAS-Myc clones and modest death in nearby wildtype cells, and consequently a much higher ratio of cell death inside vs outside the clone. Because of the somewhat arbitrary nature of quantification, and the dramatic difference, we initially chose not to provide a quantification. However, given the request, we chose an arbitrary distance from the clone boundary in which to consider dying cells and counted the numbers for each condition. We view this as a very soft quantification, but we nevertheless report it in a way that captures the phenomenon in the revised manuscript. 

      They then want to test whether Myc over-expressing clones have more proliferation. They show an image of a wing disc that has many small Myc overexpressing clones with and without Fmi. The pHH3 results support their conclusion that Myc overexpressing clones have more pHH3, but I have reservations about the many clones in these panels (Figures 5L-N). 

      As the reviewer’s reservations are not specified, we have no specific response.

      They show that the cell competition roles of Fmi are not shared by another PCP component and are not due to the Cadherin domain of Fmi. The authors appear to interpret their results as Fmi is required for winner status. Overall, some of these results are potentially interesting and at least partially supported by the data, but others are not supported by the data.

      Strengths: 

      Fmi has been studied for its role in planar cell polarity, and its potential role in competition is interesting.

      Weaknesses:

      (1) In the Myc over-expression experiments, the increased size of the Myc clones could be because they divide faster (but don't outcompete WT neighbors). If the authors want to conclude that the bigger size of the Myc clones is due to out-competition of WT neighbors, they should measure cell death across many discs of with these clones. They should also assess if reducing apoptosis (like using one copy of the H99 deficiency that removes hid, rpr, and grim) suppresses winner clone size. If cell death is not addressed experimentally and quantified rigorously, then their results could be explained by faster division of Myc over-expressing clones (and not death of neighbors). This could also apply to the RasV12, scrib-i results.

      Indeed, Myc clones have been shown to divide faster than WT neighbors, but that is not the only reason clones are bigger. As shown in (de la Cova et al, 2004), Myc-overexpressing cells induce apoptosis in WT neighbors, and blocking this apoptosis results in larger wings due to increased presence of WT cells. Also, (Moreno and Basler, 2004) showed that Myc-overexpressing clones cause a reduction in WT clone size, as WT twin spots adjacent to 4xMyc clones are significantly smaller than WT twin spots adjacent to WT clones. In the same work, they show complete elimination of WT clones generated in a tub-Myc background. Since then, multiple papers have shown these same results. It is well established then that increased cell proliferation transforms Myc clones into supercompetitors and that in the absence of cell competition, Myc-overexpressing discs produce instead wings larger than usual. 

      In (de la Cova et al, 2004) the authors already showed that blocking apoptosis with H99 hinders competition and causes wings with Myc clones to be larger than those where apoptosis wasn’t blocked. As these results are well established from prior literature, there is no need to repeat them here. 

      (2) This same comment about Fmi affecting clone growth should be considered in the scrib RNAi clones in Figure 3.

      In later stages, scrib RNAi clones in the eye are eliminated by WT cells. While scrib RNAi clones are not substantially smaller in third instar when competing against fmi cells (Fig 3M), by adulthood we see that WT clones lacking Fmi have failed to remove scrib clones, unlike WT clones that have completely eliminated the scrib RNAi clones by this time. We therefore disagree that the only effect of Fmi could be related to rate of cell division. 

      (3) I don't understand why the quantifications of clone areas in Figures 2D, 2H, 6D are log values. The simple ratio of GFP/RFP should be shown. Additionally, in some of the samples (e.g., fmiE59 >> Myc, only 5 discs and fmiE59 vs >Myc only 4 discs are quantified but other samples have more than 10 discs). I suggest that the authors increase the number of discs that they count in each genotype to at least 20 and then standardize this number.

      Log(ratio) values are easier to interpret than a linear scale. If represented linearly, 1 means equal ratios of A and B, while 2A/B is 2 and A/2B is 0.5. And the higher the ratio difference between A and B, the starker this effect becomes, making a linear scale deceiving to the eye, especially when decreased ratios are shown. Using log(ratios), a value of 0 means equal ratios, and increased and decreased ratios deviate equally from 0.

      Statistically, either analyzing a standardized number of discs for all conditions or a variable number not determined beforehand has no effect on the p-value, as long as the variable n number is not manipulated by p-hacking techniques, such as increasing the n of samples until a significant p-value has been obtained. While some of our groups have lower numbers, all statistical analyses were performed after all samples were collected. For all results obtained by cell counts, all samples had a minimum of 10 discs due to the inherent though modest variability of our automated cell counts, and we analyzed all the discs that we obtained from a given experiment, never “cherry-picking” examples. For the sake of transparency, all our graphs show individual values in addition to the distributions so that the reader knows the n values at a glance.

      (5) Figure 4 - shows examples of cell death. Cas3 is written on the figure but Dcp-1 is written in the results. Which antibody was used? The authors need to quantify these results. They also need to show that the death of cells is part of the phenotype, like an H99 deficiency, etc (see above).

      Thank you for flagging this error. We used cleaved Dcp-1 staining to detect cell death, not Cas3 (Drice in Drosophila). We updated all panels replacing Cas3 by Dcp-1. 

      As described above, cell death is a well established consequence of myc overexpression induced cell death and we feel there is no need to repeat that result. To what extent loss of Fmi induces excess cell death or reduces proliferation in “would-be” winners, and to what extent it reduces “would-be” winners’ ability to eliminate competitors are interesting mechanistic questions that are beyond the scope of the current manuscript.

      (6) It is well established that clones overexpressing Myc have increased cell death. The authors should consider this when interpreting their results.

      We are aware that Myc-overexpressing clones have increased cell death, but it has also been demonstrated that despite that fact, they behave as winners and eliminate WT neighboring cells. And as mentioned in comment (1), WT clones generated in a 3x and 4x Myc background are eliminated and removed from the tissue, and blocking cell death increases the size of WT “losers” clones adjacent to Myc overexpressing clones. 

      (7) A better characterization of discrete Fmi clones would also be helpful. I suggest inducing hs-flp clones in the eye or wing disc and then determining clone size vs twin spot size and also examining cell death etc. If such experiments have already been done and published, the authors should include a description of such work in the preprint.

      We have already analyzed the size of discrete Fmi clones and showed that they did not cause any competition, with fmi-null clones having the same size as WT clones in both eye and wing discs. We direct the reviewer’s attention to Figure 2 Supplement 1.

      (8) We need more information about the expression pattern of Fmi. Is it expressed in all cells in imaginal discs? Are there any patterns of expression during larval and pupal development? 

      Fmi is equally expressed by all cells in all imaginal discs in Drosophila larva and pupa. We include this information and the relevant reference (Brown et al, 2014) in the updated manuscript.

      (9) Overall, the paper is written for specialists who work in cell competition and is fairly difficult to follow, and I suggest re-writing the results to make it accessible to a broader audience.

      We have endeavored to both provide an accessible narrative and also describe in sufficient detail the data from multiple models of competition and complex genetic systems. We hope that most readers will be able, at a minimum, to follow our interpretations and the key takeaways, while those wishing to examine the nuts and bolts of the argument will find what they need presented as simply as possible.

      Reviewer 2:

      Summary: 

      In this manuscript, Bosch et al. reveal Flamingo (Fmi), a planar cell polarity (PCP) protein, is essential for maintaining 'winner' cells in cell competition, using Drosophila imaginal epithelia as a model. They argue that tumor growth induced by scrib-RNAi and RasV12 competition is slowed by Fmi depletion. This effect is unique to Fmi, not seen with other PCP proteins. Additional cell competition models are applied to further confirm Fmi's role in 'winner' cells. The authors also show that Fmi's role in cell competition is separate from its function in PCP formation.

      We would like to thank the reviewer for their thoughtful and positive review.

      Strengths:

      (1) The identification of Fmi as a potential regulator of cell competition under various conditions is interesting.

      (2) The authors demonstrate that the involvement of Fmi in cell competition is distinct from its role in planar cell polarity (PCP) development.

      Weaknesses:

      (1) The authors provide a superficial description of the related phenotypes, lacking a comprehensive mechanistic understanding. Induction of apoptosis and JNK activation are general outcomes, but it is important to determine how they are specifically induced in Fmi-depleted clones. The authors should take advantage of the power of fly genetics and conduct a series of genetic epistasis analyses.

      We appreciate that this manuscript does not address the mechanism by which Fmi participates in cell competition. Our intent here is to demonstrate that Fmi is a key contributor to competition. We indeed aim to delve into mechanism, are currently directing our efforts to exploring how Fmi regulates competition, but the size of the project and required experiments are outside of the scope of this manuscript. We feel that our current findings are sufficiently valuable to merit sharing while we continue to investigate the mechanism linking Fmi to competition. 

      (2) The depletion of Fmi may not have had a significant impact on cell competition; instead, it is more likely to have solely facilitated the induction of apoptosis.

      We respectfully disagree for several reasons. First, loss of Fmi is specific to winners; loss of Fmi has no effect on its own or in losers when confronting winners in competition. And in the Ras V12 tumor model, loss of Fmi did not perturb whole eye tumors – it only impaired tumor growth when tumors were confronted with competitors. We agree that induction of apoptosis is affected, but so too is proliferation, and only when in winners in competition.

      (3) To make a solid conclusion for Figure 1, the authors should investigate whether complete removal of Fmi by a mutant allele affects tumor growth induced by expressing RasV12 and scrib RNAi throughout the eye.

      We agree with the reviewer that this is a worthwhile experiment, given that RNAi has its limitations. However, as fmi is homozygous lethal at the embryo stage, one cannot create whole disc tumors mutant for fmi. As an approximation to this condition, we have introduced the GMR-Hid, cell-lethal combination to eliminate non-tumor tissue in the eye disc. Following elimination of non-tumor cells, there remains essentially a whole disc harboring fminull tumor. Indeed, this shows that whole fminull tumors overgrow similar to control tumors, confirming that the lack of Fmi only affects clonal tumors. We provide those results in the updated manuscript (Figure 1 Suppl 2 C-D).

      (4) The authors should test whether the expression level of Fmi (both mRNA and protein) changes during tumorigenesis and cell competition.

      This is an intriguing point that we considered worthwhile to examine. We performed immunostaining for Fmi in clones to determine whether its levels change during competition. Fmi is expressed ubiquitously at apical plasma membranes throughout the disc, and this was unchanged by competition, including inside >>Myc clones and at the clone boundary, where competition is actively happening. We provide these results as a new supplementary figure (Figure 5 Suppl 1) in the updated manuscript.

      Reviewer 3:

      Summary: 

      In this manuscript, Bosch and colleagues describe an unexpected function of Flamingo, a core component of the planar cell polarity pathway, in cell competition in the Drosophila wing and eye disc. While Flamingo depletion has no impact on tumour growth (upon induction of Ras and depletion of Scribble throughout the eye disc), and no impact when depleted in WT cells, it specifically tunes down winner clone expansion in various genetic contexts, including the overexpression of Myc, the combination of Scribble depletion with activation of Ras in clones or the early clonal depletion of Scribble in eye disc. Flamingo depletion reduces the proliferation rate and increases the rate of apoptosis in the winner clones, hence reducing their competitiveness up to forcing their full elimination (hence becoming now "loser"). This function of Flamingo in cell competition is specific to Flamingo as it cannot be recapitulated with other components of the PCP pathway, and does not rely on the interaction of Flamingo in trans, nor on the presence of its cadherin domain. Thus, this function is likely to rely on a non-canonical function of Flamingo which may rely on downstream GPCR signaling.

      This unexpected function of Flamingo is by itself very interesting. In the framework of cell competition, these results are also important as they describe, to my knowledge, one of the only genetic conditions that specifically affect the winner cells without any impact when depleted in the loser cells. Moreover, Flamingo does not just suppress the competitive advantage of winner clones, but even turns them into putative losers. This specificity, while not clearly understood at this stage, opens a lot of exciting mechanistic questions, but also a very interesting long-term avenue for therapeutic purposes as targeting Flamingo should then affect very specifically the putative winner/oncogenic clones without any impact in WT cells.

      The data and the demonstration are very clean and compelling, with all the appropriate controls, proper quantification, and backed-up by observations in various tissues and genetic backgrounds. I don't see any weakness in the demonstration and all the points raised and claimed by the authors are all very well substantiated by the data. As such, I don't have any suggestions to reinforce the demonstration.

      While not necessary for the demonstration, documenting the subcellular localisation and levels of Flamingo in these different competition scenarios may have been relevant and provided some hints on the putative mechanism (specifically by comparing its localisation in winner and loser cells). 

      Also, on a more interpretative note, the absence of the impact of Flamingo depletion on JNK activation does not exclude some interesting genetic interactions. JNK output can be very contextual (for instance depending on Hippo pathway status), and it would be interesting in the future to check if Flamingo depletion could somehow alter the effect of JNK in the winner cells and promote downstream activation of apoptosis (which might normally be suppressed). It would be interesting to check if Flamingo depletion could have an impact in other contexts involving JNK activation or upon mild activation of JNK in clones.

      We would like to thank the reviewer for their thorough and positive review.

      Strengths: 

      - A clean and compelling demonstration of the function of Flamingo in winner cells during cell competition.

      - One of the rare genetic conditions that affects very specifically winner cells without any impact on losers, and then can completely switch the outcome of competition (which opens an interesting therapeutic perspective in the long term)

      Weaknesses: 

      - The mechanistic understanding obviously remains quite limited at this stage especially since the signaling does not go through the PCP pathway.

      Reviewer 2 made the same comment in their weakness (1), and we refer to that response. In future work, we are excited to better understand the pathways linking Fmi and competition.

    1. Author response:

      Reviewer #2 (Public Review):

      M. El Amri et al., investigated the functions of Marcks and Marcks like 1 during spinal cord (SC) development and regeneration in Xenopus laevis. The authors rigorously performed loss of function with morpholino knock-down and CRISPR knock-out combining rescue experiments in developing spinal cord in embryo and regeneration in tadpole stage.

      For the assays in the developing spinal cord, a unilateral approach (knock-down/out only one side of the embryo) allowed the authors to assess the gene functions by direct comparing one-side (e.g. mutated SC) to the other (e.g. wild type SC on the other side). For the assays in regenerating SC, the authors microinject CRISPR reagents into 1-cell stage embryo. When the embryo (F0 crispants) grew up to tadpole (stage 50), the SC was transected. They then assessed neurite outgrowth and progenitor cell proliferation. The validation of the phenotypes was mostly based on the quantification of immunostaining images (neurite outgrowth: acetylated tubulin, neural progenitor: sox2, sox3, proliferation: EdU, PH3), that are simple but robust enough to support their conclusions. In both SC development and regeneration, the authors found that Marcks and Marcksl1 were necessary for neurite outgrowth and neural progenitor cell proliferation.

      The authors performed rescue experiments on morpholino knock-down and CRISPR knock-out conditions by Marcks and Marcksl1 mRNA injection for SC development and pharmacological treatments for SC development and regeneration. The unilateral mRNA injection rescued the loss-of-function phenotype in the developing SC. To explore the signalling role of these molecules, they rescued the loss-of-function animals by pharmacological reagents They used S1P: PLD activator, FIPI: PLD inhibitor, NMI: PIP2 synthesis activator and ISA-2011B: PIP2 synthesis inhibitor. The authors found the activator treatment rescued neurite outgrowth and progenitor cell proliferation in loss of function conditions. From these results, the authors proposed PIP2 and PLD are the mediators of Marcks and Marcksl1 for neurite outgrowth and progenitor cell proliferation during SC development and regeneration. The results of the rescue experiments are particularly important to assess gene functions in loss of function assays, therefore, the conclusions are solid. In addition, they performed gain-of-function assays by unilateral Marcks or Marcksl1 mRNA injection showing that the injected side of the SC had more neurite outgrowth and proliferative progenitors. The conclusions are consistent with the loss-of-function phenotypes and the rescue results. Importantly, the authors showed the linkage of the phenotype and functional recovery by behavioral testing, that clearly showed the crispants with SC injury swam less distance than wild types with SC injury at 10-day post surgery.

      Prior to the functional assays, the authors analyzed the expression pattern of the genes by in situ hybridization and immunostaining in developing embryo and regenerating SC. They confirmed that the amount of protein expression was significantly reduced in the loss of function samples by immunostaining with the specific antibodies that they made for Marcks and Marcksl1. Although the expression patterns are mostly known in previous works during embryo genesis, the data provided appropriate information to readers about the expression and showed efficiency of the knock-out as well.

      MARCKS family genes have been known to be expressed in the nervous system. However, few studies focus on the function in nerves. This research introduced these genes as new players during SC development and regeneration. These findings could attract broader interests from the people in nervous disease model and medical field. Although it is a typical requirement for loss of function assays in Xenopus laevis, I believe that the efficient knock-out for four genes by CRISPR/Cas9 was derived from their dedication of designing, testing and validation of the gRNAs and is exemplary.

      Weaknesses,

      (1) Why did the authors choose Marcks and Marcksl1? The authors mentioned that these genes were identified with a recent proteomic analysis of comparing SC regenerative tadpole and non-regenerative froglet (Line (L) 54-57). However, although it seems the proteomic analysis was their own dataset, the authors did not mention any details to select promising genes for the functional assays (this article). In the proteomic analysis, there must be other candidate genes that might be more likely factors related to SC development and regeneration based on previous studies, but it was unclear what the criteria to select Marcks and Marcksl1 was.

      To highlight the rationale for selecting these proteins, we reworded the sentence as follows: “A recent proteomic screen … after SCI identified a number of proteins that are highly upregulated at the tadpole stage but downregulated in froglets (Kshirsagar, 2020). These proteins included Marcks and Marcksl1, which had previously been implicated in the regeneration of other tissues (El Amri et al., 2018) suggesting a potential role for these proteins also in spinal cord regeneration.”

      (2) Gene knock-out experiments with F0 crispants,

      The authors described that they designed and tested 18 sgRNAs to find the most efficient and consistent gRNA (L191-195). However, it cannot guarantee the same phenotypes practically, due to, for example, different injection timing, different strains of Xenopus laevis, etc. Although the authors mentioned the concerns of mosaicism by themselves (L180-181, L289-292) and immunostaining results nicely showed uniformly reduced Marcks and Marcksl1 expression in the crispants, they did not refer to this issue explicitly.

      To address this issue, we state explicitly in line 208-212: “We also confirmed by immunohistochemistry that co-injection of marcks.L/S and marcksl1.L/S sgRNA, which is predicted to edit all four homeologs (henceforth denoted as 4M CRISPR) drastically reduced immunostaining for Marcks and Marcksl1 protein on the injected side (Fig. S6 B-G), indicating that protein levels are reduced in gene-edited embryos.”

      (3) Limitations of pharmacological compound rescue

      In the methods part, the authors describe that they performed titration experiments for the drugs (L702-704), that is a minimal requirement for this type of assay. However, it is known that a well characterized drug is applied, if it is used in different concentrations, the drug could target different molecules (Gujral TS et al., 2014 PNAS). Therefore, it is difficult to eliminate possibilities of side effects and off targets by testing only a few compounds.

      As explained in the responses to reviewer 1, we have completely rewritten and toned down our presentation of the pharmacological result and explicitly mention in our discussion now the possibility of side effects.

    1. Author response:

      We agree with reviewer #1 to remove the mGluR6b data. It is indeed a weakness and is too preliminary. We will gladly remove it from the revised version.

      We will address the issue of the bulk responses (depicted in Figures 5 and 6) by showing the significance data, arguing that although we cannot prove that prey-detection is increased for lower intensities, the bulk effect is significant, so prey detection is effectively stronger.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Previous work demonstrated a strong bias in the percept of an ambiguous Shepard tone as either ascending or descending in pitch, depending on the preceding contextual stimulus. The authors recorded human MEG and ferret A1 single-unit activity during presentation of stimuli identical to those used in the behavioral studies. They used multiple neural decoding methods to test if context-dependent neural responses to ambiguous stimulus replicated the behavioral results. Strikingly, a decoder trained to report stimulus pitch produced biases opposite to the perceptual reports. These biases could be explained robustly by a feed-forward adaptation model. Instead, a decoder that took into account direction selectivity of neurons in the population was able to replicate the change in perceptual bias.

      Strengths:

      This study explores an interesting and important link between neural activity and sensory percepts, and it demonstrates convincingly that traditional neural decoding models cannot explain percepts. Experimental design and data collection appear to have been executed carefully. Subsequent analysis and modeling appear rigorous. The conclusion that traditional decoding models cannot explain the contextual effects on percepts is quite strong.

      Weaknesses:

      Beyond the very convincing negative results, it is less clear exactly what the conclusion is or what readers should take away from this study. The presentation of the alternative, "direction aware" models is unclear, making it difficult to determine if they are presented as realistic possibilities or simply novel concepts. Does this study make predictions about how information from auditory cortex must be read out by downstream areas? There are several places where the thinking of the authors should be clarified, in particular, around how this idea of specialized readout of direction-selective neurons should be integrated with a broader understanding of auditory cortex.

      While we have not used the term "direction aware", we think the reviewer refers generally to the capability of our model to use a cell's direction selectivity in the decoding. In accordance with the reviewer's interpretation, we did indeed mean that the decoder assumes that a neuron does not only have a preferred frequency, but also a preferred direction of change in frequency (ascending/descending), which is what we use to demonstrate that the decoding in this way aligns with the human percept. We have adapted the text in several places to clarify this, in particular expanding the description in the Methods substantially.

      Reviewer #2 (Public Review):

      The authors aim to better understand the neural responses to Shepard tones in auditory cortex. This is an interesting question as Shepard tones can evoke an ambiguous pitch that is manipulated by a proceeding adapting stimulus, therefore it nicely disentangles pitch perception from simple stimulus acoustics.

      The authors use a combination of computational modelling, ferret A1 recordings of single neurons, and human EEG measurements.

      Their results provide new insights into neural correlates of these stimuli. However, the manuscript submitted is poorly organized, to the point where it is near impossible to review. We have provided Major Concerns below. We will only be able to understand and critique the manuscript fully after these issues have been addressed to improve the readability of the manuscript. Therefore, we have not yet reviewed the Discussion section.

      Major concerns

      Organization/presentation

      The manuscript is disorganized and therefore difficult to follow. The biggest issue is that in many figures, the figure subpanels often do not correspond to the legend, the main body, or both. Subpanels described in the text are missing in several cases.

      We have gone linearly through the text and checked that all figure subpanels are referred to in the text and the legend. As far as we can tell, this was already the case for all panels, with the exception of two subpanels of Fig. 5.

      Many figure axes are unlabelled.

      We have carefully checked the axes of all panels and all but two (Fig. 5D) were labeled. As is customary, certain panels inherit the axis label from a neighboring panel, if the label is the same, e.g. subpanels in Fig. 6F or Fig. 5E, which helps to declutter the figure. We hope that with this clarification, the reviewer can understand the labels of each panel.

      There is an inconsistent style of in-text citation between figures and the main text. The manuscript contains typos and grammatical errors. My suggestions for edits below therefore should not be taken as an exhaustive list. I ask the authors to consider the following only a "first pass" review, and I will hopefully be able to think more deeply about the science in the second round of revisions after the manuscript is better organized.

      While we are puzzled by the severity of issues that R2 indicates (see above, and R3 qualifies it as "well written", and R1 does not comment on the writing negatively), we have carefully gone through all specific issues mentioned by R2 and the other reviewers. We hope that the revised version of the paper with all corrections and clarifications made will resolve any remaining issues.

      Frequency and pitch

      The terms "frequency" and "pitch" seem to be used interchangeably at times, which can lead to major misconceptions in a manuscript on Shepard tones. It is possible that the authors confuse these concepts themselves at times (e.g. Fig 5), although this would be surprising given their expertise in this field. Please check through every use of "frequency" and "pitch" in this manuscript and make sure you are using the right term in the right place. In many places, "frequency" should actually be "fundamental frequency" to avoid misunderstanding.

      Thanks for pointing this out. We have checked every occurrence and modified where necessary.

      Insufficient detail or lack of clarity in descriptions

      There seems to be insufficient information provided to evaluate parts of these analysis, most critically the final pitch-direction decoder (Fig 6), which is a major finding. Please clarify.

      Thanks for pointing this out. We have extended the description of the pitch-direction decoder and highlighted its role for interpreting the results.

      Reviewer #3 (Public Review):

      Summary:

      This is an elegant study investigating possible mechanisms underlying the hysteresis effect in the perception of perceptually ambiguous Shepard tones. The authors make a fairly convincing case that the adaptation of pitch direction sensitive cells in auditory cortex is likely responsible for this phenomenon.

      Strengths:

      The manuscript is overall well written. My only slight criticism is that, in places, particularly for non-expert readers, it might be helpful to work a little bit more methods detail into the results section, so readers don't have to work quite so hard jumping from results to methods and back.

      Following this excellent suggestion, we have added more brief method sketches to the Results section, hopefully addressing this concern.

      The methods seem sound and the conclusions warranted and carefully stated. Overall I would rate the quality of this study as very high, and I do not have any major issues to raise.

      Thanks for your encouraging evaluation of the work.

      Weaknesses:

      I think this study is about as good as it can be with the current state of the art. Generally speaking, one has to bear in mind that this is an observational, rather than an interventional study, and therefore only able to identify plausible candidate mechanisms rather than making definitive identifications. However, the study nevertheless represents a significant advance over the current state of knowledge, and about as good as it can be with the techniques that are currently widely available.

      Thanks for your encouraging evaluation of our work. The suggestion of an interventional study has also been on our minds, however, this appears rather difficult, as it would require a specific subset of cells to be inhibited. The most suitable approach would likely be 2p imaging with holographic inhibition of a subset of cells (using ArchT for example), that has a preference for one direction of pitch change, which should then bias the percept/behavior in the opposite direction.

      Reviewer #1 (Recommendations For The Authors):

      MAJOR CONCERNS

      (1) What is the timescale used to compute direction selectivity in neural tuning? How does it compare to the timing of the Shepard tones? The basic idea of up versus down pitch is clear, the intuition for the role of direction tuning and its relation to stimulus dynamics could be laid out more clearly. Are the authors proposing that there are two "special" populations of A1 neurons that are treated differently to produce the biased percept? Or is there something specific about the dynamics of the Shepard stimuli and how direction selective neurons respond to them specifically? It would help if the authors could clarify if this result links to broader concepts of dynamic pitch coding in general or if the example reported here is specific (or idiosyncratic) to Shepard tones.

      We propose that the findings here are not specific to Shepard tones. To the contrary, only basic properties of auditory cortex neurons, i.e. frequency preference, frequency-direction (i.e. ascending or descending) preference, and local adaptation in the tuning curve, suffice. Each of these properties have been demonstrated many times before and we only verified this in the lead-up to the results in Fig. 6. While the same effects should be observable with pure tones, the lack of ambiguity in the perception of direction of a frequency step for pure tone pairs, would make them less noticeable here. Regarding the time-scale of the directional selectivity, we relied on the sequencing of tones in our paradigm, i.e. 150 ms spacing. The SSTRFs were discretized at 50 ms, and include only the bins during the stimulus, not during the pause. The directional tuning, i.e. differences in the SSTRF above and below the preferred pitchclass for stimuli before the last stimulus, typically extended only one stimulus back in time. We have clarified this in more detail now, in particular in the added Methods section on the directional decoder.

      (2) (p. 9) "weighted by each cell's directionality index ... (see Methods for details)" The direction-selective decoder is interesting and appears critical to the study. However, the details of its implementation are difficult to locate. Maybe Fig. 6A contains the key concepts? It would help greatly if the authors could describe it in parallel with the other decoders in the Methods.

      We have expanded the description of the decoder in the Methods as the reviewer suggests.

      LESSER CONCERNS

      p. 1. (L 24) "distances between the pitch representations...." It's not obvious what "distances" means without reading the main paper. Can some other term or extra context be provided?

      We have added a brief description here.

      p. 2. (L 26) "Shepard tones" Can the authors provide a citation when they first introduce this class of stimuli?

      Citation has been added.

      p. 3 (L 4) "direction selective cells" Please define or provide context for what has a direction. Selective to pitch changes in time?

      Yes, selective to pitch changes in time is what is meant. We have further clarified this in the text.

      p. 4 (L 9-19). This paragraph seems like it belongs in the Introduction?

      Given the concerns raised by R2 about the organization of the manuscript we prefer to keep this 'road-map' in the manuscript, as a guidance for the reader.

      p. 4 (L 32) "majority of cells" One might imagine that the overlap of the bias band and the frequency tuning curve of individual neurons might vary substantially. Was there some criterion about the degree of overlap for including single units in the analysis? Does overlap matter?

      We are not certain which analysis the reviewer is referring to. Generally, cells were not excluded based on their overlap between a particular Bias band and their (Shepard) tuning curve. There are several reasons for this: The bias was located in 4 different, overlapping Shepard tone regions, and all sounds were Shepard tones. Therefore, all cells overlapped with their (Shepard) tuning curve with one or multiple of the Biases. For decoding analysis, all cells were included as both a response and lack of a response is contributing to the decoding. If the reviewer is referring only to the analysis of whether a cell adapts, then the same argument applies as above, i.e. this was an average over all Bias sequences, and therefore every responding cell was driven to respond by the Bias, and therefore it was possible to also assess whether it adapted its response for different positions inside the Bias. We acknowledge that the limited randomness of the Bias sequences in combination with the specific tuning of the cells could in a few cases create response patterns over time that are not indicative of the actual behavior for repeated stimulation, however, since the results are rather clear with 91% of cells adapting, we do not think this would significantly change the conclusions.

      p. 5 (L 17) "desynchronization ... behaving conditions" The logic here is not clear. Is less desynchronization expected during behavior? Typically, increased attention is associated with greater desynchronization.

      Yes, we reformulated the sentence to: While this difference could be partly explained by desynchronization which is typically associated with active behavior or attention [30], general response adaptation to repeated stimuli is also typical in behaving humans [31].

      p. 7 (L 5) "separation" is this a separation in time?

      Yes, added.

      p. 7 (L 33) "local adaptation" The idea of feedforward adaptation biasing encoding has been proposed before, and it might be worth citing previous work. This includes work from Nelken specifically related to SSA. Also, this model seems similar to the one described in Lopez Espejo et al (PLoS CB 2019).

      Thanks for pointing this out. We think, however, that neither of these publications suggested this very narrow way of biasing, which we consider biologically implausible. We have therefore not added either of these citations.

      p. 11 (L. 17) The cartoon in Fig. 6G may provide some intuition, but it is quite difficult to interpret. Is there a way to indicate which neuron "votes" for which percept?

      This is an excellent idea, and we have added now the purported perceptual relation of each cell in the diagram.

      p. 12 (L. 8). "classically assumed" This statement could benefit from a citation. Or maybe "classically" is not the right word?

      We have changed 'classically' to 'typically', and now cite classical works from Deutsch and Repp. We think this description makes sense, as the whole concept of bistable percepts has been interpreted as being equidistant (in added or subtracted semitone steps) from the first tone, see e.g. Repp 1997, Fig.2.

      p. 12 (L. 12) "...previous studies" of Shepard tone percepts? Of physiology?

      We have modified it to 'Relation to previous studies of Shepard tone percepts and their underlying physiology", since this section deals with both.

      p. 12 (L. 25) "compatible with cellular mechanisms..." This paragraph seems key to the study and to Major Concern 1, above. What are the dynamics of the task stimuli? How do they compare with the dynamics of neural FM tuning and previously reported studies of bias? And can the authors be more explicit in their interpretation - should direction selective neurons respond preferentially to the Shepard tone stimuli themselves? And/or is there a conceptual framework where the same neurons inform downstream percepts of both FM sweeps and both normal (unbiased) and biased Shepard tones?

      The reviewer raises a number of different questions, which we address below:

      - Dynamics of the task stimuli in relation to previously reported cellular biasing: The timescales tested in the studies mentioned are similar to what we used in our bias, e.g. Ye et al 2010 used FM sweeps that lasted for up to 200ms, which is quite comparable to our SOA of 150ms.

      - Preferred responses to Shepard tones: no, we do not think that there should be preferred responses to Shepard tones, but rather that responses to Shepard tones can be thought of as the combined responses to the constituent tones.

      - Conceptual framework where the same neurons inform about FM sweeps and both normal (unbiased) and biased Shepard tones: Our perspective on this question is as follows: To our knowledge, the classical approach to population decoding in the auditory system, i.e. weighted based on preferred frequency, has not been directly demonstrated to be read out inside the brain, and certainly not demonstrated to be read out in only this way in all areas of the brain that receive input from the auditory cortex. Rather it has achieved its credibility by being linked directly with animal performance or match with the presented stimuli. However, these approaches were usually geared towards a representation that can be estimated based on constituent frequencies. Additional response properties of neurons, such as directional selectivity have been documented and analyzed before, however, not been used for explaining the percept. We agree that our use of this cellular response preference in the decoding implicitly assumes that the brain could utilize this as well, however, this seems just as likely or unlikely as the use of the preferred frequency of a neuron. Therefore we do not think that this decoding is any more speculative than the classical decoding. In both cases, subsequent neurons would have to implicitly 'know' the preference of the input neuron, and weigh its input correspondingly.

      We have added all the above considerations to the discussion in an abbreviated form.

      p. 15 (L. 15). Is there a citation for the drive system?

      There is no publication, but an old repository, where the files are available, which we cite now: https://code.google.com/archive/p/edds-array-drive/

      p. 16 (L. 24) "position in an octave" It is implied but not explicitly stated that the Shepard tones don't contain the fundamental frequency. Can the authors clarify the relationship between the neural tuning band and the bands of the stimulus. Did a single stimulus band typically fall in a neuron's frequency tuning curve? If not 1, how many?

      Yes, it is correct that the concept of fundamental frequency does not cleanly apply to Shepard tones, because it is composed of octave spaced pure tones, but the lowest tone is placed outside the hearing range of the animal and amplitude envelope (across frequencies). Therefore one or more constituent tones of the Shepard tone can fall into the tuning curve of a neuron and contribute to driving the neuron (or inhibiting it, if they fall within an inhibitory region of the tuning curve). The number of constituent tones that fall within the tuning curve depends on the tuning width of the neurons. The distribution of tuning widths to Shepard tones is shown in Fig. S1E, which indicated that a lot of neurons had rather narrow tuning (close to the center), but many were also tuned widely, indicated that they would be stimulated by multiple constituent tones of the Shepard tone. As the tuning bandwidth (Q30: 30dB above threshold) of most cortical neurons in the ferret auditory cortex (see e.g. Bizley et al. Cerebral Cortex, 2005, Fig.12) is below 1, this means that typically not more than 1 tone fell into the tuning curve of a neuron. However, we also observed multimodal tuning-curves w.r.t. to Shepard tones, which suggests that some neurons were stimulated by more than 2 or more constituent tones (again consistent with the existence of more broadly tuned neurons (see same citation). We have added this information partly to the manuscript in the caption of Fig. S1E.

      p. 17 (L. 32). "Fig 4" Correct figure ref? This figure appears to be a schematic rather than one displaying data.

      Thanks for pointing this out, changed to Fig. 5.

      p. 18 (L. 25). "assign a pitchclass" Can the authors refer to a figure illustrating this process?

      Added.

      p. 19 (L. 17). Is mu the correct symbol?

      Thanks. We changed it to phi_i, as in the formula above.

      p. 19 (L 19). "convolution" in time? Frequency?

      Thanks for pointing this out, the term convolution was incorrect in this context. We have replaced it by "weighted average" and also adapted and simplified the formula.

      p. 19 (L 25) "SSTRF" this term is introduced before it is defined. Also it appears that "SSTRF" and "STRF" are sometimes interchanged.

      Apologies, we have added the definition, and also checked its usage in each location.

      p. 23 (Fig 2) There is a mismatch between panel labels in the figure and in the legend. Bottom right panel (B3), what does time refer to here?

      Thanks for pointing these out, both fixed.

      p. 24 (L 23) "shifts them away" away from what?

      We have expanded the sentence to: "After the bias, the decoded pitchclass is shifted from their actual pitchclass away from the biased pitchclass range ... "

      p. 25 (L 7) "individual properties" properties of individual subjects?

      Thanks for pointing this out, the corresponding sentence has been clarified and citations added.

      p. 26 (L 20) What is plotted in panel D? The average for all cells? What is n?

      Yes, this is an average over cells, the number of cells has now been added to each panel.

      p. 28 (L 3) How to apply the terms "right" "right" "middle" to the panel is not clear. Generally, this figure is quite dense and difficult to interpret.

      We have changed the caption of Panel A and replaced the location terms with the symbols, which helps to directly relate them to the figure. We have considered different approaches of adding or removing content from the figure to help make it less dense, but that all did not seem to help. For lack of better options we have left it in its current form.

      MINOR/TYPOS

      p. 3 (L 1) "Stimulus Specific Adaptation" Capitalization seems unnecessary

      Changed.

      p. 4 (L 14) "Siple"

      Corrected.

      p. 9 (L 10) "an quantitatively"

      Corrected

      p. 9 (L 20) "directional ... direction ... directly ... directional" This is a bit confusing as directseems to mean several different things in its different usages.

      We have gone through these sentences, and we think the terms are now more clearly used, especially since the term 'direction' occurs in several different forms, as it relates to different aspects (cells/percept/hypothesis). Unfortunately, some repetition is necessary to maintain clarity.

      Reviewer #2 (Recommendations For The Authors):

      Detailed critique

      Stimuli

      It would be very useful if the authors could provide demos of their stimuli on a website. Many readers will not be familiar with Shepard tones and the perceptual result of the acoustical descriptions are not intuitive. I ended up coding the stimuli myself to get some intuition for them.

      We have created some sample tones and sequences and uploaded them with the revision as supplementary documents.

      Abstract

      P1 L27 'pitch and...selective cells' - The authors haven't provided sufficient controls to demonstrate that these are "pitch cells" or "selective" to pitch direction. They have only shown that they are sensitive to these properties in their stimuli. Controls would need to be included to ensure that the cells aren't simply responding to one frequency component in the complex sound, for example. This is not really critical to the overall findings, but the claim about pitch "selectivity" is not accurate.

      Fair point. We have removed the word 'selective' in both occurrences.

      Introduction

      P2 L14-17: I do not follow the phonetic example provided. The authors state that the second syllable of /alga/ and /arda/ are physically identical, but how is this possible that ga = da? The acoustics are clearly different. More explanation is needed, or a correction.

      Apologies for the slightly misleading description, it has now been corrected to be in line with the original reference.

      P2,L26-27: Should the two uses of "frequency" be "F0" and "pitch" here? The tones are not separated in frequency by half and octave, but "separated in [F0]" by half an octave, correct? Their frequency ranges are largely overlapping. And the second 'frequency', which refers to the percept, should presumably be "pitch".

      Indeed. This is now corrected.

      P3 L2-6: Unclear at this point in the manuscript what is the difference between the 3 percepts mentioned: perceived pitch-change direction, Shepard tone pitches, and "their respective differences". (It becomes clear later, but clarification is needed here).

      We have tried a few reformulations, however, it tends to overload the introduction with details. We believe it is preferable to present the gist of the results here, and present the complete details later in the MS.

      P3 L6-7 What does it mean that the MEG and single unit results "align in direction and dynamics"? These are very different signals, so clarification is needed.

      We have phrased the corresponding sentence more clearly.

      Results

      Throughout: Choose one of 'pitch class', 'pitchclass', or 'pitch-class' and use it consistently.

      Done.

      P4L12 - would be helpful at this point to define 'repulsive effect'

      We have added another sentence to clarify this term.

      P4, L14 "simple"

      Done

      P4, L12 - not clear here what "repulsive influence" means

      See above.

      P4, L17 - alternative to which explanation? Please clarify. In general, this paragraph is difficult to interpret because we do not yet have the details needed to understand the terms used and the results described. In my opinion, it would be better to omit this summary of the results at the very beginning, and instead reveal the findings as they come, when they can be fully explained to the Reader.

      We agree, but we also believe that a rather general description here is useful for providing a roadmap to the results. However, we have added a half-sentence to clarify what is meant by alternative.

      P4 L30 - text says that cells adapt in their onset, sustained and offset responses, but only data for onset responses are shown (I think - clarification needed for fig 2A2). Supp figure shows only 1 example cell of sustained and offset, and in fact there is no effect of adaptation in the sustained response shown there.

      Regarding the effect of adaptation and whether it can be discerned from the supplementary figure: the shown responses are for 10 repetitions of one particular Bias sequence. Since the response of the cell will depend on its tuning and the specific sequence of the Shepard tones in this Bias, it is not possible to assess adaptation for a given cell. We assess the level of adaptation, by averaging all biases (similar to what is shown in Fig. 2A2) per cell, and then fit an exponential to it, separately by response type. The step direction of the exponential, relative to the spontaneous rate is then used to assess the kind of adaptation. The vast majority of cells show adaptation. We have added this information to the Methods of the manuscript.

      P4, L32 - please state the statistical test and criterion (alpha) used to determine that 91% of cells decreased their responses throughout the Bias sequence. Was this specifically for onset responses?

      Thanks for pointing this out, test and p-value added. Adaptation was observed for onset, sustained and offset responses, in all cases with the vast majority showing an adapting behavior, although the onset responses were adapting the most.

      P4 L36 - "response strength is reduced locally". What does "locally" mean here? Nearby frequencies?

      We have added a sentence here to clarify this question.

      Figure 1 - this appears to be the wrong version of the figure, as it doesn't match the caption or results text. It's not possible to assess this figure until these things are fixed. Figure 1A schematic of definition of f(diff) does not correspond to legend definition.

      As far as we can tell, it is all correct, only the resolution of the figure appears to be rather low. This has been improved now.

      Fig 2 A2 - is this also onset responses only?

      Yes, added to the caption.

      Fig 2 A3 - add y-axis label. The authors are comparing a very wide octave band (5.5 octaves) to a much narrower band (0.5 octaves). Could this matter? Is there something special about the cut-off of 2.5 octaves in the 2 bands, or was this an arbitrary choice?

      Interesting question.... essentially our stimulus design left us only with this choice, i.e. comparing the internal region of the bias with the boundary region of the bias, i.e. the test tones. The internal region just corresponds to the bias, which is 5 st wide, and therefore the range is here given as 2.5 st relative to its center, while the test tones are at the boundary, as they are 3 st from the center. The axis for the bias was mislabelled, and has now been corrected. The y-axis label is matched with the panel to the left, but has now been added to avoid any confusion.

      Fig 2A4 - does not refer to ferret single unit data, as stated in the text (p5L8). Nor does supp Fig2, as stated. Also, the figure caption does not match the figure.

      Apologies, this was an error in the code that led to this mislabelling. We have corrected the labels, which also added back the recovery from the Bias sequence in the new Panel A4.

      P5 l9 - Figure 3 is not understandable at this point in the text, and should not be referred to here. There is a lot going on in Fig 3, and it isn't clear what you are referring to.

      Removed.

      P5 L12 - by Fig 2 B1, I assume you mean A4? Also, F2B1 shows only 1 subject, not 2.

      Yes, mislabeled by mistake, and corrected now.

      Fig2B2 -What is the y-axis?

      Same as in the panel to its left, added for clarity.

      Stimuli: why are tones presented at a faster rate to ferrets than to humans?

      The main reason is that the response analysis in MEG requires more spacing in time than the neuronal analysis in the ferret brain.

      P5 L6 - there is no Fig 5 D2? I don't think it is a good idea to get the reader to skip so far ahead in the figures at this stage anyway, even if such a figure existed. It is confusing to jump around the manuscript

      Changed to 'see below'

      P5 L8 - There is no Figure 2A4, so I don't know whether this time constant is accurate.

      This was in reference to a panel that had been removed before, but we have added it back now.

      P5 L16: "in humans appears to be more substantial (40%) than for the average single units under awake conditions". One cannot directly compare magnitude of effects in MEG and single unit signals in this way and assume it is due to behavioural state. You are comparing different measures of neural activity, averaged over vastly different numbers of numbers, and recorded from different species listening to different stimuli (presentation rates).

      Yes, that's why the next sentence is: "However, comparisons between the level of adaptation in MEG and single neuron firing rates may be misleading, due to the differences in the signal measured and subsequent processing.", and all statements in the preceding sentences are phrased as 'appears' and 'may'. We think we have formulated this comparison with an appropriate level of uncertainty. Further, the main message here is that adaptation is taking place in both active and passive conditions.

      P5 L25 -I do not see any evidence regarding tuning widths in Fig s2, as stated in the text.

      Corrected to Fig. S1.

      P5 l26 - Do not skip ahead to Fig 5 here. We aren't ready to process that yet.

      OK, reference removed.

      P5 l27 - Do you mean because it could be tuning to pitch chroma, not height?

      Yes, that is a possible interpretation, although it could also arise from a combination of excitatory and inhibitory contributions across multiple octaves.

      P5 l33 - remove speculation about active vs passive for reasons given above.

      Removed.

      P6L2-6 'In the present...5 semitone step' - This is an incorrect interpretation of the minimal distance hypothesis in the context of the Shepard tone ambiguity. The percept is ambiguous because the 'true' F0 of the Shepard tones are imperceptibly low. Each constituent frequency of a single tone can therefore be perceived either as a harmonic of some lower fundamental frequency or as an independent tone. The dominant pitch of the second tone in the tritone pair may therefore be biased to be perceived at a lower constituent frequency (when the bias sequence is low) or at a higher constituent frequency (when the bias sequence is high). The text states that the minimal distance hypothesis would predict that an up-bias would make a tritone into a perfect fourth (5 semitones). This is incorrect. The MDH would predict that an up-bias would reduce the distance between the 1st tone in the ambiguous pair and the upper constituent frequency of the 2nd tone in the pair, hence making the upper constituent frequency the dominant pitch percept of the 2nd tone, causing an ascending percept.

      The reviewer here refers to a “minimal distance hypothesis”, which without a literature reference,is hard for us to fully interpret. However, some responses are given below:

      - "The percept is ambiguous because the 'true' F0 of the Shepard tones are imperceptibly low." This statement appears to be based on some misconception: due to the octave spacing (rather than multiple/harmonics of a lowest frequency), the Shepard tones cannot be interpreted as usual harmonic tones would be. It is correct that the lowest tone in a Shepard tone is not audible, due to the envelope and the fact that it could in principle be arbitrarily small... hence, speaking about an F0 is really not well-defined in the case of a Shepard tone. The closest one could get to it would be to refer to the Shepard tone that is both in the audible range and in the non-zero amplitude envelope. But again, since the envelope is fading out the highest and lowest constituent tones, it is not as easy to refer to the lowest one as F0 (as it might be much quieter than the next higher constituent.

      - "The dominant pitch of the second tone in the tritone pair may therefore be biased to be perceived at a lower constituent frequency (when the bias sequence is low) or at a higher constituent frequency (when the bias sequence is high)." This may relate to some known psychophysics, but we are unable to interpret it with certainty.

      - "The text states that the minimal distance hypothesis would predict that an up-bias would make a tritone into a perfect fourth (5 semitones). This is incorrect." We are unsure how the reviewer reaches this conclusion.

      - "The MDH would predict that an up-bias would reduce the distance between the 1st tone in the ambiguous pair and the upper constituent frequency of the 2nd tone in the pair, hence making the upper constituent frequency the dominant pitch percept of the 2nd tone, causing an ascending percept." Again, in the absence of a reference to the MDH, we are unsure of the implied rationale. We agree that this is a possible interpretation of distance, however, we believe that our interpretation of distance (i.e. distances between constituent tones) is also a possible interpretation.

      Fig 4: Given that it comes before Figure 3 in the results text, these should be switched in order in the paper.

      Switched.

      PCA decoder: The methods (p18) state that the PCA uses the first 3 dimensions, and that pitch classes are calculated from the closest 4 stimuli. The results (P6), however, state that the first 2 principal components are used, and classes are computed from the average of 10 adjacent points. Which is correct, or am I missing something?

      Thanks for pointing this out, we have made this more concrete in the Methods to: "The data were projected to the first three dimensions, which represented the pitch class as well as the position in the sequence of stimuli (see Fig. 43A for a schematic). As the position in the Bias sequence was not relevant for the subsequent pitch class decoding, we only focussed on the two dimensions that spanned the pitch circle." Regarding the number of stimuli that were averaged: this might be a slight misunderstanding: Each Shepard tone was decoded/projected without averaging. However, to then assign an estimated pitch class, we first had to establish an axis (here going around the circle), where each position along the axis was associated with a pitch class. This was done by stepping in 0.5 semitone steps, and finding the location in decoded space that corresponded to the median of the Shepard tones within +/- 0.25st. To increase the resolution, this circular 'axis' of 24 points was then linearly interpolated to a resolution of 0.05st. We have updated the text in the Methods accordingly. The mentioning of 10 points for averaging in the Results was correct, as there were 240 tones in all bias stimuli, and 24 bins in the pitch circle. The mentioning of an average over 4 tones in the Methods was a typo.

      Fig 3A: axes of pink plane should be PC not PCA

      Done.

      Fig 3B: the circularity in the distribution of these points is indeed interesting! But what do the authors make of the gap in the circle between semitones 6-7? Is this showing an inherent bias in the way the ambiguous tone is represented?

      While we cannot be certain, we think that this represents an inhomogeneous sampling from the overall set of neural tuning preferences, and that if we had recorded more/all neurons, the circle would be complete and uniformly sampled (which it already nearly is, see Fig.4C, which used to be Fig. 3C).

      Fig 3B (lesser note): It'd be preferable to replace the tint (bright vs. dark) differentiation of the triangles to be filled vs. unfilled because such a subtle change in tint is not easily differentiable from a change in hue (indicating a different variable in this plot) with this particular colour palette

      We have experimented with this suggestion, and it didn't seem to improve the clarity. However, we have changed the outline of the test-pair triangles to white, which now visually separates them better.

      P6 l32 - Please indicate if cross-validation was used in this decoder, and if so, what sort. Ideally, the authors would test on a held-out data set, or at least take a leave-one-out approach. Otherwise, the classifier may be overfit to the data, and overfitting would explain the exceptional performance (r=.995) of the classifier.

      Cross-validation was not used, as the purpose of the decoder is here to create a standard against which to compare the biased responses in the ambiguous pair, which were not used for training of the decoder. We agree that if we instead used a cross-validated decoder (which would only apply to the local average to establish the pitch class circle) the correlation would be somewhat lower, however, this is less relevant for the main question, i.e. the influence of the Bias sequence on the neural representation of the ambiguous pair. We have added this information to the corresponding section.

      Fig 3D: I understood that these pitch classifications shown by the triangles were carried out on the final ambiguous pair of stimuli. I thought these were always presented at the edges of the range of other stimuli, so I do not follow how they have so many different pitchclass values on the x-axis here.

      There were 4 Biases, centered at 0,3,6 or 9 semitones, and covering [-2.5,2.5]st relative to this center. Therefore the edges of the bias ranges (3st away from their centers) happen to be the same as the centers, e.g. for the Bias centered at 3, the ambiguous pair would be a 0-6 or 6-0 step. Therefore there are 4 locations for the ambiguous tones on the x-axis of Fig. 4D (previously 3D).

      Figure 4: This demonstration of the ambiguity of Shepard pairs may be misleading. The actual musical interval is never ambiguous, as this figure suggests. Only the ascending vs descending percept is ambiguous. Therefore the predictions of the ferret A1 decoding (Fig 3D) and the model in Fig 5 are inconsistent with perception in two ways. One (which the authors mention) is the direction of the bias shift (up vs down). Another (not mentioned here) is that one never experiences a shift in the shepard tone at a fraction of a semitone - the musical note stays the same, and changes only in pitch height, not pitch chroma.

      We are unsure of the reviewer’s direction with this question. In particular the second point is not clear to us: "...one (who?) never (in this experiment? in real life?) experiences a bias shift in the Shepard tone at a fraction of a semitone" (why is this relevant in the current experiment?). Pitch chrome would actually be a possible replacement for pitch class, but somehow, the previous Shepard tone literature has referred to it as pitch class.

      P7 l12 - omit one 'consequently'

      Changed to 'Therefore'.

      P7 l24 - I encourage the authors to not use "local" and "global" without making it clear what space they refer to. One tends to automatically think of frequency space in the auditory system, but I think here they mean f0 space? What is a "cell close to the location of the bias"? Cells reside in the brain. The bias is in f0 space. The use of "local" and "global" throughout the manuscript is too vague.

      Agreed, the reference here was actually to the cell's preferred pitch class, not its physical location (which one might arguably be able to disambiguate, given the context). We have changed the wording, and also checked the use of global/local throughout the manuscript. The main use of 'global/local' is now in reference to the range of adaptation, and is properly introduced on first mention.

      P7 L26 -there is no Fig 5D1. Do you mean the left panel of 5D?

      Thanks. Changed.

      FigS3 is referred to a lot on p7-8. Should this be moved to the main text?

      The main reason why we kept it in the supplement is that it is based on a more static model, which is intended to illustrate the consequences of different encoding schemes. In order to not confuse the reader about these two models, we prefer to keep it in the supplement, which - for an online journal - makes little difference since the reader can just jump ahead to this figure in the same way as any other figure.

      Fig 5C, D - label x-axis.

      Added.

      Fig 5E - axis labels needed. I don't know what is plotted on x and y, and cannot see red and green lines in left plot

      Thanks for noticing this, colors corrected, axes labeled.

      Page 8 L3-15 - If I follow this correctly, I think the authors are confusing pitch and frequency here in a way that is fundamental to their model. They seem to equate tonotopic frequency tuning to pitch tuning, leading to confused implications of frequency adaptation on the F0 representation of complex sounds like Shepard tones. To my knowledge, the authors do not examine pure tone frequency tuning in their neurons in this study. Please clarify how you propose that frequency tuning like that shown in Fig 5A relates to representation of the F0 of Shepard tones. Or...are the authors suggesting these neural effects have little to do with pitch processing and instead are just the result of frequency tuning for a single harmonic of the Shepard tones?

      We agree that it is not trivial to describe this well, while keeping the text uncluttered, in particular, because often tuning properties to stimulus frequency contribute to tuning properties of the same neuron for pitch class, although this can be more or less straightforward: specifically, for some narrowly tuned cells, the Shepard tuning is simply a reflection of their tuning to a single octave range of the constituent tones (see Fig. S1). For more broadly tuned cells, multiple constituent tones will contribute to the overall Shepard tuning, which can be additive, subtractive, or more complex. The assumption in our approach is that we can directly estimate the Shepard tuning to evaluate the consequence for the percept. While this may seem artificial, as Shepard tones do not typically occur in nature, the same argument could be made against pure tones, on which classical tuning curves and associated decodings are often based. Relating the Shepard tuning to the classical tuning would be an interesting study in itself, although arguably relating the tuning of one artificial stimulus to another. Regarding the terminology of pitch, pitch class and frequency: The term pitch class is commonly used in the field of Shepard tones, and - as we indicated in the beginning of the results: "the term pitch is used interchangeably with pitch class as only Shepard tones are considered in this study". We agree that the term pitch, which describes the perceptual convergence/construction of a tone-height from a range of possible physical stimuli, needs to be separated from frequency as one contributor/basis for the perception of a pitch. However, we think that the term pitch can - despite its perceptual origin - also be associated with neuron/neural responses, in order to investigate the neural origin of the pitch percept. At the same time, the present study is not targeted to study pitch encoding per se, as this would require the use of a variety of stimuli leading to consistent pitch percepts. Therefore, pitch (class) is here mainly used as a term to describe the neural responses to Shepard tones, based on the previous literature, and the fact that Shepard tones are composite stimuli that lead to a pitch percept. The last sentence has been added to the manuscript for clarity.

      P7-9: I wasn't left with a clear idea of how the model works from this text. I assume you have layers of neurons tuned to frequency or f0 (based on the real data?), which are connected in some way to produce some sort of output when you input a sound? More detail is needed here. How is the dynamic adaptation implemented?

      The detailed description of the model can be found in the Methods section. We have gone through the corresponding paragraph and have tried to clarify the description of the model by introducing a high-level description and the reference to the corresponding Figure (Fig. 5A) in the Results.

      Fig6A: Figure caption can't be correct. In any case, these equations cannot be understood unless you define the terms in them.

      We have clarified the description in the caption.

      Fig 6/directionality analysis: Assuming that the "F" in the STRFs here is Shepard tone f0, and not simple frequency?

      We have changed the formula in the caption and the axis labels now.

      Fig 6C - y-axis values

      In the submission, these values were left out on purpose, as the result has an arbitrary scale, but only whether it is larger or smaller than 0 counts for the evaluation of the decoded directionality (at the current level of granularity). An interesting refinement would be to relate the decoded values to animal performance. We have now scaled the values arbitrarily to fit within [-1,1], but we would like to emphasize that only their relative scale matters here, not their absolute scale.

      Fig 6E - can't both be abscissa (caption). I might be missing something here, but I don't see the "two stripes" in the data that are described in the caption.

      Thank you. The typo is fixed. The stripes are most clearly visible in the right panel of Fig. 6E, red and blue, diagonally from top left to bottom right.

      Fig 6G -I have no idea what this figure is illustrating.

      This panel is described in the text as follows: "The resulting distribution of activities in their relation to the Bias is, hence, symmetric around the Bias (Fig. 6G). Without prior stimulation, the population of cells is unadapted and thus exhibits balanced activity in response to a stimulus. After a sequence of stimuli, the population is partially adapted (Fig. 6G right), such that a subsequent stimulus now elicits an imbalanced activity. Translated concretely to the present paradigm, the Bias will locally adapt cells. The degree of adaptation will be stronger, if their tuning curve overlaps more with the biased region. Adaptation in this region should therefore most strongly influence a cell’s response. For example, if one considers two directional cells, an up- and a down-selective cell, cocentered in the same frequency location below the Bias, then the Bias will more strongly adapt the up-cell, which has its dominant, recent part of the SSTRF more inside the region of the Bias (Fig. 6G right). Consistent with the percept, this imbalance predicts the tone to be perceived as a descending step relative to the Bias. Conversely, for the second stimulus in the pair, located above the Bias, the down-selective cells will be more adapted, thus predicting an ascending step relative to the previous tone."

      I might be just confused or losing steam at this point, but I do not follow what has been done or the results in Fig 6 and the accompanying text very well at all. Can this be explained more clearly? Perhaps the authors could show spike rate responses of an example up-direction and down-direction neuron? Explain how the decoder works, not just the results of it.

      We agree that we are presenting something new here. However, it is conceptually not very different from decoding based on preferred frequencies. We have attempted to provide two illustrations of how the decoder works (Fig. 6A) and how it then leads to the percept using prototypical examples of cellular SSTRFs (Fig. 6G). We have added a complete, but accessible description to the Methods section. Showing firing rates of neurons would unfortunately not be very telling, given the usual variability in neural response and the fact that our paradigm did not have a lot of repetitions (but instead a lot of conditions), which would be able to average out the variability on a single neuron level.

      Discussion - I do not feel I can adequately critique the author's interpretation of the results until I understand their results and methods better. I will therefore save my critique of the discussion section for the next round of revisions after they have addressed the above issues of disorganization and clarity in the manuscript.

      We hope that the updated version of the manuscript provides the reviewer now with this possibility.

      Methods

      P15L7 - gender of human subjects? Age distribution? Age of ferrets?

      We have added this information.

      P16L21 - What is the justification for randomizing the phase of the constituent frequencies?

      The purpose of the randomization was to prevent idiosyncratic phase relationships for particular Shepard tones, which would depend in an orderly fashion on the included base-frequencies if non-randomized, and could have contributed to shaping the percept for each Shepard tone in a way that was only partly determined by the pitch class of the Shepard tone. Added to the section.

      P17L6 - what are the 2 randomizations? What is being randomized?

      Pitch classes and position in the Bias sequence. Added to the section.

      P16 Shepard Tuning section - What were the durations of the tones and the time between tones within a trial?

      Thanks, added!

      Equations - several undefined terms in the equations throughout the manuscript.

      Thanks. We have gone through the manuscript and all equations and have introduced additional definitions where they had been missing.

      Reviewer #3 (Recommendations For The Authors):

      P3L10: "passive" and "active" conditions come totally out of the blue. Need introducing first. (Or cut. If adaptation is always seen, why mention the two conditions if the difference is not relevant here?)

      We have added an additional sentence in the preceding paragraph, that should clarify this. The reason for mentioning it is that otherwise a possible counter-argument could be made that adaptation does not occur in the active condition, which was not tested in ferrets (but presents an interesting avenue for future research).

      P3L14 "siple" typo

      Corrected.

      P4L1 "behaving humans" you should elaborate just a little here on what sort of behavior the participants engaged in.

      Thanks for pointing this out. We have clarified this by adding an additional sentence directly thereafter.

      P4 adaptation: I wonder whether it would be useful to describe the Bias condition a bit more here before going into the observations. The reader cannot know what to expect unless they jump ahead to get a sense of what the Bias looks like in the sense of how many stimuli are in it, and how similar they are to each other. Observations such as "the average response strength decreases as a function of the position in the Bias sequence" are entirely expected if the Bias is made up of highly repetitive material, but less expected if it is not. I appreciate that it can be awkward to have Methods after Results, but with a format like that, the broad brushstroke Methods should really be incorporated into the Results and only the tedious details should be reserved for the Methods to avoid readers having to jump back and forth.

      Agreed, we have inserted a corresponding description before going into the details of the results.

      Related to this (perhaps): Bottom of P4, top of P5: "significantly less reduced (33%, p=0.0011, 2 group t-test) compared to within the bias (Fig. 2 A3, blue vs. red), relative to the first responses of the bias" ... I am at a loss as to what the red and blue symbols in Fig 2 A3 really show, and I wonder whether the "at the edges" to "within the Bias" comparison were to make sense if at this stage I had been told more about the composition of the Bias sequence. Do the ambiguous ('target') tones also occur within the Bias? As I am unclear about what is compared against what I am also not sure how sound that comparison is.

      We have added an extended description of the Bias to the beginning of this section of the manuscript. For your reference: the Shepard tones that made up the ambiguous tones were not part of the Bias sequence, as they are located at 3st distance from the center of the Bias (above and below), while the Bias has a range of only +/- 2.5st.

      Fig 2: A4 B1 B2 labels should be B1 B2 B3

      Corrected.

      Fig 2 A2, A3: consider adjusting y-axis range to have less empty space above the data. In A3 in particular, the "interesting bit" is quite compressed.

      Done, however, while still matching the axes of A2 and A3 for better comparability.

      I am under the strong impression that the human data only made it into Fig 2 and that the data from Fig 3 onwards are animal data only. That is of course fine (MEG may not give responses that are differentiated enough to perform the sort of analyses shown in the later figures. But I do think that somewhere this should be explicitly stated.

      Yes, the reviewer's observation is correct. The decoding analyses could not be conducted on the human MEG data and was therefore not further pursued. Its inclusion in the paper has the purpose of demonstrating that even in humans and active conditions, the local adaptation is present, which is a key contributor to the two decoding models. We now state this explicitly when starting the decoding analysis.

      P5L2 "bias" not capitalized. Be consistent.

      All changed to capitalized.

      P5L8 reference to Fig 2 A4: something is amiss here. From legend of Fig 2 it seems clear that panel A4 label is mislabeled B1. Maybe some panels are missing to show recovery rates?

      Apologies for this residual text from a previous version of the manuscript. We have gone through all references and corrected them.

      P6L7 comma after "decoding".

      Changed.

      Fig 3, I like this analysis. What would be useful / needed here though is a little bit more information about how the data were preprocessed and pooled over animals. Did you do the PCA separately for each animal, then combine, or pool all units into a big matrix that went into the PCA? What about repeat, presentations? Was every trial a row in the matrix, or was there some averaging over repeats? (In fact, were there repeats??)

      Thanks for bringing up these relevant aspects, which were partly insufficiently detailed in the manuscript. Briefly, cells were pooled across animals and we only used cells that could meaningfully contribute to the decoding analysis, i.e. had auditory responses and different responses to different Shepard tones. Regarding the responses, as stated in the Methods, "Each stimulus was repeated 10 times", and we computed average responses across these repetitions. Single trials were not analyzed separately. We have added this information in the Methods, and refer to it in the Results.

      Also, there doesn't appear to be a preselection of units. We would not necessarily expect all cortical neurons to have a meaningful "best pitch" as they may be coding for things other than pitch. Intuitively I suspect that, perhaps, the PCA may take care of that by simply not assigning much weight to units that don't contribute much to explained variance? In any event I think it should be possible, and would be of some interest, to pull out of this dataset some descriptive statistics on what proportion of units actually "care about pitch" in that they have a lot (or at least significantly more than zero) of response variance explained by pitch. Would it make sense to show a distribution of %VE by pitch? Would it make sense to only perform the analysis in Fig 3 on units that meet some criterion? Doing so is unlikely to change the conclusion, but I think it may be useful for other scientists who may want to build on this work to get a sense of how much VE_pitch to expect.

      We fully agree with the reviewer, which is why this information is already presented in Supplementary Fig.1, which details the tuning properties of the recorded neurons. Overall, we recorded from 1467 neurons across all ferrets, out of which 662 were selected for the decoding analysis based on their driven firing rate (i.e. whether they responded significantly to auditory stimulation) and whether they showed a differential response to different Shepard tones The thresholds for auditory response and tuning to Shepard tones were not very critical: setting the threshold low, led to quantitatively the same result, however, with more noise. Setting the thresholds very high, reduced the set of cells included in the analysis, and eventually that made the results less stable, as the cells did not cover the entire range of preferences to Shepard tones. We agree that the PCA based preprocessing would also automatically exclude many of the cells that were already excluded with the more concrete criteria beforehand. We have added further information on this issue in the Methods section under the heading 'Unit selection'.

      P9 "tones This" missing period.

      Changed.

      P10L17 comma after "analysis"

      Changed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #3 (Public Review):

      Some critical comments are provided below:

      (1) The data quality still needs to be improved. There are many outliers in the experimental data shown in some figures, e.g. Figure 2D-G. The presence of these outliers makes the results unreliable. The author should thoroughly review the data analysis in the manuscript. In addition, a couple of western blot bands, such as IL-1β in Figure 3C, are not clear enough, please provide clearer western blot results again to support the conclusion.

      Following our comparative analysis, we have determined that these data do not affect our conclusions. Moreover, our experimental design included a total of six mice per group, with all mouse samples being subjected to testing.

      (2) As shown in Figure 1G-I, foot thickness and IL-1β content in foot tissues of the Aged+Abx group were significantly reduced, but there was no difference in serum uric acid level. In addition, the Abx-untreated group should be included at all ages.

      Thank you for your comment. We have included this data in Supplemental Material 4.

      (3) Since FMT (Figure 4) and butyrate supplementation (Figure 8) have different effects on uric acid synthesis enzyme and excretion, different mechanisms may lie behind these two interventions. Transplantation with significantly enriched single strains from young mice, such as Bifidobacterium and Akkermansia, is the more reliable approach to reveal the underlying mechanism between gut microbiota and gout.

      Thank you for your comment. Due to the involvement of multiple bacterial genera in gout and hyperuricemia, and the practical challenge of testing all strains, our focus shifted to the functional implications and metabolism of the microbiota. Experimental validation confirmed that butyrate exerts a dual-therapeutic effect in mitigating gout and hyperuricemia.

      (4) In Figure 2F, the results showed the IL-1β, IL-6, and TNF-α content in serum, which was inconsistent with the authors' manuscript description (Line 171).

      Thank you for your comment. The modifications to the results have been implemented.

      (5) Figures 2F-H duplicate Supplementary Figures S1B-D. The authors should prepare the article more carefully to avoid such mistakes.

      Thank you for your comment. We have corrected it in the manuscript.

      (6) In lines 202-206, the authors stated that the elevated serum uric acid levels in the Young+Old or Young+Aged groups, but there is no difference in the results shown in Figure 4A.

      Thank you for your comment. We have corrected it in the manuscript.

      (7) Please visualize the results in Table 2 in a more intuitive manner.

      The results have been presented in Table 2 with a more intuitive visual format. The detailed information is presented in Supplement 4.

      (8) The heatmap in Figure 7A cannot strongly support the conclusion "the butyric acid content in the faeces of Young+PBS group was significantly higher than that in the Aged+PBS group". The author should re-represent the visual results and provide a reasonable explanation. In addition, please provide the ordinate unit of Supplementary Figure 7A-H.

      Thank you for your comment. Figure 7A and Supplementary Figure 7A-H together illustrate "the butyric acid content in the faeces of Young+PBS group was significantly higher than that in the Aged+PBS group", and the specific units of short-chain fatty acids have been annotated in the manuscript.

      (9) Uncropped original full-length western blot should be provided.

      Thank you for your comment. We have made relevant notes in the paper.

      Reviewer #1 (Recommendations For The Authors):

      Gout, a prevalent form of arthritis among the elderly, exhibits an intricate relationship with age and gut microbiota. The authors found that gut microbiota plays a crucial role in determining susceptibility to age-related gout. They observed that age-related gut microbiota regulated the activation of the NLRP3 inflammasome pathway and modulated uric acid metabolism. "Younger" microbiota has a positive impact on the gut microbiota structure of old or aged mice, enhancing butanoate metabolism and butyric acid content. Finally, they found butyric acid exerts a dual effect, inhibiting inflammation in acute gout and reducing serum uric acid levels. This work's insights emphasize the potential of "young" gut microbiome in mitigating senile gout. The whole study was interesting, but there were some minor errors in the overall writing of the paper. The author should carefully check the spelling of the words in the text and the case consistency of the group names.

      Questions:

      (1) Line 118, line 142, and elsewhere 24 months in the same format as before.

      Thank you for your comment. We have corrected it in the manuscript.

      (2) Lines 123, Old and Aged group should be a complex number.

      Thank you for your suggestion. We have corrected it in the manuscript.

      (3) Why does line 133 mention the use of ABX? Please add a brief explanation.

      Thank for your suggestion. The aim of utilizing ABX is to construct the linkage between gut microbiota, age, and gout.

      (4) Lines 172-175, the description of TNF does not match the description of the result figure, may be the picture placement error, please correct this.

      Thank you for your careful review. The error has been corrected and the accurate result has been inserted into the original manuscript.

      (5) Lines183-185 and lines193-lines195, Pro-Caspase-1 and Pro-IL activate excess write.

      Thank you for your careful review. We have corrected the error at the original location.

      (6) Line 400, the text should not be written as increased.

      Thank you for your careful review. We have corrected the error at the original location.

      (7) "ns" needs to be added in the legend to indicate that there is no significant difference.

      Thank you for your careful review. We have corrected the error at the original location.

      (8) Lines 1080-1084 "Old or Aged control group and the old or aged group", group names should be case-sensitive.

      Thank you for your suggestion. We have made the correct modification to the group names.

      (9) Lines 1072-1073, "Representative western blot images of foot tissue NLRP3 pathways proteins" add band density.

      Thank you for your suggestion. We have corrected the error on lines 1072-1073 of the article.

      Reviewer #2 (Recommendations For The Authors):

      Specific comments:

      (1) In Figures 1G-H, the Aged+PBS group with antibiotic treatment shows a significant reduction in foot swelling and IL-1β compared to the Young+PBS and Old+PBS groups. The authors state that age-related changes in the gut microbiota exacerbate gout. However, why does only the Aged+PBS group improve with antibiotic treatment? It seems that butyrate alone cannot explain this phenomenon.

      We utilize antibiotics for treatment in order to establish the relationship between gut microbiota, age, and gout. Different age groups are directly given antibiotics for treatment. We found that after clearing the gut microbiota and then stimulating with MSU, the trend of inflammation factors changing with age disappears.

      (2) In Figure 2, the fecal transplantation from young mice improved the infiltration of inflammatory cells and inflammatory cytokines in the Old and Aged groups. However, in Supplementary Figure 1A, there is no improvement observed in the percentage of foot swelling. Is it appropriate to conclude that inflammation was improved even though foot swelling was not suppressed?

      Although we did not observe changes in the swelling of the mice's feet, there were changes in the inflammatory cell infiltration and inflammation factors in the slices. We rely on a comprehensive assessment of various indicators to determine whether the inflammatory condition has improved or worsened.

      (3) In line #249, the authors state that "the fecal microbiota from mice in the young group promotes uric acid elimination, inhibits reabsorption, and may contribute to the integrity of the intestinal barrier structure." However, Supplementary Figure 3F-H shows no significant alterations in Occludin and ZO-1 mRNA expression levels among all groups. Therefore, it is difficult to conclude that the fecal microbiota from the young group promotes the integrity of the intestinal barrier structure. A functional barrier assay, such as oral administration of FITC-dextran, would be necessary to verify the authors' conclusion.

      In Supplementary Figure 3F-H, we observed that the mRNA expression of Occludin and ZO-1 increased but showed no significant difference. However, after the elderly mice were transplanted with the intestinal microbiota of young mice, the mRNA expression of JAMA showed a significant upward trend. Additionally, due to the scarcity of old mice, we were unable to perform the oral administration of FITC-dextran. However, we supplemented with immunohistochemical slices of Zo-1 and Occludin to support our viewpoint.

      (4) In Figure 4, when comparing the young+PBS group with the old+PBS or aged+PBS groups, there are hardly any differences in the proteins involved in uric acid synthesis (ADA, GDA, XOD) or the genes involved in uric acid transport (URAT1, GLUT9, OAT1, OTA3, ABCG2). Since no changes in uric acid synthesis or transport pathways are observed with aging, it is questionable to conclude that fecal transplantation from young mice improves these pathways and lowers blood uric acid levels.

      In the calculation process, we used different age groups of the control group as references, instead of directly using young mice. We then compared the data of mice of different ages, and the results are in Supplementary Material 4.

      (5) In line 276, the authors describe "the Young +Old and Young+Aged groups tended to be closer to the Old+PBS and Aged+PBS groups, and the Old+Young and Aged+young groups tended to be closer to the Young+PBS group (Figure 5D)". Please conduct a statistical analysis.

      (6) In line 298, the authors hypothesize that butyrate might be the key molecule responsible for controlling gout, as Bifidobacterium and Akkermansia were abundant in the Young group, and the butyrate pathway was prominent. However, neither Bifidobacterium nor Akkermansia are butyrate-producing bacteria. Thus, the conclusion appears to be biased toward butyrate, raising questions about this interpretation.

      Upon comparison, we discovered other bacteria genera that produce butyrate, such as Lachnoclostridium. Additionally, literature (PMID:38126785, 26420851) reports have indicated that Bifidobacteria combined with other genera can enhance the production of butyrate. Meanwhile, Akkermansia, particularly the species Akkermansia muciniphila, has been found to confer several beneficial traits, as evidenced by preclinical studies. These traits include promoting the growth of butyrate-producing bacteria through the production of acetate, which leads to a decrease in the loss of the colonic bilayer and subsequent reduction in inflammation (PMID:35468952). Based on the predicted results of microbiome functions, we observed that the Butanoate_metabolism of the microbiota in young mice and the elderly mice recipients of young mouse microbiota was enhanced. Considering that Lachnoclostridium can produce butyrate, and that Bifidobacteria and Akkermansia can promote the production of butyrate by the intestinal microbiota, we speculated that butyrate might play a role in gout and hyperuricemia.

      (7) In Supplementary Figure 7, acetic acid and propionic acid also show the same behavior as butyric acid. It is possible that these metabolites may also affect the development of gout.

      Thank you for your suggestion. Indeed, Figure 7 does show a similar trend for acetic and propionic acids as for butyric acid. However, considering the predictive data of microbial function and the non-targeted metabolomic data, there is an enhancement of Butanoate_metabolism in both young mice and elderly mice receiving young mouse intestinal microbiota transplants. Therefore, we prioritized butyrate as the subject of our study. Due to the scarcity of elderly mice, we are unable to conduct subsequent experiments with acetic and propionic acids, which is one of the limitations of this study. This work will be addressed in our follow-up research.

      (8) In Figure 6, the secondary bile acid biosynthesis pathway was also changed. However, there is little mention of secondary bile acid in the discussion section. Please carefully discuss other possibilities besides butyrate.

      Thank you for your suggestion. We have incorporated a discussion about secondary bile acids into the relevant section of our manuscript.

      (9) In line #330, the authors state, 'the metabolites identified as showing differential abundance between the groups were enriched in the butanoate metabolism pathway (Figure 6A-D).' However, there does not appear to be much difference in the butanoate metabolism pathway. Specifically, in Figure 6C, the butanoate metabolism pathway in the Old group does not differ from that in the Young group. Please explain in more detail whether the butanoate metabolism pathway is relevant in the Old group.

      The metabolites identified as showing differential abundance between the groups were enriched in the butanoate metabolism pathway. The differential metabolites are enriched in the butyrate metabolism pathway; however, the non-targeted metabolomics did not reveal the extent of their enrichment.

      (10) In Figure 7, the authors measured the levels of short-chain fatty acids in the Young and Aged groups. They found butyrate in the feces of mice in the Young group was higher than that in the Aged group. However, I wonder whether the Old group also had low levels of butyrate or not.

      In the experiment, we selected three representative groups to verify the hypothesis that butyrate may play a significant role in gout and hyperuricemia. Subsequently, we found that supplementing 18-month-old and 24-month-old mice with butyrate indeed reduced blood uric acid levels and alleviated gout symptoms. Since 18-month-old mice are difficult to obtain, we only conducted microbiome sequencing and non-targeted metabolomic analysis.

      Minor issues:

      (11) In line 74, what does MSU stand for? Please describe the abbreviation.

      In line 74, MSU refers to Monosodium urate crystals.

      (12) In line 136, please insert a space between "IL-1β" and "and".

      Thank you for your suggestion. We have corrected the error of the article.

      (13) In line 570, please describe the method of butyrate administration and also correct the grammatical errors.

      Thank you for your suggestion. We have corrected the error of the article.

      (14) Change the title of x axis in Figure 2F-H, "Serum ~" to "Peritoneal fluid ~", according to the legend.

      Thank you for your suggestion. We have corrected this error in the manuscript.

      (15) In line 302, "succinates" should be "butyric acid or butyrate".

      Thank you for your suggestion. We have corrected this error in the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors showed the results of IL-1β levels in foot tissues in Figure 1C and Figure 1H, and serum IL-1β, IL-6, and TNF-α levels in Figure 2F-H. Could the authors also provide the results of IL-6 and TNF-α in foot tissue in Figure 1?

      Thank you for your suggestion. We have added the results of of IL-6 and TNF-α in foot tissue in supplementary material 4.

      (2) There are some errors in the reference citation format, such as missing page numbers.

      Thank you for your careful review. We have revised the references in our manuscript.

      (3) There are too many writing errors in the manuscript, which greatly affect the understanding of the text. The manuscript must be carefully revised to improve its readability. It's recommended that a professional English writer or native speaker proofread the paper before submission. Some errors, but not limited to these errors, are listed below.

      a. Line 107: The abbreviation for "short-chain fatty acid" should be SCFA, not SFCA.

      Thank you for your careful review. We have corrected this error in the manuscript.

      b. Line 136: There is a missing space between IL-1β and and. B.

      Thank you for your careful review. We have corrected this error in the manuscript.

      c. Line 145, the phrase "on gout on gout", and line 471, "that transplantation" are repeated.

      Thank you for your careful review. We have corrected this error in the manuscript.

      d. Line 152: "Age+PBS" should be "Aged+PBS".

      Thank you for your careful review. We have corrected this error in the manuscript.

      e. In Figure 1e, "Aded+PBS" should be "Aged+PBS".

      Thank you for your careful review.  We have corrected the error in Figure 1e.

      f. Line 152: The phrase "by via" is repeated.

      Thank you for your suggestion. We have deleted the phrase "by via" in line 152.

      g. "16S rDNA" in line 92 is inconsistent with the "16S rRNA" in line 652.

      Thank you for your suggestion. We have revised the error in the manuscript to maintain consistency in professional terminology.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Tleiss et al. demonstrate that while commensal Lactiplantibacillus plantarum freely circulate within the intestinal lumen, pathogenic strains such as Erwinia carotovora or Bacillus thuringiensis are blocked in the anterior midgut where they are rapidly eliminated by antimicrobial peptides. This sequestration of pathogenic bacteria in the anterior midgut requires the Duox enzyme in enterocytes, and both TrpA1 and Dh31 in enteroendocrine cells. This effect induces muscular muscle contraction, which is marked by the formation of TARM structures (thoracic ary-related muscles). This muscle contraction-related blocking happens early after infection (15mins). On the other side, the clearance of bacteria is done by the IMD pathway possibly through antimicrobial peptide production while it is dispensable for the blockage. Genetic manipulations impairing bacterial compartmentalization result in abnormal colonization of posterior midgut regions by pathogenic bacteria. Despite a functional IMD pathway, this ectopic colonization leads to bacterial proliferation and larval death, demonstrating the critical role of bacteria anterior sequestration in larval defense.

      This important work substantially advances our understanding of the process of pathogen clearance by identifying a new mode of pathogen eradication from the insect gut. The evidence supporting the authors' claims is solid and would benefit from more rigorous experiments.

      (1) The authors performed the experiments on Drosophila larvae. I wonder whether this model could extend to adult flies since they have shown that the ROS/TRPA1/Dh31 axis is important for gut muscle contraction in adult flies. If not, how would the authors explain the discrepancy between larvae and adults?

      We have linked the adult phenotype to the larval model to explore the ROS/TrpA1/Dh31 axis in both contexts.  As highlighted in the discussion, however, there are key behavioral differences between larvae and adult flies. Unlike larvae, which remain in the food environment, adult flies have the ability to move away. This difference could impact the relevance of gut muscle contraction and bacterial clearance mechanisms between the two stages. Specifically, in larvae, the rapid ejection of gut contents due to muscle contraction poses a unique risk: larvae may inadvertently re-ingest the expelled material within minutes, which could influence their immune defenses. We have clarified this distinction and our hypothesis in the final section of the discussion, as it emphasizes the adaptive nature of this mechanism in larvae.

      (2) The authors performed their experiments and proposed the models based on two pathogenic bacteria and one commensal bacterial at a relatively high bacterial dose. They showed that feeding Bt at 2X1010 or Ecc15 at 4X108 did not induce a blockage phenotype. 

      I wonder whether larvae die under conditions of enteric infection with low concentrations of pathogenic bacteria. 

      To address this, we have provided new data (Movie 5), in which larvae were fed a lower dose of Bt-GFP at 1.3 × 10^10 CFU/mL. In this video, we observe that when larvae ingest fewer bacteria, no blockage occurs, and the bacteria are able to reach the posterior midgut. As the bacterial load is lower, the fluorescence signal is weaker, but the movie clearly shows the excretion of bacteria. Importantly, under these conditions, no larval death was observed. These findings suggest that below a certain bacterial threshold, the pathogenicity is insufficient to: (1) trigger the blockage response, and (2) kill the larvae. In such cases, bacteria are likely eliminated through normal peristaltic movements rather than through the blockage mechanism described in our study.

      If larvae do not show mortality, what is the mechanism for resisting low concentrations of pathogenic bacteria? 

      As mentioned in our previous response, we hypothesize that the larvae’s ability to resist low concentrations of pathogenic bacteria is likely due to being below the threshold of virulence. At lower bacterial doses, the pathogenic load is insufficient to trigger the blockage mechanism or cause larval death. In these cases, it is probable that classical peristaltic movements of the gut efficiently eliminate the bacteria, preventing them from colonizing the posterior midgut or causing significant harm. Thus, the larvae rely on standard gut motility and immune mechanisms, rather than the blockage response, to clear lower doses of bacteria.

      Why is this model only applied to high-dose infections? 

      The reason this model primarily applies to high-dose infections is that lower concentrations of pathogenic bacteria do not trigger the blockage mechanism. As we mentioned in the manuscript, for low bacterial concentrations, where the GFP signal remains detectable, wild-type larvae are still able to resist live bacteria in the posterior part of the intestine.

      Regarding the bacterial doses used in our experiments, it's important to clarify that we calculate the bacterial load based on colony-forming units (CFU). In our setup, there are approximately 5 × 10^4 CFU per midgut. For each experiment, we prepare 500 µl of contaminated medium containing 4 × 10^10 CFU. Fifty larvae are placed into this 500 µl of medium, meaning each larva ingests around 5 × 10^4 CFU within one hour of feeding.

      This leads us to two key points:

      (1) Continuous feeding might trigger the blockage response even at lower doses, as extended exposure to bacteria could lead to higher accumulation within the gut.

      (2) Other defense mechanisms, such as the production of reactive oxygen species (ROS) or classical peristaltic movements, could be sufficient to eliminate lower bacterial doses (around 10^3 CFU or below).

      We also refer to the newly provided Movie 5, where larvae fed with Bt-GFP at 1.3 × 10^10 CFU/mL show no blockage at low ingestion levels and successfully eliminate the bacteria.

      (3) The authors claim that the lock of bacteria happens at 15 minutes while killing by AMPs happens 6-8 hours later. 

      Our CFU data indicate that it’s after 4 to 6 hours that the quantity of bacteria decreases. We fixed this in the text.

      What happened during this period? 

      During the 4 to 6-hour period, several defense mechanisms are activated. ROS play a bacteriostatic and bacteriolytic role, helping to control bacterial growth. Concurrently, the IMD pathway is activated, leading to the transcription, translation, and secretion of antimicrobial peptides. These AMPs exert both bacteriostatic and bacteriolytic effects, contributing to the eventual clearance of the pathogenic bacteria.

      More importantly, is IMD activity induced in the anterior region of the larval gut in both Ecc15 and Bt infection at 6 hours after infection? 

      We have provided new data (Supplementary Figure 6) that includes RT-qPCR analysis of the whole larval gut in wt, TrpA1- and Dh31- genetic background after feeding with Lp, Ecc15, Bt, or yeast only. We monitored the expression of three different AMP-encoding genes and found that while AMP expression varied depending on the food content, there were no significant differences between the genotypes tested.

      Additionally, we included new imaging data (Supplementary Figure 11) from AMP reporter larvae (Dpt-Cherry) fed with fluorescent Lp or Bt. In larvae infected with Bt, which is blocked in the anterior part of the gut, the dpt gene is predominantly induced in this region, indicating strong IMD pathway activity in response to Bt infection. Conversely, in larvae fed with Lp-GFP, the Dpt-Cherry reporter shows weak expression in the anterior midgut, and is barely detectable in the posterior midgut where Lp-GFP establishes itself. This aligns with previous findings by Bosco-Drayon et al. (2012), which demonstrated low AMP expression in the posterior midgut due to the presence of negative regulators of the IMD pathway, such as amidases and Pirk.

      Are they mostly expressed in the anterior midgut in both bacterial infections? Several papers have shown quite different IMD activity patterns in the Drosophila gut. Zhai et al. have shown that in adult Drosophila, IMD activity was mostly absent in the R2 region as indicated by dpt-lacZ. Vodovar et al. have shown that the expression of dpt-lacZ is observable in proventriculus while Pe is not in the same region. Tzou et al. showed that Ecc15 infection induced IMD activity in the anterior midgut 24 hours after infection. 

      Based on our new data (Supplementary Figure 11), we observe that Dpt-RFP expression is primarily localized in the anterior midgut and likely in the beginning of acidic region in larvae infected with Bt, Ecc and Lp. 

      Using TrpA1 and Dh31 mutants, the authors found both Ecc15 and Bt in the posterior midgut. Why are they not evenly distributed along the gut? 

      We observe that bacteria are not evenly distributed along the gut in wild-type larvae as well, with LP. This suggests that the transit time in the anterior part of the gut may be relatively short due to active peristaltism, which would make this region function as a "checkpoint" for bacteria that are not supposed to be blocked. Indeed, we confirmed that peristaltism is active during our intoxication experiments, which could explain the rapid movement of bacteria through the anterior midgut.

      In contrast, bacteria tend to remain longer in the posterior midgut, which corresponds to the absorptive functions of intestinal cells in this region. This would explain why we observe more bacteria in the posterior midgut for Lp in control larvae and for Ecc15 and Bt in the TrpA1- and Dh31- mutants. Although a few bacteria are still found in the anterior midgut, they are consistently in much lower numbers compared to the posterior, as shown in Figures 1A and 3A of our manuscript.

      Last but not least, does the ROS/TrpA1/Dh31 axis affect AMP expression?

      We investigated whether the ROS/TrpA1/Dh31 axis influences AMP expression by performing RT-qPCR on the whole gut of larvae in wild-type, TrpA1-, and Dh31- genetic backgrounds. Larvae were fed with Lp, Ecc, Bt, or yeast (new data: Supplementary Figure 6). We monitored the expression of three different AMP-encoding genes and found that while AMP expression varied depending on the food content, there were no significant differences in AMP expression between the different genotypes.

      Additionally, we provide imaging data from AMP reporter larvae (pDpt-Cherry) fed with fluorescent Lp or Bt (new data: Supplementary Figure 11). These results further confirm that the ROS/TrpA1/Dh31 axis does not significantly affect AMP expression in our experimental conditions.

      (4) The TARM structure part is quite interesting. However, the authors did not show its relevance in their model. Is this structure the key-driven force for the blocking phenotype and killing phenotype? 

      We agree that the TARM structures are a fascinating aspect of this study and acknowledge the interest in their potential role in the blocking and killing phenotypes. While we are keen to explore the specific contributions of these structures during bacterial intoxication, the current genetic tools available for manipulating TARMs target both TARM T1 and T2 simultaneously, as demonstrated by Bataillé et al., 2020 (Fig. 2). Of note, these muscles are essential for proper gut positioning in larvae, and their absence leads to significant defects in food intake and transit, which would confound the results of our intoxication experiments (see Fig. 6 from Bataillé et al., 2020).

      Therefore, while TARMs are likely involved in these processes, the current limitations in selectively targeting them prevent us from definitively testing their role in bacterial blocking and killing at this stage. We hope to address this in future studies as more refined genetic tools become available.

      Is the ROS/TrpA1/Dh31 axis required to form this structure?

      To determine whether the ROS/TrpA1/Dh31 axis is required for the formation of TARM structures, we examined larval guts from control, TrpA1-, and Dh31- mutant backgrounds. Our new data (Supplementary Figure 8) show that the TARM T2 structures are still present in the mutants, indicating that the formation of these structures does not depend on the ROS/TrpA1/Dh31 axis.

      Reviewer #2 (Public Review):

      This article describes a novel mechanism of host defense in the gut of Drosophila larvae. Pathogenic bacteria trigger the activation of a valve that blocks them in the anterior midgut where they are subjected to the action of antimicrobial peptides. In contrast, beneficial symbiotic bacteria do not activate the contraction of this sphincter, and can access the posterior midgut, a compartment more favorable to bacterial growth.

      Strengths:

      The authors decipher the underlying mechanism of sphincter contraction, revealing that ROS production by Duox activates the release of DH31 by enteroendocrine cells that stimulate visceral muscle contractions. The use of mutations affecting the Imd pathway or lacking antimicrobial peptides reveals their contribution to pathogen elimination in the anterior midgut.

      Weaknesses:

      The mechanism allowing the discrimination between commensal and pathogenic bacteria remains unclear.

      Based on our findings, we hypothesize that ROS play a crucial role in this discrimination process, with uracil release by pathogenic or opportunistic bacteria potentially serving as a key signal.

      To test whether uracil could trigger this discrimination, we conducted experiments where Lp was supplemented with uracil. However, our results show that uracil supplementation alone was not sufficient to induce the blockage response (new data: Supplementary Figure 5). This suggests that while uracil may be a factor in bacterial discrimination, it is likely not the sole trigger, and additional bacterial factors or signals may be required to activate the blockage mechanism. 

      The use of only two pathogens and one symbiotic species may not be sufficient to draw a conclusion on the difference in treatment between pathogenic and symbiotic species.

      To address this concern, we performed additional intoxication experiments using Escherichia coli OP50, a bacterium considered innocuous and commonly used as a standard food source for C. elegans in laboratory settings. The results, presented in our updated data (new data: Fig 1B), show that E. coli OP50, despite being from the same genus as Ecc, does not trigger the blockage response. This further supports our conclusion that the gut’s discriminatory mechanism is specific to pathogenic bacteria, and not merely based on bacterial genus.

      We can also wonder how the process of sphincter contraction is affected by the procedure used in this study, where larvae are starved. Does the sphincter contraction occur in continuous feeding conditions? Since larvae are continuously feeding, is this process physiologically relevant?

      In our intoxication protocol, the larvae are exposed to contaminated food for 1 hour, during which the blockage ratio is quantified. Since this period involves continuous feeding with the contaminated food, we do not consider the larvae starved during the quantification process. Our observations show differences in the blockage response depending on the bacterial contaminant and the genetic background of the host. Additionally, we were able to trigger the blocking phenomenon using exogenous hCGRP.

      Regarding the experimental setup for movie observations, it is true that larvae are immobilized on tape in a humid chamber, which is not a fully physiological context. However, in the new movie we provide (Movie 3), co-treatment with fluorescent Dextran (Red) and fluorescent Bt (Green) shows that both are initially blocked, followed by the posterior release of Dextran once the bacterial clearance begins.

      Furthermore, to address the question of continuous exposure, we extended the exposure period to 20 hours instead of 1 hour. Even after prolonged exposure, we observed that pathogens are still blocked in the anterior part of the gut (new data: Supplementary Figure 2B). This supports the physiological relevance of the sphincter contraction and its ability to function under continuous feeding conditions.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors performed the experiments on Drosophila larvae. I wonder whether this model could extend to adult flies since they have shown that the ROS/TRPA1/Dh31 axis is important for gut muscle contraction in adult flies. If not, how would the authors explain the discrepancy between larvae and adults?

      We link the adult phenotype to the one we describe in larvae in order to have the candidate approach toward the ROS/TrpA1/Dh31 axis. As we already mention in the discussion, while larvae stay in the food, adult flies can go away. If larvae eject their gut content, they may ingest it within minutes. We clarify our idea in the last part of the discussion.

      (2) The authors performed their experiments and proposed the models based on two pathogenic bacteria and one commensal bacterial at a relatively high bacterial dose. They showed that feeding Bt at 2X1010 or Ecc15 at 4X108 did not induce a blockage phenotype. 

      I wonder whether larvae die under conditions of enteric infection with low concentrations of pathogenic bacteria. 

      Video provided with Bt-GFP 1.3 10^10 CFU/mL (new data: Movie 5). When larvae eat less, there is no blockage and bacteria can reach the posterior midgut. Note that the fluorescence is weak due to the low amount of bacteria ingested. The movie shows an excretion of the bacteria. There is also no death of the larvae. Together these results suggest that below a given threshold, the virulence of the bacteria is too weak to i) trigger a blockage and 2/ kill the larva. The bacteria are likely eliminated through classical peristaltism.

      If larvae do not show mortality, what is the mechanism for resisting low concentrations of pathogenic bacteria? 

      Maybe we are below the threshold of virulence. See our response just above.

      Why is this model only applied to high-dose infections? 

      As mentioned in the manuscript, lower concentrations do not trigger the blockage and for lower concentrations with a GFP signal still detectable, wild-type animals resist the presence of live-bacteria within the posterior part of the intestine.

      About the doses, the CFU should be considered. Indeed, there are around 5.10^4 CFU per midgut. In our experimental procedure we calculate the amount of bacteria for 500 µl of contaminated medium (i.e. 4.10^10 CFU/500µl of medium). Then around 50 larvae were deposited in the 500µl of contaminated media. In this condition, one larva ingests 5.10^4 CFU. Moreover, larvae are only fed for 1h. 

      So 1/ continuous feeding may also trigger locking even at lower doses and 2/ the other mechanisms of defenses (such as ROS) or peristalsis may be sufficient to eliminate lower doses (i.e. 10^3 CFU or below). See the new movie 5 we provide with Bt-GFP 1.3 10^10 CFU/mL

      (3) The authors claim that the lock of bacteria happens at 15 minutes while killing by AMPs happens 6-8 hours later. 

      Our CFU data indicate that it’s after 4 to 6 hours that the quantity of bacteria decreases. We fixed this in the text.

      What happened during this period? 

      ROS activity (bacteriostatic and bacteriolytic), IMD activation, AMP transcription, translation, secretion and bacteriostatic as well as bacteriolytic activity.

      More importantly, is IMD activity induced in the anterior region of the larval gut in both Ecc15 and Bt infection at 6 hours after infection? 

      We provide new data for larval whole gut RT-qPCR data in wt, TrpA1- and Dh31- genetic background fed with Lp or Ecc or Bt or yeast only (new data: SUPP6). We monitored 3 different AMP-encoding genes and found differences related to the food content, but no differences between genotypes. In addition, we provide images from AMP reporter animals (Dpt-Cherry) fed with fluorescent Lp or Bt (new data: SUPP11) showing that with Bt blocked in the anterior part of the intestine, the dpt gene is mainly induced in this area. Note that in the larva infected with Lp-GFP, the Dpt-Cherry reporter is weakly expressed in the anterior midgut. In the posterior midgut, the place where Lp-GFP is established, Dpt-Cherry is barely detectable. This observation is in line with the previous observation made by Bosco-Drayon et al., (2012) demonstrating the low level of AMP expression in the posterior midgut due to the expression of the IMD negative regulators such as amidases and pirk. In the larva infected with Bt-GFP, note the obvious expression of DptCherry in the anterior midgut colocalizing with the bacteria (new data: SUPP11).

      Are they mostly expressed in the anterior midgut in both bacterial infections? Several papers have shown quite different IMD activity patterns in the Drosophila gut. Zhai et al. have shown that in adult Drosophila, IMD activity was mostly absent in the R2 region as indicated by dpt-lacZ. Vodovar et al. have shown that the expression of dpt-lacZ is observable in proventriculus while Pe is not in the same region. Tzou et al. showed that Ecc15 infection induced IMD activity in the anterior midgut 24 hours after infection. 

      In ctrl animals fed Bt, Ecc and Lp we see Dpt-RFP in anterior midgut and likely in the beginning of acidic region. See the new data: SUPP11 images provided for the previous remark.

      Using TrpA1 and Dh31 mutants, the authors found both Ecc15 and Bt in the posterior midgut. Why are they not evenly distributed along the gut? 

      Same is true with Lp in wt; not evenly distributed. As if the transit time in the anterior part is very short due to peristaltism which would fit for a check point area if you’re not supposed to be blocked. Indeed, peristaltism is active during our intoxications. Then, it stays longer in the posterior part, fitting with the absorptive skills of the intestinal cells in this area. With Lp in ctrl or Ecc and Bt in TrpA1- and Dh31- mutants, there are always a few in the anterior midgut but always much less compared to the posterior. See our figure 1A and 3A.

      Last but not least, does the ROS/TrpA1/Dh31 axis affect AMP expression?

      We provide larval whole gut RT-qPCR data in wt, TrpA1- and Dh31- genetic background fed with Lp or Ecc or Bt or yeast only (new data: SUPP6). We monitored 3 different AMPencoding genes and found differences related to the food content, but no differences between genotypes. In addition, we provide images from AMP reporter animals (pDptCherry) fed with fluorescent Lp or Bt, (new data: SUPP11).

      (4) The TARM structure part is quite interesting. However, the authors did not show its relevance in their model. Is this structure the key-driven force for the blocking phenotype and killing phenotype? 

      Indeed, we would like to explore the roles of these structures and the putative requirement upon bacterial intoxication using some driver lines developed by the team that studied these muscles in vivo. However, the genetic tools currently available will target TARMsT1 and T2 at the same time. See Fig 2 form Bataillé et al, . 2020. Moreover, these TARMs are, at first, crucial for the correct positioning of the gut within the larvae and their absence lead to a global food intake and transit defect that will bias the outcomes of our intoxication protocol (see fig 6 from Bataillé et al,. 2020).

      Is the ROS/TrpA1/Dh31 axis required to form this structure?

      We provide images of larval guts from ctrl, TrpA1 and Dh31 mutants demonstrating the presence of the TARMs T2 structures despite the mutations (new data: SUPP8). In addition, we provide representative movies of peristalsis in intestines of Dh31 mutants fed or not with Ecc to illustrate that muscular activity is not abolished (new data: Movie 9 and Movie 10).

      Minor points:

      (1) Why not use the Pros-Gal4/UAS-Dh31 strain in Figure 3B in addition to hCGRP?

      We opted for exogenous hCGRP addition because it allowed us precise timing control over Dh31 activation. Overexpression of Dh31 from embryogenesis or early larval stages could have significant and unintended effects on intestinal physiology, potentially confounding the results. While temporal control using TubG80ts could be an alternative, our focus was on identifying the specific cells responsible for the phenomenon.

      To achieve this, we perturbed Dh31 production via RNAi, specifically targeting a limited number of enteroendocrine cells (EECs) using the DJ752-Gal4 driver, as described by Lajeunesse et al., 2010. Our new data (Supplementary Figure 4) demonstrate that Dh31 expression in this subset of cells is indeed necessary for the blockage phenomenon.

      (2) Section title (line 287) refers to mortality, but no mortality data is in the figure.

      We agree that the title referenced mortality, whereas no mortality data was presented in this section. We have updated the title to better reflect the data discussed in this part of the manuscript.

      (3) It may be better to combine ROS-related contents in the same figure.

      While it is technically feasible to consolidate the ROS-related content into one figure, doing so would require splitting essential data, such as the Gal4 controls for the RNAi assays and parts of the survival phenotype data. We believe that the current structure of the study, which first explores the molecular aspects of the phenomenon and then demonstrates its relevance to the animal’s survival, provides a clearer and more logical flow. For these reasons, we prefer to maintain the current figure layout.

      Reviewer #2 (Recommendations For The Authors):

      Major recommendation

      (1) Other wild-type backgrounds should be added (including the w Drosdel background of the AMP14 deficient flies) to check the robustness of the phenotype.

      To address the concern regarding the robustness of the phenotype across different wildtype backgrounds, we have tested additional genetic backgrounds, including w1, the isogenized w1118 and Oregon animals. 

      The results (new data: Figure 1C) demonstrate that Lp is able to transit freely to the posterior part of the intestine in all backgrounds, while Ecc and Bt are blocked in the anterior part. These findings confirm the robustness of the phenotype across different wildtype strains.

      (2) Although we recognize that this may be limited by the number of GFP-expressing species, other commensal and pathogenic bacteria should be tested in this assay (e.g. E. faecalis and Acetobacter).

      We performed new intoxication experiments using Escherichia coli OP50, a wellestablished innocuous bacterial strain. The data, presented in Figure 1B (new data), show that E. coli OP50, despite being from the same genus as Ecc, does not trigger the blockage response. This further supports our hypothesis that the blockage phenomenon is specific to pathogenic bacteria and not simply related to the bacterial genus.

      (3) It is important to test whether sphincter closure also occurs in continuous feeding conditions. This does not mean repeating all the experiments but just shows that this mechanism can take place in conditions where larvae are kept in a vial with food.

      While the movies we provide involve larvae immobilized on tape in a humid chamber, which is not a fully physiological context, we now provide new data (Movie 3) showing that, after co-treatment with fluorescent Dextran (Red) and fluorescent Bt (Green), both substances are initially blocked in the anterior midgut. Later, the dextran is released posteriorly once bacterial clearance has begun.

      Additionally, we extended the feeding period in our experiments from 1 hour to 20 hours to simulate more continuous exposure to contaminated food. Even under these prolonged conditions, we observed that pathogens are still blocked in the anterior part of the gut (new data: Supplementary Figure 2B). This confirms that the sphincter mechanism can function in continuous feeding conditions as well.

      (4) What are the molecular determinants discriminating innocuous from pathogenic bacteria? Addressing this point will increase the impact of the article. The fact that Relish mutants have normal valve constriction suggests that peptidoglycan recognition is not involved. Is there a sensing of pathogen virulence factors? 

      Our data suggest that uracil could be a key molecular determinant in discriminating between innocuous and pathogenic bacteria, as previously described by the W-J Lee team in several studies on adult Drosophila. However, in our experiments, exogenous uracil addition using the blue dye protocol (Keita et al., 2017) did not induce any significant changes in the larvae. Similarly, uracil supplementation in adult flies failed to trigger the Ecc expulsion and gut contraction phenotype, as reported by Benguettat et al., 2018. 

      To further investigate this, we tested the addition of uracil during Lp-GFP intoxication. In these experiments, we did not observe any blockage of Lp (new data: Supplementary Figure 5). These results suggest that uracil might not be the sole trigger for the blockage response, or we may not be providing uracil exogenously in the most effective way. Alternatively, there could be other pathogen-specific virulence factors that contribute to this discrimination mechanism.

      To address this question, the authors should infect larvae with Ecc15 evf- mutants or Ecc15 lacking uracil production. 

      Thank you for your suggestion to use Ecc15 evf- mutants or Ecc15 lacking uracil production to explore the role of uracil in bacterial discrimination. While we have provided some data using uracil supplementation (new data: Supplementary Figure 5), we agree that testing mutants like PyrE would be an important next step. Unfortunately, we currently lack access to fluorescent PyrE or Ecc15 evf- mutants.

      We are planning to address this by developing a new protocol involving fluorescent beads alongside bacteria. This approach will allow us to test several bacterial strains in parallel and better define the size threshold of the valve. However, we do not have the relevant data yet, but this will be a key focus of our future work.

      Similarly, does feeding heat-killed Ecc15 or Bt induce sequestration in the anterior midgut (larvae may be fed dextran-FITC at the same time to track bacteria)?

      Unfortunately, in our attempts to test heat-killed or ethanol-killed fluorescent Ecc15 for these experiments, we encountered an issue: while we were able to efficiently kill the bacteria, we lost the GFP signal required to track their position in the gut. This made it challenging to assess whether sequestration in the anterior midgut occurs with non-viable bacteria.

      Is uracil or Bt toxin feeding sufficient to induce valve closure? 

      As previously mentioned, uracil is a strong candidate for bacterial discrimination, and we have tested its role by adding exogenous uracil during Lp-GFP intoxication. However, in these experiments, Lp was not blocked (new data: Supplementary Figure 5). This suggests that uracil alone may not be sufficient to induce valve closure, or it may not be the only factor involved. It is also possible that our method of exogenous uracil supplementation may not be effectively mimicking the endogenous conditions.

      Regarding Bt, we used vegetative cells without Cry toxins in our experiments. Cry toxins are only produced during sporulation and are enclosed in crystals within the spore. The Bt strain we used, 4D22, has been deleted for the plasmids encoding Cry toxins. As a result, there were no Cry toxins present in the Bt-GFP vegetative cells used in our assays. This has been clarified in the Materials and Methods section of the manuscript.

      Would Bleomycin induce the same phenotype? 

      Indeed, Bleomycin, as well as paraquat, has been shown to damage the gut and trigger intestinal cell proliferation in adult Drosophila through mechanisms involving TrpA1. Testing whether Bleomycin induces a similar phenotype in larvae would indeed be interesting.

      However, one challenge we face in our intoxication protocol is that larvae tend to stop feeding when chemicals are added to their food mixture. We encountered similar difficulties in our DTT experiments, which were challenging to set up for this reason. Consequently, we aim to avoid approaches that might impair the general feeding activity of the larvae, as it can significantly affect the outcomes of our experiments.

      Could this process of sphincter closure be more related to food poisoning?

      If gut damage were the primary trigger for sphincter closure, we would indeed expect the blockage phenomenon to occur later following bacterial exposure. However, in our experiments, we observe the blockage occurring early after bacterial contact, suggesting that damage may not be the main trigger for this response.

      That said, we have not yet tested bacterial mutants lacking toxins, nor have we tested a direct damaging agent such as Bleomycin, as proposed. These would be valuable future experiments to explore the potential role of gut damage more thoroughly in this process.

      (5) Is Imd activation normal in trpA1 and DH31 mutants? The authors could use a diptericin reporter gene to check if Diptericin is affected by a lack of valve closure in trpA1.

      To address this, we performed RT-qPCR on whole larval guts from wt, TrpA11 and Dh31KG09001 genetic background. Larvae were fed with Lp, Ecc, Bt or yeast only (new data: SUPP6). We monitored the expression of three different AMP-encoding genes and found that while AMP expression varied depending on the food content, there were no significant differences in AMP expression between the genotypes.

      Additionally, we provide imaging data from AMP reporter animals (pDpt-Cherry) in a wildtype background, fed with fluorescent Lp or Bt (new data: Supplementary Figure 11). These images also support the conclusion that Diptericin expression is not significantly affected by a lack of valve closure in trpA1 and Dh31 mutants.

      (6) Are the 2-6 DH31 positive cells the same cells described by Zaidman et al., Developmental and Comparative Immunology 36 (2012) 638-647.

      The cells identified as hemocytes in the midgut junctions by Zaidman et al. are likely the same cells we describe in our study, as they are located in the same region and are Dh31 positive. We have added a reference to this paper and included lines in the manuscript acknowledging this connection.

      Although confirming whether these cells are Hml+, Dh31+, and TrpA1+ would clarify their exact identity, this falls outside the scope of our current study. However, the possibility that these cells play a role in physical barrier immunity and also possess a hemocyte identity is indeed intriguing, and we hope future research will explore this further.

      Minor points

      (1) The mutations should be appropriately labelled with the allele name.

      This has been fixed in the main text, in Fig Legends, and in figures. 

      (2) Line 230-231: the sentence is unclear to me.

      We simplified the sentence and do not refer to the expulsion in larvae.

      (3) Discussion: although the discussion is already a bit long, it would be interesting to see if this process is likely to happen/has been described in other insects (mosquito, Bactrocera, ...).

      We reviewed the available literature but were unable to find specific examples describing the blockage phenomenon in other insects. Most studies we found focused on symbiotic bacteria rather than pathogenic or opportunistic bacteria. However, as mentioned in our manuscript, the anterior localization of opportunistic or pathogenic bacteria has been observed in Drosophila by independent research groups.

      (4) Line 546: add the Caudal Won-Jae Lee paper to state the posterior midgut is less microbicidal.

      We added the reference at the right place, mentioning as well that it concerns adults. 

      (5)  Figure 6 indicates what the cells are, shown by the arrow.

      The sentence ‘the arrows point to TARMs’ is present in the legend of Fig6.

      (6) Does the sphincter closure depend on hemocytes?

      As mentioned above, the cells we identify as TrpA1+ in the midgut junction may be the same cells described by Zaidman et al., 2012, and earlier by Lajeunesse et al., 2010. Inactivating hemocytes using the Hml-Gal4 driver may also affect these Dh31+ cells, as they share similarities with hemocytes, as pointed out by Zaidman et al. However, distinguishing between hemocytes and Dh31+/TrpA1+ cells would require a genetic intersectional approach, which is beyond the scope of our current study.

      Nevertheless, the possibility that these cells play a dual role in immunity (through blockage) and share characteristics with hemocytes while functioning as enteroendocrine cells (EECs) is quite intriguing and deserves further exploration in future studies.

    1. Author response:

      Reviewer #1 (Public review):

      Lu et al. use their workflow to visualize RNA expression of five enzymes that are each involved in the biosynthetic pathway of different neurotransmitters/modulators, namely chat (cholinergeric), gad (GABAergic), tbh (octopaminergic), th (dopaminergic), and tph (serotonergic). In this way, they generate an anatomical atlas of neurons that produce these molecules. Collectively these markers are referred to as the "neuronpool." They overstate when they write, "The combination of these five types of neurons constitutes a neuron pool that enables the labeling of all neurons throughout the entire body." This statement does not accurately represent the state of our knowledge about the diversity of neurons in S. mediterranea. There are several lines of evidence that support the presence of glutamatergic and glycinergic neurons, including the following. The glutamate receptor agonists NMDA and AMPA both produce seizure-like behaviors in S. mediterranea that are blocked by the application of glutamate receptor antagonists MK-801 and DNQX (which antagonize NMDA and AMPA glutamate receptors, respectively; Rawls et al., 2009). scRNA-Seq data indicates that neurons in S. mediterranea express a vesicular glutamate transporter, a kainite-type glutamate receptor, a glycine receptor, and a glycine transporter (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022). Two AMPA glutamate receptors, GluR1 and GluR2, are known to be expressed in the CNS of another planarian species, D. japonica (Cebria et al., 2002). Likewise, there is abundant evidence for the presence of peptidergic neurons in S. mediterranea (Collins et al., 2010; Fraguas et al., 2012; Ong et al., 2016; Wyss et al., 2022; among others) and in D. japonica (Shimoyama et al., 2016). For these reasons, the authors should not assume that all neurons can be assayed using the five markers that they selected. The situation is made more complex by the fact that many neurons in S. mediterranea appear to produce more than one neurotransmitter/modulator/peptide (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022), which is common among animals (Vaaga et al., 2014; Brunet Avalos and Sprecher, 2021). However the published literature indicates that there are substantial populations of glutamatergic, glycinergic, and peptidergic neurons in S. mediterranea that do not produce other classes of neurotransmission molecule (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022). Thus it seems likely that the neuronpool will miss many neurons that only produce glutamate, glycine or a neuropeptide.

      In response to your comments, we agree that our initial statement regarding the "neuron pool" overstated the extent of neuronal coverage provided by the five selected markers. We have revised the sentence as “The combination of these five types of neurons constitutes a neuron pool that enables the labeling of most of the neurons throughout the entire body, including the eyes, brain, and pharynx”. 

      Furthermore, we chose the five neurotransmitter systems (cholinergic, GABAergic, octopaminergic, dopaminergic, and serotonergic) based on their well-characterized roles in planarian neurobiology and the availability of reliable markers. However, we acknowledge the limitations of this approach and recognize that it does not encompass all neuron types, particularly those involved in glutamatergic, glycinergic, and peptidergic signaling, which have been documented in S. mediterranea. We will also add the content about other neuron types in our revised manuscript “Additionally, there is considerable diversity among glutamatergic, glycinergic, and peptidergic neurons in planarians. Many neurons in S. mediterranea express more than one neurotransmitter or neuropeptide, which adds further complexity to the system.”

      The authors use their technique to image the neural network of the CNS using antibodies raised vs. Arrestin, Synaptotagmin, and phospho-Ser/Thr. They document examples of both contralateral and ipsilateral projections from the eyes to the brain in the optic chiasma (Figure 1C-F). These data all seem to be drawn from a single animal in which there appears to be a greater than normal number of nerve fiber defasciculatations. It isn't clear how well their technique works for fibers that remain within a nerve tract or the brain. The markers used to image neural networks are broadly expressed, and it's possible that most nerve fibers are too densely packed (even after expansion) to allow for image segmentation. The authors also show a close association between estrella-positive glial cells and nerve fibers in the optic chiasma. 

      Thank you for your detailed feedback. While we did not perform segmentation of all neuron fibers, we were able to segment more isolated fibers that were not densely packed within the neural tracts. We use 120 nm resolution to segment neurons along the three axes. Our data show the presence of both contralateral and ipsilateral projections of visual neurons. Although Figure 1C-F shows data from one planarian, we imaged three independent specimens to confirm the consistency of these observations. In the revised manuscript, we will include a discussion on the limitations of TLSM in reconstructing neural networks, particularly when it comes to resolving fibers within densely packed regions of the nerve tracts.

      The authors count all cell types, neuron pool neurons, and neurons of each class assayed. They find that the cell number to body volume ratio remains stable during homeostasis (Figure S3C), and that the brain volume steadily increases with increasing body volume (Figure S3E). They also observe that the proportion of neurons to total body cells is higher in worms 2-6 mm in length than in worms 7-9 mm in length (Figure 2D, S3F). They find that the rate at which four classes of neurons (GABAergic, octopaminergic, dopaminergic, serotonergic) increase relative to the total body cell number is constant (Figure S3G-J). They write: "Since the pattern of cholinergic neurons is the major cell population in the brain, these results suggest that the above observation of the non-linear dynamics between neurons and cell numbers is likely from the cholinergic neurons." This conclusion should not be reached without first directly counting the number of cholinergic neurons and total body cells. Given that glutamatergic, glycinergic, and peptidergic neurons were not counted, it also remains possible that the non-linear dynamics are due (in part or in whole) to one or more of these populations. 

      We have removed the statement "Since the pattern of cholinergic neurons is the major cell population in the brain, these results suggest that the above observation of the non-linear dynamics between neurons and cell numbers is likely from the cholinergic neurons." We changed this statement into “These results suggest that the above observation of the non-linear dynamics between neurons and cell numbers is not likely from the octopaminergic, GABAergic, dopaminergic and serotonergic neurons. Since our neuron pool may not include glutamatergic, glycinergic, and peptidergic neurons, we would like to add the possibility that the non-linear dynamics may be from cholinergic neurons or other neurons not included in our staining.”

      Reviewer #2 (Public review): 

      Weaknesses: 

      (1) The proprietary nature of the microscope, protected by a patent, limits the technical details provided, making the method hard to reproduce in other labs. 

      Thank you for your comment. We understand the importance of reproducibility and transparency in scientific research. We would like to point out that the detailed design and technical specifications of the TLSM are publicly available in our published work: Chen et al., Cell Reports, 2020. Additionally, the protocol for C-MAP, including the specific experimental steps, is comprehensively described in the methods section of this paper. We believe that these resources should provide sufficient information for other labs to replicate the method.

      (2) The resolution of the analyses is mostly limited to the cellular level, which does not fully leverage the advantages of expansion microscopy. Previous applications of expansion microscopy have revealed finer nanostructures in the planarian nervous system (see Fan et al. Methods in Cell Biology 2021; Wang et al. eLife 2021). It is unclear whether the current protocol can achieve a comparable resolution. 

      Thank you for raising this important point. The strength of our C-MAP protocol lies in its fluorescence-protective nature and user convenience. Notably, the sample can be expanded up to 4.5-fold linearly without the need for heating or proteinase digestion, which helps preserve fluorescence signals. In addition, the entire expansion process can be completed within 48 hours. While our current analysis focused on cellular-level structures, our method can achieve comparable or better resolution and we will add this information in the revised manuscript.

      (3) The data largely corroborate past observations, while the novel claims are insufficiently substantiated. 

      A few major issues with the claims: 

      (4) Line 303-304: While 6G10 is a widely used antibody to label muscle fibers in the planarian, it doesn't uniformly mark all muscle types (Scimone at al. Nature 2017). For a more complete view of muscle fibers, it is important to use a combination of antibodies targeting different fiber types or a generic marker such as phalloidin. This raises fundamental concerns about all the conclusions drawn from Figures 4 and 6 about differences between various muscle types. Additionally, the authors should cite the original paper that developed the 6G10 antibody (Ross et al. BMC Developmental Biology 2015). 

      We appreciate the reviewer’s insightful comments and acknowledge that 6G10 does not uniformly label all muscle fiber types. We agree that this limitation should be recognized in the interpretation of our results. we will revise the manuscript to explicitly state the limitations of using 6G10 alone for muscle fiber labeling and highlight the need for additional markers. We would also clarify that the primary objective of our study was not to distinguish all muscle fiber types but rather to demonstrate the application of our 3D tissue reconstruction method in addressing traditional research questions. Nonetheless, we agree that expanding the labeling strategy in future studies would allow for a more thorough investigation of muscle fiber diversity. We will ensure all citations are properly revised and updated in our next version.

      (5) Lines 371-379: The claim that DV muscles regenerate into longitudinal fibers lacks evidence. Furthermore, previous studies have shown that TFs specifying different muscle types (DV, circular, longitudinal, and intestinal) both during regeneration and homeostasis are completely different (Scimone et al., Nature 2017 and Scimone et al., Current Biology 2018). Single-cell RNAseq data further establishes the existence of divergent muscle progenitors giving rise to different muscle fibers. These observations directly contradict the authors' claim, which is only based on images of fixed samples at a coarse time resolution. 

      Thank you for your valuable feedback. Our intent was not to suggest that DV muscles regenerate into longitudinal fibers. Our observations focused on the wound site, where DV muscle fibers appear to reconnect, and longitudinal fibers, along with other muscle types, gradually regenerate to restore the structure of the injured area. We will revise the relevant sections of the manuscript to clarify this dynamic process more accurately.

      (6) Line 423: The manuscript lacks evidence to claim glia guide muscle fiber branching. 

      We will remove this statement from the revised version. Instead, we will focus on describing our observations of the connections between glial cells and muscle fibers.

      (7) Lines 432/478: The conclusion about neuronal and muscle guidance on glial projections is similarly speculative, lacking functional evidence. It is possible that the morphological defects of estrella+ cells after bcat1 RNAi are caused by Wnt signaling directly acting on estrella+ cells independent of muscles or neurons. 

      We understand that this approach is insufficient and we will revise the manuscript to more clearly state the limitations of our data. We will describe our observations as preliminary and suggest that further experiments are required.

      (8) Finally, several technical issues make the results difficult to interpret. For example, in line 125, cell boundaries appear to be determined using nucleus images; in line 136, the current resolution seems insufficient to reliably trace neural connections, at least based on the images presented. 

      We use two setups for imaging cells and neuron projections. For cellular resolution imaging, we utilized a 1× air objective with a numerical aperture (NA) of 0.25 and a working distance of 60 mm (OLYMPUS MV PLAPO). The voxel size used was 0.8×0.8×2.5 µm3. This configuration resulted in a resolution of 2×2×5 µm3 and a spatial resolution of 0.5×0.5×1.25 µm3 with 4× isotropic expansion. Alternatively, for sub-cellular imaging, we employed a 10×0.6 SV MP water immersion objective with 0.8 NA and a working distance of 8 mm (OLYMPUS). The voxel size used in this configuration was 0.26×0.26×0.8 µm3. As a result of this configuration, we achieved a resolution of 0.5×0.5×1.6 µm3 and a spatial resolution of 0.12×0.12×0.4 µm3 with a 4.5× isotropic expansion. The higher resolution achieved with sub-cellular imaging allows us to observe finer structures and trace neural connections.

      Regarding your question about cell boundaries, we will revise the manuscript to specify that the boundaries we identified are those of each nucleus, rather than entire cells. This distinction will be made clear in the revised version.

      Reviewer #3 (Public review): 

      Weaknesses: 

      (1) The work would have been strengthened by a more careful consideration of previous literature. Many papers directly relevant to this work were not cited. Such omissions do the authors a disservice because in some cases, they fail to consider relevant information that impacts the choice of reagents they have used or the conclusions they are drawing. 

      For example, when describing the antibody they use to label muscles (monoclonal 6G10), they do not cite the paper that generated this reagent (Ross et al PMCID: PMC4307677), and instead, one of the papers they do cite (Cebria 2016) that does not mention this antibody. Ross et al reported that 6G10 does not label all body wall muscles equivalently, but rather "predominantly labels circular and diagonal fibers" (which is apparent in Figure S5A-D of the manuscript being reviewed here). For this reason, the authors of the paper showing different body wall muscle populations play different roles in body patterning (Scimone et al 2017, PMCID: PMC6263039, also not cited in this paper) used this monoclonal in combination with a polyclonal antibody to label all body wall muscle types. Because their "pan-muscle" reagent does not label all muscle types equivalently, it calls into question their quantification of the different body wall muscle populations throughout the manuscript. It does not help matters that their initial description of the body wall muscle types fails to mention the layer of thin (inner) longitudinal muscles between the circular and diagonal muscles (Cebria 2016 and citations therein). 

      Ipsilateral and contralateral projections of the visual axons were beautifully shown by dye-tracing experiments (Okamoto et al 2005, PMID: 15930826). This paper should be cited when the authors report that they are corroborating the existence of ipsilateral and contralateral projections. 

      Thank you for your feedback. We will incorporate these citations and clarifications into the revised manuscript. We acknowledge the limitations of this approach and recognize that it does not encompass all neuron types, particularly those involved in glutamatergic, glycinergic, and peptidergic signaling. We will also add the content about other neuron types in our revised version.

      (2) The proportional decrease of neurons with growth in S. mediterranea was shown by counting different cell types in macerated planarians (Baguna and Romero, 1981; https://link.springer.com/article/10.1007/BF00026179) and earlier histological observations cited there. These results have also been validated by single-cell sequencing (Emili et al, bioRxiv 2023, https://www.biorxiv.org/content/10.1101/2023.11.01.565140v). Allometric growth of the planaria tail (the tail is proportionately longer in large vs small planaria) can explain this decrease in animal size. The authors never really discuss allometric growth in a way that would help readers unfamiliar with the system understand this. 

      Thank you for your feedback. We will incorporate these citations and clarifications into the revised manuscript.

      (3) In some cases, the authors draw stronger conclusions than their results warrant. The authors claim that they are showing glial-muscle interactions, however, they do not provide any images of triple-stained samples labeling muscle, neurons, and glia, so it is impossible for the reader to judge whether the glial cells are interacting directly with body wall muscles or instead with the well-described submuscular nerve plexus. Their conclusion that neurons are unaffected by beta-cat or inr-1 RNAi based on anti-phospho-Ser/Thr staining (Fig. 6E) is unconvincing. They claim that during regeneration "DV muscles initially regenerate into longitudinal fibers at the anterior tip" (line 373). They provide no evidence for such switching of muscle cell types, so it is unclear why they say this. 

      We acknowledge that some of our conclusions were overclaimed given the current data, and we appreciate the opportunity to clarify and refine these claims in the revised manuscript. Regarding the statement that "DV muscles initially regenerate into longitudinal fibers at the anterior tip" (line 373), as addressed in our previous response, this phrasing was unclear. Our intent was not to imply that DV muscles switch into longitudinal fibers. Instead, we observed that muscle fibers reconnect at the wound site, with longitudinal fibers and other muscle types gradually restoring the structure. We will revise this section to better describe the dynamic changes observed during regeneration.

      (4) The authors show how their automated workflow compares to manual counts using PI-stained specimens (Figure S1T). I may have missed it, but I do not recall seeing a similar ground truth comparison for their muscle fiber counting workflow. I mention this because the segmented image of the posterior muscles in Figure 4I seems to be missing the vast majority of circular fibers visible to the naked eye in the original image. 

      Thank you for raising this important point. We will include a ground truth comparison of our automated muscle fiber counting with manual counts in the supplementary figures. Regarding the observation of missing circular fibers in Figure 4I, we agree that the segmentation appears to have missed a significant number of circular fibers in this particular image. This may have been due to limitations in the current parameters of the segmentation algorithm, especially in distinguishing fibers in regions of varying intensity or overlap. We are revisiting the segmentation parameters to improve the accuracy of detecting circular fibers, and we will provide an updated version of Figure 4I in the revised manuscript.

      (5) It is unclear why the abstract says, "We found the rate of neuron cell proliferation tends to lag..." (line 25). The authors did not measure proliferation in this work and neurons do not proliferate in planaria. 

      Thank you for bringing this to our attention. What we intended to convey was the increase in neuron number during homeostasis. We will revise the abstract to avoid this mistake in this context and instead describe it as the increase in neuron numbers due to progenitor cell differentiation during homeostasis.

      (6) It is unclear what readers are to make of the measurements of brain lobe angles. Why is this a useful measurement and what does it tell us? 

      The measurement of brain lobe angles is intended to provide a quantitative assessment of the growth and morphological changes of the planarian brain during regeneration. Additionally, the relevance of brain lobe angles has been explored in previous studies, such as Arnold et al., Nature, 2016, further supporting its use as a meaningful parameter.

      (7) The authors repeatedly say that this work lets them investigate planarians at the single-cell level, but they don't really make the case that they are seeing things that haven't already been described at the single-cell level using standard confocal microscopy. 

      Thank you for your comment. We agree that single-cell level imaging has been previously achieved in planarians using conventional confocal microscopy. However, our goal was to extend the application of expansion microscopy by combining C-MAP with tiling light sheet microscopy (TLSM), which allows for faster and high-resolution 3D imaging of whole-mount planarians. This combination offers several key advantages over traditional confocal microscopy. For example, it enables high-throughput imaging across entire organisms with a level of detail and speed that is not easily achieved using confocal methods. This approach allows us to investigate the planarian nervous system at multiple developmental and regenerative stages in a more comprehensive manner, capturing large-scale structures while preserving fine cellular details. The ability to rapidly image whole planarians in 3D with this resolution provides a more efficient workflow for studying complex biological processes. We believe this distinction is significant and represents an advance over previous methods. We will clarify this point in the manuscript to better distinguish our approach from standard techniques.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this article the authors described mouse models presenting with backer muscular dystrophy, they created three transgenic models carrying three representative exon deletions: ex45-48 del., ex45-47 19 del., and ex45-49 del. This article is well written but needs improvement in some points.

      Strengths:

      This article is well written. The evidence supporting the authors' claims is robust, though further implementation is necessary. The experiments conducted align with the current state-of-the-art methodologies.

      Weaknesses:

      This article does not analyze atrophy in the various mouse models. Implementing this point would improve the impact of the work

      We thank the reviewer for their constructive suggestions and comments on this work. Muscle hypertrophy is shown with growth in dystrophin-deficient skeletal muscle in mdx mice; thus, we did not pay attention to the factors associated with muscle atrophy in BMD mice. As the reviewer suggested, the examination of the association between type IIa fiber reduction and muscle atrophy is important, and the result is considered to be helpful in resolving the cause of type IIa fiber reduction in BMD mice.

      Thus, we are planning to:

      (1) Evaluate the cross-sectional areas (CSA) of muscles and compare them with the changes in the proportion of type IIa fibers.

      (2) Evaluate the expression levels of Murf1 and Atrogin1 as markers of muscle atrophy using RT-PCR.

      Reviewer #2 (Public review):

      Summary

      Miyazaki et al. established three distinct BMD mouse models by deleting different exon regions of the dystrophin gene, observed in human BMD. The authors demonstrated that these models exhibit pathophysiological changes, including variations in body weight, muscle force, muscle degeneration, and levels of fibrosis, alongside underlying molecular alterations such as changes in dystrophin and nNOS levels. Notably, these molecular and pathological changes progress at different rates depending on the specific exon deletions in the dystrophin gene. Additionally, the authors conducted extensive fiber typing, revealing a site-specific decline in type IIa fibers in BMD mice, which they suggest may be due to muscle degeneration and reduced capillary formation around these fibers.

      Strengths:

      The manuscript introduces three novel BMD mouse models with different dystrophin exon deletions, each demonstrating varying rates of disease progression similar to the human BMD phenotype. The authors also conducted extensive fiber typing across different muscles and regions within the muscles, effectively highlighting a site-specific decline in type IIa muscle fibers in BMD mice.

      Weaknesses:

      The authors have inadequate experiments to support their hypothesis that the decay of type IIa muscle fibers is likely due to muscle degeneration and reduced capillary formation. Further investigation into capillary density and histopathological changes across different muscle fibers is needed, which could clarify the mechanisms behind these observations.

      We thank the reviewer for these positive comments and the very important suggestion about type IIa fiber reduction and capillary change around muscle fibers in BMD mice. From the results of the cardiotoxin-induced muscle degeneration and regeneration model, type IIa and IIx fibers showed delayed recovery compared with that of type-IIb fibers. However, this delayed recovery of type IIa and IIx could not explain the cause of the selective muscle fiber reduction limited to type IIa fibers in BMD mice. Therefore, we considered vascular dysfunction as the reason for the selective type IIa fiber reduction, and we found morphological capillary changes from a “ring pattern” to a “dot pattern” around type IIa fibers in BMD mice. However, the association between selective type IIa fiber reduction and the capillary change around muscle fibers in BMD mice remains unclear due to the lack of information about capillaries around type IIx and IIb fibers. The reviewer pointed out this insufficient evaluation of capillaries around other muscle fibers (except for type IIa fibers), and this suggestion is very helpful for explaining the association between selective type IIa fiber reduction and vascular dysfunction in BMD mice.

      Thus, we are planning to:

      (1) Evaluate the changes in capillary formation around other muscle fibers, except for type IIa fibers (e.g., type IIx and IIb fibers).

      (2) Evaluate the endothelial area around other muscle fibers, except for type IIa fibers.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors have assembled a cohort of 10 SiNET, 1 SiAdeno, and 1 lung MiNEN samples to explore the biology of neuroendocrine neoplasms. They employ single-cell RNA sequencing to profile 5 samples (siAdeno, SiNETs 1-3, MiNEN) and single-nuclei RNA sequencing to profile seven frozen samples (SiNET 4-10).

      They identify two subtypes of siNETs, characterized by either epithelial or neuronal NE cells, through a series of DE analyses. They also report findings of higher proliferation in non-malignant cell types across both subtypes. Additionally, they identify a potential progenitor cell population in a single-lung MiNEN sample.

      Strengths:

      Overall, this study adds interesting insights into this set of rare cancers that could be very informative for the cancer research community. The team probes an understudied cancer type and provides thoughtful investigations and observations that may have translational relevance.

      Weaknesses:

      The study could be improved by clarifying some of the technical approaches and aspects as currently presented, toward enhancing the support of the conclusions:

      (1) Methods: As currently presented, it is possible that the separation of samples by program may be impacted by tissue source (fresh vs. frozen) and/or the associated sequencing modality (single cell vs. single nuclei). For instance, two (SiNET1 and SiNET2) of the three fresh tissues are categorized into the same subtype, while the third (SiNET9) has very few neuroendocrine cells. Additionally, samples from patient 1 (SiNET1 and SiNET6) are separated into different subtypes based on fresh and frozen tissue. The current text alludes to investigations (i.e.: "Technical effects (e.g., fresh vs. frozen samples) could also impact the capture of distinct cell types, although we did not observe a clear pattern of such bias."), but the study would be strengthened with more detail.

      We thank the reviewer for the thoughtful and constructive review. Due to the difficulty in obtaining enough SiNET samples, we used two platforms to generate data - single cell analysis of fresh samples, and single nuclei analysis of frozen samples. We opted to combine both sample types in our analysis while being fully aware of the potential for batch effects. We therefore agree that this is a limitation of our work, and that differences between samples should be interpreted with caution.

      Nevertheless, we argue that the two SiNET subtypes that we have identified are very unlikely to be due to such batch effect. First, the epithelial SiNET subtype was not only detected in two fresh samples but also in one frozen sample (albeit with relatively few cells, as the reviewer correctly noted). Second, and more importantly, the epithelial SiNET subtype was also identified in analysis of an external and much larger cohort of bulk RNA-seq SiNET samples that does not share the issue of two platforms (as seen in Fig. 2f). Moreover, the proportion of samples assigned to the two subtypes is similar between our data and the external data. We therefore argue that the identification of two SiNET subtypes cannot be explained by the use of two data platforms. However, we agree that the results should be further investigated and validated by future studies, as is often done in research on rare tumors.

      The reviewer also commented that two samples from the same patient which were profiled by different platforms (SiNET1 and SiNET6) were separated into different subtypes. We would like to clarify that this is not the case, since SiNET6 was not included in the subtype analysis due to too few detected Neuroendocrine cells, and was not assigned to any subtype, as noted in the text and as can be seen by its exclusion from Figure 2 where subtypes are defined. We apologize that our manuscript may have gave the wrong impression about SiNET6 classification (it is labeled in Fig. 4a in a misleading manner). In the revised manuscript, we will correct the labeling in Fig. 4a and clarify that SiNET is not assigned to any subtype. We will further acknowledge the limitation of the two platforms and the arguments in favor of the existence of two SiNET subtypes.

      (2) Results:<br /> Heterogeneity in the SiNET tumor microenvironment: It is unclear if the current analysis of intratumor heterogeneity distinguishes the subtypes. It may be informative if patterns of tumor microenvironment (TME) heterogeneity were identified between samples of the same subtype. The team could also evaluate this in an extension cohort of published SiNET tumors (i.e. revisiting additional analyses using the SiNET bulk RNAseq from Alvarez et al 2018, a subset of single-cell data from Hoffman et al 2023, or additional bulk RNAseq validation cohorts for this cancer type if they exist [if they do not, then this could be mentioned as a need in Discussion])

      We agree that analysis of an independent cohort will assist in defining the association between TME and the SiNET subtype. However, the sample size required for that is significantly larger than the data available. In the revised manuscript we will note that as a direction for future studies.

      (3) Proliferation of NE and immune cells in SiNETs: The observed proliferation of NE and immune cells in SiNETs may also be influenced by technical factors (including those noted above). For instance, prior studies have shown that scRNA-seq tends to capture a higher proportion of immune cells compared to snRNA-seq, which should be considered in the interpretation of these results. Could the team clarify this element?

      We agree that different platforms could affect the observed proportions of immune cells, and more generally the proportions of specific cell types. However, the low proliferation of Neuroendocrine cells and the higher proliferation of immune cells (especially B cells, but also T cells and macrophages) is consistently observed in both platforms, as shown in Fig. 4a, and therefore appears to be reliable despite the limitations of our work. We will clarify this consistency in the revised manuscript. 

      (4) Putative progenitors in mixed tumors: As written, the identification of putative progenitors in a single lung MiNEN sample feels somewhat disconnected from the rest of the study. These findings are interesting - are similar progenitor cell populations identified in SiNET samples? Recognizing that ideally additional validation is needed to confidently label and characterize these cells beyond gene expression data in this rare tumor, this limitation could be addressed in a revised Discussion.

      We agree with this comment and will add the need for additional validation for this finding in the revised Discussion.

      Reviewer #2 (Public review):

      Summary:

      The research identifies two main SiNET subtypes (epithelial-like and neuronal-like) and reveals heterogeneity in non-neuroendocrine cells within the tumor microenvironment. The study validates findings using external datasets and explores unexpected proliferation patterns. While it contributes to understanding SiNET oncogenic processes, the limited sample size and depth of analysis present challenges to the robustness of the conclusions.

      Strengths:

      The studies effectively identified two subtypes of SiNET based on epithelial and neuronal markers. Key findings include the low proliferation rates of neuroendocrine (NE) cells and the role of the tumor microenvironment (TME), such as the impact of Macrophage Migration Inhibitory Factor (MIF).

      Weaknesses:

      However, the analysis faces challenges such as a small sample size, lack of clear biological interpretation in some analyses, and concerns about batch effects and statistical significance.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors set out to profile small intestine neuroendocrine tumors (siNETs) using single-cell/nucleus RNA sequencing, an established method to characterize the diversity of cell types and states in a tumor. Leveraging this dataset, they identified distinct malignant subtypes (epithelial-like versus neuronal-like) and characterized the proliferative index of malignant neuroendocrine cells versus non-malignant microenvironment cells. They found that malignant neuroendocrine cells were far less proliferative than some of their non-malignant counterparts (e.g., B cells, plasma cells, epithelial cells) and there was a strong subtype association such that epithelial-like siNETs were linked to high B/plasma cell proliferation, potentially mediated by MIF signaling, whereas neuronal-like siNETs were correlated with low B/plasma cell proliferation. The authors also examined a single case of a mixed lung tumor (neuroendocrine and squamous) and found evidence of intermediate/mixed and stem-like progenitor states that suggest the two differentiated tumor types may arise from the same progenitor.

      Strengths:

      The strengths of the paper include the unique dataset, which is the largest to date for siNETs, and the potentially clinically relevant hypotheses generated by their analysis of the data.

      Weaknesses:

      The weaknesses of the paper include the relatively small number of independent patients (n = 8 for siNETs), lack of direct comparison to other published single-cell NET datasets, mixing of two distinct methods (single-cell and single-nucleus RNA-seq), lack of direct cell-cell interaction analyses and spatially-resolved data, and lack of in vitro or in vivo functional validation of their findings.

      The analytical methods applied in this study appear to be appropriate, but the methods used are fairly standard to the field of single-cell omics without significant methodological innovation. As the authors bring forth in the Discussion, the results of the study do raise several compelling questions related to the possibility of distinct biology underlying the epithelial-like and neuronal-like subtypes, the origin of mixed tumors, drivers of proliferation, and microenvironmental heterogeneity. However, this study was not able to further explore these questions through spatially-resolved data or functional experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      The authors examined the salt-dependent phase separation of the low-complexity domain of hnRN-PA1 (A1-LCD). Using all-atom molecular dynamics simulations, they identified four distinct classes of salt dependence in the phase separation of intrinsically disordered proteins (IDPs), which can be predicted based on their amino acid composition. However, the simulations and analysis, in their current form, are inadequate and incomplete. 

      Strengths: 

      The authors attempt to unravel the mechanistic insights into the interplay between salt and protein phase separation, which is important given the complex behavior of salt effects on this process. Their effort to correlate the influence of salt on the low-complexity domain of hnRNPA1 (A1-LCD) with a range of other proteins known to undergo salt-dependent phase separation is an interesting and valuable topic. 

      Weaknesses: 

      (1) The simulations performed are not sufficiently long (Figure 2A) to accurately comment on phase separation behavior. The simulations do not appear to have converged well, indicating that the system has not reached a steady state, rendering the analysis of the trajectories unreliable.

      We have extended the simulations for an additional 500 ns, to 1500 ns. The last 500 ns show reasonably good convergence (see Figure 2A).

      (2) The majority of the data presented shows no significant alteration with changes in salt concentration. However, the authors have based conclusions and made significant comments regarding salt activities. The absence of error bars in the data representation raises questions about its reliability. Additionally, the manuscript lacks sufficient scientific details of the calculations.  

      We have now included error bars. With the error bars, the salt dependences of all the calculated properties (exception for Rg) show a clear trend. Additionally, we have expanded the descriptions of our calculations (p. 15-16).

      (3) In Figures 2B and 2C, the changes in the radius of gyration and the number of contacts do not display significant variations with changes in salt concentration. The change in the radius of gyration with salt concentration is less than 1 Å, and the number of contacts does not change by at least 1. The authors' conclusions based on these minor changes seem unfounded. 

      The variation of ~ 1 Å for the calculated Rg is similar to the counterpart for the experimental Rg. As for the number of contacts, note that this property is presented on a per-residue basis, so a value of 1 means that each residue picks up one additional contact, or each protein chain gains a total of 131 contacts, when the salt concentration is increased from 50 to 1000 mM.

      Reviewer #2 (Public Review): 

      This is an interesting computational study addressing how salt affects the assembly of biomolecular condensates. The simulation data are valuable as they provide a degree of atomistic details regarding how small salt ions modulate interactions among intrinsically disordered proteins with charged residues, namely via Debye-like screening that weakens the effective electrostatic interactions among the polymers, or through bridging interactions that allow interactions between like charges from different polymer chains to become effectively attractive (as illustrated, e.g., by the radial distribution functions in Supplementary Information). However, this manuscript has several shortcomings: 

      (i) Connotations of the manuscript notwithstanding, many of the authors' concepts about salt effects on biomolecular condensates have been put forth by theoretical models, at least back in 2020 and even earlier. Those earlier works afford extensive information such as considerations of salt concentrations inside and outside the condensate (tie-lines). But the authors do not appear to be aware of this body of prior works and therefore missed the opportunity to build on these previous advances and put the present work with its complementary advantages in structural details in the proper context.

      (ii) There are significant experimental findings regarding salt effects on condensate formation [which have been modeled more recently] that predate the A1-LCD system (ref.19) addressed by the present manuscript. This information should be included, e.g., in Table 1, for sound scholarship and completeness. 

      (iii) The strengths and limitations of the authors' approach vis-à-vis other theoretical approaches should be discussed with some degree of thoroughness (e.g., how the smallness of the authors' simulation system may affect the nature of the "phase transition" and the information that can be gathered regarding salt concentration inside vs. outside the "condensate" etc.). Accordingly, this manuscript should be revised to address the following. In particular, the discussion in the manuscript should be significantly expanded by including references mentioned below as well as other references pertinent to the issues raised. 

      (1) The ability to use atomistic models to address the questions at hand is a strength of the present work. However, presumably because of the computational cost of such models, the "phase-separated" "condensates" in this manuscript are extremely small (only 8 chains). An inspection of Fig.1 indicates that while the high-salt configuration (snapshot, bottom right) is more compact and droplet-like than the low-salt configuration (top right), it is not clear that the 50 mM NaCl configuration can reasonably correspond to a dilute or homogeneous phase (without phase separation) or just a condensate with a lower protein concentration because the chains are still highly associated. One may argue that they become two droplets touching each other (the chains are not fully dispersed throughout the simulation box, unlike in typical coarse-grained simulations of biomolecular phase separation). While it may not be unfair to argue from this observation that the condensed phase is less stable at low salt, this raises critical questions about the adequacy of the approach as a stand-alone source of theoretical information. Accordingly, an informative discussion of the limitation of the authors' approach and comparisons with results from complementary approaches such as analytical theories and coarsegrained molecular dynamics will be instructive-even imperative, especially since such results exist in the literature (please see below). 

      We now discuss the limitations of our all-atom simulations and also other approaches (p. 13; see below).

      (2) The aforementioned limitation is reflected by the authors' choice of using Dmax as a sort of phase separation order parameter. However, no evidence was shown to indicate that Dmax exhibits a twostate-like distribution expected of phase separation. It is also not clear whether a Dmax value corresponding to the linear dimension of the simulation box was ever encountered in the authors' simulated trajectories such that the chains can be reliably considered to be essentially fully dispersed as would be expected for the dilute phase. Moreover, as the authors have noted in the second paragraph of the Results, the variation of Dmax with simulation time does not show a monotonic rank order with salt concentration. The authors' explanation is equivalent to stipulating that the simulation system has not fully equilibrated, inevitably casting doubt on at least some of the conclusions drawn from the simulation data. 

      First off, with the extended simulations, the Dmax values converge to a tiered order rank, with successively decreasing values from low salt (50 mM) to intermediate salt (150 and 300 mM) to high salt (500 and 1000 mM). Secondly, as we now state (p. 13), our low-salt simulations mimic a homogenous solution whereas our high-salt simulations mimic the dense phase of a phase-separated system. The intermediate-salt simulations also mimic the dense phase but at a somewhat lower concentration (hence the intermediate Dmax value).

      (3) With these limitations, is it realistic to estimate possible differences in salt concentration between the dilute and condensed phases in the present work? These features, including tie-lines, were shown to be amenable to analytical theory and coarse-grained molecular dynamics simulation (please see below).  

      The differences in salt effects that we report do not represent those between two phases. Rather, as explained in the preceding reply, they represent differences between a homogenous solution at low salt and the dense phase at higher salt. We also acknowledge salt effects calculated by analytical theory and coarse-grained simulations (p. 13).

      (4) In the comparison in Fig.2B between experimental and simulated radius of gyration as a function of [NaCl], there is an outlier among the simulated radii of gyration at [NaCl] ~ 250 mM. An explanation should be offered.  

      After extending the simulations and analyzing the last 500 ns, the Rg data no longer show an outlier though still have some fluctuations from one salt concentration to another.

      (5) The phenomenon of no phase separation at zero and low salt and phase separation at higher salt has been observed for the IDP Caprin1 and several of its mutants [Wong et al., J Am Chem Soc 142, 24712489 (2020) [https://pubs.acs.org/doi/full/10.1021/jacs.9b12208], see especially Fig.9 of this reference]. This work should be included in the discussion and added to Table 1. 

      We now have added Caprin1 to Table 1 (new ref 26) and discuss this paper (p. 13).

      (6) The authors stated in the Introduction that "A unifying understanding of how salt affects the phase separation of IDPs is still lacking". While it is definitely true that much remains to be learned about salt effects on IDP phase separation, the advances that have already been made regarding salt effects on IDP phase separation is more abundant than that conveyed by this narrative. For instance, an analytical theory termed rG-RPA was put forth in 2020 to provide a uniform (unified) treatment of salt, pH, and sequence-charge-pattern effects on polyampholytes and polyelectrolytes (corresponding to the authors' low net charge and high net charge cases). This theory offers a means to predict salt-IDP tie-lines and a comprehensive account of salt effect on polyelectrolytes resulting in a lack of phase separation at extremely low salt and subsequent salt-enhanced phase separation (similar to the case the authors studied here) and in some cases re-entrant phase separation or dissolution [Lin et al., J Chem Phys 152. 045102 (2020) [https://doi.org/10.1063/1.5139661]]. This work is highly relevant and it already provided a conceptual framework for the authors' atomistic results and subsequent discussion. As such, it should definitely be a part of the authors' discussion. 

      We now cite this paper (new ref 34) in Introduction (p. 4). We also discuss its results for Caprin1 (new ref 18; p. 13).

      (7) Bridging interactions by small ions resulting in effective attractive interactions among polyelectrolytes leading to their phase separation have been demonstrated computationally by Orkoulas et al., Phys Rev Lett 90, 048303 (2003) [https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.90.048303]. This result should also be included in the discussion. 

      We now cite this paper (new ref 41; p. 11).

      (8) More recently, the salt-dependent phase separations of Caprin1, its RtoK variants and phosphorylated variant (see item #5 above) were modeled (and rationalized) quite comprehensively using rG-RPA, field-theoretic simulation, and coarse-grained molecular dynamics [Lin et al., arXiv:2401.04873 [https://arxiv.org/abs/2401.04873]], providing additional data supporting a conceptual perspective put forth in Lin et al. J Chem Phys 2020 (e.g., salt-IDP tie-lines, bridging interactions, reentrance behaviors etc.) as well as in the authors' current manuscript. It will be very helpful to the readers of eLife to include this preprint in the authors' discussion, perhaps as per the authors' discretion along the manner in which other preprints are referenced and discussed in the current version of the manuscript. 

      We now cite this paper (new ref 18) and discuss it along with new ref 26 in Discussion (p. 13).

      Reviewer #3 (Public Review): 

      Summary: 

      This study investigates the salt-dependent phase separation of A1-LCD, an intrinsically disordered region of hnRNPA1 implicated in neurodegenerative diseases. The authors employ all-atom molecular dynamics (MD) simulations to elucidate the molecular mechanisms by which salt influences A1-LCD phase separation. Contrary to typical intrinsically disordered protein (IDP) behavior, A1-LCD phase separation is enhanced by NaCl concentrations above 100 mM. The authors identify two direct effects of salt: neutralization of the protein's net charge and bridging between protein chains, both promoting condensation. They also uncover an indirect effect, where high salt concentrations strengthen pi-type interactions by reducing water availability. These findings provide a detailed molecular picture of the complex interplay between electrostatic interactions, ion binding, and hydration in IDP phase separation. 

      Strengths: 

      Novel Insight: The study challenges the prevailing view that salt generally suppresses IDP phase separation, highlighting A1-LCD's unique behavior. 

      Rigorous Methodology: The authors utilize all-atom MD simulations, a powerful computational tool, to investigate the molecular details of salt-protein interactions. 

      Comprehensive Analysis: The study systematically explores a wide range of salt concentrations, revealing a nuanced picture of salt effects on phase separation. 

      Clear Presentation: The manuscript is well-written and logically structured, making the findings accessible to a broad audience. 

      Weaknesses: 

      Limited Scope: The study focuses solely on the truncated A1-LCD, omitting simulations of the full-length protein. This limitation reduces the study's comparative value, as the authors note that the full-length protein exhibits typical salt-dependent behavior. A comparative analysis would strengthen the manuscript's conclusions and broaden its impact.

      Perhaps we did not impress on the reviewer how expensive the all-atom MD simulations on A1-LCD were: the systems each contained half a million atoms and the simulations took many months to complete. That said, we agree with the reviewer that, ideally, a comparative study on a protein showing the typical screening class of salt dependence would have made our work more complete. However, we are confident of the conclusions for several reasons. First, the three salt effects – charge neutralization, bridging, and strengthening of pi-types of interactions – revealed by the all-atom simulations are physically sound and well-supported by other studies. Second, these effects led us to develop a unified picture for the salt dependence of homotypic phase separation, in the form of a predictor for the classes of salt dependence based on amino-acid composition. This predictor works well for nearly 30 proteins. Third, recent studies using analytical theory and coarse-grained simulations (new ref 18) also strongly support our conclusions.

      Reviewer #1 (Recommendations For The Authors): 

      (1) In Figure 1, the color scheme should be updated and the figure remade, as the current set of color choices makes it very difficult to distinguish the magenta spheres.  

      We have increased the sizes of ions in Figure 1 to make them distinguishable.

      (2) Within the framework of atomistic simulations, the influence of salt concentration alteration on protein conformational plasticity is worth investigating. This could be correlated (with proper details) with the effect of salt-concentration-modulated protein aggregation behavior. 

      We now use RMSF to measure conformational plasticity, which shows a clear salt-dependent trend with a 27% reduction in fluctuations from 50 mM to 1000 mM NaCl (new Fig. S1).

      (3) The authors should mention the protein concentrations employed in the simulations and whether these are consistent with experimentally used concentrations.  

      We have mentioned the initial concentration (3.5 mM). We now further state that this concentration is maintained in the low-salt simulations, indicating absence of phase separation, but is increased to 23 mM in the high-salt simulations, indicating phase separation. The latter value is consistent with the measured concentrations in the dense phase (last two paragraphs of p. 5).

      (4) It would be useful to test the salt effect for at least two extreme salt concentrations at various protein concentrations, consistent with experimental protein concentration ranges.  

      In simulation studies of short peptides (ref 37), we have shown that the initial concentration does not affect the final concentration in the dense phase, as expected for phase-separation systems. We expect that the same will be true for the A1-LCD system at intermediate and high salt where phase separation occurs. Though this expectation could be tested by simulations at a different initial protein concentration, such simulations would be expensive but unlikely to yield new physical insight.

      (5) Importantly, the simulations do not appear to have converged well enough (Figure 2A). The authors should extend the simulation trajectories to ensure the system has reached a steady state.  

      We extended the simulations for an additional 500 ns, which now appear to show convergence. In Figure 2A we now see Dmax values converge to a tiered order rank, with successively decreasing values from low salt (50 mM) to intermediate salt (150 and 300 mM) to high salt (500 and 1000 mM). 

      (6) The authors mention "phase separation" in the title, but with only a 1 μs simulation trajectory, it is not possible to simulate a phenomenon like phase separation accurately. Since atomistic simulations cannot realistically capture phase separation on this timescale, a coarse-grained approach is more suitable. To properly explore salt effects in the context of phase separation, long timescale simulation trajectories should be considered. Otherwise, the data remain unreliable. 

      Our all-atom simulations revealed rich salt effects that might have been missed in coarse-grained simulations. It is true that coarse-grained models allow the simulations of the phase separation process, but as we have recently demonstrated (refs 36 and 37), all-atom simulations on the μs timescale are also able to capture the spontaneous phase separation of peptides and small IDPs. A1-LCD is much larger than those systems, so we had to use a relatively small chain number (8 chains here vs 64 used in ref 37 and 16 used in ref 37). S2ll, we observe the condensation into a dense phase at high salt. We discuss the pros and cons of all-atom vs. coarse-grained simulations in p. 13.

      (7) In Figure 5E, the plot does not show that g(r) has reached 1. If it does, the authors should show the full curve. The same issue remains with supplementary figures 1, 2, 3, etc.  

      We now show the approach to 1 in the insets of Figs. S2, S3, S4, and 5E.

      (8) None of the data is represented with error bars. The authors should include error bars in their data representations. 

      We have now included error bars in all graphs that report average values.

      (9) The authors state that "the net charge of the system reduces to only +8 at 1000 mM NaCl (Figure 3C)" but do not explain how this was calculated. 

      We now add this explanation in methods (p. 16).

      (10). The authors mention "similar to the role played by ATP molecules in driving phase separation of positively charged IDPs." However, ATP can inhibit aggregation, and its induction of phase separation is concentration-dependent. Given ATP's large aromatic moiety, its comparison to ions is not straightforward and is more complex. This comparison can be at best avoided. 

      In this context we are comparing the bridging capability of ATP molecules in driving phase separation of positively charged IDPs in ref 36 to the bridging capability of the ions here. In ref 36 the authors show ATP bridging interactions between protein chains similar to what we show here with ions.

      (11) Many calculations are vaguely represented. The process for calculating the number of bridging ions, for example, is not well documented. The authors should provide sufficient details to allow for the reproducibility of the data. 

      We have now expanded the methods section to include more detailed information on calculations done.

      Reviewer #3 (Recommendations For The Authors): 

      Include error bars or standard deviations for all results averaged over four replicates, particularly for the number of ions and contacts per residue. This would provide a clearer picture of the data's reliability and variability. 

      We have now included error bars in all graphs that report averaged values.

      Strengthen the support for the conclusion that "each Arg sidechain often coordinates two Cl- ions, multiple backbone carbonyls often coordinate a single Na+ ion." While Fig. 3A clearly demonstrates ArgCl- coordination, the Na+ coordination claim for a 131-residue protein requires further clarification. Consider including the integration profile of radial distribution functions for Na+ ions to bolster this assertion. 

      We now report the number of Na+ ions that coordinate with multiple backbone carbonyls (p. 7) as well as the number of Na+ ions that bridge between A1-LCD chains via coordination with multiple backbone carbonyls (p. 9). Please note that Figure 4A right panel displays an example of Na+ coordinating with multiple backbone carbonyls.

      Address the following typographical errors in the main text: o Page 11, line 25: "distinct classes of sat dependence" should be "distinct classes of salt dependence" o Page 14, line 9: "for Cl- and 3.0 and 5.4 A" should be "for Cl- and 3.0 and 5.4 √Ö" o Page 14, line 18: "As a control, PRDFs for water were also calculated" should be "As a control, RDFs for water were also calculated" (assuming PRDF was meant to be RDF) 

      We have now corrected these typos.

      Consider expanding the study to include simulations of the full-length protein to provide a more comprehensive comparison between the truncated A1-LCD and the complete protein's behavior in various salt concentrations. 

      As we explained above, even with eight chains of A1-LCD, which has 131 residues, the systems already contain half a million atoms each and the all-atom simulations took many months to complete. Full-length A1 has 314 residues so a multi-chain system would be too large to be feasible for all-atom simulations.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      Crosslinking mass spectrometry has become an important tool in structural biology, providing information about protein complex architecture, binding sites and interfaces, and conformational changes. One key challenge of this approach represents the quantitation of crosslinking data to interrogate differential binding states and distributions of conformational states.

      Here, Luo and Ranish present a novel class of isobaric crosslinkers ("Qlinkers"), conduct proof-of-concept benchmarking experiments on known protein complexes, and show example applications on selected target proteins. The data are solid and this could well be an exciting, convincing new approach in the field if the quantitation strategy is made more comprehensive and the quantitative power of isobaric labeling is fully leveraged as outlined below. It's a promising proof-of-concept, and potentially of broad interest for structural biologists.

      Strengths:

      The authors demonstrate the synthesis, application, and quantitation of their "Q2linkers", enabling relative quantitation of two conditions against each other. In benchmarking experiments, the Q2linkers provide accurate quantitation in mixing experiments. Then the authors show applications of Q2linkers on MBP, Calmodulin, selected transcription factors, and polymerase II, investigating protein binding, complex assembly, and conformational dynamics of the respective target proteins. For known interactions, their findings are in line with previous studies, and they show some interesting data for TFIIA/TBP/TFIIB complex formation and conformational changes in pol II upon Rbp4/7 binding.

      Weaknesses:

      This is an elegant approach but the power of isobaric mass tags is not fully leveraged in the current manuscript.

      First, "only" Q2linkers are used. This means only two conditions can be compared. Theoretically, higher-plexed Qlinkers should be accessible and would also be needed to make this a competitive method against other crosslinking quantitation strategies. As it is, two conditions can still be compared relatively easily using LFQ - or stable-isotope-labeling based approaches. A "Q5linker" would be a really useful crosslinker, which would open up comprehensive quantitative XLMS studies.

      We agree that a multiplexed Qlinker approach would be very useful. The multiplexed Qlinkers are more difficult and more expensive to synthesize. We are currently working on different schemes for synthesizing multiplexed Qlinkers.

      Second, the true power of isobaric labeling, accurate quantitation across multiple samples in a single run, is not fully exploited here. The authors only show differential trends for their interaction partners or different conformational states and do not make full quantitative use of their data or conduct statistical analyses. This should be investigated in more detail, e.g. examine Qlinker quantitation of MBP incubated with different concentrations of maltose or Calmodulin incubated with different concentrations of CBPs. Does Qlinker quantitation match ratios predicted using known binding constants or conformational state populations? Is it possible to extract ratios of protein populations in different conformations, assembly, or ligand-bound states?

      With these two points addressed this approach could be an important and convincing tool for structural biologists.

      We agree that multiplexed Qlinkers would open the door to exciting avenues of investigation such as studying conformational state populations.  We plan to conduct the suggested experiments when multiplexed Qlinkers are available.

      Reviewer #2 (Public review):

      The regulation of protein function heavily relies on the dynamic changes in the shape and structure of proteins and their complexes. These changes are widespread and crucial. However, examining such alterations presents significant challenges, particularly when dealing with large protein complexes in conditions that mimic the natural cellular environment. Therefore, much emphasis has been put on developing novel methods to study protein structure, interactions, and dynamics. Crosslinking mass spectrometry (CSMS) has established itself as such a prominent tool in recent years. However, doing this in a quantitative manner to compare structural changes between conditions has proven to be challenging due to several technical difficulties during sample preparation. Luo and Ranish introduce a novel set of isobaric labeling reagents, called Qlinkers, to allow for a more straightforward and reliable way to detect structural changes between conditions by quantitative CSMS (qCSMS).

      The authors do an excellent job describing the design choices of the isobaric crosslinkers and how they have been optimized to allow for efficient intra- and inter-protein crosslinking to provide relevant structural information. Next, they do a series of experiments to provide compelling evidence that the Qlinker strategy is well suited to detect structural changes between conditions by qCSMS. First, they confirm the quantitative power of the novel-developed isobaric crosslinkers by a controlled mixing experiment. Then they show that they can indeed recover known structural changes in a set of purified proteins (complexes) - starting with single subunit proteins up to a very large 0.5 MDa multi-subunit protein complex - the polII complex.

      The authors give a very measured and fair assessment of this novel isobaric crosslinker and its potential power to contribute to the study of protein structure changes. They show that indeed their novel strategy picks up expected structural changes, changes in surface exposure of certain protein domains, changes within a single protein subunit but also changes in protein-protein interactions. However, they also point out that not all expected dynamic changes are captured and that there is still considerable room for improvement (many not limited to this crosslinker specifically but many crosslinkers used for CSMS).

      Taken together the study presents a novel set of isobaric crosslinkers that indeed open up the opportunity to provide better qCSMS data, which will enable researchers to study dynamic changes in the shape and structure of proteins and their complexes. However, in its current form, the study some aspects of the study should be expanded upon in order for the research community to assess the true power of these isobaric crosslinkers. Specifically:

      Although the authors do mention some of the current weaknesses of their isobaric crosslinkers and qCSMS in general, more detail would be extremely helpful. Throughout the article a few key numbers (or even discussions) that would allow one to better evaluate the sensitivity (and the applicability) of the method are missing. This includes:

      (1) Throughout all the performed experiments it would be helpful to provide information on how many peptides are identified per experiment and how many have actually a crosslinker attached to it.

      As the goal of the experiments is to maximize identification of crosslinked peptides which tend to have higher charge states, we targeted ions with charge states of 3+ or higher in our MS acquisition settings for CLMS, and ignored ions with 2+ charge states, which correspond to many of the normal (i.e., not crosslinked) peptides that are identified by MS. As a result, normal peptides are less likely to be identified by the MS procedure used in our CLMS experiments compared to MS settings typically used to identify normal peptides. Our settings may also fail to identify some mono-modified peptides. Like most other CLMS methods, the total number of identified crosslinked peptide spectra is usually less than 1% of the total acquired spectra and we normally expect the crosslinked species to be approximately 1% of the total peptides. 

      We added information about the number of crosslinked and monolinked peptides identified in the pol I benchmarking experiments (line 173).  The number of crosslinks and monolinks identified in the pol II +/- a-amanitin experiment, the TBP/TFIIA/TFIIB experiment and the pol II experiment +/- Rpb4/7 are also provided.

      (2) Of all the potential lysines that can be modified - how many are actually modified? Do the authors have an estimate for that? It would be interesting to evaluate in a denatured sample the modification efficiency of the isobaric crosslinker (as an upper limit as here all lysines should be accessible) and then also in a native sample. For example, in the MBP experiment, the authors report the change of one mono-linked peptide in samples containing maltose relative to the one not containing maltose. The authors then give a great description of why this fits to known structural changes. What is missing here is a bit of what changes were expected overall and which ones the authors would have expected to pick up with their method and why have they not been picked up. For example, were they picked up as modified by the crosslinker but not differential? I think this is important to discuss appropriately throughout the manuscript to help the reader evaluate/estimate the potential sensitivity of the method. There are passages where the authors do an excellent job doing that - for example when they mention the missed site that they expected to see in the initial the pol II experiments (lines 191 to 207). This kind of "power analysis" should be heavily discussed throughout the manuscript so that the reader is better informed of what sensitivity can be expected from applying this method.

      Regarding the Pol II complex experiment described in Figures 4 and 5, out of the 277 lysine residues in the complex, 207 were identified as monolinked residues (74.7%), and 817 crosslinked pairs out of 38,226 potential pairs (2.1%) were observed. The ability of CLMS to detect proximity/reactivity changes may be impacted by several factors including 1) the (low) abundance of crosslinked peptides in complex mixtures, 2) the presence of crosslinkable residues in close proximity with appropriate orientation, and 3) the ability to generate crosslinked peptides by enzymatic digestion that are amenable to MS analysis (i.e., the peptides have appropriate m/z’s and charge states, the peptides ionize well, the peptides produce sufficient fragment ions during MS2 analysis to allow confident identification). Future efforts to enrich crosslinked peptides prior to MS analysis may improve sensitivity.

      It is very difficult to estimate the modification efficiency of Qlinker (or many other crosslinkers) based on peptide identification results. One major reason for this is that trypsin is not able to cleave after a crosslinker-modified lysine residue.  As a result, the peptides generated after the modification reaction have different lengths, compositions, charge states, and ionization efficiencies compared to unmodified peptides. These differences make it very difficult to estimate the modification efficiencies based on the presence/absence of certain peptide ions, and/or the intensities of the modified and unmodified versions of a peptide. Also, 2+ ions which correspond to many normal (i.e., unmodified) peptides were excluded by our MS acquisition settings.

      It is also very difficult to predict which structural changes are expected and which crosslinked peptides and/or modified peptides can be observed by MS.  This is especially true when the experiment involves proteins containing unstructured regions such as the experiments involving Pol II, and TBP, TFIIA and TFIIB. Since we are at the early stages of using qCLMS to study structural changes, we are not sure which changes we can expect to observe by qCLMS. Additional applications of Qlinker-CLMS are needed to better understand the types of structural changes that can be studied using the approach.

      We hope that our discussions of some the limitations of CLMS for detecting conformational/reactivity changes provide the reader with an understanding of the sensitivity that can be expected with the approach.  At the end of the paragraph about the pol II a-amanitin experiment we say, “Unfortunately, no Q2linker-modified peptides were identified near the site where α-amanitin binds. This experiment also highlights one of the limitations of residue-specific, quantitative CLMS methods in general. Reactive residues must be available near the region of interest, and the modified peptides must be identifiable by mass spectrometry.” In the section about Rbp4/7-induced structural changes in pol II we describe the under-sampling issue. And in the last paragraph we reiterate these limitations and say, “This implies that this strategy, like all MS-based strategies, can only be used for interpretation of positively identified crosslinks or monolinks. Sensitivity and under sampling are common problems for MS analysis of complex samples.”

      (3) It would be very helpful to provide information on how much better (or not) the Qlinker approach works relative to label-free qCLMS. One is missing the reference to a potential qCLMS gold standard (data set) or if such a dataset is not readily available, maybe one of the experiments could be performed by label-free qCLMS. For example, one of the differential biosensor experiments would have been well suited.

      We agree with the reviewer that it will be very helpful to establish gold standard datasets for CLMS. As we further develop and promote this technology, we will try to establish a standardized qCLMS.

      Reviewer #1 (Recommendations for the authors):

      Only a very minor point:

      I may have missed it but it's not really clear how many independent experiments were used for the benchmarking quantitation and mixing experiments for Figure 1. What is the reproducibility across experiments on average and on a per-peptide basis?

      Otherwise, I think the approach would really benefit from at least "Q5linkers" or even "Q10linkers", if possible. And then conduct detailed quantitative studies, either using dilution series or maybe investigating the kinetics of complex formation.

      We used a sample of BSA crosslinked peptides to optimize the MS settings, establish the MS acquisition strategies and test the quantification schemes.  The data in Figure 1 is based on one experiment, in which used ~150 ug of purified pol I complexes from a 6 L culture. We added this information to the Figure 1 legend. We also provide information about the reproducibility of peptide quantification by plotting the observed and expected ratios for each monolinked and crosslinked peptide identified in all of the runs in Figure S3.

      We agree with the reviewer that the Qlinker approach would be even more attractive if multiplex Qlinker reagents were designed. The multiplexed Qlinkers are more difficult and more expensive to synthesize. We are currently working on different schemes for synthesizing multiplexed Qlinkers.

      Reviewer #2 (Recommendations for the authors):

      In addition to the public review I have the following recommendations/questions:

      (1) The first part of the results section where the synthesis of the crosslinker is explained is excellent for mass spec specialists, but problematic for general readers - either more info should be provided (e.g. b1+ ions - most readers will have no idea why that is) - or potentially it could be simplified here and the details shifted to Materials and Methods for the expert reader. The same is true below for the length of spacer arms.

      However - in general this level of detail is great - but can impact the ease of understanding for the more mass spec affine but not expert reader.

      We have added the following sentence to assist the general reader: A b1+ ion is an ion with a charge state of +1 corresponding to the first N-terminal amino acid residue after breakage of the first peptide bond (lines 126-128).

      (2) The Calmodulin experiment (lines 239 to 257) - it is a very nice result that they see the change in the crosslinked peptide between residues K78-K95, but the monolinks are not just detected as described in the text but actually go 2 fold up. This would have been actually a bit expected if the residues are now too far away to be still crosslinked that the monolinks increase. In this case, this counteraction of monolinks to crosslinked sites can also be potentially used as a "selection criteria" for interesting sites that change. Is that a possible interpretation or do the authors think that upregulation of the monolinks is a coincidence and should not be interpreted?

      We agree with the reviewer that both monolinks and crosslinks can be used as potential indicators for some changes. However, it is much more difficult to interpret the abundance information from monolinks because, unlike crosslinks, there is little associated structural/proximity information with monolinks. Because it is difficult to understand the reason(s) for changes in monolink abundance, we concentrate on changes in crosslink abundances, which provide proximity/structural information about the crosslinked residues.

      (3) Lines 267 to 274: a small thing but the structural information provided is quite dense I have to say. Maybe simplify or accompany with some supplemental figures?

      We agree that the structural information is a bit dense especially for readers who are not familiar with the pol II system.  We added a reference to Figure 3c (line 177) to help the reader follow the structural information. 

      As qCLMS is still a relatively new approach for studying conformational changes, the utility of the approach for studying different types of conformational changes is still unclear. Thus, one of the goals of the experiments is to demonstrate the types of conformational changes that can be detected by Q2linkers.  We hope that the detailed descriptions will help structural biologists understand the types of conformational changes that can be detected using Qlinkers.

      (4) Line 280: explain maybe why the sample was fractionated by SCX (I guess to separate the different complexes?).

      SCX was used to reduce the complexity of the peptide mixtures. As the samples are complex and crosslinked peptides are of low abundance compared to normal peptides, SCX can separate the peptides based on their positive charges.  Larger peptides and peptides with higher charge states, such as crosslinked peptides, tend to elute at higher salt concentration during SCX chromatography.  The use of SCX to fractionate complex peptide mixtures is described in the “General crosslinking protocol and workflow optimization” section of the Methods, and we added a sentence to explain why the sample was fractionated by SCX (lines 278-279).

      (5) Lines 354 to 357: "This suggests that the inability to identity most of these crosslinked peptides in both experiments is mainly due to under-sampling during mass spectrometry analysis of the complex samples, rather than the absence of the crosslinked peptides in one of the experiments."

      This is an extremely important point for the interpretation of missing values - have the authors tried to also collect the mass spec data with DIA which is better in recovery of the same peptide signals between different samples? I realize that these are isobaric samples so DIA measurements per se are not useful as the quantification is done on the reporter channels in the MS2, but it would at least give a better idea if the missing signals were simply not picked up for MS2 as claimed by the authors or the modified peptides are just not present. Another possibility is for the authors to at least try to use a "match between the run" function as can be done in Maxquant. One of the strengths of the method is that it is quantitative and two states are analyzed together, but as can be seen in this experiment, more than two states might want to be compared. In such cases, the under-sampling issue (if that is indeed the cause) makes interpretation of many sites hard (due to missing values) and it would be interesting if for example, an analysis approach with a "match between the runs" function could recover some of the missing values.

      We agree that undersampling/missing values is an important issue that needs to be addressed more thoroughly. This also highlights the importance of qCLMS, as conclusions about structural changes based on the presence/absence of certain crosslinked species in database search results may be misleading if the absence of a species is due to under-sampling. We have not tried to collect the data with DIA since we would lose the quantitative information. It would be interesting to see if match between runs can recover some of the missing values. While this could provide evidence to support the under-sampling hypothesis, it would not recover the quantitative information.

      We recommend performing label swap experiments and focusing downstream analysis on the crosslinks/monolinks that are identified on both experiments. Future development of multiplexed Qlinker reagents should help to alleviate under-sampling issues. See response to Reviewer #1.

      (6) Lines 375 to 393 (the whole paragraph): extremely detailed and not easy to follow. Is that level of detail necessary to drive home that point or could it be visualized in enough detail to help follow the text?

      We agree that the paragraph is quite detailed, but we feel that the level of detailed is necessary to describe the types of conformational changes that can be detected by the quantitative crosslinking data, and also illustrate the challenges of interpreting the structural basis for some crosslink abundance changes even when high resolution structural data exists.

      To make it easier to follow, we added a sentence to the legend of Figure 5b. “In the holo-pol II structure (right), Switch 5 bending pulls Rpb1:D1442 away from K15, breaking the salt bridge that is formed in the core pol II structure (left). The increase in the abundances of the Rpb1:15-Rpb6:76 and Rpb1:15-Rpb6:72 crosslinks in holo-pol II is likely attributed to the salt bridge between K15 and D1442 in core pol II which impedes the NHS ester-based reaction between the epsilon amino group of K15 and the crosslinker.”

      (7) Final paragraph in the results section - lines 397 and 398: "All of the intralinks involving Rpb4 are more abundant in holo-pol II as expected." If I understand that experiment correctly the intralinks with Rpb4 should not be present at all as Rpb4 has been deleted. Is that due to interference between the 126 and 127 channels in MS2? If so, then this also sets a bit of the upper limit of quantitative differences that can be seen. The authors should at least comment on that "limitation".

      Yes, we shouldn’t detect any Rpb4 peptides in the sample derived from the Rpb4 knockout strain. The signal from Rpb4 peptides in the DRpb4 sample is likely due to co-eluting ions. To clarify, we changed the text to:

      All of the intralinks involving Rpb4 are more abundant in the holo-pol II sample (even though we don’t expect any reporter ion signal from Rpb4 peptides derived from the ∆Rpb4 pol II sample, we still observed reporter ion signals from the channel corresponding to the DRpb4 sample, potentially due to the presence of low abundance, co-eluting ions)(lines 395-399).

      (8) Materials and Methods - line 690: I am probably missing something but why were two different mass additions to lysine added to the search (I would have expected only one for the crosslinker)?

      The 297 Da modification is for monolinked peptides with one end of the crosslinker hydrolyzed and 18 Da water molecule is added. The 279 Da modification is for crosslinks and sometimes for looplinks (crosslinks involving two lysine residues on the same tryptic peptide).

    1. Author response:

      Review #1:

      Also, they observed no difference in the binding free energy of phosphatidylserine with wild TREM2-Ig and mutant TREM2-Ig, which is a bit inconsistent with the previous report with experiment studies by Journal of Biological Chemistry 293, (2018), Alzheimer's and Dementia 17, 475-488 (2021), Cell 160, 1061-1071 (2015).

      We directly note this contrast with experimental findings in the body of our work, particularly given the known limitations of free energy calculations in MD simulations, as outlined in the Limitations section. Our claim is that the loss of function in the R47H variant extends beyond decreased binding affinities and also impacts binding patterns. As stated in our manuscript: ‘Our observations for both sTREM2 and TREM2 indicate that R47H-induced dysfunction may result not only from diminished ligand binding but also an impaired ability to discriminate between different ligands in the brain, proposing a novel mechanism for loss-of-function.’

      Perhaps the authors made significant efforts to run a number of simulations for multiple models, which is nearly 17 microseconds in total; none of the simulations has been repeated independently at least a couple of times, which makes me uncomfortable to consider this finding technically true. Most of the important conclusions that authors claimed, including the opposite results from previous research, have been made on the single run, which raises the question of whether this observation can be reproduced if the simulation has been repeated independently. Although the authors stated the sampling number and length of MD simulations in the current manuscript as a limitation of this study, it must be carefully considered before concluding rather than based on a single run.

      The reviewer raises an interesting point regarding the repetition of individual simulations, a consideration we carefully evaluated during the design of this study. However, we believe our approach—running multiple independent models of the same system—offers a more rigorous methodology than simply repeating simulations of the same docked model. This strategy allows us to sample several distinct starting configurations, thereby minimizing biases introduced by docking algorithms and single-model reliance.

      In our study, we demonstrate that within the 150 ns timescale of our protein/ligand (PL) simulations, the relatively small ligands are able to move from their initial docking positions to a specific binding site. While ideally, replicates of these independent models would further strengthen the findings, this was not computationally feasible given the unprecedented total duration of our simulations. Importantly, our conclusions are seldom based on the results of a single protein/PL simulation.

      Moreover, the ergodic hypothesis suggests that over sufficiently long timescales, simulations will explore all accessible states. Additionally, we have performed several replicate simulations of our WT and R47H Ig-like domain models in solution, specifically to investigate CDR2 loop dynamics.

      In this case, since the system involves only the protein and lacks the independent replicates seen in the protein/PL simulations, these runs were chosen to effectively capture the stochastic nature of CDR2 loop movement.

      sTREM2 shows a neuroprotective effect in AD, even with the mutations with R47H, as evidenced by authors based on their simulation. sTREM2 is known to bind Aβ within the AD and reduce Aβ aggregation, whereas R47H mutant increases Aβ aggregation. I wonder why the authors did not consider Aβ as a ligand for their simulation studies. As a reader in this field, I would prefer to know the protective mechanism of sTREM2 in Aβ aggregation influenced by the stalk domain.

      Our initial approach for this study used Aβ as a ligand rather than phospholipids. However, we noted the difficulties in simulating Aβ, particularly in choosing relevant Aβ structures and oligomeric states (n-mers). We believe that phospholipids represent an equally pertinent ligand for TREM2, given its critical role in lipid sensing and metabolism. Furthermore, there is growing recognition in the AD research community of the need to move beyond Aβ and focus on other understudied pathological mechanisms.

      In a similar manner, why only one mutation is considered "R47H" for the study? There are more server mutations reported to disrupt tethering between these CDRs, such as T66M. Although this "T66M" is not associated with AD, I guess the stalk domain protective mechanism would not be biased among different diseases. Therefore, it would be interesting to see whether the findings are true for this T66M.

      In most previous studies, the mechanism for CDR destabilization by mutant was explored, like the change of secondary structures and residue-wise interloop interaction pattern. While this is not considered in this manuscript, neither detailed residue-wise interaction that changed by mutant or important for 'ligand binding" or "stalk domain".

      These are both excellent points that deserve extensive investigation. While R47H is the most common and prolific mutation in literature, an extensive catalog of other mutations is important to explore. We are currently preparing two separate publications that will delve into these gaps in more detail, as addressing them was beyond the scope of the present study.

      The comparison between the wild and mutant and other different complex structures must be determined by particular statistical calculations to state the observed difference between different structures is significant. Since autocorrelation is one of the major concerns for MD simulation data for predicting statistical differences, authors can consider bootstrap calculations for predicting statistical significance.

      We are currently working to address this comment to strengthen the validity of our results and statistical conclusions in the revised manuscript.  

      Review #2:

      The authors state that reported differences in ligand binding between the TREM2 and sTREM2 remain unexplained, and the authors cite two lines of evidence. The first line of evidence, which is true, is that there are differences between lipid binding assays and lipid signaling assays. However, signaling assays do not directly measure binding. Secondly, the authors cite Kober et al 2021 as evidence that sTREM2 and TREM2 showed different affinities for Abeta1-42 in a direct binding assay. Unfortunately, when Kober et al measured the binding of sTREM2 and Ig-TREM2 to Abeta they reported statistically identical affinities (Kd = 3.8 {plus minus} 2.9 µM vs 5.1 {plus minus} 3.7 µM) and concluded that the stalk did not contribute measurably to Abeta binding.

      We appreciate the reviewer’s insight and acknowledge the need to clarify our interpretation of Kober et al. (2021). We will adjust and refocus how we reference this evidence from Kober et al. in our revised manuscript. 

      In line with these findings, our energy calculations reveal that sTREM2 exhibits weaker—but still not statistically significant—binding affinities for phospholipids compared to TREM2. These results suggest that while overall binding affinity might be similar, differences in binding patterns or specific lipid interactions could still contribute to functional differences observed between TREM2 and sTREM2.

      The authors appear to take simulations of the Ig domain (without any stalk) as a surrogate for the full-length, membrane-bound TREM2. They compare the Ig domain to a sTREM2 model that includes the stalk. While it is fully plausible that the stalk could interact with and stabilize the Ig domain, the authors need to demonstrate why the full-length TREM2 could not interact with its own stalk and why the isolated Ig domain is a suitable surrogate for this state.

      We believe that this is a major limitation of all computational work of TREM2 to-date, and of experimental work which only presents the Ig-like domain. This is extensively discussed in the limitations section of our paper. Hence, we are currently working toward a manuscript that will be the first biologically relevant model of TREM2 in a membrane and will challenge the current paradigm of using the Ig-like domain as an experimental surrogate for TREM2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      PPARgamma is a nuclear receptor that binds to orthosteric ligands to coordinate transcriptional programs that are critical for adipocyte biogenesis and insulin sensitivity. Consequently, it is a critical therapeutic target for many diseases, but especially diabetes. The malleable nature and promiscuity of the PPARgamma orthosteric ligand binding pocket have confounded the development of improved therapeutic modulators. Covalent inhibitors have been developed but they show unanticipated mechanisms of action depending on which orthosteric ligands are present. In this work, Shang and Kojetin present a compelling and comprehensive structural, biochemical, and biophysical analysis that shows how covalent and noncovalent ligands can co-occupy the PPARgamma ligand binding pocket to elicit distinctive preferences of coactivator and corepressor proteins. Importantly, this work shows how the covalent inhibitors GW9662 and T0070907 may be unreliable tools as pan-PPARgamma inhibitors despite their widespread use.

      Strengths:

      - Highly detailed structure and functional analyses provide a comprehensive structure-based hypothesis for the relationship between PPARgamma ligand binding domain co-occupancy and allosteric mechanisms of action. - Multiple orthogonal approaches are used to provide high-resolution information on ligand binding poses and protein dynamics.

      - The large number of x-ray crystal structures solved for this manuscript should be applauded along with their rigorous validation and interpretation.

      Weaknesses

      - Inclusion of statistical analysis is missing in several places in the text. - Functional analysis beyond coregulator binding is needed.

      We added additional statistical analyses as recommended (Source Data 1, a Microsoft Excel spreadsheet).

      Related to functional analysis, we cite and studies from our previous publication (Hughes et al. Nature Communications 2014 5:3571) where we demonstrated that the covalent inhibitor ligands (GW9662 and T0070907) do not block the activity of other ligands using a PPARγ transcriptional reporter assay and gene expression analysis in 3T3-L1 preadipocytes. Our study here expands on this finding and other published studies showing the structural mechanism for the lack of blocking activity by the covalent inhibitors.

      Reviewer #2 (Public Review):

      Summary:

      The flexibility of the ligand binding domain (LBD) of NRs allows various modes of ligand binding leading to various cellular outcomes. In the case of PPARγ, it's known that two ligands can co-bind to the receptor. However, whether a covalent inhibitor functions by blocking the binding of a non-covalent ligand, or co-bind in a manner that weakens the binding of a non-covalent ligand remains unclear. In this study, the authors first used TR-FRET and NMR to demonstrate that covalent inhibitors (such as GW9662 and T0070907) weaken but do not prevent non-covalent synthetic ligands from binding, likely via an allosteric mechanism. The AF-2 helix can exchange between active and repressive conformations, and covalent inhibitors shift the conformation toward a transcriptionally repressive one to reduce the orthosteric binding of the non-covalent ligands. By co-crystal studies, the authors further reveal the structural details of various non-covalent ligand binding mechanisms in a ligand-specific manner (e.g., an alternate binding site, or a new orthosteric binding mode by alerting covalent ligand binding pose).

      Strengths:

      The biochemical and biophysical evidence presented is strong and convincing.

      Weaknesses:

      However, the co-crystal studies were performed by soaking non-covalent ligands to LBD pre-crystalized with a covalent inhibitor. Since the covalent inhibitors would shift the LBD toward transcriptionally repressive conformation which reduces orthosteric binding of non-covalent ligands, if the sequence was reversed (i.e., soaking a covalent inhibitor to LBD pre-crystalized with a non-covalent ligand), would a similar conclusion be drawn? Additional discussion will broaden the implications of the conclusion.

      This is an interesting point, which we now expand upon in a new (third) paragraph of the discussion in our revised manuscript:

      “In our previous study, we observed synthetic and natural/endogenous ligand co-binding via co-crystallography where preformed crystals of PPARγ LBD bound to unsaturated fatty acids (UFAs) were soaked with a synthetic ligand, which pushed the bound UFA to an alternate site within the orthosteric ligand-binding pocket 8. In the scenario of synthetic ligand cobinding with a covalent inhibitor, it is possible that soaking a covalent inhibitor into preformed crystals where the PPARγ LBD is already bound to a non-covalent ligand may prove to be difficult. The covalent inhibitor would need to flow through solvent channels within the crystal lattice, which may not be a problem. However, upon reaching the entrance surface to the orthosteric ligand-binding pocket, it may be difficult for the covalent inhibitor to gain access to the region of the orthosteric pocket required for covalent modification as the larger non-covalent ligand could block access. This potential order of addition problem may not be a problem for studies in solution or in cells, where the non-covalent ligand can more freely exchange in and out of the orthosteric pocket and over time the covalent reaction would reach full occupancy.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - IC50 or EC50 values are not reported for the coregulator interaction assays, R2 for fit should also be reported where Ki and IC50s are disclosed.

      We now report fitting statistics and IC50/EC50 values when possible in Figure 2B and Source Data 1 along with R2 values for the fit. We note that some data do not show complete or robust enough binding curves to faithfully fit to a dose response equation.

      -  Reporter gene or qPCR should be performed for the combinations of covalent and noncovalent ligands to show how these molecules impact transcriptional activities rather than just coregulator binding profiles.

      We previously performed PPARγ transcriptional reporter assay and gene expression analysis in 3T3-L1 preadipocytes to demonstrate that cotreatment of a covalent inhibitor (GW9662 or T0070907) with a non-covalent ligand does not block activity of the non-covalent ligand and showed cobinding-induced activation relative to DMSO control (Hughes et al., 2024 Nature Communications). We did not specifically mention this in our original manuscript, but we now call this out in the first paragraph of the results section.

      - Inclusion of a structure figure to show the different helix 12 orientations should be included in the introduction. Likewise, how the overall structure of the LBD changes as a result of the cobinding in the discussion or a summary model would be helpful.

      Our revised manuscript includes a structure figure called out in the introduction describing the active and repressive helix 12 PPARγ LBD conformations (new Figure 1). There are no major changes to the overall structure of the LBD compared to the active conformation that crystallized, so we did not include a summary model figure but we do refer readers to our previous paper (Shang and Kojetin, Structure 2021 29(9):940-950) in the penultimate paragraph of the discussion. We also added the following sentence to the crystallography results section related to the overall LBD changes:

      “The structures show high structural similarity to the transcriptionally active LBD conformation with rmsd values ranging from 0.77–1.03Å (Supplementary Table S2)”

      A typo in paragraph 3 of the discussion says "long-live" when it should probably say "long-lived."

      We corrected this typo.

      Reviewer #2 (Recommendations For The Authors):

      It's interesting that ligand-specific binding mode of non-covalent ligands was observed. Would modifications of the chemical structure of a covalent inhibitor alter the allosteric binding behavior of non-covalent ligands in a predictive manner? If so, how can such SAR be used to guide the design of covalent inhibitors to more broadly and effectively inhibit agonists of various chemical structures? Discussion on this topic could be valuable.

      This is an interesting point, which we now discuss in the penultimate and last paragraphs of the discussion:

      “Another way to test this structural model could be through the use of covalent PPARγ inverse agonist analogs with graded activity 23, where one might posit that covalent inverse agonist analogs that shift the LBD conformational ensemble towards a fully repressive LBD conformation may better inhibit synthetic ligand cobinding.”

      “It may be possible to use the crystal structures we obtained to guide structure-informed design of covalent inhibitors that would physically block cobinding of a synthetic ligand. This could be the potential mechanism of a newer generation covalent antagonist inhibitor we developed, SR16832, that more completely inhibit alternate site ligand binding of an analog of MRL20, rosiglitazone and the UFA docosahexaenoic acid (DHA)

      21 and thus may be a better choice for the field to use as a covalent ligand inhibitor of PPARγ.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      How plants perceive their environment and signal during growth and development is of fundamental importance for plant biology. Over the last few decades, nano domain organisation of proteins localised within the plasma-membrane has emerged as a way of organising proteins involved in signal pathways. Here, the authors addressed how a non-surface localised signal (viral infection) was resisted by PM localised signalling proteins and the effect of nano domain organisation during this process. This is valuable work as it describes how an intracellular process affects signalling at the PM where most previous work has focused on the other way round, PM signalling effecting downstream responses in the plant. They identify CPK3 as a specific calcium dependent protein kinase which is important for inhibiting viral spread. The authors then go on to show that CPK3 diffusion in the membrane is reduced after viral infection and study the interaction between CPK3 and the remorins, which are a group of scaffold proteins important in nano domain organisation. The authors conclude that there is an interdependence between CPK3 and remorins to control their dynamics during viral infection in plants.

      Strengths:

      The dissection of which CPK was involved in the viral propagation was masterful and very conclusive. Identifying CPK3 through knockout time course monitoring of viral movement was very convincing. The inclusion of overexpression, constitutively active and point mutation non functioning lines further added to that.

      Weaknesses:

      My main concerns with the work are twofold.

      (1) Firstly, the imaging described and shown is not sufficient to support the claims made. The PM localisation and its non-PM localised form look similar and with no PM stain or marker construct used to support this. The sptPALM data conclusions are nice and fit the narrative. However, no raw data or movie is shown, only representative tracks. Therefore, the data quality cannot be verified and in addition, the reporting of number of single particle events visualised per experiment is absent, only number of cells imaged is reported. Therefore, it is impossible for the reader to appreciate the number of single molecule behaviours obtained and hence the quality of the data.

      (2) Secondly, remorins are involved in a lot of nanodomain controlled processes at the PM. The authors have not conclusively demonstrated that during viral infection the remorin effects seen are solely due to its interaction with CPK3. The sptPALM imaging of REM1.2 in a cpk3 knockout line goes part way to solve this but more evidence would strengthen it in my opinion. How do we not know that during viral infection the entire PM protein dynamics and organisation are altered? Or that CPK3 and REM are at very distant ends of a signalling cascade. Negative control experiments are required here utilising other PM localised proteins which have no role during viral infection. In addition, if the interaction is specific, the transiently expressed CPK3-CA construct (shown to from nano domains) should be expressed with REM1.2-mEOS to show the alterations in single particle behaviour occur due to specific activations of CPK3 and REM1.2 in the absence of PIAMV viral infection and it is not an artefact of whole PM changes in dynamics during viral infection.

      In addition, displaying more information throughout the manuscript (such as raw particle tracking movies and numbers of tracks measured) on the already generated data would strengthen the manuscript further.

      Overall, I think this work has the potential to be a very strong manuscript but additional reporting of methods and data are required and additional lines of evidence supporting interaction claims would significantly strengthen the work and make it exceptional.

      Reviewer #2 (Public Review):

      Summary:

      The paper provides evidence that CPK3 plays a role in plant virus infection, and reports that viral infection is accompanied by changes in the dynamics of CPK3 and REM1.2, the phosphorylation substrate of CPK3, in the plasma membrane. In addition, the dynamics of the two proteins in the PM are shown to be interdependent.

      Strengths:

      The paper contains novel, important information.

      Weaknesses:

      The interpretation of some experimental data is not justified, and the proposed model is not fully based on the available data.

      Reviewer #3 (Public Review):

      Summary:

      This study examined the role that the activation and plasma membrane localisation of a calcium dependent protein kinase (CPK3) plays in plant defence against viruses.<br /> The authors clearly demonstrate that the ability to hamper the cell-to-cell spread of the virus P1AMV is not common to other CPKs which have roles in defence against different types of pathogens, but appears to be specific to CPK3 in Arabidopsis. Further they show that lateral diffusion of CPK3 in the plasma membrane is reduced upon P1AMV infection, with CPK3 likely present in nano-domains. This stabilisation however, depends on one of its phosphorylation substrates a Remorin scaffold protein REM1-2. However, when REM1-2 lateral diffusion was tracked, it showed an increase in movement in response to P1AMV infection. These contrary responses to P1AMV infection were further demonstrated to be interdependent, which led the authors to propose a model in which activated CPK3 is stabilised in nano-domains in part by its interaction with REM1.2, which it binds and phosphorylates, allowing REM1-2 to diffuse more dynamically within the membrane.

      The likely impact of this work is that it will lead to closer examination of the formation of nano-domains in the plasma membrane and dissection of their role in immunity to viruses, as well as further investigation into the specific mechanisms by which CPK3 and REM1-2 inhibit the cell-to-cell spread of viruses.

      Strengths:

      The paper provided compelling evidence about the roles of CPK3 and REM1-2 through a combination of logical reverse genetics experiments and advanced microscopy techniques, particularly in single particle tracking.

      Weaknesses:

      There is a lack of evidence for the downstream pathways, specifically whether the role that CPK3 has in cytoskeletal organisation may play a role in the plant's defence against viral propagation. Also, there is limited discussion about the localisation of the nano-domains and whether there is any overlap with plasmodesmata, which as plant viruses utilise PD to move from cell to cell seems an obvious avenue to investigate.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Viral spread work in CPK mutants with time courses is beautiful!

      Regarding my public points on my issues with the imaging:

      - Figure 2A shows 'PM' localisation of CPK3 and 'non-PM' imaging of CPK3-G2A. The images are near identical both showing cell outlines and cytoplasmic strands. Here a PM marker (such as Lti6B) tagged with a different fluorophore or PM stain should be used in conjunction with surface views (such as in Figure 2C) to show it really is at the PM and the G2A line is not.

      Impaired membrane localization of CPK3-G2A is documented in Mehlmer et al., 2010 using microsomal fractionation. Although Figure 2A main purpose is to show correct expression of the constructs in the lines used for PlAMV propagation (Figure 2B), we replaced the images with wider view pictures to be more representative of the subcellular localization of CPK3 and CPK3-G2A.

      - Regarding Figure 2C, this is extremely noisy and PM heterogeneity is barely observable over the noise from the system (looking at the edges of surface imaged). You mention low resolution was an issue. I notice from the methods you have taken confocal images on an Zeiss 880 with Airyscan. These images must be confocal but If imaged with Airyscan the PM heterogeneity would be much clearer (see work from John Runions lab).

      Indeed, these are tangential views images obtained by Zeiss 880 with Airyscan. Based on tessellation analysis (Figure 2H-J), CPK3 is rather homogeneously distributed and forms ND of around 70nm of diameter. Objects of such size cannot be resolved using pixel reassignment methods such as Airyscan. Note also that AtREM in our study are less heterogeneously distributed than what was described in the literature for StREM1.3.

      - Regarding all sptPALM data. At least an example real data image and video is required otherwise the data can’t be assessed. The work of Alex Martiniere (sptPALM) or Alex Jonson (TIRF) all show raw data so the reader can appreciate the quality of the data. In addition, number of events (particles tracked) has to be shown in the figure legend, not just number of cells otherwise was one track performed per cell? Or 10,000? Obviously where the N sits in this range gives the reader more or less confidence of the data.

      We agree and we added example videos of sptPALM experiments in the supplementary data, we also indicated the number of tracked particles in the figure legends.

      - On a slight technical aside, how do you know the cells being imaged for sptPALM with PIAMV are actually infected with the virus? In Fig 2C you use a GFP tagged version but in sptPALM you use none tagged. I think a sentence in methods on this would help clarify.

      PlAMV-GFP was used for spt-PALM experiment and cell infection was assessed during PALM experiment. This is now precised in the corresponding figures and methods.

      - I also have a concern over some of the representative images showing the same things between different figures. Your clustering data in 3F looks very convincing. However, in Figure 2H the mock and PIAMV-GFP look very similar. How is Figure 3F so different for the same experiment? Especially considering the scale bars are the same for both figures. Same for CPK3-mRFP1.2 in Fig 2C and 3A, the same thing is being imaged, at the same scale (scale bars same size) but the images are extremely different.

      Figure 2 data were generated using CPK3 stably expressed in A. thaliana while Figure 3 data were obtained upon transient over-expression of CPK3 in N. benthamiana. We do not have a clear explanation for such a difference in CPK3 PM behavior, it could lie on a different PM composition or actin organization between those two species, this point is now addressed in the discussion.

      - Line 193&194 - you state that the CA CPK3 is reminiscent of the CPK3 upon PIAMV expression. I don't agree, while CPK3CA is less mobile (2D), the MSD shows it is in-between CPK3 and CPK3 + PIAMV. Therefore, can’t the opposite also be true? That overall the behaviour of CPK3-CA is reminiscent of WT CPK. I think this needs rewording.

      We agree and we reworded that part

      - Line 651 - what numerical aperture are you using for the lens during confocal microscopy. This is fundamentally important information directly related to the reproducibility of the work. You report it for the sptPALM.

      The numerical aperture is now indicated in the methods.

      Regarding my bigger point about specific interactions between CPK3 and remorin during viral infection to strengthen your claim the following need doing. I am not suggesting you do all of these but at least two would significantly enhance the paper.

      (1) Image a none related PM protein during viral infection using sptPALM and demonstrate that its behaviour is not altered (such as lti6b). This would show the affects on remorin behaviour are specific to CPK3 and not a whole scale PM alteration in dynamics due to viral infection.

      (2) Two colour SPT imaging of CPK3 and REM1.2. You show in absence of proteins (knockouts effect on each other) but your only interaction data is from a kinase assay (where CPK1 and 2 also interact, even though they are not localised at the same place) and colocalisation data (see below). A two colour SPT imaging experiment showing interaction and clustering of CPK3 and REM1.2 with each other and the change in their behaviours when viral infected and simultaneously imaged would address all of my concerns.

      - On another note, the co-localisation data (fig 5 sup 4) needs additional analysis. I would expect most PM proteins to show the results you show as the data is very noisy. In order to improve I would zoom in to fill the field of view and then determine correlation and also when one image is rotated 90 degrees (as described in Jarsch et al., plant cell) to enhance this work.

      (3) In the absence of viral infection, but presence of CPK3-CA, is sptPALM REM1.2 behaviour in the PM altered, if so then the interaction is specific and changes in remorin dynamics are not due to whole scale PM changes during viral infection and the manuscript substantially strengthened.

      (4) Building on from 3), if you have a CPK3 mutated with both CPK3-CA and G2A this would be constitutively active and non-PM localised and as such should not affect Remorin behaviour if your model is true, this would strengthen the case significantly but I appreciate is highly artificial and would need to be done transiently.

      Regarding the first point, since the role of PM proteins involved in potexvirus infection is barely assessed, picking a non-related PM protein might be tricky. The data obtained with mEOS3.2-REM1.2 expressed in cpk3 null-mutant point towards a specific role of CPK3 in PlAMV-induced REM1.2 diffusion and not a general alteration of PM protein behavior.

      Regarding the second point, we already reported the in vivo interaction between AtCPK3CA and AtREM1.2/AtREM1.3 by BiFC in N.benthamiana (Perraki et al 2018) and AtCPK3 was shown to co-IP with AtREM1.2 (Abel et al, 2021). While we agree on the relevance of doing dual color sptPALM with CPK3 and REM1.2, it is so far technically challenging and we would not be able to implement this in a timely manner. For the colocalization, although the whole cell is displayed in the figure, the analysis was performed on ROI to fill the field of analysis.

      We agree with the relevance of adding the colocalization analysis of randomized images (mTagBFP2 channel rotated 90 degrees), this is now added to Figure 5 – supplement figure 5.

      Finally, for the third and fourth points, spt-PALM analysis of REM1.2 in presence of CPK3-CA and CPK3-CA-G2A was performed (Figure 5 - figure supplement 4). The results suggest a specific role of CPK3-CA in REM1.2 diffusion.

      Minor points:

      Line 59 - from, I think you mean from.

      Line 63 - Reference needed after latter.

      Line 68 - Reference required after viral infection.

      Line 85 - Propose not proposed.

      Line 156 - Allowed us to not allows to.

      Line 204 - add we previously 'demonstrated'

      Line 622 and 623 - You say lines obtained from Thomas Ott. This is very odd phrasing considering he is an author. I appreciate citing the work producing the lines but maybe reword this

      These points were corrected, thank you.

      Reviewer #2 (Recommendations For The Authors):

      The paper provides evidence that CPK3 plays a role in plant virus infection, and reports that viral infection is accompanied by changes in the dynamics of CPK3 and REM1.2, the phosphorylation substrate of CPK3, in the plasma membrane. In addition, the dynamics of the two proteins in the PM are shown to be interdependent. The paper contains novel, important information that can undoubtedly be published in eLife. However, I have some concerns that should be addressed before it can be accepted for publication.

      Major concerns

      When the authors say that CPK3 plays a role in viral propagation, it should be clarified what is meant by 'propagation', - replication of the viral genome, its cell-to-cell transport, or long-distance transport via the phloem. By default the readers will tend to assume the former meaning. In my opinion, the term 'propagation' is misleading and should be avoided.

      We purposely chose the term “propagation” because it sums replication and cell-to-cell movement. Nevertheless, we previously showed that group 1 StREM1.3 doesn’t alter PVX replication (Raffaele et al., 2009 The Plant Cell). In this paper, as we do not investigate the role of AtREM1.2 or AtCPK3 in the replication of the viral PlAMV genome, we cannot state that these proteins are strictly involved in cell-to-cell movement of the virus.

      The authors show that viral infection is associated with decreased diffusion of CPK3 and increased diffusion of REM1.2 in the PM. However, it remains unclear whether these changes are related to partial resistance to viral infection involving CPK3 and REM1.2, or whether they are simply a consequence of viral infection that may lead to altered PM properties and altered dynamics of PM-associated proteins. Therefore, the model presented in Fig. 6 appears to be entirely speculative, as it postulates that changes in CPK3 and REM1.2 dynamics are the cause of suppressed virus 'propagation'. In addition, the model implies that a decrease in CPK3 mobility leads to activation of its kinase activity. This view is not supported by experimental data (see my next comment). The model should be deleted (both as the figure and its description in the Discussion) or substantially reworked so that it finally relies on existing data.

      For the first point, the results obtained from the additional experiments proposed by reviewer #1 supports the hypothesis of a direct impact of CPK3 on REM1.2 diffusion (Figure 5 - figure supplement 4).

      We agree with the second point and reworked the model to remove the link between CPK3 activation and its increased diffusion.

      The statement that 'changes in CPK3 dynamics upon PlAMV infection are linked to its activation' (line 194) is based on a flawed logic, and the conclusion in this section of Results ('changes in CPK3 dynamics upon PlAMV infection are linked to its activation') is incorrect, as it is not supported by experimental data. In fact, the authors show that CPK3 dynamics and clustering upon viral infection is somewhat reminiscent of the behavior of a CPK3 deletion mutant, which is a constitutively active protein kinase. However, this partial similarity cannot be taken as evidence that CPK3 dynamics upon PlAMV infection are related to its activation. Furthermore, the authors emphasize the similarity of the mutant and CPK3 in infected cells without taking into account a drastic difference in their localization (Fig. 3A, middle and right panels) showing that the reduced dynamics or the compared proteins may have different causes. I suggest the removal of the section 'CPK3 activation leads to its confinement in PM ND' from the paper, as the results included in this section are not directly related to other data presented.

      The PM lateral organization of PM-bound CPKs in their native or constitutively active form as well as the role of lipid in such phenomenon was never shown before. We believe that this section contains relevant information for the community. We kept the section but reworded it to tamper the correlation made between CPK3 PM organization upon viral infection and its activation.

      Line 270 - 'group 1 REMs might play a role in CPK3 domain stabilization upon viral infection'. This is an overstatement. The size of the CPK3-containing NDs may have no correlation with their stability.

      We reworded the sentence.

      Minor points

      Line 204 - we previously that Line 234 and hereafter - "the D" sounds strange. Suggest using "the diffusion coefficient".

      This was reworded.

      Reviewer #3 (Recommendations For The Authors):

      The authors have previously demonstrated that there was an increase in REM1.2 localisation to plasmodesmata under viral challenge. It would be useful to see if there was any co-localisation of REM1.2 and CPK3 with plasmodesmata in response to PlAMV and how this is affected in the mutants. This could be carried out relatively simply using aniline blue.

      These experiments were added to the supplementary data of Figure 2 – figure supplement 2.  and Figure 4 – figure supplement 4. , no enrichment of CPK3 or REM1.2 at plasmodesmata could be observed upon PlAMV infection.

      Fig 3 supplementary figure 2 would be better incorporated into the main body of Figure 3 as this underpins discussion on the involvement of lipids such as sterols in the formation of nanodomains.

      We moved Figure 3 – Supplementary figure 2 to the main body of Figure 3.

      Minor corrections:

      Whilst the paper is generally well written there are a number of grammatical errors:

      Line 1 & 2: Title doesn't quite read correctly, suggest a rewording for clarity.

      L31: Insert "a"after only

      L33: Replace "are playing" with "play"

      L34: Begin sentence "Viruses are intracellular pathogens and as such the role..."59: replace "form" with "from"

      L63: Insert "was demonstrated" after REM1.2)

      L85: Replace "proposed" with "propose"

      L86: replace "encouraging to explore" with "which will encourage further exploration of "

      L129: replace "we'll focus on" with "we concentrated on"

      L131: insert "an" before ATP

      L138: change "among" to "amongst"

      L156: change "allows to analyse" to "allows the analysis of"

      L204: Insert "showed" after previously.

      L232: "root seedlings" should this be the roots of seedlings?

      L235: insert "to" after "as"

      L280: insert "a" after "only"

      L281: change " to play" with "as playing": change CA to superscript

      L307: Insert "was" after "transcription"

      L320: change "display" to "displaying"

      L321: change "form" to forms"

      L340: "hampering" should come before viral

      L365: insert"us' after "allow"

      Thank you, these were corrected

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This paper provides a computational model of a synthetic task in which an agent needs to find a trajectory to a rewarding goal in a 2D-grid world, in which certain grid blocks incur a punishment. In a completely unrelated setup without explicit rewards, they then provide a model that explains data from an approach-avoidance experiment in which an agent needs to decide whether to approach or withdraw from, a jellyfish, in order to avoid a pain stimulus, with no explicit rewards. Both models include components that are labelled as Pavlovian; hence the authors argue that their data show that the brain uses a Pavlovian fear system in complex navigational and approach-avoid decisions. 

      We thank the reviewer for their thoughtful comments. To clarify, the grid-world setup was used as a didactic tool/testbed to understand the interaction between Pavlovian and instrumental systems (lines 80-81) [Dayan et al., 2006], specifically in the context of safe exploration and learning. It helps us delineate the Pavlovian contributions during learning, which is key to understanding the safety-efficiency dilemma we highlight. This approach generates a hypothesis about outcome uncertainty-based arbitration between these systems, which we then test in the approach-withdrawal VR experiment based on foundational studies studying Pavlovian biases [Guitart-Masip et al., 2012, Cavanagh et al., 2013].

      Although the VR task does not explicitly involve rewards, it provides a specific test of our hypothesis regarding flexible Pavlovian fear bias, similar to how others have tested flexible Pavlovian reward bias without involving punishments (e.g., Dorfman & Gershman, 2019). Both the simulation and VR experiment models are derived from the same theoretical framework and maintain algebraic mapping, differing only in task-specific adaptations (e.g., differing in action sets and temporal difference learning for multi-step decisions in the grid world vs. Rescorla-Wagner rule for single-step decisions in the VR task). This is also true for Dayan et al. [2006] who bridge Pavlovian bias in a Go-No Go task (negative auto-maintenance pecking task) and a grid world task. Therefore, we respectfully disagree that the two setups are completely unrelated and that both models include components merely labelled as Pavlovian.

      We will rephrase parts of the manuscript to prevent the main message of our manuscript from being misconveyed. Particularly in the Methods and Discussion, to clarify that our main focus is on Pavlovian fear bias in safe exploration and learning (as also summarised by reviewers #2 and #3), rather than on its role in complex navigational decisions. We also acknowledge the need for future work to capture more sophisticated safe behaviours, such as escapes and sophisticated planning which span different aspects of the threat-imminence continuum [Mobbs et al., 2020], and we will highlight these as avenues for future research.

      In the first setup, they simulate a model in which a component they label as Pavlovian learns about punishment in each grid block, whereas a Q-learner learns about the optimal path to the goal, using a scalar loss function for rewards and punishments. Pavlovian and Q-learning components are then weighed at each step to produce an action. Unsurprisingly, the authors find that including the Pavlovian component in the model reduces the cumulative punishment incurred, and this increases as the weight of the Pavlovian system increases. The paper does not explore to what extent increasing the punishment loss (while keeping reward loss constant) would lead to the same outcomes with a simpler model architecture, so any claim that the Pavlovian component is required for such a result is not justified by the modelling. 

      Thank you for this comment. We acknowledge that our paper does not compare the Pavlovian fear system to a purely instrumental system with varying punishment sensitivity. Instead, our model assumes the coexistence of these two systems and demonstrates the emergent safety-efficiency trade-off from their interaction. It is possible that similar behaviours could be modelled using an instrumental system alone. In light of the reviewer’s comment, we will soften our claims regarding the necessity of the Pavlovian system, despite its known existence.

      We also encourage the reviewer to consider the Pavlovian system as a biologically plausible implementation of punishment sensitivity. Unlike punishment sensitivity (scaling of the punishments), which has not been robustly mapped to neural substrates in fMRI studies, the neural substrates for the Pavlovian fear system (e.g., the limbic loop) are well known (see Supplementary Fig. 16).

      Additionally, we point out that varying reward sensitivities while keeping punishment sensitivity constant allows our PAL agent to differentiate from an instrumental agent that combines reward and punishment into a single feedback signal. As highlighted in lines 136-140 and the T-maze experiment (Fig. 3 A, B, C), the Pavlovian system maintains fear responses even under high reward conditions, guiding withdrawal behaviour when necessary (e.g., ω = 0.9 or 1), which is not possible with a purely instrumental model if the punishment sensitivities are fixed. This is a fundamental point.

      We will revise our discussion and results sections to reflect these clarifications.

      In the second setup, an agent learns about punishments alone. "Pavlovian biases" have previously been demonstrated in this task (i.e. an overavoidance when the correct decision is to approach). The authors explore several models (all of which are dissimilar to the ones used in the first setup) to account for the Pavlovian biases. 

      Thank you, we respectfully disagree with the statement that our models used in the experimental setup are dissimilar to the ones used in the first setup. Due to differences in the nature of the task setup, the action set differs, but the model equations and the theory are the same and align closely, as described in our response above. The only additional difference is the use of a baseline bias in human experiments and the RLDDM model, where we also model reaction times with drift rates which is not a behaviour often simulated in grid world simulations. We will improve our Methods section to ensure that model similarity is highlighted.

      Strengths: 

      Overall, the modelling exercises are interesting and relevant and incrementally expand the space of existing models. 

      We thank reviewer #1 for acknowledging the relevance of our models in advancing the field. We would like to further highlight that, to the best of our knowledge, this is the first time reaction times in Pavlovian-Instrumental arbitration tasks have been modelled using RLDDM, which adds a novel dimension to our approach.

      Weaknesses: 

      I find the conclusions misleading, as they are not supported by the data. 

      First, the similarity between the models used in the two setups appears to be more semantic than computational or biological. So it is unclear to me how the results can be integrated. 

      We acknowledge the dissimilarity between the task setups (grid-world vs. approach-withdrawal). However, we believe these setups are computationally similar and may be biologically related, as suggested by prior work like Dayan et al. [2006], which integrates Go-No Go and grid-world tasks. Just as that work bridged findings in the appetitive domain, we aim to integrate our findings in the aversive domain. We will provide a more integrated interpretation in the discussion section of the revised manuscript.

      Dayan, P., Niv, Y., Seymour, B., and Daw, N. D. (2006). The misbehavior of value and the discipline of the will. Neural networks, 19(8):1153–1160.

      Secondly, the authors do not show "a computational advantage to maintaining a specific fear memory during exploratory decision-making" (as they claim in the abstract). Making such a claim would require showing an advantage in the first place. For the first setup, the simulation results will likely be replicated by a simple Q-learning model when scaling up the loss incurred for punishments, in which case the more complex model architecture would not confer an advantage. The second setup, in contrast, is so excessively artificial that even if a particular model conferred an advantage here, this is highly unlikely to translate into any real-world advantage for a biological agent. The experimental setup was developed to demonstrate the existence of Pavlovian biases, but it is not designed to conclusively investigate how they come about. In a nutshell, who in their right mind would touch a stinging jellyfish 88 times in a short period of time, as the subjects do on average in this task? Furthermore, in which real-life environment does withdrawal from a jellyfish lead to a sting, as in this task? 

      Thank you for your feedback. As mentioned above, we invite the reviewer to potentially think of Pavlovian fear systems as a way how the brain might implement punishment sensitivity. Secondly, it provides a separate punishment memory that cannot be overwritten with higher rewards (see also Elfwing and Seymour 2017, and Wang et al, 2021)

      Elfwing, S., & Seymour, B. (2017, September). Parallel reward and punishment control in humans and robots: Safe reinforcement learning using the MaxPain algorithm. In 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) (pp. 140-147). IEEE. 

      Wang, J., Elfwing, S., & Uchibe, E. (2021). Modular deep reinforcement learning from reward and punishment for robot navigation. Neural Networks, 135, 115-126.

      The simulation setups such as the following grid-worlds are common test-beds for algorithms in reinforcement learning [Sutton and Barto, 2018].

      Any experimental setup faces the problem of having a constrained experiment designed to test and model a specific effect versus designing a lesser constrained exploratory experiment which is more difficult to model. Here we chose the former, building upon previous foundational experiments on Pavlovian bias in humans [Guitart-Masip et al., 2012, Cavanagh et al., 2013].  The condition where withdrawal from a jellyfish leads to a sting, though less realistic, was included for balancing the four cue-outcome conditions. Overall the task was designed to isolate the effect we wanted to test - Pavlovian fear bias in choices and reaction times, to the best of our ability. In a free operant task, it is very well likely that other components not included in our model could compete for control.

      Crucially, simplistic models such as the present ones can easily solve specifically designed lab tasks with low dimensionality but they will fail in higher-dimensional settings. Biological behaviour in the face of threat is utterly complex and goes far beyond simplistic fight-flight-freeze distinctions (Evans et al., 2019). It would take a leap of faith to assume that human decision-making can be broken down into oversimplified sub-tasks of this sort (and if that were the case, this would require a meta-controller arbitrating the systems for all the sub-tasks, and this meta-controller would then struggle with the dimensionality j). 

      We agree that safe behaviours, such as escapes, involve more sophisticated computations. We do not propose Pavlovian fear bias as the sole computation for safe behavior, but rather as one of many possible contributors. Knowing about the existence about the Pavlovian withdrawal bias, we simply study its possible contribution. We will include in our discussion that such behaviours likely occupy different parts of the threat-imminence continuum [Mobbs et al., 2020].

      Dean Mobbs, Drew B Headley, Weilun Ding, and Peter Dayan. Space, time, and fear: survival computations along defensive circuits. Trends in cognitive sciences, 24(3):228–241, 2020.

      On the face of it, the VR task provides higher "ecological validity" than previous screen-based tasks. However, in fact, it is only the visual stimulation that differs from a standard screen-based task, whereas the action space is exactly the same. As such, the benefit of VR does not become apparent, and its full potential is foregone. 

      We thank the reviewer for their comment. We selected the action space to build on existing models [Guitart-Masip et al., 2012, Cavanagh et al., 2013] that capture Pavlovian biases and we also wanted to minimize participant movement for EEG data collection. Unfortunately, despite restricting movement to just the arm, the EEG data was still too noisy to lead to any substantial results. We will explore more free-operant paradigms in future works.

      On the issue of the difference between VR and lab-based tasks, we note the reviewer's point. Note however that desktop monitor-based tasks lack the sensorimotor congruency between the action and the outcome. Second, it is also arguable, that the background context is important in fear conditioning, as it may help set the tone of the fear system to make aversive components easier to distinguish.

      If the authors are convinced that their model can - then data from naturalistic approach-avoidance VR tasks is publicly available, e.g. (Sporrer et al., 2023), so this should be rather easy to prove or disprove. In summary, I am doubtful that the models have any relevance for real-life human decision-making. 

      We thank the reviewers for their thoughtful inputs. We do not claim our model is the best fit for all naturalistic VR tasks, as they require multiple systems across the threat-imminence continuum [Mobbs et al., 2020] and are currently beyond the scope of the current work. However, we believe our findings on outcome-uncertainty-based arbitration of Pavlovian bias could inform future studies and may be relevant for testing differences in patients with mental disorders, as noted by reviewer #2. At a general level, it can be said that most well-controlled laboratory-based tasks need to bridge a sizeable gap to applicabilty in real-life naturalistic behaviour; although the principle of using carefully designed tasks to isolate individual factors is well established

      Finally, the authors seem to make much broader claims that their models can solve safety-efficiency dilemmas. However, a combination of a Pavlovian bias and an instrumental learner (study 1) via a fixed linear weighting does not seem to be "safe" in any strict sense. This will lead to the agent making decisions leading to death when the promised reward is large enough (outside perhaps a very specific region of the parameter space). Would it not be more helpful to prune the decision tree according to a fixed threshold (Huys et al., 2012)? So, in a way, the model is useful for avoiding cumulatively excessive pain but not instantaneous destruction. As such, it is not clear what real-life situation is modelled here. 

      We thank the reviewer for their comments and ideas. In our discussion lines 257-264, we discuss other works which identify similar safety-efficiency dilemmas, in different models. Here, we simply focus on the safety-efficiency trade-off arising from the interactions between Pavlovian and instrumental systems. It is important to note that the computational argument for the modular system with separate rewards and punishments explicitly protects (up to a point, of course) against large rewards leading to death because the Pavlovian fear response is not over-written by successful avoidance in recent experience. Note also that in animals, reward utility curves are typically convex. We will clarify this in the discussion section.

      We completely agree that in certain scenarios, pruning decision trees could be more effective, especially with a model-based instrumental agent. Here we utilise a model-free instrumental agent, which leads to a simpler model - which is appreciated by some readers such as reviewer #2. Future work can incorporate model-based methods.

      A final caveat regarding Study 1 is the use of a PH associability term as a surrogate for uncertainty. The authors argue that this term provides a good fit to fear-conditioned SCR but that is only true in comparison to simpler RW-type models. Literature using a broader model space suggests that a formal account of uncertainty could fit this conditioned response even better (Tzovara et al., 2018). 

      We thank the reviewer for bringing this to our notice. We will discuss Tzovara et al., 2018 in our discussion in our revised manuscript.

      Reviewer #2 (Public review): 

      Summary: 

      The authors tested the efficiency of a model combining Pavlovian fear valuation and instrumental valuation. This model is amenable to many behavioral decision and learning setups - some of which have been or will be designed to test differences in patients with mental disorders (e.g., anxiety disorder, OCD, etc.). 

      Strengths: 

      (1) Simplicity of the model which can at the same time model rather complex environments. 

      (2) Introduction of a flexible omega parameter. 

      (3) Direct application to a rather advanced VR task. 

      (4) The paper is extremely well written. It was a joy to read. 

      Weaknesses: 

      Almost none! In very few cases, the explanations could be a bit better. 

      We thank reviewer #2 for their positive feedback and thoughtful recommendations. We will ensure that, in our revision, we clarify the explanations in the few instances where they may not be sufficiently detailed, as noted.

      Reviewer #3 (Public review): 

      Summary: 

      This paper aims to address the problem of exploring potentially rewarding environments that contain the danger, based on the assumption that an independent Pavlovian fear learning system can help guide an agent during exploratory behaviour such that it avoids severe danger. This is important given that otherwise later gains seem to outweigh early threats, and agents may end up putting themselves in danger when it is advisable not to do so. 

      The authors develop a computational model of exploratory behaviour that accounts for both instrumental and Pavlovian influences, combining the two according to uncertainty in the rewards. The result is that Pavlovian avoidance has a greater influence when the agent is uncertain about rewards. 

      Strengths: 

      The study does a thorough job of testing this model using both simulations and data from human participants performing an avoidance task. Simulations demonstrate that the model can produce "safe" behaviour, where the agent may not necessarily achieve the highest possible reward but ensures that losses are limited. Interestingly, the model appears to describe human avoidance behaviour in a task that tests for Pavlovian avoidance influences better than a model that doesn't adapt the balance between Pavlovian and instrumental based on uncertainty. The methods are robust, and generally, there is little to criticise about the study. 

      Weaknesses: 

      The extent of the testing in human participants is fairly limited but goes far enough to demonstrate that the model can account for human behaviour in an exemplar task. There are, however, some elements of the model that are unrealistic (for example, the fact that pre-training is required to select actions with a Pavlovian bias would require the agent to explore the environment initially and encounter a vast amount of danger in order to learn how to avoid the danger later). The description of the models is also a little difficult to parse. 

      We thank reviewer #3 for their thoughtful feedback and useful recommendations, which we will take into account while revising the manuscript.

      We acknowledge the complexity of specifying Pavlovian bias in the grid world and appreciate the opportunity to elaborate on how this bias is modelled. In the human experiment, the withdrawal action is straightforwardly biased, as noted, while in the grid world, we assume a hardwired encoding of withdrawal actions for each state/grid. This innate encoding of withdrawal actions could be represented in the dPAG [Kim et. al., 2013]. We implement this bias using pre-training, which we assume would be a product of evolution. Alternatively, this could be interpreted as deriving from an appropriate value initialization where the gradient over initialized values determines the action bias. Such aversive value initialization, driving avoidance of novel and threatening stimuli, has been observed in the tail of the striatum in mice, which is hypothesized to function as a Pavlovian fear/threat learning system [Menegas et. al., 2018].

      Additionally, we explored the possibility of learning the action bias on the fly by tracking additional punishment Q-values instead of pre-training, which produced similar cumulative pain and step plots. While this approach is redundant, and likely not how the brain operates, it demonstrates an alternative algorithm.

      We thank the reviewer for pointing out these potentially unrealistic elements, and we will revise the manuscript to clarify and incorporate these explanations and improve the model descriptions.

      Eun Joo Kim, Omer Horovitz, Blake A Pellman, Lancy Mimi Tan, Qiuling Li, Gal Richter-Levin, and Jeansok J Kim. Dorsal periaqueductal gray-amygdala pathway conveys both innate and learned fear responses in rats. Proceedings of the National Academy of Sciences, 110(36):14795–14800, 2013

      William Menegas, Korleki Akiti, Ryunosuke Amo, Naoshige Uchida, and Mitsuko Watabe-Uchida. Dopamine neurons projecting to the posterior striatum reinforce avoidance of threatening stimuli. Nature neuroscience, 21(10): 1421–1430, 2018

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The study significantly advances our understanding of how exosomes regulate filopodia formation. Filopodia play crucial roles in cell movement, polarization, directional sensing, and neuronal synapse formation. McAtee et al. demonstrated that exosomes, particularly those enriched with the protein THSD7A, play a pivotal role in promoting filopodia formation through Cdc42 in cancer cells and neurons. This discovery unveils a new extracellular mechanism through which cells can control their cytoskeletal dynamics and interaction with their surroundings. The study employs a combination of rescue experiments, live-cell imaging, cell culture, and proteomic analyses to thoroughly investigate the role of exosomes and THSD7A in filopodia formation in cancer cells and neurons. These findings offer valuable insights into fundamental biological processes of cell movement and communication and have potential implications for understanding cancer metastasis and neuronal development. 

      Weaknesses: 

      The conclusions of this study are in most cases supported by data, but some aspects of data analysis need to be better clarified and elaborated. Some conclusions need to be better stated and according to the data observed. 

      We appreciate the reviewer's recognition of the impact of our study.  We will address the concerns about data analysis and statement of our conclusions in our full response to reviewers.

      Reviewer #2 (Public review): 

      Summary: 

      The authors show that small EVs trigger the formation of filopodia in both cancer cells and neurons. They go on to show that two cargo proteins, endoglin, and THSD7A, are important for this process. This possibly occurs by activating the Rho-family GTPase CDC42. 

      Strengths: 

      The EV work is quite strong and convincing. The proteomics work is well executed and carefully analyzed. I was particularly impressed with the chick metastasis assay that added strong evidence of in vivo relevance. 

      Weaknesses: 

      The weakest part of the paper is the Cdc42 work at the end of the paper. It is incomplete and not terribly convincing. This part of the paper needs to be improved significantly.

      We appreciate the reviewer's recognition of the impact of our study.  Indeed, more work needs to be done to clarify the role of Cdc42 in the induction of filopodia by exosome-associated THSD7A.  We anticipate that this will be a separate manuscript, delving in-depth into how exosome-associated THSD7A interacts with recipient cells to activate Cdc42 and carrying out a variety of assays for Cdc42 activation.

      Reviewer #3 (Public review): 

      Summary: 

      The authors identify a novel relationship between exosome secretion and filopodia formation in cancer cells and neurons. They observe that multivesicular endosomes (MVE)-plasma membrane (PM) fusion is associated with filopodia formation in HT1080 cells and that MVEs are present in filopodia in primary neurons. Using overexpression and knockdown (KD) of Rab27/HRS in HT1080 cells, melanoma cells, and/or primary rat neurons, they found that decreasing exosome secretion reduces filopodia formation, while Rab27 overexpression leads to the opposite result. Furthermore, the decreased filopodia formation is rescued in the Rab27a/HRS KD melanoma cells by the addition of small extracellular vesicles (EVs) but not large EVs purified from control cells. The authors identify endoglin as a protein unique to small EVs secreted by cancer cells when compared to large EVs. KD of endoglin reduces filopodia formation and this is rescued by the addition of small EVs from control cells and not by small EVs from endoglin KD cells. Based on the role of filopodia in cancer metastasis, the authors then investigate the role of endoglin in cancer cell metastasis using a chick embryo model. They find that injection of endoglin KD HT1080 cells into chick embryos gives rise to less metastasis compared to control cells - a phenotype that is rescued by the co-injection of small EVs from control cells. Using quantitative mass spectrometry analysis, they find that thrombospondin type 1 domain containing 7a protein (THSD7A) is downregulated in small EVs from endoglin KD melanoma cells compared to those from control cells. They also report that THSD7A is more abundant in endoglin KD cell lysate compared to control HT1080 cells and less abundant in small EVs from endoglin KD cells compared to control cells, indicating a trafficking defect. Indeed, using immunofluorescence microscopy, the authors observe THSD7A-mScarlet accumulation in CD63-positive structures in endoglin KD HT1080 cells, compared to control cells. Finally, the authors determine that exosome-secreted THSD7A induces filopodia formation in a Cdc42-dependent mechanism. 

      Strengths: 

      (1) While exosomes are known to play a role in cell migration and autocrine signaling, the relationship between exosome secretion and the formation of filopodia is novel. 

      (2) The authors identify an exosomal cargo protein, THSD7A, which is essential for regulating this function. 

      (3) The data presented provide strong evidence of a role for endoglin in the trafficking of THSD7A in exosomes. 

      (4) The authors associate this process with functional significance in cancer cell metastasis and neurological synapse formation, both of which involve the formation of filopodia. 

      (5) The data are presented clearly, and their interpretation appropriately explains the context and significance of the findings. 

      Weaknesses: 

      (1) A better characterization of the nature of the small EV population is missing: 

      It is unclear why the authors chose to proceed to quantitative mass spectrometry with the bands in the Coomassie from size-separated EV samples, as there are other bands present in the small EV lane but not the large EV lane. This is important to clarify because it underlies how they were able to identify THSD7A as a unique regulator of exosome-mediated filopodia formation. Is there a reason why the total sample fractions were not compared? This would provide valuable information on the nature of the small and large EV populations. 

      We would like to clarify that there are two sets of proteomics data in the manuscript. The first was comparing bands from a Coomassie gel from two samples: small EVs and large EVs from B16F1 cells. In this proteomics experiment, we identified endoglin as present in small EVs, but not large EVs. For this experiment, we only sent 4 bands from the small EV lane, chosen based on their obvious banding pattern difference on the Coomassie gel.

      In the second proteomics experiment, we used quantitative iTRAQ proteomics to compare small EVs purified from B16F1 control (shScr) and endoglin KD (shEng1 and shEng2) cell lines. In this experiment, we sent total protein extracted from small EV samples for analysis. So, these samples included the entire EV content, not just selected bands from a gel. In this experiment, we identified THSD7A as reduced in the shEng small EVs.

      (2) Data analysis and quantification should be performed with increased rigor: 

      a) Figure 1C - The optical and temporal resolution are insufficient to conclusively characterize the association between exosome secretion and filopodia. Specifically, the 10-second interval used in the image acquisitions is too close to the reported 20-second median time between exosome secretion and filopodia formation. Two-5 sec intervals should be used to validate this. It would also be important to correlate the percentage of filopodia events that co-occur with exosome secretion. Is this a phenomenon that occurs with most or only a small number of filopodia? Additionally, resolution with typical confocal microscopy is subpar for these analyses. TIRF microscopy would offer increased resolution to parse out secretion events. As the TIRF objective is listed in the Methods section, figure legends should mention which images were acquired using TIRF microscopy. 

      We acknowledge that the frame rate naturally limits our estimates of the timing of filopodia formation after exosome secretion. We set out to show a relationship between exosome secretion and filopodia formation, based on their proximity in timing. While our data set shows a median time interval of 20 seconds, the true median could be between 10-30 seconds, based on our frame rate.  Regardless of the exact timing, our data show that exosome secretion is rapidly followed by filopodia formation events.

      To address the question of the percentage of filopodia events that are preceded by exosome secretion, the reviewer is correct in stating that we might need TIRF microscopy to get an accurate calculation of this number.  Nonetheless, we will review our live imaging data for this experiment to determine if this calculation is possible. Again, we will be limited by the frame rate we used to capture the images, so we could possibly be missing secretion events taking place between the 10 second time intervals.  Regardless, for the secretion events that we visualized, we always observed subsequent filopodia formation.

      No TIRF imaging was used in this manuscript.  A TIRF objective was used for selected neuron imaging (see methods); however, it was used for spinning disk confocal microscopy, not for TIRF imaging.  We will clarify this in the methods.

      b) Figure 2 - It would be important to perform further analysis to concretely determine the relationship between exosome secretion and filopodia stability. Are secretion events correlated with the stability of filopodia? Is there a positive feedback loop that causes further filopodia stability and length with increased secretion? Furthermore, is there an association between the proximity of secretion with stability? Quantification of filopodia more objectively (# of filopodia/cell) would be helpful. 

      Our data shows that manipulation of general exosome secretion, via Hrs knockdown, affects both de novo filopodia formation and filopodia stability (Fig 2g,h). Interestingly, knockdown of endoglin only affects de novo filopodia formation, while filopodia stability is unaffected (Fig 4g,h). These results suggest that filopodia stability is dependent upon exosome cargoes besides endoglin/THSD7A.  Such cargoes might include other extracellular matrix molecules, such as fibronectin. We previously showed that exosomes promote nascent cell adhesion and rapid cell migration, through exosome-bound fibronectin (Sung et al., Nature Communications, 6:7164, 2015). We also previously found that inhibition of exosome secretion affects the persistence of invadopodia, which are filopodia-dependent structures (Hoshino et al., Cell Reports, 5:1159-1168, 2013).  We agree that this is an interesting research direction, and perhaps future work could focus on exosomal factors that are responsible for filopodia persistence.

      With regard to the way we plotted the filopodia data, we plotted the cancer cell data as filopodia per cell area so that it matched the neuron data, which was plotted as filopodia per 100 mm of dendrite distance. Since the neurons cannot be imaged as a whole cell, the quantification is based on the length of the dendrite in the image. We found that graphing the cancer cell data as filopodia per cell gave similar results as filopodia per cell area, as there were no significant differences in cell area between conditions and experiments. We plan to include a new supplementary figure showing the data in Figure 2 plotted as filopodia per cell to show that this quantification gives the same results.

      c) Figure 6 - Why use different gel conditions to detect THSD7A in small EVs from B16F1 cells vs HT1080 and neurons? Why are there two bands for THSD7A in panels C and E? It is difficult to appreciate the KD efficiency in E. The absence of a signal for THSD7A in the HT1080 shEng small EVs that show a signal for endoglin is surprising. The authors should provide rigorous quantification of the westerns from several independent experimental repeats. 

      Detection of THSD7A via Western blot was, unfortunately, not straightforward and simple. Due to the large size (~260 kDa) of THSD7A, its low level of expression in cancer cells, as well as the inconsistency of commercially available THSD7A antibodies, we had to troubleshoot multiple conditions.  We found that it was much easier to detect THSD7A in the human fibrosarcoma cell line HT1080 than in the mouse B16F1 cells, both in the cell lysates and in the small EVs. We were usually unable to detect THSD7A using these same conditions for the mouse melanoma B16F1 samples, but were successful using native gel conditions. We also detected THSD7A in rat primary neuron samples. All these samples were from different source organisms (human, mouse, rat) and from either cell lysates or extracellular vesicles, further complicating the analyses. Expression and maturation of THSD7A in these different cell types and compartments could involve different post-translational modifications, such as glycosylation, thus requiring different methods needed to detect THSD7A on Western blots and leading to different banding patterns. Based on our THSD7A trafficking data, we believe that in control cells, most of the THSD7A is getting trafficked and secreted via small EVs. As you can see in Figure 7A, the band for THSD7A in the shScr cell lysate is relatively light and also shows a double band similar to Figure 6E (both HT1080 samples).

      With regard to the level of knockdown of THSD7A in the Western blot shown in Figure 6E, the normalized level is quantitated below the bands.  If you compare that quantitation to the filopodia phenotypes in the same panel, they are quite concordant.  Figures 7B and 7C show quantification of triplicate Western blots, highlighting the significant accumulation of THSD7A in shEng cell lysates, as well as significant small EV secretion of THSD7A in control and WT rescued conditions.

      (3) The study lacks data on the cellular distribution of endoglin and THSD7A: 

      a) Figure 6 - Is THSD7A expected to be present in the nucleus as shown in panel D (label D is missing in the Figure). It is not clear if this is observed in neurons. a Western of endogenous THSD7A on cell fractions would clarify this. The authors should further characterize the cellular distribution of THSD7A in both cell types. Similarly, the cellular distribution of endoglin in the cancer cells should be provided. This would help validate the proposed model in Figure 8. 

      The image in figure 6D shows an HT1080 cell stained with phalloidin-Alexa Fluor 488 to visualize F-actin with or without expression of THSD7A-mScarlet.  In order to fully visualize the thin filopodia protrusions, the cellular plane of focus of the images for this panel was purposely taken at the bottom of the cell, where the cell is attached to the coverslip glass. Thus, we interpret the red signal across the cell body as THSD7A-mScarlet expression on the plasma membrane underneath the cell, not in the nucleus. The neuron images only include the dendrite portion of the neurons; therefore, there is no nucleus present in the neuronal images.

      b) Figure 7 - Although the western blot provides convincing evidence for the role of endoglin in THSD7A trafficking, the microscopy data lack resolution as well as key analyses. While differences between shSCR and shEng cells are clear visually, the insets appear to be zoomed digitally which decreases resolution and interferes with interpretation. It would be crucial to show the colocalization of endoglin and THSD7A within CD63-postive MVE structures. What are the structures in Figure 7E shSCR zoom1? It would be important to rule out that these are migrasomes using TSPAN4 staining. More information on how the analysis was conducted is needed (i.e. how extracellular areas were chosen and whether the images are representative of the larger population). A widefield image of shSCR and shEng cells and DAPI or HOECHST staining in the higher magnification images should be provided. Additionally, the authors should quantify the colocalization of external CD63 and mScarlet signals from many independently acquired images (as they did for the internal signals in panel F). Is there no external THSD7A signal in the shEng cells? 

      The images for Figure 7E were taken with high resolution on a confocal microscope.  Insets for Figure 7E were zoomed in so that readers could see the tiny structures.  Zoom 1 in Figure 7E shows areas of extracellular deposition. In these areas, we can see small punctate depositions that are positive for CD63 and/or THSD7A-mScarlet. Our interpretation of this staining is that the cells are secreting heterogeneous small EVs that are then attached to the glass coverslip. The images and zooms in Fig 7E were chosen to be representative and indeed reveal that there is more extracellular deposition of THSD7A-mScarlet outside the control shScr cells compared to the shEng cells, consistent with more export of THSD7A into small EVs from shScr cells when compared to those of shEng cells (Fig 7A,B). However, we did not quantify this difference, as these experiments were conducted with transient transfection of THSD7A-mScarlet and it is challenging to determine which cell the extracellular THSD7A-mScarlet came from, complicating any quantitative analysis on a per-cell basis.  Quantification of internal THSD7A localization is much more straightforward in this experimental regime.  Indeed, in Figure 7F we assessed internal colocalization of THSD7A-mScarlet and CD63, which we obtained by choosing only cells that were visually positive for THSD7A-mScarlet in each transient transfection and omitting all extracellular signals. Quantifying the extracellular colocalization of THSD7A and CD63 could certainly be a future direction for this project and would require establishing cells that stably express THSD7A-mScarlet.

    1. Author response:

      We thank the reviewers for their thoughtful feedback and valuable comments. We plan to fully address their concerns by including the following experiments and analyses:

      Reviewer 1 suggested exploring data scaling trends for encoding models, as successful scaling would justify larger datasets for language ECoG studies. To estimate scaling effects, we will develop encoding models on subsets of our data.

      Reviewer 2 expressed uncertainty about the baseline for model-brain correlation and recommended adding control LLMs with randomly initialized weights. In response, we will generate embeddings using untrained LLMs to establish a more robust baseline for encoding results.

      Reviewer 2 also proposed incorporating control regressors such as word frequency and phonetic features of speech. We will re-run our modeling analysis using control regressors for word frequency, 8 syntactic features (e.g., part of speech, dependency, prefix/suffix), and 3 phonetic features (e.g., phonemes, place/manner of articulation) to assess how much these features contribute to encoding performance.

      Reviewer 3 raised concerns that the “plateau in maximal encoding performance” was actually a decline for the largest models. We will add significance tests in Figure 2B to clarify this issue.

      Reviewer 3 also noted that in Supplementary Figure 1A, the decline in encoding performance was more pronounced when using PCA to reduce embedding dimensionality, in contrast to the trend observed when using ridge regression. To address this, we will attempt to replicate the observed scaling trends in Figure 2B using PCA combined with OLS.

      Additionally, we will provide a point-by-point response and revise the manuscript with updated analyses and figures in the near future.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      Colomb et al have further explored the mechanisms of action of a family of three immunodulatory proteins produced by the murine gastrointestinal nematode parasite Heligmosomoides polygyrus bakeri. The family of HpARI proteins binds to the alarmin interleukin 33 and depending on family members, exhibits differential activities, either suppressive or enhancing. The present work extends previous studies by this group showing the binding of DNA by members of this family through a complement control protein (CCP1) domain. Moreover, they identify two members of the family that bind via this domain in a non-specific manner to the extracellular matrix molecule heparan sulphate through a basic charged patch in CCP1. The authors thus propose that binding to DNA or heparan sulphate extends the suppressive action of these two parasite molecules, whereas the third family member does not bind and consequently has a shorter half-life and may function via diffusion. 

      Strengths: 

      A strength of the work is the multifaceted approach to examining and testing their hypotheses, using a well-established and well-defined family of immunomodulatory molecules using multiple approaches including an in vivo setting. 

      Weaknesses: 

      There are a few weaknesses of the approach. Perhaps some discussion and speculation as to how these three family members might operate in concert during Heligmosomoides polygyrus bakeri infection would help place the biology of these molecules in context for the reader, e.g. when and where they are produced. 

      We agree that the roles of these proteins during infection requires further study and is not fully elucidated in infection here. We have added further discussion to the manuscript on their potential roles during infection (track changes manuscript, lines 277 – 283).

      Reviewer #2 (Public Review): 

      Summary: 

      Colomb et al. investigated here the heparin-binding activity of the HpARI family proteins from H. polygyrus. HpARIs bind to IL-33, a pleiotropic cytokine, and modulate its activities. HpARI1/2 has suppressive functions, while HpARI3 can enhance the interaction between IL-33 and its receptor. This study builds upon their previous observation that HpARI2 binds DNA via its CCP1 domain. Here, the authors tested the CCP1 domain of HpARIs in binding heparan sulfate, an important component of the extracellular matrix, and found that 1/2 bind heparan, but 3 cannot, which is related to their half-lives in vivo. 

      Strengths: 

      The authors use a comprehensive multidisciplinary approach to assess the binding and their effects in vivo, coupled with molecular modeling. 

      Weaknesses: 

      (1) Figure 1C should include Western. 

      We apologise for this oversight, and now include an uncropped western blot image as a Figure 1, Figure Supplement 1.

      (2) Figure 1E: Why does HpARI1 stop binding DNA at 50%? 

      It is currently unclear why HpARI1 does not bind to all DNA in the EMSA assay, however this was our repeated finding. With our revised findings we can now state definitively that HpARI1 has a lower affinity for HS compared to HpARI2, and in each of our assays (EMSA (Fig 1D-E), size exclusion chromatography (Fig 4A), HS-bead pull-down (Fig 4B), lung cell surface binding (Fig 4C) and ITC (Fig 4D)) HpARI1 always shows a weaker response compared to HpARI2. We hypothesise that HpARI1 binds more weakly to DNA/HS to allow it to diffuse further from the site of deposition, but we have yet to demonstrate this during infection. We add further discussion of this point (track changes manuscript, lines 262 – 266).

      (3) ITC binding experiment with HpARI1? Also, the ITC results from HpARI2 do not seem to saturate, thus it is difficult to really determine the affinity. 

      We have now included HpARI1-HS ITC, and re-ran the HpARI2 experiment to saturation (Fig 4D-E).

      (4) It would be helpful to add docking results from HpARI1. 

      We have now included HpARI1-HS docking, in Figure 5B.

      (5) Some conclusions are speculative and need to remain in the Discussion. e.g.: a) That HpARI3 may be able to diffuse farther 

      We have rewritten these points to remove the speculation on localisation from the abstract (lines 18-19) and introduction (line 78).

      b) That DNA/HS may trap HpARI1/2 at the infection site. 

      Likewise, these points have been rewritten in the abstract and introduction as above, and we have made it clearer that this is a model that we are proposing in the discussion (line 277-283).

      Reviewer #1 (Recommendations For The Authors): 

      The paper is well-written and the data well-presented. I have one small comment that the authors may like to consider. In the discussion, second paragraph, line 17, perhaps, "evolved" rather than "developed". 

      Thank you for this suggestion, we have made this change (line 248).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      To hopefully contribute to more strongly support the conclusions drawn by the authors, I am including a series of concerns regarding the manuscript, as well as some suggestions that could be useful to address these issues:

      (1) The main results of this study derive from the use of auxin-inducible degron (AID)-tagged proteins. Despite the great advantages of the AID strategy to conditionally deplete proteins, the AID tag can affect the normal function of a protein. In fact, some of the AID-labeled DDC components generated in this work are shown to be hypomorphic. Hence, the manuscript would have benefited from the additional confirmation of some of the observations using a different way to eliminate the proteins (e.g., temperature-sensitive mutants).

      Most ts mutants are also hypomorphic; hence we don’t see there is much advantage to their use. The addition of the AID to these proteins alone does not interfere with the ability to sustain checkpoint arrest as demonstrated in Figure S1. Instead we found that by overexpressing Rad9-AID we could demonstrate that inactivating Rad9 after 15 h behaved the same way as the inactivation of Ddc2, significantly strengthening our finding that the DDC checkpoint becomes dispensable while the SAC takes over. 

      (2) In cells depleted of Rad53-AID, the deletion of CHK1 stimulates an earlier release from a mitotic arrest induced by two DSBs (Figures 2D and 3C). Likewise, the authors claim that a faster escape from the cell cycle block can also be observed when upstream factors such as Ddc2, Rad9, or Rad24 are depleted in the absence of CHK1 (Figures 2A-C and Figures 3D-F). However, this earlier release from the cell cycle arrest, if at all, is only slightly noticeable in a Rad9-AID background (Figures 2B and 3E). In this sense, it is also worth pointing out that Rad9-AID chk1Δ (Figure 3E) and Rad24-AID chk1Δ (Figure 3F) cells were only evaluated up to 7 h, while in all other instances, cells were followed for 9 h, which hinders a fair assessment of the differences in the release from the cell cycle arrest.

      As noted above, we have now been able to examine Rad9 over the long-time frame.

      (3) Although only 25% of the cells depleted for Dun1 remained in G2/M arrest 7 h following the induction of two DSBs, it is shocking that Rad53 was nonetheless still phosphorylated after the cells had escaped the cell cycle blockage (Figure 4A).

      This persistence of Rad53 phosphorylation is also seen with the inactivation of Mad2, allowing escape in spite of continued Rad53 phosphorylation.

      (4) Generation of Rad9-AID2 and Rad24-AID2 strains did not fully restore the function of these proteins, since most cells had adapted 24 h after induction of two DSBs (Figure S1C). Nonetheless, Rad9-AID2 and Rad24-AID2 are still likely more stable than their AID counterparts, and hence the authors could have instead used the AID2 proteins for the experiments in Figure 2 to better evaluate the role of Rad9 and Rad24 in the maintenance of the DDC-dependent arrest.

      We note again that we have found a way to study Rad9 up to 24 h. 

      (5) Deletion of BFA1 has been shown to promote the escape from a cell cycle arrest triggered by telomere uncapping (Wang et al. 2000, Hu et al. 2001, Valerio-Santiago et al. 2013). Likewise, while cells carrying the cdc5-T238A allele cannot adapt to a checkpoint arrest induced by one irreparable DSB, BFA1 deletion rescues the adaptation defect of this mutant CDC5 allele (Rawal et al., 2016). The authors show how, using AID-degrons of Bfa1 and Bub2, that only Bub2, but not Bfa1, is required to maintain a prolonged cell cycle arrest after the induction of two DSBs. To reinforce this point, and as shown for mad2Δ cells (Figure S6A), the authors could perform a complete time course using both the Bfa1-AID and a bfa1Δ mutant to demonstrate that they do indeed show the same behavior in terms of the adaptation to a two DSB-induced cell cycle arrest.

      We thank the reviewer for noting these other instances where bfa1D promoted an escape from arrest. We tested a 2-DSB bfa1 deletion, data has been added to Figure S9E-F. We did not observe a difference in the percentage of cells escaping arrest between the 2-DSB bfa1 deletion and the 2-DSB BFA1-AID strains.

      (6) Bypass or adaptation of a checkpoint-induced cell cycle arrest in S. cerevisiae often leads to cells entering a new cell cycle without doing cytokinesis and, hence, to the accumulation of rebudded cells. However, the experiments shown in the manuscript only account for G1 or budded cells with either one or two nuclei. Do any of the mutants show cytokinesis problems and subsequent rebudding of the cells? If so, this should have been also noted and quantified in the corresponding assays.

      In the cases we have studied we have not seen instances where the cells re-bud without completing mitosis (at least as assessed by the formation of budded cells with two distinct DAPI staining masses). In the morphological assays we have done, we score the continuation of the cell cycle by the appearance of multiple buds, G1, and small budded cells. In our adaptation assays when cells escaped G2/M arrest they formed microcolonies indicating no short-term deficiency in cell division.

      (7) The location of the DSB relative to the centromere of a chromosome seems to be a factor that determines the capacity of the SAC to sustain a prolonged cell cycle arrest. The authors discuss the possibility that the DSB could somehow affect the structure of the kinetochore. Did they evaluate whether Mad1 or Mad2 were more actively recruited to kinetochores in those strains that more strongly trigger the SAC after induction of the DSBs?

      We have not attempted to follow Mad1/2 recruitment. ChIP-seq could be used to monitor Mad1/2 localization at the 16 centromeres in response to DSBs and the spread of g-H2AX across the centromere. Our previous data showed that g-H2AX could spread across the centromere region and could create a change that would be detected by Mad1/2.  This change does not, however, affect the mitotic behavior of a strain in which the H2A genes have been modified to the possibly phosphomimetic H2A-S129E allele.

      (8) The authors could speculate in the discussion about the reasons that could explain why the DDC is required for the maintenance of checkpoint arrest at early stages but then becomes dispensable for the preservation of a prolonged cell DNA DSB-induced cycle arrest, which is instead sustained at later stages by the SAC.

      Our suggestion is that cells would have adapted, but modification of the centromere region engages SAC.

      Finally, some minor issues are:

      (1) The lines in the graphs that display the results from adaptation assays (e.g., Figures 1B and 1E) or cell and nuclear morphology (e.g., Figures 1D and 1G) are too thick. This makes it sometimes difficult to distinguish the actual percentages of cells in each category, particularly in the experiments monitoring nuclear division.

      Fixed

      (2) While both the adaptation assay and the analysis of nuclear division in Figures 1E and 1G, respectively, show a complete DDC-dependent arrest at 4h, the Western blot in Figure 1F suggests that Rad53 is not phosphorylated at that time point. Do these figures represent independent experiments? Ideally, the analysis of cell budding and nuclear division, which is performed in liquid cultures, and the Western blot displaying Rad53 phosphorylation should correspond to the same experiment.

      Cell budding in liquid cultures and adaptation assays were performed in triplicate with 3 biological replicates and the collective results are shown in each graph showing the percentage of large-budded cells. Western blot samples were collected in each liquid culture experiment. The western blot in 1G is a representative western blot.

      (3) It is somewhat confusing that the blots for the proteins are not displayed in the same order in Figures 2A (Rad53 at the top) and 2B or 2C (Rad53 in the middle).

      Fixed.  We place Rad53 – the relevant protein - at the top.

      Reviewer #2 (Recommendations For The Authors):

      (1) Yeast with the two breaks responds to DNA damage checkpoint (DDC) until sometimes 4-15 h post DNA damage. Since the auxin-induced degradation does not completely deplete all the tagged proteins in cells, the results should be more carefully considered and not to interpret if the checkpoint entry or maintenance depends on each target protein's ability to induce Rad53 phosphorylation. It should be theoretically possible if checkpoint maintenance requires only a modest amount of checkpoint factors especially because the experiments involve the induction of one or two DSBs. The low levels of DDC factors may be insufficient for Rad53 activation but could still be effective for cell cycle arrest. Indeed, the Haber group showed that the mating type switch did not induce Rad53 phosphorylation but still invoked detectable DNA damage response. To test such possibilities, the authors might consider employing yet another marker for DDC such as H2A or Chk1 phosphorylation besides Rad53 autophosphorylation. Alternatively, the authors might check if auxin-induced depletion also disrupts break-induced foci formation for checkpoint maintenance or their enrichment at DNA breaks using ChIP assays at various points post-damage.

      DAPI staining of Ddc2-AID cells show that when IAA is added 4 h after DSB induction (Figure S3A), cells escape G2/M arrest as evidenced by the increase in large-budded cells with 2 DAPI signals, small budded cells, and G1 cells. Overexpression of Ddc2 can sustain the checkpoint past 24 h, but without SAC proteins like Mad2 they will eventually adapt (Figure S6B).

      That Rad9-AID or Rad24-AID in the absence of added auxin (but in the presence of TIR1) is unable to sustain arrest suggests to us that low levels of Rad9 or Rad24 are not sufficient to maintain arrest.  As the reviewer notes, normal MAT switching doesn’t cause Rad53 phosphorylation or arrest, though early damage-induced events such as H2A phosphorylation do occur.  But our point is that Rad9 or Ddc2 is needed to maintain arrest only up to a certain point, after which they become superfluous and a different checkpoint arrest is imposed. At that point apparently a low level of these proteins plays no obvious role.

      (2) It is interesting that DDC no longer responds to the damage signaling after 15 h of DSB-induced prolonged checkpoint arrest after two DNA double-strand breaks. Is this also applicable to other adaptation mutants? The results might improve the broad impact of the current conclusions. It is also possible that the transition from DDC to SPC depends on simply the changes in signaling or in part due to the molecular changes in the status of DNA breaks or its flanking regions. Indeed, the proposed model suggests that the spreading of H2A phosphorylation to centromeric regions induces SAC and thus mitotic arrest. The authors could measure H2A phosphorylation near the centromere using ChIP assays at various intervals post-DNA damage. It is particularly interesting if depletion of Ddc2 at 15 h post DNA damage does not alter the level of H2A phosphorylation at or near centromere.

      Our previous data have suggested that the involvement of the SAC in prolonging DSB-induced arrest involved post-translational modification of centromeric chromatin such as the Mec1- and Tel1-dependent phosphorylation of the histone H2A (Dotiwala). In budding yeast there is also a similar DSB-induced modification of histone H2B (Lee et al.). To ask if there is an intrinsic activation of the SAC if the regions around centromeres were modified by checkpoint kinase phosphorylation, we examined cell cycle progression in strains in which histone H2A or histone H2B was mutated to their putative phosphomimetic forms (H2A-S129E and H2B-T129E).  As shown in Figure S11, there was no effect on the growth rate of these strains, or of the double mutant, suggesting that cells did not experience a delay in entering mitosis because of these modifications. We note that although histone H2A-S129E is recognized by an antibody specific for the phosphorylation of histone H2A-S129, the mutation to S129E may not be fully phosphomimetic. 

      (3) It is puzzling why Rad9-AID or Rad24-AID are proficient for DDC establishment but cannot sustain permanent arrest in the two break cells. It appears Rad53 phosphorylation for DDC is weaker in cells expressing Rad9-AID or Rad24-AID according to Fig.2B and C even though their protein level before IAA treatment is still robust. This might also explain why the results of depleting Rad53 and Rad9 are very different. It also raises concern if the effect of Rad24 depletion on checkpoint maintenance is in part due to the weaker checkpoint establishment. It might be necessary to use the AID2 system to redo Rad24 depletion to exclude such a possibility.

      We believe that the AID mutants are very sensitive to the low level of IAA present in yeast.  The instability of the protein is entirely dependent on the TIR1 SCF factor, so the proteins themselves are not intrinsically defective; they are just subject to degradation.  Overexpressing Rad9 allowed us to evaluate its role at late time points. 

      (4) It is intriguing that the switch from DDC to SAC might take place at around 12 h when yeasts with a single unrepairable break ignore DDC and resume cell cycling (so-called "adaptation"). Since 4h and 15h are far apart and the transition point from DDC to SAC likely takes place between these two points, it will be very helpful to analyze and compare cell cycle exit after 24 h by treating IAA at multiple points between 4-15h.

      When we add IAA to Mad2-AID and Mad1-AID 4 h after DSB induction, cells remain arrested for up to 12 h after DSB induction. At 15 h cells begin to exit checkpoint arrest indicating that the handoff of checkpoint arrest must occur between 12 to 15 h after DSB induction. If we degraded DNA damage checkpoint proteins at any point before Mad2, Mad1, and Bub2 begin to contribute to checkpoint arrest, then arrested cells will likely adapt in a similar manner to when IAA was added 4 h after DSB induction.

      (5) Some of the Western blot quality is poor. For instance, in Figure 6C, Mad1-AID level after IAA addition is not compelling especially because the TIR level (the loading control) is also very low.

      In Figure 6C, while the relative levels of TIR1 are similar in the IAA treated and untreated samples, there is no detectable amount of Mad1-AID in the IAA treated samples indicating that Mad1-AID was successful degraded with the AID system.

      (6) Fig. 8 is complex. It might be helpful to define the different types of arrows in the figure. The legend also has a spelling error, Rad23 should be Rad24.

      We’ve defined what each arrow means in the legend and corrected the spelling error in the figure legend.

      Reviewer #3 (Recommendations For The Authors):

      Major concerns:

      Much of the manuscript states that two unrepairable DSBs lead to a long and severe G2/M arrest. Two main cytological approaches are used to make this statement: bud size and number on plates after micromanipulation (microcolony assay), and cell and nuclear morphology in liquid cultures. While the latter gives a clear pattern that can be assigned to a G2/M block as expected by DDC, i.e. metaphase-like mononucleated cells with large buds, the former can only tell whether cells eventually reach a second S phase (large budded cells on the plate can be in a proper G2/M arrest, but can also be in an anaphase block or even in the ensuing G1). The authors always performed the microcolony assay, but there are several cases where the much more informative budding/DAPI assay is missing. These include Dun1-aid and others, but more importantly chk1D and its combinations with DDC proteins. Incidentally, for the microcolony assay, it is more accurate to label the y-axis of the corresponding graphs (and in the figure legends and main text) with something like "large budded cells"; "G2/M arrested cells" is misleading.

      Figures have been updated to more accurately reflect what we are measuring.

      The results obtained with the Bfa1/Bub2 partner are intriguing. These two proteins form a complex whose canonical function is to prevent exit from mitosis until the spindle is properly aligned, acting in a distinct subpathway within the SAC that blocks MEN rather than anaphase onset. The data presented by the authors suggest that, on the one hand, both SAC subpathways work together to block the cell cycle. However, why does canonical SAC (Mad1/Mad2) inactivation not lead to a transition from G2/M (metaphase-like) arrested cells to anaphase-like arrest maintained by Bfa1-Bub2? Since Bfa1-Bub2 is a target of DDC, is it possible that DDC knockdown also inactivates this checkpoint, allowing adaptation? On the other hand, can the authors provide more data to confirm and strengthen their claim of a Bfa1-independent Bub2 role in prolonged arrest? Perhaps long-term protein localization and PTM changes. Bub2-independent roles for Bfa1 have been reported, but not vice versa, to the best of my knowledge.

      In the mitotic exit network Bfa1/Bub2 prime activation of the pathway by bringing Tem1 to spindle pole bodies. Phosphorylation of Bfa1 causes Tem1 to be released and phosphorylate Cdc5 to trigger exit by MEN. It has been shown that DNA damage, in a cdc13-1 ts mutant, phosphorylates Bfa1 in a Rad53 and Dun1 dependent manner. This phosphorylation of Bfa1 could release Tem1 and prime cells to exit checkpoint arrest when cells pass through anaphase. Looking at Tem1 localization to spindle pole bodies and interactions with Bfa1/Bub2 in response to DNA damage might give insight into why cells don’t experience an anaphase-like arrest when they are released by either deactivation of the DNA damage checkpoint or SAC.

      We have previously shown that a deletion of bub2 in a 1-DSB background shortens DSB-induced checkpoint arrest. Deletion of bfa1 in a 2-DSB background showed ~80-70% of cells stuck in a large-budded state as measured through an adaptation assay tracking the morphology of G1 cells on a YP-Gal plate and DAPI staining. Deletion or degradation of bfa1 might not release cells from arrest because the Mad2/Mad1 prevent cells from transitioning into anaphase. Our DAPI data for Bub2-AID shows an increase in cells with 2 DAPI signals (transition into anaphase) and small budded cells indicating that degradation of Bub2 is releasing cells into anaphase and allowing cells to complete mitosis.

      Further suggestions:

      It would be richer if authors could provide more than one experimental replicate in some panels (e.g., S1A,B; S4A; and S6B).

      S1C confirms that Rad9-AID and Rad24-AID will adapt by 24 h even with the point mutant TIR1(F74G) which has lower basal degradation than TIR1. S4A has been updated with additional experimental replicates. The 48 h timepoint after DSB induction was to show the importance of Mad2 even when Ddc2 is overexpressed.

      Figure 1: Rearrange figure panels when they are first mentioned in the text. For example, it makes more sense to have the plate adaptation assay as panel B for both 1-DSB and 2-DSB strains, budding plus DAPI as panel C, and Rad53 as panel D.

      These figures have been rearranged in the order that they are mentioned in the paper.

      Figure 5: Correct Ph-5-IAA in the Rad53 WBs (it should be 5-Ph-IAA).

      This has been corrected.

      Figure S2: The straight line under the "+IAA" text box is misleading. I think it should also cover the "-2" time point, right? Also, check the figure legend. Information is missing and does not correspond to the figure layout.

      This has been corrected.

      Figure S3: Perhaps "Cell cycle profile as determined by budding and DAPI staining" is a better and more accurate legend title.

      The legend title has been updated to “Cell cycle profile as determined by budding and DAPI staining in Ddc2-AID and Rad53-AID mutants ± IAA 4 h after galactose.”

      Figure S5: Detection of both Rad53 and Ddc2 in the same blot could lead to misinterpretation as hyperphosphorylated Rad53 appears to coincide with Ddc2 migration.

      Figure S5A-B are representative western blots where Rad53 was probed to show activation of the DNA damage checkpoint by Rad53 phosphorylation. When measuring the relative abundance of Ddc2 we did not probe all blots for Rad53.

      Table S1: Include the post-hoc test used for comparisons after ANOVA.

      A Sidak post-hoc test was used in PRISM for the one-way ANOVA test. PRISM listed the Sidak post-hoc test as the recommended test to correct for multiple comparisons. A column has been added to S. Table 1 to show which post-hoc test was used.

      Page 10, line 4: The putative additive effect of chk1 knockout with Dun1 depletion should also be compared to chk1 alone (in Figure 3A).

      We address the additive effect of chk1 knockout with Dun1-AID depletion in a later section on Page 11, line 6. Since we had not explored possible effects from downstream targets of Rad53 for prolonging checkpoint arrest when Rad53 was depleted, we did not mention the effect of the chk1 knockout on Dun1 depletion.

      Page 14, second paragraph, line 4: "Figure 6A-D", is it not?

      Figure S6A is measuring checkpoint arrest in a deletion of mad2 in a 2-DSB strain. Figure 6A-D shows how degradation of Mad2-AID and Mad1-AID after the handoff of arrest causes cells to exit the checkpoint in a Rad53 independent manner.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: The authors investigated the function of Microrchidia (MORC) proteins in the human malaria parasite Plasmodium falciparum. Recognizing MORC's implication in DNA compaction and gene silencing across diverse species, the study aimed to explore the influence of PfMORC on transcriptional regulation, life cycle progression and survival of the malaria parasite. Depletion of PfMORC leads to the collapse of heterochromatin and thus to the killing of the parasite. The potential regulatory role of PfMORC in the survival of the parasite suggests that it may be central to the development of new antimalarial strategies.

      Strengths: The application of the cutting-edge CRISPR/Cas9 genome editing tool, combined with other molecular and genomic approaches, provides a robust methodology. Comprehensive ChIP-seq experiments indicate PfMORC's interaction with sub-telomeric areas and genes tied to antigenic variation, suggesting its pivotal role in stage transition. The incorporation of Hi-C studies is noteworthy, enabling the visualization of changes in chromatin conformation in response to PfMORC knockdown.

      We greatly appreciate the overall positive feedback and cognisense of our efforts. Our application of CRISPR/Cas9 genome editing tools coupled with complementary cellular and functional approaches shed light on the importance of _Pf_MORC in maintaining chromatin structural integrity in the parasite and highlights this protein as a promising target for novel therapeutic intervention.

      Weaknesses: Although disruption of PfMORC affects chromatin architecture and stage-specific gene expression, determining a direct cause-effect relationship requires further investigation.

      Our conclusions were made on the basis of multiple, unbiased molecular and functional genomic assays that point to the relevance of the _Pf_MORC protein in maintaining the parasite’s chromatin landscape. Although we do not claim to have precise evidence on the step-by-step pathway to which _Pf_MORC is involved, we bring forth first-hand evidence of its role in heterochromatin binding, gene-regulation and its association with major TFs as well as chromatin remodeling and modifying enzymes. We however agree with the comment regarding the lack of direct effects of _Pf_MORC KD and have since provided additional evidence by performing ChIP-seq experiments against H3K9me3 and H3K9ac during KD. Our new results are presented in Fig. 5. We showed that the level of H3K9me3 decreased significantly during _Pf_MORC KD.

      Furthermore, while numerous interacting partners have been identified, their validation is critical and understanding their role in directing MORC to its targets or in influencing the chromatin compaction activities of MORC is essential for further clarification. In addition, the authors should adjust their conclusions in the manuscript to more accurately represent the multifaceted functions of MORC in the parasite.

      Validation of the identified interacting partners is indeed critical and essential to understanding their role in directing MORC to its targets. Our protein pull down experiments have been done using several biological replicates. Several of the interacting partners have also been identified and published by other labs and collaborators. To confirm our results, we completed a direct comparison of our work with previous published work. Results have now been incorporated into the revised manuscript to confirm the identified interacting partners and the accuracy of the data we obtained in our experiment. Molecular validation of novel proteins identified in our protein pull down requires generation of tagged lines and may take a few more years but will be submitted for publication in a follow up manuscript.

      Reviewer #2 (Public Review):

      Summary: This paper, titled "Regulation of Chromatin Accessibility and Transcriptional Repression by PfMORC Protein in Plasmodium falciparum," delves into the PfMORC protein's role during the intra-erythrocytic cycle of the malaria parasite, P. falciparum. Le Roch et al. examined PfMORC's interactions with proteins, its genomic distribution in different parasite life stages (rings, trophozoites, schizonts), and the transcriptome's response to PfMORC depletion. They conducted a chromatin conformation capture on PfMORC-depleted parasites and observed significant alterations. Furthermore, they demonstrated that PfMORC depletion is lethal to the parasite.

      Strengths: This study significantly advances our understanding of PfMORC's role in establishing heterochromatin. The direct consequences of the PfMORC depletion are addressed using chromatin conformation capture.

      We appreciate the Reviewer’s comments and reflection on the importance of our work.

      Weaknesses: The study only partially addressed the direct effects of PfMORC depletion on other heterochromatin markers.

      Here again, we agree with the reviewer’s comment and have performed additional experiments to delve deeper into the multifaceted roles of _Pf_MORC. We have performed additional ChIP-sequencing analysis on _Pf_MORC depleted conditions focusing on known heterochromatin and euchromatin markers H3K9me3 and H3K9ac respectively. We hope our new results presented in figure 5 will shed light on the more direct implications of _Pf_MORC on heterochromatin and gene silencing.

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses.

      • Why does MORC, which was used in the pull-down, seem to be only minimally enriched in the volcano plot, while a series of proteins (marked in red) and AP2 (highlighted in green) are enriched with log2 fold changes exceeding 15?

      We apologize for the confusion. MORC was detected with the highest number of peptides (97 and 113) and spectra (1041 and 1177) confirming the efficiency of our pull-down. However, considering the relatively large size of the MORC protein (295kDa) and it weak detection in the control (5 and 7 peptides; 16 and 43 spectra), the Log2 FoldChange and Z-statistic after normalization are minimal compared to smaller proteins that were not identified in the control samples.

      Additionally, can you explain why these proteins appear to be enriched at the same fold? 

      We can postulate that these proteins form a complex with a ratio of 1:1. Two of these three proteins are described to interact with MORC in several publications, supporting a strong interaction between them.

      Variations in the interactome could result from the washing buffer's stringency.

      We agree that the IP conditions could affect the detection of the interactome as well as the parasite stage used. As indicated below, the overlap with previous publications and the presence of AP2 TFs and chromatin remodelers strongly support our results.

      It would be highly appropriate for the authors, similar to the co-submitted article (Maneesh Kumar Singh et al.), to present their mass spectrometry data in relation to previous purifications in Plasmodium (Bryant et al. 2020; Subudhi et al. 2023; Hillier et al. 2019) and also in Toxoplasma (Farhat et al. 2020). It would be good if authors could also put their results into perspective in light of the following pre-prints:

      We agree with the reviewer’s comment. In this revised manuscript, we compared our IP-MS data to previous published manuscripts. Key proteins including the AP2-P (PF3D7_1107800) and HDAC1 were indeed identified in several experiments validating our initial findings of the formation of large complexes with MORC. However, it’s important to highlight that the MORC protein was not used as the bait protein in previously published papers, and thus some discrepancies can be observed.

      Given the tendency of MORCs to form multiple complexes with AP2 factors, have you explored whether specific AP2s are conserved between Plasmodium and Toxoplasma, within the phylum?

      P. falciparum encodes for 27 putative AP2s, while T. gondii has over 60 AP2s, making direct comparison challenging. Some Plasmodium AP2s have multiple counterparts in T. gondii and typically conservation is limited to the AP2 binding domains. Attempts to identify sequence homology among AP2s and the regions of conservation have been performed (PMID: 30959972, PMID: 30959972, PMID: 16040597). Although this information would provide interesting insight, we believe exploring this topic at this time would diverge from our primary objectives. It would be more appropriate to address this in future studies.

      Could this conservation be identified either through phylogenetic means or by using tools such as AlphaFold, especially considering not just the AP2 domains but also any existing ACDC domains?

      Although this may reveal important information regarding the association between MORC proteins and AP2 domains, we believe investigating the conservation between AP2 across apicomplexan parasites may prove too challenging and is beyond the scope of this work.

      Most of the genes are depicted without their immediate surroundings (Fig. 2d and Fig S2c, d). For instance, the promoter region of AP2g is not shown (Fig. 2d). It is therefore very challenging to determine the presence or absence of MORC upstream or downstream; considering that this factor, which can create DNA loop protrusions, might bind at a distance from the genes in question.

      All gene coverage plots, including AP2-G, show 500 bp up- and downstream of the displayed gene. We have modified our figure legends to make sure that this information is provided.

      Upon examining Figure S3, it is evident that the authors have indicated a decline in PfMORC expression, represented as percentages over two unique time frames. The methodology behind this quantification remains ambiguous. It's essential for the authors to specify whether normalization was done using a loading control. As a benchmark, Singh et al. (2021) in their Figure 4 transparently used GAPDH as a loading control and included an untreated sample in their western blot analysis.

      We thank the Reviewer for bringing this to our attention. Our initial quantification was performed using ImageJ. To address the Reviewer’s comment, we have reperformed the experiment. Our quantitative analysis was performed through Bio-Rad ImageLab software using aldolase expression as a loading control (50% of the MORC loading). This information has now been incorporated into the supplementary figures (Figure S3).

      There's a striking observation that, despite significant degradation of PfMORC (as depicted in Figures S1 and S3), only the upper band in the western blot diminishes. This inconsistency needs addressing, as it can raise questions about the interpretation of the results.

      We agree with the reviewer's comment. We experienced some challenges upon performing a Western Blot on such a large protein (295kDa). Our initial attempts required long exposure that may have highlighted non-specific signals of smaller proteins. To address the reviewer’s comment, we have performed the experiment one more time and made necessary changes to our WB protocol. Our new result better reflects the expected down regulation of _Pf_MORC. These changes have been incorporated to our manuscript and Fig S3.

      Recommendations for improving the writing and presentation.

      MORC KD quantification and consistency with previous findings (Figure S3): When comparing their results with those from another study (Singh et al. 2021), it's critical to ensure that the experimental conditions, especially the methodology for KD and the quantification of protein levels, are similar. If not, a direct comparison might be misleading.

      We greatly appreciate the suggestions and have made efforts to redesign the MORC KD quantifications according to the reviewer’s recommendations.

      While the manuscript mentions the level of KD, it does not delve into the functional consequences of such a decrease in protein levels. It would be of interest to understand how this level of KD affects the parasite's biology, especially in the context of the paper's main findings.

      We have addressed this question by looking at the changes in chromatin structure in WT versus KD parasites upon atc removal. We have also validated this initial result by designing an additional ChIP-seq experiment against histone marks in WT versus KD parasites upon atc removal. Our findings showed a significant downregulation in H3K9me coverage in heterochromatin regions, specifically in genes associated with antigenic variation and invasion genes. These findings suggest that PfMORC regulates at least partially gene silencing and chromatin arrangements. The manuscript has been edited accordingly. 

      Concluding page 5, the authors present an interpretation of their findings that suggests a multi-faceted role of PfMORC in regulating stage-specific gene families, particularly the gametocyte-related genes and merozoite surface proteins. While the narrative they present is intriguing, several concerns arise:

      Over-reliance on correlation: The authors draw a direct line between the levels of PfMORC binding and the function of these genes in the parasite's life cycle. However, a mere correlation between PfMORC binding and stage-specific gene activity does not necessarily imply causation. They would need to provide experimental evidence showing that manipulation of PfMORC levels directly impacts these genes' expression.

      We agree with the reviewer's comment. We have however partially addressed this issue by comparing our ChIP-seq, RNA-seq and Hi-C experiments. We concluded that several of the transcriptional changes observed were due to an indirect effect of PfMORC KD and were most likely induced by a cell cycle arrest and partial collapse of the chromatin structure. The collapse of the heterochromatin structure was validated using our Hi-C experiment. To further address additional concerns the review’s had, we have included additional ChIP-seq experiments targeting histone marks to confirm our initial hypothesis. Result of this additional experiment has been incorporated in the revised version of the manuscript.

      Ambiguity surrounding "low levels" and "high levels": The terms "low levels" and "high levels" of PfMORC binding are qualitative and could be subject to interpretation. Without quantification or a clear benchmark, these descriptions remain vague.

      We agree with the reviewers that the terms "low levels" and "high levels" of PfMORC binding are qualitative and could be subject to interpretation. We have however quantified our change in DNA binding using normalized reads (RPKM). In trophozoite and schizont stages, most of the genes contain a mean of <0.5 RPKM normalized reads per nucleotide of Pf_MORC binding within their promoter region, whereas antigenic gene families such as _var and rifin contain ~1.5 and 0.5 normalized reads, respectively (Fig. 2b). Similar results are also obtained for the gametocyte-specific transcription factor AP2-G  that contains levels of Pf_MORC binding similar to what is observed in _var genes (Fig. 2c and S2c, d).

      Shift in Binding Sites: The observed minor switch in PfMORC binding sites from gene bodies to intergenic and promoter regions is mentioned, but without context on how these shifts impact gene expression or any comparative analysis with other proteins showing similar shifts. The claim that this shift implicates PfMORC as an "insulator" is a leap without direct evidence.

      We apologize for the confusion. We  have compared our ChIP-seq with RNA seq results at different time points of the cell cycle and demonstrated that the shift observed has an effect in gene expression. We have edit the manuscript to clarify these results.

      Overextension of PfMORC's Role: The authors suggest that PfMORC moves to the regulatory regions around the TSS to guide RNA Polymerase and transcription factors. This is a substantial claim and would require additional experiments to validate. Simply observing binding in a region is insufficient to assign a specific functional role, especially one as critical as guiding RNA Polymerase. Historically, the MORC family has been primarily linked with gene silencing across Apicomplexan, plants, and metazoans. On page 7, the authors noted a minimal overlap between the ChIP-seq and RNA-seq signals (Fig. 4e). They also acknowledged that the pronounced gene expression shifts at schizont stages result from a combination of direct and indirect impacts of PfMORC degradation, which could cause cell cycle arrest and potential heterochromatin disintegration, rather than just decreased PfMORC binding. Therefore, the authors should adjust their conclusions in the manuscript to more accurately represent the multifaceted functions of MORC in the parasite.

      We agree with the reviewer's comment and have edited the manuscript accordingly.  

      DISCUSSION:

      The authors concluded that "Using a combination of ChIP-seq, protein knock down, RNA-seq and Hi-C experiments, we have demonstrated that the MORC protein is essential for the tight regulation of gene expression through chromatin compaction, preventing access to gene promoters from TFs and the general transcriptional machinery in a stage specific manner."

      Again, the assertion that MORC protein is essential for tight regulation of gene expression, based purely on correlational data (e.g., ChIP-seq showing binding doesn't prove functionality), assumes causality which might not be fully substantiated. The phrase "preventing access to gene promoters from TFs and the general transcriptional machinery in a stage-specific manner" needs also validation. Asserting that MORC is essential for this function might oversimplify the process and overlook other critical contributors.

      We agree with the reviewer’s comments and the conclusion has since been edited accordingly.

      The discussion is quite poor. It would be pertinent to put MORC in perspective within the broader picture of regulatory mechanisms of chromatin state at telomeres and var genes. For instance, how do SIR2 and HDAC1 (associated with MORC) divide the task of deacetylation? Or the contribution of HP1 and other non-coding RNAs.

      We agree with the reviewer’s suggestion. However, in order to put MORC in perspective within a broader picture, we would need to measure changes in localization of several molecular components regulating heterochromatin in WT versus KD condition. This will require access to several molecular tools and specific antibodies that we do not currently have. We have addressed these issues in our discussion.  

      Minor corrections to the text and figures.

      Figure 1d: Could you provide the ID for each AP2 directly on the volcano plot? While some IDs are referenced in the manuscript, visual representation in the plot would facilitate a clearer understanding of their enrichment levels.

      ID for unknown AP2 proteins have been added on the volcano plot.

      I recommend presenting Figure S2b as a panel within a primary figure. This change would offer readers a more quantitative understanding of the distinct differences between developmental stages. Notably, there seems to be a limited number of genes in common when considering the total, and there is an apparent lack of enrichment in the ring stage.

      This has been done.

      The captions are very minimally detailed. An effort must be made to better describe the panels as well as which statistical tests were used. 

      We have improved the figure legends and add the number of biological replicates as well as the statistic used in each figure legend.

      Figure 1A: The protein diagram with its domains does not take scale into account.

      The figure has been modified.

      Reviewer #2 (Recommendations For The Authors):

      (1) The study lacks a direct link between PfMORC's inferred function and the state of heterochromatin in the genome post-depletion.

      We agree with the reviewer's comment and have included additional ChIP-seq experiments to measure changes in histone marks in PfMORC depleted parasite line. We show a significant decrease in histone H3K9me3 marks in PfMORC KD condition.

      Conducting ChIP-seq on well-known heterochromatin markers such as H3K9me3, HP1, or H3K36me2/3 could shed light on the consequences of PfMORC depletion on global heterochromatin and its boundaries.

      With no access to an anti-HP1 antibody with reasonable affinity, we have not been able to study the impact of MORC KD on HP1 but have successfully observed the impact on H3K9me3 marks. These results have been added to the revised manuscript in (Fig. 5).

      (2) The authors should conduct a more comprehensive analysis of PfMORC's genomic localization, comparing it to ApiAP2 binding (interacting proteins) and histone modifications. This would provide valuable insights.

      We have performed a more comprehensive genome wide analysis of MORC binding through ChIP-seq on WT and MORC-KD conditions. Our results show that Pf_MORC localizes to heterochromatin with significant overlap with H3K9-trimethylation (H3K9me3) marks, at or near _var gene regions. When downregulated, level of H3K9me3 was detected at a lower level, validating a possible role of _Pf_MORC in gene repression. Regarding the comparison with AP2 binding, our proteomics datasets have shown extensive MORC binding with several AP2 proteins.

      (3) RNA-seq data reveals that only a few genes are affected after 24 hours of PfMORC depletion, with an equivalent number of up-regulated and down-regulated genes. The reasons behind down-regulation resulting from a heterochromatin marker depletion are not clearly established.

      We agree with the reviewer’s comment. At this stage (24 hours), _Pf_MORC depletion is limited and the effects at the transcriptional level are quite restricted. Furthermore, it is highly probable that down-regulated genes are most likely due to an indirect effect of a cell cycle arrest. We have edited the manuscript to address this comment. 

      The relationship between this data and the partial depletion of PfMORC needs further discussion.

      We agree with the reviewers and have improved our discussion in the revised version of the manuscript.

      (4) The authors did not compare their ChIP-seq data with the genes found downregulated in the RNA-seq data. Examining the correlation between these datasets would enhance the study.

      We apologize for the confusion. We have compared ChIP-seq and RNA-seq data and identified a very limited number of overlapping genes indicating that most of the changes observed in gene expression are in fact most likely indirect due to a cell cycle arrest and a collapse of the chromatin. We have edited the manuscript to clarify this issue.

      (5) The discussion section is relatively concise and does not fully address the complexity of the data, warranting further exploration.

      We have improved the discussion section in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors previously showed in cell culture that Su(H), the transcription factor mediating Notch pathway activity, was phosphorylated on S269 and they found that a phospho-deficient Su(H) allele behaves as a moderate gain of Notch activity in flies, notably during blood cell development. Since a downregulation of Notch signaling was proposed to be important for the production of a specialized blood cell types (lamellocytes) in response to wasp parasitism, the authors hypothesized that Su(H) phosphorylation might be involved in this cellular immune response.

      Consistent with their hypothesis, the authors show that Su(H)S269A knock-in flies display a reduced response to wasp parasitism and that Su(H) is phosphorylated upon infestation. Using in vitro kinase assays and a genetic screen, they identify the PKCa family member Pkc53E as the putative kinase involved in Su(H) phosphorylation and they show that Pkc53E can bind Su(H). They further show that Pkc53E deficit or its knock-down in larval blood cells results in similar blood cell phenotypes as Su(H)S269A, including a reduced response to wasp parasitism, and their epistatic analyses indicate that Pkc53E acts upstream of Su(H).

      Strengths

      The manuscript is well presented and the experiments are sound, with a good combination of genetic and biochemical approaches and several clear phenotypes which back the main conclusions. Notably Su(H)S269A mutation or Pkc53E deficiency strongly reduces lamellocyte production and the epistatic data are convincing.

      Weaknesses

      The phenotypic analysis of larval blood cells remains rather superficial. Looking at melanized cells is a crude surrogate to quantify crystal cell numbers as it is biased toward sessile cells (with specific location) and does not bring information concerning the percentage of blood cells differentiated along this lineage.

      In Su(H)S269A knock-in or Pkc53E zygotic mutants, the increase in crystal cells in uninfected conditions and the decreased capacity to induce lamellocytes following infection could have many origins which are not investigated. For instance, premature blood cell differentiation could promote crystal cell differentiation and reduce the pool of lamellocytes progenitors. These mutations could also affect the development and function of the posterior signaling center in the lymph gland, which plays a key role in lamellocyte induction.

      Similarly, the mild decrease on resistance to wasp infestation (Fig. 2A) could reflect a constitutive reduction in blood cell numbers in Su(H)S269A larvae rather than a defective down-regulation of Notch activity.

      We fully agree with the reviewer that sessile crystal cells counts are a coarse approach to capture hemocytes. However, they allowed the screening of numerous genotypes in the course of our kinase candidate screen. We recorded the hemocyte numbers in the various genetic backgrounds and with regard to wasp infestation. There was no significant difference between Su(H)S269A and Su(H)gwt control, independent of infection. This is in agreement with earlier observations of unchanged plasmatocyte numbers in N or Su(H) mutants compared to the wild type (Duvic et al., 2002). We noted, however, a small drop in hemocyte numbers in Su(H)S269D and a strong one in Pkc53ED28 mutants in both conditions relative to control. Presumably, Pkc53E has a more general role in blood cell development, which we have not further analysed. The results were included in new Figure 1_S1 and Figure 9_S1 supplements. Based on the link between hemocyte numbers and wasp resistance (e.g. McGonigle et al., 2017), we cannot exclude that the lowered resistance of Pkc53ED28 mutants regarding wasp attacks is partly due to reduced hemocyte numbers, albeit we did not see significant differences between either Su(H)S269A, nor Pkc53ED28 nor the double mutant. We have included this notion in the text.

      Lamellocytes arise in response to external challenges like parasitoid wasp infestation by trans-differentiation from larval plasmatocytes, and by maturation of lamellocyte precursors in the lymph gland, yet barely in the Su(H)S269A and Pkc53ED28 mutants.

      We find it hard to envisage, however, that a premature differentiation of plasmatocytes into crystal cells in our case could deplete the pool of lamellocyte progenitors in the hemolymph. (Is there a precedent?). Crystal cells make up about 5% of the hemocyte pool; they are increased max. 2 fold in the Su(H)S269A and Pkc53E mutants. Even if these extra crystal cells (now  ̴10%) had arisen by premature differentiation, there should be still enough plasmatocytes (̴ 80%) remaining with a potential to further divide and transdifferentiate into lamellocytes.

      Indeed, we cannot exclude an effect of the Su(H)S269A mutant on the development and function of the posterior signaling center of the lymph gland. We noted, however, a slight but significant enlargement of the PS in the Su(H)S269A mutant, that to our understanding cannot explain the reduced lamellocyte numbers.

      Whereas the authors also present targeted-knock down/inhibition of Pkc53E suggesting that this enzyme is required in blood cells to control crystal cell fate (Fig. 6), it is somehow misleading to use lz-GAL4 as a driver in the lymph gland and hml-GAL4 in circulating hemocytes as these two drivers do not target the same blood cell populations/steps in the crystal cell development process.

      We fully agree with the reviewer that the two driver lines target different blood cell populations/ steps in hematopoiesis. The hml-Gal4 driver is regarded pan-hemocyte, common to both plasmatocytes and pre-crystal cells (e.g. Tattikota et al., 2020). It has been reported to drive specifically within differentiated hemocytes prior to or at the stage of crystal cells commitment (Mukherjee et al., 2011). Hence, hml-Gal4 appeared suitable to hit sessile and circulating hemocytes prior to final differentiation into crystal cells or lamellocytes, respectively.

      In the lymph gland, however, hml is expressed within the cortical zone, where it appears specific to the plasmatocytes lineage, and not present in the crystal cell precursors (Blanco-Obregon et al., 2020). In contrast, lz-Gal4 is specific to the differentiating crystal cells in both lineages, i.e. in circulating and sessile hemocytes and in the lymph gland. Hence, we choose lz-Gal4 instead of hml-Gal4 at the risk of driving markedly later in the course of crystal cell differentiation. We included the reasoning in the text. Overall, we feel that this choice does not limit our conclusions.

      In addition, the authors do not present evidence that Pkc55E function (and Su(H) phosphorylation) is required specifically in blood cells to promote lamellocyte production in response to infestation.

      We have tried to address this interesting question by several means. Firstly, we show that Pkc53E is indeed expressed in the various cell types of larval hemocytes, shown in a new Figure 8 and Figure 8_S1 supplement. I.e., there is the potential of Pkc53E to promote lamellocyte formation. Moreover, RNAi-mediated downregulation of Pkc53E within hemocytes affected crystal cell formation similar to the Pkc53ED28 mutant, in agreement with a specific requirement within blood cells (Figure 6). Finally, we show a major drop in Notch target gene transcription (NRE-GFP) in response to wasp infestation within isolated hemocytes from Su(H)gwt in contrast to Su(H)S269A larvae (see new Figure 1 G). These data show that Su(H)-mediated Notch activity must be downregulated in hemocytes prior to lamellocyte formation in agreement with our hypothesis.

      Finally, the conclusion that Pkc53E is (directly) responsible for Su(H) phosophorylation needs to be strengthened. Most importantly, the authors do not demonstrate that Pkc53E is required for Su(H) phosphorylation in vivo (i.e. that Su(H) is not phosphorylated in the absence of Pkc53E following infestation).

      We would very much like to show respective results. Unfortunately, the low affinity of our pS269 antibody does not allow any in situ or in vivo experiments. We very much hope to obtain a more specific phosphoS269-Su(H) antibody allowing us further in situ studies, and show, for example co-localization with Pkc53E.

      In addition, the in vitro kinase assays with bacterially purified Pkc53E (in the presence of PMA or using an activated variant of Pkc53E) only reveal a weak activity on a Su(H) peptide encompassing S269 (Fig. 4).

      The reviewer correctly notes the poor activity of our purified Pkc53EEDDD kinase. This low activity also holds true for the standard peptide (PS), which in fact is even less well accepted than the Swt substrate. Indeed, the commercially available PKCα is a magnitude more active. Whether this reflects the poor quality of our isolated protein compared to the commercial PKCα, or whether it reflects a true biochemical property of Pkc53E remains to be shown in the future. We noted this observation in the manuscript.

      Moreover, while the authors show a coIP between an overexpressed Pkc53E and endogenous Su(H) (Fig. 7) (in the absence of infestation), it has recently been reported that Pkc53E is a cytoplasmic protein in the eye (Shieh et al. 2023), calling for a direct assessment of Pkc53E expression and localization in larval blood cells under normal conditions and upon infestation.

      Indeed, it is interesting that a Pkc53E-GFP fusion protein is cytoplasmic in the eye. The construct reported by Shieh et al. however, i.e. the B-isoform, is preferentially expressed in photoreceptors, where it regulates the de-polymerization of the actin cytoskeleton.

      Due to the eye-specific expression, we unfortunately cannot use the Pkc53E-B-GFP construct to test for Pkc53E’s distribution in other tissues.

      As this construct is of little use for studying hematopoiesis, we have instead used Pck53E-GFP (BL59413) derived from a protein trap: again, GFP is primarily seen in the cytoplasm of hemocytes, including lamellocytes of infected larvae. However, in a small number of hemocytes, GFP appears to be also nuclear (Fig. 8A), leaving the possibility that activated Pkc53E may localize to the nucleus, eventually phosphorylating Su(H) and downregulating Notch activity. As Su(H) enters the nucleus piggy-back with NICD, however, phosphorylation may as well occur at the membrane or within the cytoplasm. We note, however, that these hypotheses require a much more detailed analysis.

      Furthermore, the effect of the PKCa agonist PMA on Su(H)-induced reporter gene expression in cell culture and crystal cell number in vivo is somehow consistent with the authors hypothesis, but some controls are missing (notably western blots to show that PMA/Staurosporine treatment does not affect Su(H)-VP16 level) and it is unclear why STAU treatment alone promotes Su(H)-VP16 activity (in their previous reports, the authors found no difference between Su(H)S269A-VP16 and Su(H)-VP16) or why PMA treatment still has a strong impact on crystal cell number in Su(H)S269A larvae.

      We have added a Western blot showing that the treatment does not affect Su(H)-VP16 expression levels (Figure 5_supplement 1). As STAU is a general kinase inhibitor, it may obviate any inhibitory phosphorylation of Su(H)-VP16 in the HeLa cells, e.g. that by Akt1, CAMK2D or S6K which pilot T271, phosphorylation of which is expected to affect the DNA-binding of Su(H) as well (Figure 3_supplement 2). Moreover, in the previous report, we used different constructs with regard to the promoter, and we used RBPJ instead of Su(H), which may explain some of the discrepancies. As PMA is not specific to just Pkc53E, the altered crystal cell numbers may result from the influence on other kinases involved in blood cell homeostasis, as predicted by our genetic screen (Figure 3_supplement 1).

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should provide a more elaborate examination of larval blood cell types and blood cell counts under normal conditions and following infestation in the different zygotic mutants as well as upon Pkc53 knock-down. A thorough examination of PSC integrity should be performed and the maintenance of core blood cell progenitors examined. The authors should also clarify when after infestation the LG and larval bleeds are analyzed.

      - a more elaborate examination of larval blood cell types:

      - examination of larval blood cell counts under normal conditions: hemocyte # in gwt, SA, SD, & Pkc

      - examination of larval blood cell counts after infestation: hemocyte # in gwt, SA, SD, & Pkc

      - thorough examination  of PSC integrity: in gwt, SA, SD, & Pkc

      - thorough examination of blood cell progenitors: in gwt, SA, SD, & Pkc

      - clarify timing

      Hemocyte numbers of the various genotypes and conditions were recorded and are presented in Figure 1_S1 and Figure 9_S1. Timing was elaborated in the text and the Methods section.

      (2) The authors should clarify why they use lz-GAL4 or hml-GAL4 and what we can infer from using these different drivers.

      See above. The reasoning was included in the text.

      (3) The percentage of hatching of Su(H)S269A and Su(H)gwt flies in the absence of infestation should also be scored; a small decrease in Su(H)S269A viability might explain the observed differences in survival to wasp infestation. Absolute blood cell numbers (in the absence of infestation) have also been correlated with survival to infection and should be checked.

      Percentage of the emerging flies and hemocyte numbers in the absence of infestation were recorded and included in Figure 2, Figure 1_S1, Figure 9_S1.

      (4) Whereas the impact of Su(H)S269A or Pkc53E mutation on lamellocytes production is clear, there is still a substantial reduction in crystal cell production following infestation. So I wouldn't conclude that the Su(H) larvae are "unable" to detect this immune challenge or respond to it (line 116).

      Thank you for the hint, we corrected the text.

      (5) The expression and localization of Pkc53E in larval blood cells should be investigated, for instance using the Pkc53E-GFP line recently published by Shieh et al. (or at least at the RNA level).

      Firstly, we confirmed expression of Pkc53E in hemocytes by RT-PCR (Figure 8_S1 supplement). Secondly, expression of Pkc53E-GFP was monitored in hemocytes (Figure 8). To this end, we used the protein trap (BL59413), since the one published by Shieh et al., 2023 is restricted to photoreceptors.

      (6) It would be interesting to test the anti-pS269 antibody in immunostaining (using Su(H)S269A as negative control).

      Unfortunately, the pS269 antiserum does not work in situ at all.

      (7) The authors must perform a western blot with anti-pS269 in Pkc53e mutant to show that Su(H) is not phosphorylated anymore after wasp infestation.

      The blot gives a negative result.

      (8) It is surprising that no signal is seen in the absence of infestation with anti-pS269: the fact that Su(H)S269A have more crystal cells suggest that there is a constitutive level of phosphorylation of Su(H).

      We fully agree: In the ideal world, we would expect a low level of S269 phosphorylation in the wild type as well. However, given the lousy specificity of our antibody, we were happy to see phospho-Su(H) in infected larvae. We are currently working hard to get a better antibody. 

      (9) The authors should check Su(H)-VP16 levels and phosphorylation status after PMA and/or staurosporine treatment. Some clarifications are also needed to explain the impact of PMA in Su(H)S269 larvae (this clearly suggests that PKC has other substrates implicated in crystal cell development).

      Su(H)-VP16 expression levels were monitored by Western blot and were not altered conspicuously (Figure 5_1 supplement). Presumably, Pkc53E is not the only kinase involved in Su(H) phosphorylation or the transduction of stress signals. Moreover, PMA may have a more general effect on larval development and hematopoiesis affecting both genotypes. We included this reasoning in the text.

      (10) Concerning the redaction, the authors forgot to mention and discuss the work of Cattenoz et al. (EMBO J 2020). The presentation of the screen for kinase candidates could be streamlined and better illustrated (notably supplement table 4, which would be easier to grasp as a figure/graph). The discussion could be shortened (notably the part on T cells), and I don't really understand lines 374-376 (why is it consistent?).

      We are sorry for omitting Cattenoz et al. 2020, which we have now included. We fully agree that this paper is of utmost importance to our work. We streamlined the screen and included a new figure in addition to table 4 summarizing the results graphically (Figure 3_S1 supplement). We cut on the T cell part and omitted the strange lines.

      Reviewer #2 (Public Review):

      Summary:

      The current draft by Deischel et.al., entitled "Inhibition of Notch activity by phosphorylation of CSL in response to parasitization in Drosophila" decribes the role of Pkc53E in the phosphorylation of Su(H) to downregulate its transcriptional activity to mount a successful immune response upon parasitic wasp-infection. Overall, I find the study interesting and relevant especially the identification of Pkc53E in phosphorylation of Su(H) is very nice. However, I have a number of concerns with the manuscript which are central to the idea that link the phosphorylation of Su(H) via Pkc53E to implying its modulation of Notch activity. I enlist them one by one subsequently.

      Strengths:

      I find the study interesting and relevant especially because of the following:

      (1) The identification of Pkc53E in phosphorylation of Su(H) is very interesting.

      (2) The role of this interaction in modulating Notch signaling and thereafter its requirement in mounting a strong immune response to wasp infection is also another strong highlight of this study.

      Weaknesses:

      (1) Epistatic interaction with Notch is needed: In the entire draft, the authors claim Pkc53E role in the phosphorylation of Su(H) is down-stream of notch activity. Given the paper title also invokes Notch, I would suggest authors show this in a direct epistatic interaction using a Notch condition. If loss of Notch function makes many more lamellocytes and GOF makes less, then would modulating Pkc53E (and SuH)) in this manifest any change? In homeostasis as well, given gain of Notch function leads to increased crystal cells the same genetic combinations in homeostasis will be nice to see.

      While I understand that Su(H) functions downstream of Notch, but it is now increasingly evident that Su(H) also functions independent of Notch. An epistatic relationship between Notch and Pkc will clarify if this phosphorylation event of Su(H) via Pkc is part of the canonical interaction being proposed in the manuscript and not a non-canoncial/Notch pathway independent role of Su(H).

      This is important, as I worry that in the current state, while the data are all discussed inlight of Notch activity, any direct data to show this affirmatively is missing. In our hands we do find Notch independent Su(H) function in immune cells, hence this is a suggestion that stems from our own personal experience.

      The role of Notch in Drosophila hematopoiesis, notably during crystal cell development in both hematopoietic compartments is well established; likewise the role of Su(H) as integral signal transducer in this context (e.g. Duvic et al., 2002). Not only promotes Notch activity crystal cell fate by upregulating target genes, at the same time it prevents adopting the alternative plasmatocyte fate (e.g. Terriente-Felix et al., 2013). We could confirm the downregulation of Notch target gene expression in response to wasp infestation by qRT-PCR, which was discovered earlier by Small et al. (2014). This is clearly in favor of a repression of Notch activity rather than a relief of inhibition by Su(H). A ligand-independent activation of Notch signaling has been uncovered in the context of crystal cell maintenance in the lymph gland involving Sima/Hif-α, including Su(H) as transcriptional mediator (Mukherjee et al., 2011). However, we are unaware of a respective Su(H) activity independent of Notch.

      Certainly, Su(H) acts independently of Notch in terms of gene repression. Here, Su(H) forms a repressor complex together with H and co-repressors Groucho and CtBP to silence Notch target genes. Accordingly, loss of Su(H) or H may induce the upregulation of respective gene expression independent of Notch activity. This has been demonstrated, for example, during wing and heart development (Klein et al., 2000; Kölzer, Klein, 2006; Panta et al., 2020). Moreover, during axis formation of the early embryo, global repression is brought about by Su(H) and relieved by activated Notch (Koromila, Stathopolous, 2019). In all these instances, Su(H) is thought to act as a molecular switch, and the activation of Notch causes a strong expression of the respective genes. Likewise, the loss of DNA-binding resulting from the phosphorylation of Su(H) allows the upregulation of repressed Notch target genes in wing imaginal discs, e.g. dpn, as we have demonstrated before with overexpression and clonal analyses (Nagel et al. 2017; Frankenreiter et al., 2021). However, H does not contribute to crystal cell homeostasis, i.e. de-repression of Notch target genes does not appear to be a major driver in this context, asking for additional mechanisms to downregulate Notch activity. Our work provides evidence that these inhibitory mechanisms involves the phosphorylation of Su(H) by Pkc53E. Formally, we cannot exclude alternative mechanisms. Hence, we have tried to avoid the direct link between Su(H) phosphorylation and the inhibition of Notch activity throughout the text, including the title. Moreover, we have discussed the possible consequences of Su(H) lack of DNA binding, interfering either with the activation of Notch target genes or abrogating their repression.

      In addition, we have performed new experiments addressing the epistasis between Notch and Su(H) during crystal cell formation (Figure 1_supplement 1). To this end, we knocked down Notch activity in hemocytes by RNAi (hml::N-RNAi) in the Su(H)gwt and Su(H)S269A background, respectively. Indeed, Notch downregulation strongly impairs crystal cell development independent of the genetic background as expected if Notch were epistatic to Su(H). We attribute the slightly elevated crystal cell numbers observed in the Su(H)S269A background to the increase in the embryonic precursors (see Fig. 4; Frankenreiter et al. 2021). Of note, the Notch gain of function allele Ncos479 also displayed a likewise increase in embryonic crystal cell precursors as well as in crystal cells within the lymph gland (Frankenreiter et al. 2021).

      (2) Temporal regulation of Notch activity in response to wasp-infection and its overlapping dynamics of Su(H) phosphorylation via Pkc is needed:

      First, I suggest the authors to show how Notch activity post infection in a time course dependent manner is altered. A RT-PCR profile of Notch target genes in hemocytes from infected animals at 6, 12, 24, 48 HPI, to gauge an understanding of dynamics in Notch activity will set the tone for when and how it is being modulated. In parallel, this response in phospho mutant of Su(H) will be good to see and will support the requirement for phosphorylation of Su(H) to manifest a strong immune response.

      Indeed, it would be extremely nice to follow the entire processes in every detail, ideally at the cellular level. The challenge, however, is quantities. The mRNA isolated from hemocytes could be barely quantified, although the subsequent ct-values were ok. We quantified NRE-GFP expression, introduced into Su(H)gwt and Su(H)S269A, as well as atilla expression. We were able to generate data for two time slots, 0-6 h and 24-30 h post infection. The data are provided in the extended Figure 1G, and show a strong drop of NRE-GFP in the infected Su(H)gwt control compared to the uninfected animals, whereas expression in Su(H)S269A plateaus at around 60%-70% of the infected Su(H)gwt control. Atilla expression jumps up in the control, but stays low in Su(H)S269A hemocytes.

      Second, is the dynamics of phosphorylation in a time course experiment is missing. While the increased phosphorylation of Su(H) in response to wasp-infestation shown in Fig.2B is using whole animal, this implies a global down-regulation of Su(H)/Notch activity. The authors need to show this response specifically in immune cells. The reader is left to the assumption that this is also true in immune cells. Given the authors have a good antibody, characterizing this same in circulating immune cells in response to infection will be needed. A time course of the phosphorylation state at 6, 12, 24, 48 HPI, to guage an understanding of this dynamics is needed.

      We really would love to do these experiments. Unfortunately, our pS269 antibody is rather lousy. It does not allow to detect Su(H) protein in tissue or cells, nor does it work on protein extracts in Westerns or for IP. Hence, we have no way so far to demonstrate cell or tissue specificity of Su(H) phosphorylation. So far, we were lucky to detect mCherry-tagged Su(H) proteins pulled down in rather large amounts with the highly specific nano-bodies. We have tried very hard to repeat the experiment with hemolymph and lymph glands only, but we have failed so far. Hence, we have to state that our antibody is neither suitable for in vivo analyses, nor for a detection of phospho-Su(H) at lower levels.

      The authors suggest, this mechanism may be a quick way to down-regulate Notch, hence a side by side comparison of the dynamics of Notch down-regulation (such as by doing RT-PCR of Notch target genes following different time point post infection) alongside the levels of pS269 will strengthen the central point being proposed.

      We fully agree and hope to address these issues in the future by improving our tools.

      Last, in Fig7. the authors show Co-immuno-precipitation of Pkc53EHA with Su(H)gwt-mCh 994 protein from Hml-gal4 hemocytes. I understand this is in homeostasis but since this interaction is proposed to be sensitive to infection, then a Co-IP of the two in immune cells, upon infection should be incorporated to strengthen their point.

      We do not fully agree with the reviewer. Although we also think that the interaction between Pkc53E and Su(H) might occur more frequently upon infection, we propose that this is a transient process occurring in several but not all hemocytes at a given time. Moreover, in the described experiment, Pkc53E-HA was expressed in hemocytes via the UAS/Gal4 system. We cannot exclude that this approach causes an overexpression. Hence, we would not expect considerable differences between unchallenged and infested animals.

      (3) In Fig 5B, the authors show the change in crystal cell numbers as read out of PMA induced activation of Pkc53E and subsequent inhibition of Su(H) transcriptional activity, I would suggest the authors use more direct measures of this read out. RT-PCR of Su(H) target genes, in circulating immune cells, will strengthen this point. Formation of crystal cells is not just limited to Notch, I am not convinced that this treatment or the conditions have other affect on immune cells, such as any impact on Hif expression may also lead to lowering of CC numbers. Hence, the authors need to strengthen this point by showing that effects are direct to Notch and Su(H) and not non-specific to any other pathway also shown to be important for CC development.

      We agree with the Reviewer that the rather general influence of PMA on PKCs might present a systemic stress to the animal. For example, we observed a slight drop of crystal cell numbers also in Su(H)S269A, suggesting other kinases apart from Pkc53E were affected that are involved in crystal cell homeostasis. We have included this notion in the text. To provide more conclusive evidence we also fed Staurosporine to the larvae which reversed the PMA effect. In addition, we assayed the expression of NRE-GFP in hemocytes of infected animals by qRT-PCR, and observed a strong drop in the infected versus uninfected control but less so in Su(H)S269A. The new data are provided in extended Figures 1G and 5B.

      (4) In addition to the above mentioned points, the data needs to be strengthened to further support the main conclusions of the manuscript. I would suggest the authors present the infection response with details on the timing of the immune response. Characterization of the immune responses at respective time points (as above or at least 24 and 48 HPI, as norms in the field) will be important. Also, any change in overall cell numbers, other immune cells, plasmatocytes or CC post infection is missing and is needed to present the specificity of the impact. The addition of these will present the data with more rigor in their analysis.

      Total hemocyte numbers of the various genotypes, i.e. control, Su(H)S269A, Su(H)S269D, and Pkc53ED28 were included before and after wasp infestation in supplemental Figures 1_S1 and 9_S1. 

      (5) Finally, what is the view of the authors on what leads to activation of Pkc53E, any upstream input is not presented. It will be good to see if wasp infection leads to increased Pkc53 kinase activity.

      The analysis of the full process is an ongoing project. We propose that ROS is produced upon the wasps’ sting, which is to trigger the subsequent cascade of events. These have to end with activation of Pkc53E in the presumptive pre-lamellocyte pool of both lineages, i.e. in plasmatocyte of the hemolymph, presumably in the sessile compartment (Tattikotta et al., 2021) and at the same time in the lymph gland cortex harboring the LM precursors (Blanco-Obregon et al., 2020). One of the known upstream kinases, Pdk1 has a similar impact on crystal cell development as Pkc53E, making its involvement likely. Moreover, we think that other PKCs influence the process as well.

      Without a good read out, e.g. a functional pSu(H) antiserum working in situ or a Pkc-activity reporter, it will be quite difficult to follow up this question. However, we already know that Pkc53E is expressed in hemocytes of all types independent of wasp infestation, in agreement with a role during lamellocyte differentiation. We hope to unravel the process in more of it in the future.

      Overall, I think the findings in the current state are interesting and fill an important gap, but the authors will need to strengthen the point with more detailed analysis that includes generating new data and also presenting the current data with more rigor in their approach. The data have to showcase the relationship with Notch pathway modulation upon phosphorylation of CSL in a much more comprehensive way, both in homeostasis and in response to infection which is entirely missing in the current draft.

      Reviewer #3 (Public Review):

      Diechsel et al. provide important and valuable insights into how Notch signalling is shut down in response to parasitic wasp infestation in order to suppress crystal cell fate and favour lamellocyte production. The study shows that CSL transcription factor Su(H) is phosphorylated at S269A in response to parasitic wasp infestation and this inhibitory phosphorylation is critical for shutting down Notch. The authors go on to perform a screen for kinases responsible for this phosphorylation and have identified Pkc53E as the specific kinase acting on Su(H) at S269A. Using analysis of mutants, RNAi and biochemistry-based approaches the authors convincingly show how Pkc53E-Su(H) interaction is critical for remodelling hematopoiesis upon wasp challenge. The data presented supports the overall conclusions made by the authors. There are a few points below that need to be addressed by the authors to strengthen the conclusions:

      (1) The authors should check melanized crystal cells in Su(H)gwt and Su(H)S269A in presence of PMA and Staurosporine?

      Thank you for the suggestion. We included the results of PMA + Staurosporine feeding into an extended Fig. 5B; they match those from the HeLa cells. Unfortunately, Staurosporine alone was lethal for the larvae at various concentrations, presumably owing to the overarching inhibition of kinase activity. This global effect also explains the high crystal cell numbers in the control fed with PMA + STAU compared to the untreated animals, as the downregulation of many kinases results in higher crystal cell numbers, a fact uncovered in our genetic screen.

      (2) Data for number of dead pupae, flies eclosed, wasps emerged post infestation should be monitored for the following genotypes and should be included:

      Pkc53EΔ28_, Su(H)S269A,_ Pkc53EΔ28 Su(H)S269A, Su(H)S269D, Su(H)S269D Pkc53EΔ28

      We extended the data with and without infection. The respective data are shown in a new Fig. 9 and an extended Fig. 2,  except for the Su(H)S269D allele. Su(H)S269D is larval lethal, i.e. dies too early for wasp development, and hence could not be included in the assay. Overall, Pkc53EΔ28 matched Su(H)S269A_._

      (3) The exact molecular trigger for activation of Pkc53E upon wasp infestation is not clear.

      Indeed, and we would love to know! Perhaps, the generation of Ca2+ by the wasp’s breach of the larval cuticle results in Pkc53E activation. The generation of ROS could be involved as well. At this point, we can only speculate. We hope to be able in the future to obtain direct experimental evidence for the one or the other hypothesis.

      (4) The authors should check if activating ROS alone or induction of Calcium pulses/DUOX activation can mimic this condition and can trigger activation of Pkc53E and thereby cause phosphorylation of Su(H) at S269

      The reviewer’s suggestions open up a new field of investigations, and are hence beyond of the scope of this article. However, we want to pursue the research in this direction, albeit we realize that counting crystal cells is too coarse but to give a first impression, and that lamellocytes may form already by breaching the larval cuticle. A major challenge shall be direct measurements of Pkc53E activation. To date, we have no tools for this, but ideally, we would like to have a direct, biochemical read out. Although we have been unsuccessful in the past, we want to develop a strong and specific phospho-S269 antibody that is also working in situ. Alternatively, we think of developing a PS-phosphorylation reporter, to allow reasonably addressing these questions.

      (5) Does Pkc53E get activated during sterile inflammation?

      We are in the process of addressing this issue, however, feel that his topic is beyond the scope of this paper. Our preliminary experiments, however, support the notion of a phospho-dependent regulation of Su(H) also in this context.

      Reviewer #3 (Recommendations For The Authors):

      The authors provide a graphical representation of major phenotypes that form the basis of their investigation and conclusions but have not supplemented the quantitation with images that represent these phenotypes. The authors need to include the following data to strengthen their conclusions:

      (1) The authors should include representative images for each of the genotypes/conditions (in presence and absence of wasp infestation) based on which corresponding plots have been made in Figure 1. Please include this for both circulating lamellocytes in the hemolymph and in the lymph glands since this is one of the main figures presenting the key findings.

      The data have been included in Figure 1-S2 supplement.

      (2) Please include representative images of LG with Hnt staining and corresponding images for melanization for each of the genotypes used in the plots in Figure 6A and B.

      The data have been included in Figure 6-S2 supplement.

      (3) Representative images for each of the genotypes in Figure 7A & B should be included (circulating crystal cells and lymph gland crystal cell numbers).

      Representative images for each of the genotypes for Fig. 7A have been included in Figure 7-S1 and for the old Fig. 7B in Figure 9-S2 supplement, respectively.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to reviewers

      We thank the Editor and the Reviewers for their constructure review. In the light of this feedback, we have made a number of changes and additions to the manuscript, that we think improved the presentation and hopefully address the majority of the concerns by the reviewers.

      Main changes:

      •   We added a new SI section (B1) with a population dynamics simulation in the high clonal interference regime and without expiring fitness (see R1: (1)).

      •   We added a new SI section (A9) with the derivation of the equilibrium state of our SIR model in the case of 𝑀 immune groups and in the limit 𝜀 → 0 (see R1: (5)).

      •   The text of the section Abstraction as “expiring” fitness advantage has been modified.

      •   We added a new SI section (A4) describing the links between parameters of the “expiring fitness” and SIR models.

      All three reviewers had concerns about the relation between our SIR model and the “expiring fitness” model, that we hope will be addressed by the last two items listed above. In particular, we would like to underline the following points:

      •   The goal of our SIR model is to give a mechanistic explanation of partial sweeps using traditional epidemiological models. While ecological models (e.g. consumer resource) can give rise to the same phenomenology, we believe that in the context of host-pathogen interaction it is relevant to explicitely show that SIR models can result in partial sweeps.

      •   The expiring fitness model is mainly an effective model: it reproduces some qualitative features of the SIR but does not quantitatively match all aspects of the frequency dynamics in SIR models.

      •   It is possible to link the parameters of the SIR (𝛼,𝛾,𝑏,𝑓) and expiring fitness (𝑠,𝑥,𝜈) models at the beginning of the invasion of the variant (new SI section A4). However, the two models also differ in significant ways (the SIR model can for example oscillate, while the effective model can not). The correspondence of quantities like the initial invasion rate and the ‘expiration rate’ of fitness effects is thus only expected to hold for some time after the emergence of a novel variant.

      Public reviews:

      Reviewer 1:

      Summary In this work, the authors study the dynamics of fast-adapting pathogens under immune pressure in a host population with prior immunity. In an immunologically diverse population, an antigenically escaping variant can perform a partial sweep, as opposed to a sweep in a homogeneous population. In a certain parameter regime, the frequency dynamics can be mapped onto a random walk with zero mean, which is reminiscent of neutral dynamics, albeit with differences in higher order moments. Next, they develop a simplified effective model of time dependent selection with expiring fitness advantage, and posit that the resulting partial sweep dynamics could explain the behaviour of influenza trajectories empirically found in earlier work (Barrat-Charlaix et al. Molecular Biology and Evolution, 2021). Finally, the authors put forward an interesting hypothesis: the mode of evolution is connected to the age of a lineage since ingression into the human population. A mode of meandering frequency trajectories and delayed fixation has indeed been observed in one of the long-established subtypes of human influenza, albeit so far only over a limited period from 2013 to 2020. The paper is overall interesting and well-written. Some aspects, detailed below, are not yet fully convincing and should be treated in a substantial revision.

      We thank the reviewer for their constructive criticism. The deep split in the A/H3N2 HA segment from 2013 to 2020 is indeed the one of the more striking examples of such meandering frequency dynamics in otherwise rapidly adapting populations. But the up and down of H1N1pdm clade 5a.2a.1 in recent years might be a more recent example. We argue that such meandering dynamics might be a common contributor to seasonal influenza dynamics, even if it only spans 3-6 years.

      (1) The quasi-neutral behaviour of amino acid changes above a certain frequency (reported in Fig, 3), which is the main overlap between influenza data and the authors’ model, is not a specific property of that model. Rather, it is a generic property of travelling wave models and more broadly, of evolution under clonal interference (Rice et al. Genetics 2015, Schiffels et al. Genetics 2011). The authors should discuss in more detail the relation to this broader class of models with emergent neutrality. Moreover, the authors’ simulations of the model dynamics are performed up to the onset of clonal interference 𝜌/ 𝑠0 \= 1 (see Fig. 4). Additional simulations more deeply in the regime of clonal interference (e.g. 𝜌/ 𝑠0 \= 5) show more clearly the behaviour in this regime.

      We agree with the reviewer that we did not discuss in detail the effects of clonal interference on quasi-neutrality and predictability. As suggested, we conducted additional simulations of our population model in the regime of high clonal interference (𝜌/ 𝑠0 ≫ 1) and without expiring fitness effects. The results are shown in a new section of the supplementary information. These simulations show, as expected, that increasing clonal interference tends to decrease predictability: the fixation probability of an adaptive mutation found at frequency 𝑥 moves closer to 𝑥 as 𝜌 increases. However, even in a case of strong interference 𝜌/ 𝑠0 \= 32, 𝑝fix remains significantly different from the neutral expectation. We conclude from this that while it is true that dynamics tend to quasi-neutrality in the case of strong interference, this effect alone is unlikely to explain observations of H3N2 influenza dynamics. In our previous publication (BarratCharlaix et al, MBE, 2021) we have also investigated the effect of epistatic interactions between mutations, along side strong clonal interference. We concluded that, while most of these processes make evolution less predictable and push 𝑝fix towards the diagonal, it is hard to reproduce the empirical observations with realistic parameters. The “expiring fitness” model, however, produces this quite readily.

      But there are qualitative differences between quasi-neutrality in traveling wave models and the expiring fitness model. In the traveling wave, a genotype carrying an adaptive mutation is always fitter than if it didn’t carry the mutation. Quasi-neutrality emerges from the accumulation of fitness variation at other loci and the fact that the coalescence time is not much bigger than the inverse selection coefficient of the mutation. In the expiring fitness model, the selective effect of the mutation itself goes away with time. We now discuss the literature on quasi-neutrality and cite Rice et al. 2015 and Schiffels et al. 2011.

      In this context, I also note that the modelling results of this paper, in particular the stalling of frequency increase and the decrease in the number of fixations, are very similar to established results obtained from similar dynamical assumptions in the broader context of consumer resource models; see, e.g., Good et al. PNAS 2018. The authors should place their model in this broader context.

      We thank the reviewer for pointing out the link between consumer resource models and our work. We further strengthened our discussion of the similarity of the phenomenology to models typically used in ecology and made an effort to highlight the link between consumer-resource models and ours in the introduction and in the part on the SIR model.

      (2) The main conceptual problem of this paper is the inference of generic non-predictability from the quasi-neutral behaviour of influenza changes. There is no question that new mutations limit the range of predictions, this problem being most important in lineages with diverse immune groups such as influenza A(H3N2). However, inferring generic non-predictability from quasi-neutrality is logically problematic because predictability refers to individual trajectories, while quasi-neutrality is a property obtained by averaging over many trajectories (Fig. 3). Given an SIR dynamical model for trajectories, as employed here and elsewhere in the literature, the up and down of individual trajectories may be predictable for a while even though allele frequencies do not increase on average. The authors should discuss this point more carefully.

      We agree with the reviewer that the deterministic SIR model is of course predictable. Similarly, a partial sweep is predictable. But we argue that expiring fitness makes evolution less predictable in two ways: (i) When a new adaptive mutation emerges and rises in frequency, we typically don’t know how rapidly its fitness effect is ‘expiring’. Thus even if we can measure its instantaneous growth rate accurately, we can’t predict its fate far into the future. (ii) Compared to the situation where fitness effects are not expiring, time to fixation is longer and there are more opportunities for novel mutations to emergence and change the course of the trajectory. We have tried to make this point clearer in the manuscript.

      (3) To analyze predictability and population dynamics (section 5), the authors use a Wright-Fisher model with expiring fitness dynamics. While here the two sources of the emerging neutrality are easily tuneable (expiring fitness and clonal interference), the connection of this model to the SIR model needs to be substantiated: what is the starting selection 𝑠0 as a function of the SIR parameters (𝑓,𝑏,𝑀,𝜀), the selection decay 𝜈 = 𝜈(𝑓,𝑏,𝑀,𝜀,𝛾)? This would enable the comparison of the partial sweep timing in both models and corroborate the mapping of the SIR onto the simplified W-F model. In addition, the authors’ point would be strengthened if the SIR partial sweeps in Fig.1 and Fig.2 were obtained for a combination of parameters that results in a realistic timescale of partial sweeps.

      We added a new section to the SI (A4) that relates the parameters of the SIR and expiring fitness models. In particular, we compute the initial growth rate 𝑠0 and a proxy for the fitness expiry rate 𝜈 as a function of the SIR parameters 𝛼,𝛾,𝑓,𝑏,𝑀, at the instant where the variant is introduced. The initial growth rate depends primarily on the degree of immune escape 𝑓, while the expiration rate 𝜈 is related to incidence 𝐼wt + 𝐼𝑚. However, as both models have fundamentally different dynamics, these relations are only valid on time scales shorter than potential oscillations of the SIR model. Beyond that, the connection between the models is mostly qualitative: both rely on the fact that growth rate of a strain diminishes when the strain becomes more frequent, and give rise to partial sweeps.

      In Figure 1, the time it takes a partial sweep to finish is roughly 100− 200 generations (bottom right panel). If we consider H3N2 influenza and take one generation to be one week, this corresponds to a sweep time of 2 to 4 years, which is slightly slower but roughly in line with observations for selective sweeps. This time is harder to define if oscillatory dynamics takes place (middle right panel), but the time from the introduction of the mutant to the peak frequency is again of about 4 years. The other parameters of the model correspond to a waning time of 200 weeks and immune escape on the order of 20-30% change in susceptibility.

      Reviewer 2:

      Summary

      This work addresses a puzzling finding in the viral forecasting literature: high-frequency viral variants evince signatures of neutral dynamics, despite strong evidence for adaptive antigenic evolution. The authors explicitly model interactions between the dynamics of viral adaptations and of the environment of host immune memory, making a solid theoretical and simulation-based case for the essential role of host-pathogen eco-evolutionary dynamics. While the work does not directly address improved data-driven viral forecasting, it makes a valuable conceptual contribution to the key dynamical ingredients (and perhaps intrinsic limitations) of such efforts.

      Strengths

      This paper follows up on previous work from these authors and others concerning the problem of predicting future viral variant frequency from variant trajectory (or phylogenetic tree) data, and a model of evolving fitness. This is a problem of high impact: if such predictions are reliable, they empower vaccine design and immunization strategies. A key feature of this previous work is a “traveling fitness wave” picture, in which absolute fitnesses of genotypes degrade at a fixed rate due to an advancing external field, or “degradation of the environment”. The authors have contributed to these modeling efforts, as well as to work that critically evaluates fitness prediction (references 11 and 12). A key point of that prior work was the finding that fitness metrics performed no better than a baseline neutral model estimate (Hamming distance to a consensus nucleotide sequence). Indeed, the apparent good performance of their well-adopted “local branching index” (LBI) was found to be an artifact of its tendency to function as a proxy for the neutral predictor. A commendable strength of this line of work is the scrutiny and critique the authors apply to their own previous projects. The current manuscript follows with a theory and simulation treatment of model elaborations that may explain previous difficulties, as well as point to the intrinsic hardness of the viral forecasting inference problem.

      This work abandons the mathematical expedience of traveling fitness waves in favor of explicitly coupled eco-evolutionary dynamics. The authors develop a multi-compartment susceptible/infected model of the host population, with variant cross-immunity parameters, immune waning, and infectious contact among compartments, alongside the viral growth dynamics. Studying the invasion of adaptive variants in this setting, they discover dynamics that differ qualitatively from the fitness wave setting: instead of a succession of adaptive fixations, invading variants have a characteristic “expiring fitness”: as the immune memories of the host population reconfigure in response to an adaptive variant, the fitness advantage transitions to quasi-neutral behavior. Although their minimal model is not designed for inference, the authors have shown how an elaboration of host immunity dynamics can reproduce a transition to neutral dynamics. This is a valuable contribution that clarifies previously puzzling findings and may facilitate future elaborations for fitness inference methods.

      The authors provide open access to their modeling and simulation code, facilitating future applications of their ideas or critiques of their conclusions.

      We thank the reviewer for their summary, assessement, and constructive critique.

      (1) The current modeling work does not make direct contact with data. I was hoping to see a more direct application of the model to a data-driven prediction problem. In the end, although the results are compelling as is, this disconnect leaves me wondering if the proposed model captures the phenomena in detail, beyond the qualitative phenomenology of expiring fitness. I would imagine that some data is available about cross-immunity between strains of influenza and sarscov2, so hopefully some validation of these mechanisms would be possible.

      We agree with the reviewer that quantitatively confronting our model with data would be very interesting. Unfortunately, most available serological data for influenza and SARS-CoV-2 is obtained using post-infection sera from previoulsy naive animal models. To test our model, we would require human serology data, ideally demographically resolved, and a way to link serology to transmission dynamics. Furthermore, our model is mostly an explanation for qualitative features of variant dynamics and their apparent lack of predictability. We therefore considered that quantitative validation using data is out of scope of this work.

      (2) After developing the SIR model, the authors introduce an effective “expiring fitness” model that avoids the oscillatory behavior of the SIR model. I hoped this could be motivated more directly, perhaps as a limit of the SIR model with many immune groups. As is, the expiring fitness model seems to lose the eco-evolutionary interpretability of the SIR model, retreating to a more phenomenological approach. In particular, it’s not clear how the fitness decay parameter 𝜈 and the initial fitness advantage 𝑠0 relate to the key ecological parameters: the strain cross-immunity and immune group interaction matrices.

      The expiring fitness model emerges as a limiting case, at least qualitatively, of the SIR model when growth rate of the new variant is small compared to the waning rate and the SIR model does not oscillate. This can be readily achieved by many immune groups, which reconciles the large effect of many escape mutations and the lack of oscillation by confining the escape to some fraction of the population. Beyond that, the expiring fitness model is mainly an effective model that allows us to study the consequences of partial sweeps on predictability on long timescales. As stated in the “Main changes” section at the start of this reply, we added an SI section which links parameters of the two models. However, we underline the fact that beyond the phenomenon of partial sweeps, the dynamics of the two are different.

      Reviewer 3:

      Summary

      In this work the authors start presenting a multi-strain SIR model in which viruses circulate in an heterogeneous population with different groups characterized by different cross-immunity structures. They argue that this model can be reformulated as a random walk characterized by new variants saturating at intermediate frequencies. Then they recast their microscopic description to an effective formalism in which viral strains lose fitness independently from one another. They study several features of this process numerically and analytically, such as the average variants frequency, the probability of fixation, and the coalescent time. They compare qualitatively the dynamics of this model to variants dynamics in RNA viruses such as flu and SARS-CoV-2.

      Strengths

      The idea that a vanishing fitness mechanisms that produce partial sweeps may explain important features of flu evolution is very interesting. Its simplicity and potential generality make it a powerful framework. As noted by the authors, this may have important implications for predictability of virus evolution and such a framework may be beneficial when trying to build predictive models for vaccine design. The vanishing fitness model is well analyzed and produces interesting structures in the strains coalescent. Even though the comparison with data is largely qualitative, this formalism would be helpful when developing more accurate microscopic ingredients that could reproduce viral dynamics quantitatively. This general framework has a potential to be more universal than human RNA viruses, in situations where invading mutants would saturate at intermediate frequencies.

      We thank the reviewer for their positive remarks and constructive criticism below.

      Weaknesses

      The authors build the narrative around a multi-strain SIR model in which viruses circulate in an heterogeneous population, but the connection of this model to the rest of the paper is not well supported by the analysis. When presenting the random walk coarse-grained description in section 3 of the Results, there is no quantitative relation between the random walk ingredients importantly 𝑃(𝛽) - and the SIR model, just a qualitative reasoning that strains would initially grow exponentially and saturate at intermediate frequencies. So essentially any other microscopic description with these two features would give rise to the same random walk.

      As also highlighted in the response to other reviewers, we now discuss how the parameter of the SIR model are related to the initial growth rate and the ‘expiration’ rate of the effective model. While the phenomenology of the SIR model is of course richer, this correspondence describes its overdamped limit qualitatively well.

      Currently it’s unclear whether the specific choices for population heterogeneity and cross-immunity structure in the SIR model matter for the main results of the paper. In section 2, it seems that the main effect of these ingredients are reduced oscillations in variants frequencies and a rescaled initial growth rate. But ultimately a homogeneous population would also produce steady state coexistence between strains, and oscillation amplitude likely depends on parameters choices. Thus a homogeneous population may lead to a similar coarse-grained random walk.

      The reviewer is correct that the primary effects of using many immune groups is to slow down the increase of novel variant, which in turn dampens the oscillations. Having multiple immune groups widens the parameter space in which partial sweeps without dramatic oscillations are observed. For slow sweeps, similar dymamics are observed in a homogeneous population.

      Similarly, it’s unclear how the SIR model relates to the vanishing fitness framework, other than on a qualitative level given by the fact that both descriptions produce variants saturating at intermediate frequencies. Other microscopic ingredients may lead to a similar description, yet with quantitative differences.

      Both of these points were also raised by other reviewers and we agree that it is worth discussing them at greater length. We now discuss how the parameters of the ‘expiring fitness’ model relate to those of the SIR. We also discuss how other models such as ecological models give rise to similar coarse grained models.

      At the same time, from the current analysis the reader cannot appreciate the impact of such a mean field approximation where strains lose fitness independently from one another, and under what conditions such assumption may be valid.

      In the SIR model, the rate at which strains lose fitness does depend on the precise state of the host population through the quantities 𝑆𝑚 and 𝑆wt , which is apparent in equation (A27) of the new SI section. The fact that a new variant shifts the equilibrium frequencies of previous strains in a proportional way is valid if the “antigenic space” is of very high dimensions, as explained in section Change in frequency when adding subsequent strains of the SI. It would indeed be interesting to explore relaxations of this assumption by considering a larger class of cross immunity matrices 𝐾. However, in the expiring fitness model, the fact that strains lose fitness independently from each ohter is a necessary simplification.

      In summary, the central and most thoroughly supported results in this paper refer to a vanishing fitness model for human RNA viruses. The current narrative, built around the SIR model as a general work on host-pathogen eco-evolution in the abstract, introduction, discussion and even title, does not seem to match the key results and may mislead readers. The SIR description rather seems one of the several possible models, featuring a negative frequency dependent selection, that would produce coarse-grained dynamics qualitatively similar to the vanishing fitness description analyzed here.

      We have revised the text throughout to make the connections between the different parts of the manuscript, in particular the SIR model and the expiring fitness model, clearer. We agree that the phenomenology of the expiring fitness model is more general than the case of human RNA viruses described by the SIR model, but we think this generality is an attractive feature of the coarse-graining, not a shortcoming. Indeed, other settings with negative frequency dependent selection or eco-systems that adapt on appropriate time scale generate similar dynamics.

      Recommendations for the authors:

      Reviewer 1:

      (4) Line 74: what does fitness mean?

      Many population dynamics models, including ones used for viral forecasting, attach a scalar fitness to each strain. The growth rate of each strain is then computed by substracting the average population fitness to the strain’s fitness. In this sentence, fitness is intended in this way.

      (5) Fig. 1: The equilibrium frequency in the middle and bottom rows is hardly smaller than the equilibrium frequency in the top row for one immune group. This is surprising since for M=10, the variant escapes in only 1/10th of the population, which naively should impact the equilibrium frequency more strongly. Could the authors comment on this?

      This is indeed non-trivial, and a hand-waving argument can be made by considering the extreme case 𝜀 = 0. The variant is then completely neutral for the immune groups 𝑖 > 1, and would be at equilibrium at any frequency in these immune groups. Its equilibrium frequency is then only determined by group 1, which is the only one breaking degeneracy. For 𝜀 > 0 but small, we naturally expect a small deviation from the 𝜀 = 0 case and thus 𝛽 should only change slightly.

      A more rigorous argument with a mathematical proof in the case 𝜀 = 0 is now given in section A4 of the supplementary information.

      (6) Fig. 1: In the caption, it is stated that the simulations are performed with 𝜀 = 0.99. Is this a typo? It seems that it should be 𝜀 = 0.01, as in and just below equation (7).

      This was indeed a typo. It is now fixed.

      (7) Fig. 3: The data analysis should be improved. In order to link the average frequency trajectories to standard population genetics of conditional fixation probabilities, the focal time should always be the time where the trajectory crosses the threshold frequency for the first time. Plotting some trajectories from a later time onwards, on their downward path destined to loss, introduces a systematic bias towards negative clonal interference (for these trajectories, the time between the first and the second crossing of the threshold frequency is simply omitted). The focal time of first crossing of the threshold frequency can easily be obtained, e.g., by linear interpolation of the trajectory between subsequent time points of frequency evalution. In light of the modified procedure, the statements on the on the inertia of the trajectories after crossing 𝑥⋆ (line 356) should be re-examined.

      The way we process the data is already in line with the suggestions of the reviewer. In particular, we use as focal time the first time at which a trajectory is found in the threshold frequency bin. Trajectories that are never seen in the bin because of limited time-resolution are simply ignored.

      In Fig. 3, there are no trajectories that are on their downward path at the focal time and when crossing the threshold frequency. Our other work on predictability of flu Barrat-Charlaix et. al. (2021) has a similar figure, which maybe created confusion.

      (8) Fig. 4: authors write 𝛼/ 𝑠0 in the figure, but should be 𝜈/ 𝑠0.

      Fixed.

      (9) Line 420: authors refer to the blue curve in panel B as the case with strong interference. However, strong interference is for higher 𝜌/ 𝑠0, that is panel D (see point 1).

      Fixed.

      (10) Line 477: typo “there will a variety of mutations”.

      Fixed.

      Reviewer 2:

      Should 𝛼 be 𝜈 in Figure 4 legends?

      Thank you very much for spotting this error. We fixed it.

      Equations 4-5 could be further simplified.

      We factorised the 𝐼 term in equation 4. In equation 5, we prefered to keep the 1− 𝛿/ 𝛼 term as this quantity appears in different calculations concerning the model. For instance, 𝑆 = 𝛿/ 𝛼 at equilibrium.

      The sentence before equation 8 references 𝑃𝛽(𝛽), but this wasn’t previously introduced.

      We now introduce 𝑃𝑏𝜂 at the beginning of the section Ultimate fate of the variant.

      In the last paragraph of page 12, “monotonously” maybe should be “monotonically”.

      Fixed.

      For the supplement section B, you might want a more descriptive title than “other”.

      We renamed this section to Expiring fitness model and random walk.

      Reviewer 3:

      To expand on my previous comments, my main concerns regard the connection of section 2 and the SIR model with the rest of the paper.

      In the first paragraph of page 9 the authors argue that a stochastic version of the SIR model would lead to different fixation dynamics in homogeneous vs heterogeneous populations due to the oscillations. This paragraph is quite speculative, some numerical simulations would be necessary to quantitatively address to what extent these two scenarios actually differ in a stochastic setting, and how that depends on parameters.

      Likewise, the connection between the SIR model, the random walk coarse-grained description and the vanishing fitness model can be investigated through numerical simulations of a stochastic SIR given the chosen population and cross-immunity structures with i.e. 10-20 strains. This would allow for a direct comparison of individual strain dynamics rather than the frequency averages, as well as other scalar properties such as higher moments, coalescent, and fixation probability once reaching a given frequency. It would also be possible to characterize numerically the SIR P(beta) bridging the gap with the random walk description. It’s not obvious to me that the SIR P(beta) would not depend on the population size in the presence of birth-death stochasticity, potentially changing the moments scalings. I appreciate that such simulations may be computationally expensive, but similar numerical studies have been performed in previous phylodynamics works so it shouldn’t be out of reach.

      An alternative, the authors should consider re-centering the narrative directly on the random walk of the vanishing fitness model, mentioning the SIR more briefly as a possible qualitative way to get there. Either way the authors should comment on other ways in which this coarse-grained dynamics could arise.

      In the vanishing fitness model, where variants fitnesses are independent, is an infinite dimensional antigenic space implicitly assumed? If that’s the case, it should be explained in the main text.

      A long simulation of the SIR model would indeed be interesting, but is numerically demanding and our current simulation framework doesn’t scale well for many strains and susceptibilities. We thus refrained from adding extensive simulations.

      In Figure 2B of the main text, the simulation with 7 strains illustrates the qualitative match between the expiring fitness and the SIR model. However, it is clearly not long enough to discuss statistical properties of the corresponding random walk. Furthermore, we do not expect the individual strain dynamics of the SIR and expiring fitness models to match. The latter depends on few parameters (𝛼, 𝑠0), while the former depends on the full state of the host population and of the previous variants.

      In the sectin linking the parameters of the two models, we now discuss the distribution 𝑃(𝛽) of the SIR model for two strains and a specific choice of distribution for the cross immunity 𝑏 and 𝑓.

      Minor comments:

      There is some back and forth in the writing. For instance, when introducing the model, 𝐶𝑖𝑗 is first defined as 1/ 𝑀, then a few paragraphs later the authors introduce that in another limit 𝐶𝑖𝑖 is just much higher than any 𝐶𝑖𝑗, and finally they specify that the former is the fast mixing scenario.

      Another example is in section 2, in the first paragraph they put forward that heterogeneity and crossimmunity have different impacts on the dynamics, but the meaning attributed to these different ingredients becomes clear only a while later after the homogeneous population analysis. Uniforming the writing would make it easier for the reader to follow the authors’ train of thought.

      We removed the paragraph below Equation (1) mentioning the 𝐶𝑖𝑗 \= 1/ 𝑀 case, which we hope will linearize the writing.

      When mentioning geographical structure, why would geography affect how immunity sees pairs of viral strains (differences in 𝐾)?

      Geographic structure could influence cross-immunity because of exposure histories of hosts. For instance in the case of influenza, different geographical regions do not have the same dominating strains in each season, and hosts from different regions may thus build up different immunity.

      In the current narrative there are some speculations about non-scalar fitness, especially in section 2. The heterogeneity in this section does not seem so strong to produce a disordered landscape that defies the notion of scalar fitness in the same way some complex ecological systems do. A more parsimonious explanation for the coexistence dynamics observed here may be a negative frequency dependent selection.

      Our language here was not very precise and we agree that the phenomenology we describe is related to that of frequency dependent selection (mediated by via immunity of the host population that integrates past frequencies). Traveling wave models typically use fitness function that are independent of the population distribution and only account for the evolution via an increasing average fitness. We have made discussion more accurate by stating that we consider a case where fitness depends explicitly on present and past population composition, which includes the case of negative frequency dependent selection.

      I don’t understand the comparison with genetic drift (typo here, draft) in the last paragraph of section 3 given that there is no stochasticity in growth death dynamics.

      We compare the random walk to genetic drift because of the expression of the second moment of the step size. The genetic draft has the same functional form. If one defines the effective population size as in the text, the drift due to random sampling of alleles (neutral drift) and the changes in strain frequency in our model have the same first and second moments. The stochasticity here does not come from the dynamics, which are indeed deterministic, but from the appearance of new mutations (variants) on backgrounds that are randomly sampled in the population. This latter property is shared with genetic draft.

      In the vanishing fitness model, I think the reader would benefit from having 𝑃(𝑠) in the main text, and it should be made more clear what simulations assume what different choice of 𝑃(𝑠).

      We added the expression of 𝑃(𝑠) in the main text. Simulations use the value 𝑠0 \= 0.03, which we added in the caption of Figure 4.

      When comparing the model and data, is the point that COVID is not reproduced due to clonal interference? It seems from the plot that flu has clonal interference as well though. Why is that negligible?

      A similar point has been raised by the first reviewer (see R1-(1)). Clonal interference is not negligible, but we find it to be insufficient to explain the observations made for H3N2 influenza, namely the lack of inertia of frequency trajectories or the probability of fixation. This is shown in the new section (B1) of the SI. Both SARS-CoV-2 and H3N2 influenza experience clonal interference, but the former is more predictable than the latter. Our point is that expiring fitness effects should be stronger in influenza because of the higher immune heterogeneity of the host population, making it less predictable than SARS-CoV-2.

      Does the fixation probability as a function of frequency threshold match the flu data for some parameters sets?

      For H3N2 influenza, the fixation probability is found to be equal to the threshold frequency (see Barrat-Charlaix MBE 2021, also indirectly visible from Fig. 3). In Figure 4, we obtain that either a high expiry rate or intermediate expiry rates and clonal interference regimes match this observation.

      It would be instructive to see examples of the individual variant dynamics of the vanishing fitness model compared to the presented data.

      We added an extra SI figure (S7) showing 10 randomly selected trajectories of individual variants in the case of H3N2/HA influenza and for the expiring fitness model with different parameter choices.

      Figure 4E has no colorbar label. The reader shouldn’t have to look for what that means in the bottom of the SIs. In panels A and B the label should be 𝜈, not 𝛼. Same thing in most equations of page 42.

      We added the colorbar label to the figure and also updated the caption: a darker color corresponds to a higher probability of sweeps to overlap. We fixed the 𝜈 – 𝛼 confusion in the SI and in the caption of the figure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This study presents an important finding on the involvement of a Caspase 3-dependent pathway in the elimination of synapses for retinogeniculate circuit refinement and eye-specific territory segregation. This work fits well with the concept of "synaptosis" which has been proposed in the past but lacked in vivo support. Despite its elegant design and many strengths, the evidence supporting the claims of the authors is incomplete, particularly regarding whether Caspase-3 expression can really be isolated to synapses vs locally dying cells, whether microglia direct or instruct synapse elimination, and whether astrocytes are also involved. The work will be of interest to investigators studying cell death pathways, neurodevelopment, and neurodegenerative disease.

      Regarding significance:

      This study provides in vivo evidence that caspase-3 is important for synapse elimination in the visual pathway (Figure 3 and 4) and corroborates the previously proposed but not yet validated “synaptosis” hypothesis. But more significantly, we show that caspase-3 is activated in dLGN relay neurons in response to synapse inactivation (Figure 1) when synaptic competition is present (Figure 2), and that caspase-3 is important for efficient elimination of weakened synapses by microglia (Figure 5 and 6). We consider the causal link between synapse weakening/inactivation and caspase-3 activation to be the most important finding of this study and believe it is an error to not include this aspect of the study in the assessment. The mechanism by which neuronal activity influences synapse elimination is a fundamental question in neuroscience, and our study presents a significant advancement in understanding this problem.

      Regarding strength of evidence:

      We do not agree with the assessment that our evidence should be broadly labeled as “incomplete”. In fact, we argue that many concerns raised by the reviewers are not focused on the main claims made in this study.

      (1) Regarding whether caspase-3 activation (not “expression”, which is the term used in the assessment) is isolated to synapses or occurs in entire cells, we show in Figure 1 that both types of signals can be present. The main concern of the reviewers seems to be that activated caspase-3 signals in apoptotic dLGN relay neurons are irrelevant to our analysis and confound interpretation. We argue that this is not the case.

      In Figure 1, we have two sets of controls demonstrating that the observed apoptosis of dLGN relay neurons occurs specifically in response to synapse inactivation. For each animal that received TeTxLC injection in the right eye, activated caspase-3 signal is compared between the left dLGN, where most of the inactivated synapses are located, and the right dLGN, where the minority of the inactivated synapses are located (between Figure 1B and 1C, also between the first and second group of Figure 1E). We observed apoptotic neurons in the right dLGN with more inactivated synapses but not in the left dLGN with fewer inactivated synapses. The second control is between TeTxLC-injected animals (Figure 1B) and mock-injected animals (Figure 1D). We observed apoptotic relay neurons in the dLGN of TeTxLC-injected animals (Figure 1B) but not mock-injected animals (Figure 1D). Both these controls show that the observed apoptosis of dLGN relay neurons is caused by synapse inactivation.

      In addition, in our synapse inactivation experiment (Figure 1), AAV-hSyn-TeTxLC is injected into the right eye and expressed only in RGCs, not in dLGN relay neurons. Since dLGN relay neurons in this experiment do not receive a perturbation that is independent of synaptic transmission, we conclude that their apoptosis occurs through synapse-dependent mechanisms.

      Furthermore, if the apoptotic neurons are confounding the analysis (as implied by reviewers and editors) and do not occur through synapse-dependent mechanisms, then inhibiting both eyes with TeTxLC (Figure 2C, rightmost group) should cause high levels of caspase-3 activation, like that in the single-inhibition condition. Instead, we observe the opposite (Figure 2C, middle group) – overall caspase-3 activity goes down significantly in the dual-inhibition condition and is closer to the unperturbed condition, which can be explained by a loss of interaction between “strong” and “weak” synapses. Taken together, our data demonstrate that apoptosis of relay neurons in Figure 1 occurs specifically in response to synapse inactivation through synapse-dependent mechanisms, and the activated caspase-3 signal in the neurons should be included in our analysis.

      Why does synaptic caspase-3 activation manifest in different forms: puncta, “blobs”, and cells?  This is not surprising when considering the mechanisms that neurons must utilize to spatially confine caspase-3 activation and the nature of the apoptotic signaling cascade. On one hand, it has been proposed that caspase-3 activity in dendrites can be locally confined by proteasomal degradation of cleaved caspase-3 (Erturk et al., DOI: 10.1523/JNEUROSCI.3121-13.2014 ). On the other hand, caspase-3 activation is known to trigger explosive feedback amplification of apoptotic signaling events (McComb et al., DOI: 10.1126/sciadv.aau9433 ). For caspase-3 activation to remain localized to dendrites, the negative regulation must outweigh the positive feedback amplification. By expressing TeTxLC in RGCs of one eye, we create a strong perturbation that silences a large fraction of the synapses in the retinogeniculate pathway, which likely shifts the balance between positive and negative regulation of caspase-3 activity in some relay neurons. To be more specific, if a given dLGN relay neuron receives too many inactivated synapses, which is likely the case in our perturbation, caspase-3 activity that is initially localized can overwhelm the physiological negative regulation mechanisms that act to spatially confine it, resulting in whole cell apoptosis. In fact, previous in vitro evidence (Enturk et al., DOI: 10.1523/JNEUROSCI.3121-13.2014 ) demonstrated that, while caspase-3 activation in a single distal dendrite can be locally contained, activating apoptosis signaling in dendrites proximal to the cell body can result in whole-cell apoptosis. Similarly, a few inactivated retinogeniculate synapses can elicit locally contained caspase-3 activity in dLGN relay neurons, but a large number of inactivated synapses on a single relay neuron may trigger sufficient caspase-3 activity that can lead to whole-cell apoptosis. We discussed how to interpret synapse inactivation-induced apoptosis in dLGN relay neurons both in the main text and in the discussion (line 123-132, and line 411-421).

      (2) Regarding microglia, we did not claim that “microglia direct or instruct synapse elimination”. Our main claim is that caspase-3 activation is important for efficient elimination of weakened synapses by microglia. This claim emphasizes a regulatory role for caspase-3 activation in microglia-mediated synapse elimination, but not a regulatory role of microglia in synapse elimination. To be more specific, our data suggest that lack of synaptic activity induces caspase-3 activity, and caspase-3 activity in turn influences which synapses are preferentially eliminated by microglia. Therefore, the elimination specificity is fundamentally determined (i.e. instructed) by neuronal activity, not by microglia. We also did not presume the manner in which microglia engage in synapse elimination. We specifically address this point in the discussion at line 458 through 465 where we acknowledge that microglia may indirectly mediate synapse elimination by engulfing shed neuronal material. In our title and text, we use the phrase “microglia-mediated synapse elimination”, which is not the same as microglia-instructed synapse elimination and does not presume any instructive/directive role of microglia.

      (3) Regarding whether astrocytes are involved, we did not challenge the notion that astrocytes play important roles in synapse elimination. Rather, our claim is that, unlike what we observed with microglia, the amount of synaptic material engulfed by astrocytes does not robustly depend on whether caspase-3 is present. We acknowledge that there might be a caspase-3 dependent phenotype that we were unable to detect (line 309-310), and that it is plausible that astrocytes mediate activity-dependent synapse elimination through other caspase-3-independent mechanisms. This claim is not central to our study, and we would like to qualify the statements in the manuscript. We will remove the phrase “but not astrocytes” in line 18 of the abstract.

      In summary, using a state-of-the-art method to inactivate retinogeniculate synapses, we discovered a causal link between synapse weakening/inactivation and caspase-3 activation. Coupled with well-established in vivo assays (e.g., segregation analysis, electrophysiology, and engulfment analysis) that are used in many landmark studies we cite, we provide solid evidence supporting our claim that “caspase-3 is essential for synapse elimination driven by both spontaneous and experience-dependent neural activity”, and that “synapse weakening-induced caspase-3 activation determines the specificity of synapse elimination mediated by microglia”.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In this manuscript, the authors study the effects of synaptic activity on the process of eye-specific segregation, focusing on the role of caspase 3, classically associated with apoptosis. The method for synaptic silencing is elegant and requires intrauterine injection of a tetanus toxin light chain into the eye. The authors report that this silencing leads to increased caspase 3 in the contralateral eye (Figure 1) and demonstrate evidence of punctate caspase 3 that does not overlap neuronal markers like map2. However, the quantifications showing increased caspase 3 in the silenced eye (done at P5) are complicated by overlap with the signal from entire dying cells in the thalamus. The authors also show that global caspase 3 deficiency impairs the process of eye-specific segregation and circuit refinement (Figures 3-4). 

      The reviewer states: “this silencing leads to increased caspase 3 in the contralateral eye”. We observed increased caspase-3 activity, not protein levels, in the contralateral dLGN, not eye.

      The reviewer states: “and demonstrate evidence of punctate caspase 3 that does not overlap neuronal markers like map2”. This is not accurate. We show that the punctate active caspase-3 signals overlap with the dendritic marker MAP2 (Figure S4A).

      The reviewer states: “, the quantifications showing increased caspase 3 [activity] in the silenced [dLGN] (done at P5) are complicated by overlap with the signal from entire dying cells in the thalamus”. This is not accurate. The apoptotic neurons we observed are relay neurons located in the dLGN (confirmed by their morphology and positive staining of NeuN – Figure S4B-C), not “cells” of unknown lineage (as suggested by the reviewer) in the general “thalamus” area (as suggested by the reviewer). If the dying cells were non-neuronal cells, that would indeed confound our quantification and conclusions, but that is not the case.

      We argue that the active caspase-3 signals in apoptotic dLGN relay neurons are not a confounding factor but a bona fide response to synaptic silencing and therefore should be included in the quantification. We have two sets of controls (please also see the general response above), one is between the strongly inactivated dLGN and the weakly inactivated dLGN in each TeTxLC-injected animal, second is between dLGN of TeTxLC-injected animals and mock-injected animals. In both controls, only the dLGN receiving strong synapse inactivation has these apoptotic dLGN relay neurons, demonstrating that these cells occur as a consequence of synapse inactivation. It is also unlikely that our perturbation is causing cell death through a non-synaptic mechanism. As mock injections do not cause apoptosis in dLGN neurons, this phenomenon is not related to surgical damage. TeTxLC is injected into the eyes and only expressed in presynaptic RGCs, not in postsynaptic relay neurons, so this phenomenon is also unlikely to be caused by TeTxLC-related toxicity. Furthermore, if apoptosis of dLGN relay neurons is not related to synapse inactivation, then when TeTxLC is injected into both eyes, one would expect to see either the same amount or more apoptotic relay neurons, but we instead observed a reduction in dLGN neuron apoptosis, suggesting a synapse-related mechanism must be responsible. Considering the above, apoptosis of relay neurons in TeTxLC-inactivated dLGN is causally linked to synapse inactivation, and active caspase-3 signals in these neurons are true signals that should be included in the quantification.

      The authors also report that "synapse weakening-induced caspase-3 activation determines the specificity of synapse elimination mediated by microglia but not astrocytes" (abstract). They report that microglia engulf fewer RGC axon terminals in caspase 3 deficient animals (Figure 5), and that this preferentially occurs in silenced terminals, but this preferential effect is lost in caspase 3 knockouts. Based on this, the authors conclude that caspase 3 directs microglia to eliminate weaker synapses. However, a much simpler and critical experiment that the authors did not perform is to eliminate microglia and show that the caspase 3 dependent effects go away. Without this experiment, there is no reason to assume that microglia are directing synaptic elimination. 

      The reviewer states: “microglia engulf fewer RGC axon terminals in caspase 3 deficient animals (Figure 5), and that this preferentially occurs in silenced terminals, but this preferential effect is lost in caspase 3 knockouts”. We are not sure what the reviewer means by “this preferentially occurs in silenced terminals”. Our results show that microglia preferentially engulf silenced terminals, and such preference is lost in caspase-3 deficient mice (Figure 6).

      We do not understand the experiment where the reviewer suggested to: “eliminate microglia and show that the caspase 3 dependent effects go away”. To quantify caspase-3 dependent engulfment of synaptic material by microglia or preferential engulfment of silenced terminals by microglia, microglia must be present in the tissue sample. If we eliminate microglia, neither of these measurements can be made. What could be measured if microglia are eliminated is the refinement of retinogeniculate pathway. This experiment would test whether microglia are required for caspase-3 dependent phenotypes. This is not a claim made in the manuscript. Instead, we claimed caspase-3 is required for microglia to preferentially eliminate weak synapses.

      We did not claim that “microglia are directing synaptic elimination”. Our claim is that synapse inactivation induces caspase-3 activity, and this caspase-3 activity in turn determines the substrate preference of microglia-mediated synapse elimination. Based on this model, it is the neuronal activity that fundamentally directs synapse elimination. Throughout the manuscript, we used the term “microglia-mediated synapse elimination”. This terminology does not assume a directive/instructive role of microglia in synapse elimination and only describes the observed engulfment of synaptic material by microglia. We also did not assume how microglia engage in synapse elimination. We acknowledge in the discussion (line 458 through 465) that microglia may mediate synapse elimination in an indirect, passive way by engulfing shed neuronal material. This topic is a matter of debate in the field (Eyo et al., DOI: 10.1126/science.adh7906 ).

      Finally, the authors also report that caspase 3 deficiency alters synapse loss in 6-month-old female APP/PS1 mice, but this is not really related to the rest of the paper. 

      We respectfully disagree that Figure 7 is not related to the rest of the paper. Many genes involved in postnatal synapse elimination, such as C1q and C3, have been implicated in neurodegeneration. It is therefore natural and important to ask whether the function of caspase-3 in regulating synaptic homeostasis extends to neurodegenerative diseases in adult animals. The answer to this question may have broad therapeutic impacts.

      Reviewer #2 (Public Review): 

      Summary: 

      This manuscript by Yu et al. demonstrates that activation of caspase-3 is essential for synapse elimination by microglia, but not by astrocytes. This study also reveals that caspase 3 activation-mediated synapse elimination is required for retinogeniculate circuit refinement and eye-specific territories segregation in dLGN in an activity-dependent manner. Inhibition of synaptic activity increases caspase-3 activation and microglial phagocytosis, while caspase-3 deficiency blocks microglia-mediated synapse elimination and circuit refinement in the dLGN. The authors further demonstrate that caspase-3 activation mediates synapse loss in AD, loss of caspase-3 prevented synapse loss in AD mice. Overall, this study reveals that caspase-3 activation is an important mechanism underlying the selectivity of microglia-mediated synapse elimination during brain development and in neurodegenerative diseases. 

      Strengths: 

      A previous study (Gyorffy B. et al., PNSA 2018) has shown that caspase-3 signal correlates with C1q tagging of synapses (mostly using in vitro approaches), which suggests that caspase-3 would be an underlying mechanism of microglial selection of synapses for removal. The current study provides direct in vivo evidence demonstrating that caspase-3 activation is essential for microglial elimination of synapses in both brain development and neurodegeneration. 

      The paper is well-organized and easy to read. The schematic drawings are helpful for understanding the experimental designs and purposes. 

      Weaknesses: 

      It seems that astrocytes contain large amounts of engulfed materials from ipsilateral and contralateral axon terminals (Figure S11B) and that caspase-3 deficiency also decreased the volume of engulfed materials by astrocytes (Figures S11C, D). So the possibility that astrocyte-mediated synapse elimination contributes to circuit refinement in dLGN cannot be excluded.

      The experiments presented in Figure S11 aim to determine whether astrocyte-mediated synapse elimination depends on caspase- 3 signaling.  We do not claim that astrocytes are unimportant for synapse elimination or circuit refinement. We did observe a small decrease in synaptic material engulfed by astrocytes when caspase-3 is deficient, and we acknowledged that there could be defects that we were not able to detect (line 309-310). The claim that caspase-3 does not regulate astrocyte-mediated synapse elimination is not a central claim of the manuscript and we will qualify our statements in the text. We will remove the phrase “but not astrocytes” in the abstract (line 18).

      Does blocking single or dual inactivation of synapse activity (using TeTxLC) increase microglial or astrocytic engulfment of synaptic materials (of one or both sides) in dLGN? 

      We assume that by “blocking single or dual inactivation of synapse activity”, the reviewer refers to inactivating retinogeniculate synapses from one or both eyes.

      We showed that inactivating retinogeniculate synapses from one eye (single inactivation) increases microglia-mediated engulfment of presynaptic terminals of inactivated synapses (Figure 6). We did not measure microglia-mediated engulfment of synaptic material while inactivating retinogeniculate synapses from both eyes (dual inactivation). However, based on the total active caspase-3 signal (Figure 2) in the dual inactivation scenario, we do not expect to see an increase in engulfment of synaptic material.

      We did not measure astrocyte-mediated engulfment with single or dual inactivation, as we did not see a robust caspase-3 dependent phenotype in astrocyte-mediated engulfment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      Summary:

      Bursicon is a key hormone regulating cuticle tanning in insects. While the molecular mechanisms of its function are rather well studied--especially in the model insect Drosophila melanogaster, its effects and functions in different tissues are less well understood. Here, the authors show that bursicon and its receptor play a role in regulating aspects of the seasonal polyphenism of Cacopsylla chinensis. They found that low temperature treatment activated the bursicon signaling pathway during the transition from summer form to winter form and affect cuticle pigment and chitin content, and cuticle thickness. In addition, the authors show that miR-6012 targets the bursicon receptor, CcBurs-R, thereby modulating the function of bursicon signaling pathway in the seasonal polyphenism of C. chinensis. This discovery expands our knowledge of the roles of neuropeptide bursicon action in arthropod biology.

      However, the study falls short of its claim that it reveals the molecular mechanisms of a seasonal polyphenism. While cuticle tanning is an important part of the pear psyllid polyphenism, it is not the equivalent of it. First, there are other traits that distinguish between the two morphs, such as ovarian diapause (Oldfield, 1970), and the role of bursicon signaling in regulating these aspects of polyphenism were not measured. Thus, the phenotype in pear psyllids, whereby knockdown bursicon reduces cuticle tanning seems to simply demonstrate the phenotypes of Drosophila mutants for bursicon receptor (Loveall and Deitcher, 2010, BMC Dev Biol) in another species (Fig. 2I, 4H). Second, the study fails to address the threshold nature of cuticular tanning in this species, although it is the threshold response (specifically, to temperature and photoperiod) that distinguishes this trait as a part of a polyphenism. Whereas miR-6012 was found to regulate bursicon expression, there no evidence is provided that this microRNA either responds to or initiates a threshold response to temperature. In principle, miR-6012 could regulate bursicon whether or not it is part of a polyphenism. Thus, the impact of this work would be significantly increased if it could distinguish between seasonal changes of the cuticle and a bona fide reflection of polyphenism.

      Thanks for your valuable suggestion. We concur with the review’s comment that cuticle tanning does not equate to the C. chinensis polyphenism. To better reflect the core focus of our research, we have revised the title to "Neuropeptide Bursicon and its receptor mediated the transition from summer-form to winter-form of Cacopsylla chinensis".

      In response to the reviewer's inquiry regarding the threshold nature of cuticular tanning in C. chinensis, we have included a detailed analysis of the phenotypic changes (including nymph phenotypes, cuticle pigment absorbance, and cuticle thickness) during the transition from summer-form to winter-form in C. chinensis at distinct time intervals (3, 6, 9, 12, 15 days) under different temperature conditions (10°C and 25°C). As shown in Figure S1, nymphs exhibit a light yellow and transparent coloration at 3, 6, and 9 days, while nymphs at 12 and 15 days display shades of yellow-green or blue-yellow under 25°C conditions. At 10°C conditions, the abdomen end turns black at 3, 6, and 9 days. By the 12 days, numerous light black stripes appear on the chest and abdomen of nymphs at 10°C. At 15 days, nymphs exhibit an overall black-brown appearance, featuring dark brown stripes on the left and right sides of each chest and abdominal section. Furthermore, the end of the abdomen and back display a large black-brown coloration at 10°C (Figure S1A). The UV absorbance of the total pigment extraction at a 300 nm wavelength markedly increases following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1B). Cuticle thicknesses also increased following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1C). The detailed results (L122-143), materials and methods (L647-652), and discussion (L319-322) have been added in our revised manuscript.

      Regarding the response of miR-6012 to temperature, we have already determined its expression at 3, 6, 10 days under different temperatures in the previous Figure 5E. We now included additional time intervals (9, 12, 15 days) in the updated Figure 5E. Our results indicate a significant decrease in the expression levels of miR-6012 after 10°C treatment for 3, 6, 9, 12, 15 days compared to the 25°C treatment group. Detailed information regarding this has been integrated into the Materials and Methods (Line 608-610) of our revised manuscript.

      Strengths:

      This study convincingly identifies homologs of the genes encoding the bursicon subunits and its receptor, showing an alignment with those of another psyllid as well as more distant species. It also demonstrates that the stage- and tissue-specific levels of bursicon follow the expected patterns, as informed by other insect models, thus validating the identity of these genes in this species. They provide strong evidence that the expression of bursicon and its receptor depend on temperature, thereby showing that this trait is regulated through both parts of the signaling mechanism.

      Several parallel measurements of the phenotype were performed to show the effects of this hormone, its receptor, and an upstream regulator (miR-6012), on cuticle deposition and pigmentation (if not polyphenism per se, as claimed). Specifically, chitin staining and TEM of the cuticle qualitatively show difference between controls and knockdowns, and this is supported by some statistical tests of quantitative measurements (although see comments below). Thus, this study provides strong evidence that bursicon and its receptor play an important role in cuticle deposition and pigmentation in this psyllid.

      The study identified four miRNAs which might affect bursicon due to sequence motifs. By manipulating levels of synthetic miRNA agonists, the study successfully identified one of them (miR-6012) to cause a cuticle phenotype. Moreover, this miRNA was localized (by FISH) to the cuticle, body-wide. To our knowledge, this is the first demonstrated function for this miRNA, and this study provides a good example of using a gene of known function as an entry point to discovering others influencing a trait. Thus, this finding reveals another level of regulation of cuticle formation in insects.

      Weaknesses:

      (1) The introduction to this manuscript does not accurately reflect progress in the field of mechanisms underlying polyphenism (e.g., line 60). There are several models for polyphenism that have been used to uncover molecular mechanisms in at least some detail, and this includes seasonal polyphenisms in Hemiptera. Therefore, the justification for this study cannot be predicated on a lack of knowledge, nor is the present study original or unique in this line of research (e.g., as reviewed by Zhang et al. 2019; DOI: 10.1146/annurev-ento-011118-112448). The authors are apparently aware of this, because they even provide other examples (lines 104-108); thus the introduction seems misleading as framed.

      Thanks for your excellent suggestion. We have added the paper of Zhang et al. 2019 which recommended by reviewer (DOI: 10.1146/annurev-ento-011118-112448) in Line 57 of our revised manuscript. The statement has been revised to “However, the specific molecular mechanism underling temperature-dependent polyphenism still require further clarification” in Line 60-61 of our revised manuscript.

      (2) The data in Figure 2H show "percent of transition." However, the images in 2I show insects with tanned cuticle (control) vs. those without (knockdown). Yet, based on the description of the Methods provided, there appears to be no distinction between "percent of transition" and "percent with tanning defects". This an important distinction to make if the authors are going to interpret cuticle defects as a defect in the polyphenism. Furthermore, there is no mention of intermediate phenotypes. The data in 2H are binned as either present or absent, and these are the phenotypes shown in 2I. Was the phenotype really an all-or-nothing response? Instead of binning, which masks any quantitative differences in the tanning phenotypes, the authors should objectively quantify the degree of tanning and plot that. This would show if and to what degree intermediate tanning phenotypes occurred, which would test how bursicon affects the threshold response. This comment also applies to the data in Figures 4G and 6G. Since cuticle tanning is present in more insect than just those with seasonal polyphenism, showing how this responds as a threshold is needed to make claims about polyphenism.

      We appreciate your insightful comments. As shown in Figure 1 of our published paper (Zhang et al., 2013; doi.org/10.7554/eLife.88744.3) and Figure 2C-2I of the current manuscript, the transition from summer-form to winter-form entails not only external cuticular tanning but also alterations in internal cuticular chitin levels and cuticle thickness. While external cuticular tanning serves as a prominent and easily observable indicator of this transition, it is crucial to acknowledge that internal changes also play a significant role and should be taken into consideration. Therefore, we propose that the term "percent of transition" may be more suitable than "percent with tanning defects" to describe this process accurately.

      In order to provide a more visually comprehensive understanding of the phenotypic changes during the transition from summer-form to winter-form, we have included images at different time points (3, 6, 9, 12, 15 days) under different temperature conditions in Figure S1A of our revised manuscript. Specifically, under the 10°C condition, nymphs exhibit abdomen tanning after 6 and 9 days of treatment, while the thorax remains untanned. By days 12 to 15, both the abdomen and thorax of the nymphs show tanning, resulting in the majority of summer-form nymphs transitioning into winter-form, as depicted in Figure 2I for comparison. This observation indicates the presence of a critical threshold for cuticle tanning of C. chinensis following exposure to 10°C. Nymphs that did not undergo the transition to winter-form succumbed to the cold, highlighting the absence of intermediate phenotypes at 12-15 days under the 10°C condition. The UV absorbance of the total pigment extraction at a 300 nm wavelength markedly increases following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1B). Additionally, cuticle thickness shows an increase following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1C). These results highlight the relationship between the threshold of cuticular tanning and the transition process. The detailed description and information have been added in Results (L122-143), Materials and Methods (L647-652), and Discussion (L319-322) of our manuscript.

      (3) This study also does not test the threshold response of cuticle phenotypes to levels of bursicon, its receptor, or miR-6012. Hormone thresholds are the most widespread and, in most systems where polyphenism has been studied, the defining characteristic of a polyphenism (e.g., Nijhout, 2003, Evol Dev). Quantitative (not binned) measurements of a polyphenism marker (e.g., chitin) should be demonstrated to result as a threshold titer (or in the case of the receptor, expression level) to distinguish defects in polyphenism from those of its component trait.

      Thanks for your valuable feedback. We have supplemented additional data on the phenotypes (Figure S1A), cuticle pigment absorbance (Figure S1B), cuticle thickness (Figure S1C), expression levels of bursicon (Figure 1E and 1F), its receptors (Figure 3G), and miR-6012 (Figure 5E) corresponding to nymphs treated over different time periods (3, 6, 9, 12, 15 days) under both 10°C and 25°C conditions in our revised manuscript.

      While all these identified markers exhibit a strong correlation with the transition from summer-form to winter-form, it is important to note that they are not suitable as definitive thresholds due to the nature of relative gene expression quantification and chitin content assessment, rather than absolute quantitation. Further, given that tanning hormones are neuropeptides present in trace amounts in insects, unlike steroid hormones, determining their titers poses a considerable challenge.

      (4) Cuticle issue:

      (a) Unlike Fig. 6D and F, Figs. 2D and F do not correspond to each other. Especially the lack and reduction of chitin in ds-a+b! By fluorescence microscopy there is hardly any signal, whereas by TEM there is a decent cuticle. Additionally, the dsGFP control cuticle in 2D is cut obliquely with a thick and a thin chitin layer. This is misleading.

      Thanks for your insightful feedback. We have replaced the previous WGA chitin staining images in the dsCcbursα+β treatment of Figure 2D with new representative images aligning with Figure 2F. Furthermore, the presence of both thin and thick chitin layers observed in the dsEGFP treatment of Figure 2D could potentially be ascribed to the chitin content in the insect midgut or fat body as previously discussed (Zhu et al., 2016). It is notable that during the process of cuticle staining, the chitin located in the midgut and fat body of C. chinensis may exhibit green fluorescence, leading to the appearance of a thin chitin layer. A detailed analysis and elucidation of these observations have been added in the discussion section (Lines 347-352) of our revised manuscript.

      Zhu KY, Merzendorfer H, Zhang W, Zhang J, Muthukrishnan S. Biosynthesis, Turnover, and Functions of Chitin in Insects. Annu Rev Entomol. 2016;61:177-196. doi:10.1146/annurev-ento-010715-023933.

      (b) In Figs. 2F and 4F, the endocuticle appears to be missing, a portion of the procuticle that is produced post-molting. As tanning is also occurring post-molting, there seems to be a general problem with cuticle differentiation at this time point. This may be a timing issue. Please clarify.

      Thank you for your suggestion. The insect cuticle typically comprises three distinct layers (endocuticle, exocuticle, and epicuticle), with the thickness of each layer varying among different insect species. Cuticle differentiation is closely linked to the molting cycle of insects (Mrak et al., 2017). In our study, nymphal cuticles exhibited normal differentiation patterns, characterized by a thin epicuticle and comparable widths of the endocuticle and exocuticle following dsEGFP treatment, as illustrated in Figure 2F and 4F. Conversely, nymphs treated with dsCcBurs-α, dsCcBurs-β, and dsCcburs-R displayed impaired development, manifesting only the exocuticle without a discernible endocuticle layer. These findings suggest that bursicon genes and their receptor play a pivotal role in regulating insect cuticle development (Costa et al., 2016). We have added some discussion about these results in Lines 356-367 of our revised manuscript.

      Mrak, P., Bogataj, U., Štrus, J., & Žnidaršič, N. (2017). Cuticle morphogenesis in crustacean embryonic and postembryonic stages. Arthropod structure & development, 46(1), 77–95. https://doi.org/10.1016/j.asd.2016.11.001

      Costa, C. P., Elias-Neto, M., Falcon, T., Dallacqua, R. P., Martins, J. R., & Bitondi, M. (2016). RNAi-mediated functional analysis of Bursicon genes related to adult cuticle formation and tanning in the Honeybee, Apis mellifera. PloS one, 11(12), e0167421. https://doi.org/10.1371/journal.pone.0167421

      (c) To provide background information, it would be useful analyze cuticle formation in the summer and winter morphs of controls separately by light and electron microscopy. More baseline data on these two morphs is needed.

      Thanks for your valuable feedback. To provide more background information about cuticle formation, we supplied the results of nymph phenotypes, cuticle pigment absorbance, and cuticle thickness at distinct time intervals (3, 6, 9, 12, 15 days) under different temperatures of 10°C and 25°C in Figure S1 of our revised manuscript. Hope these results can help better understand the baseline data on these two morphs.

      (d) For the TEM study, it is not clear whether the same part of the insect's thorax is being sectioned each time, or if that matters. There is not an obvious difference in the number of cuticular layers, but only the relative widths of those layers, so it is difficult to know how comparable those images are. This raises two questions that the authors should clarify. First, is it possible that certain parts of the thoracic cuticle, such as those closer to the intersegmental membrane, are naturally thinner than other parts of the body? Second, is the tanning phenotype based on the thickness or on the number of chitin layers, or both? The data shown later in Figure 4I, J convincingly shows that the biosynthesis pathway for chitin is repressed, but any clarification of what this might mean for deposition of chitin would help to understand the phenotypes reported. Also, more details on how the data in Fig. 2G were collected would be helpful. This also goes for the data in Fig. 4 (bursicon receptor knockdowns).

      Thanks for your great comment. The TEM investigation adhered to a standardized protocol was used as previous description (Zhang et al., 2023), Initially, insect heads were uniformly excised and then fixed in 4% paraformaldehyde. Subsequently, a consistent cutting and staining procedure was executed at a uniform distance above the insect's thorax. The dorsal region of the thorax was specifically chosen for subsequent fluorescence imaging or transmission electron microscopy assessments with the specific objective of quantifying cuticle thickness. Regarding the measurement of cuticle thickness, use the built-in measuring ruler on the software to select the top and bottom of the same horizontal line on the cuticle. Measure the cuticle of each nymph at two close locations. Six nymphs were used for each sample. Randomly select 9 values and plot them. The related description has been added in the Materials and Methods (Line 660-668) of our revised manuscript.

      Zhang, S.D., Li, J.Y., Zhang, D.Y., Zhang, Z.X., Meng, S.L., Li, Z., & Liu, X.X. (2023). MiR-252 targeting temperature receptor CcTRPM to mediate the transition from summer-form to winter-form of Cacopsylla chinensis. eLife, 12. https://doi.org/10.7554/eLife.88744

      (5) Tissue issue:

      The timed experiments shown in all figures were done in whole animals. However, we know from Drosophila that Bursicon activity is complex in different tissues. There is, thus, the possibility, that the effects detected on different days in whole animals are misleading because different tissues--especially the brain and the epidermis, may respond differentially to the challenge and mask each other's responses. The animal is small, so the extraction from single tissue may be difficult. However, this important issue needs to be addressed.

      Thanks for your excellent suggestion. We express our heartfelt appreciation to the reviewer for their valuable input regarding the challenges involved in dissecting various tissue sections from the diminutive early instar nymphs of C. chinensis. In light of the metamorphic transition of C. chinensis across developmental stages, this study concentrated on examining the extensive phenotypic alterations. Consequently, intact samples of C. chinensis were specifically chosen for for qPCR analysis. The related descriptions have been added in the Materials and Methods (Line 513, 517, 553, 555, and 613) and Discussion (Line 327-329) of our revised manuscript.

      (6) No specific information is provided regarding the procedure followed for the rescue experiments with burs-α and burs-β (How were they done? Which concentrations were applied? What were the effects?). These important details should appear in the Materials and Methods and the Results sections.

      Thanks for your excellent suggestion. For the rescue experiments, the dsRNA of CcBurs-R and proteins of burs α-α, burs β-β homodimers, or burs α-β heterodimer (200 ng/μL) were fed together. The concentration of heterodimer protein of CcBurs-α+β was 200 ng/μL. The heterodimer protein of CcBurs-α+β fully rescued the effect of RNAi-mediated knockdown on CcBurs-R expression, while α+α or β+β homodimers did not (Figure 3F). Feeding the α+β heterodimer protein fully rescued the defect in the transition percent and morphological phenotype after CcBurs-R knockdown (Figure 4G-4H). We have added the detailed methods of rescued experiments and specific concentrations in the Materials and Methods (Line 561-563), and Results (Line 263) of our revised manuscript.

      (7) Pigmentation

      (a) The protocol used to assess pigmentation needs to be validated. In particular, the following details are needed: Were all pigments extracted? Were pigments modified during extraction? Were the values measured consistent with values obtained, for instance, by light microscopy (which should be done)?

      Thanks for your excellent comment. Our protocol for pigment extracted as detailed in Bombyx mori, the cuticles were pulverized in liquid nitrogen and then dissolved in 30 milliliters of acidified methanol (Futahashi et al., 2012; Osanai-Futahashi et al., 2012). Thus, all cuticle pigments were dissected and treated with acidified methanol. Pigments were not modified during extraction.. The details description have been integrated into the Materials and Methods (Line 630-633) of our revised manuscript.

      Futahashi, R., Kurita, R., Mano, H., & Fukatsu, T. (2012). Redox alters yellow dragonflies into red. Proceedings of the National Academy of Sciences of the United States of America, 109(31), 12626–12631. https://doi.org/10.1073/pnas.1207114109

      Osanai-Futahashi, M., Tatematsu, K. I., Yamamoto, K., Narukawa, J., Uchino, K., Kayukawa, T., Shinoda, T., Banno, Y., Tamura, T., & Sezutsu, H. (2012). Identification of the Bombyx red egg gene reveals involvement of a novel transporter family gene in late steps of the insect ommochrome biosynthesis pathway. The Journal of biological chemistry, 287(21), 17706–17714. https://doi.org/10.1074/jbc.M111.321331

      (b) In addition, pigmentation occurs post-molting; thus, the results could reflect indirect actions of bursicon signaling on pigmentation. The levels of expression of downstream pigmentation genes (ebony, lactase, etc) should be measured and compared in molting summer vs. winter morphs.

      Thanks for your valuable suggestion. Actually, we already studied the function of some downstream pigmentation genes, including ebony, Lactase, Tyrosine hydroxylase, Dopa decarboxylase, and Acetyltransferase. The variations in the expression patterns of these genes are closely tied to the molting dynamics of nymphs undergoing transitions between summer-form and winter-form. These findings will put in another manuscript currently being prepared for submission, thus detailed outcomes are not suitable for inclusion in the current manuscript.

      (8) L236: "while the heterodimer protein of CcBurs α+β could fully rescue the effect of CcBurs-R knockdown on the transition percent (Figure 4G 4H)". This result seems contradictory. If CcBurs-R is the receptor of bursicon, the heterodimer protein of CcBurs α+β should not be able to rescue the effect of CcBurs-R knockdown insects. How can a neuropeptide protein rescue the effect when its receptor is not there! If these results are valid, then the CcBurs-R would not be the (sole) receptor for CcBurs α+β heterodimer. This is a critical issue for this manuscript and needs to be addressed (also in L337 in Discussion).

      Thanks for your insightful suggestion. Following the administration of dsCcBur-R to C. chinensis, the expression of CcBurs-R exhibited a reduction of approximately 66-82% as depicted in Figure 4A, rather than complete suppression. Activation of endogenous CcBurs-R through feeding of the α+β heterodimer protein results in an increase in CcBurs-R expression, with the effectiveness of the rescue effect contingent upon the dosage of the α+β heterodimer protein. Consequently, the capacity of the α+β heterodimer protein to effectively mitigate the impacts of CcBurs-R knockdown on the conversion rate is clearly demonstrated. We have added additional discussion in Line 396-403 of our revised manuscript.

      (9) Fig. 5D needs improvement (the magnification is poor) and further explanation and discussion. mi6012 and CcBurs-R seem to be expressed in complementary tissues--do we see internal tissues also (see problem under point 2)? Again, the magnification is not high enough to understand and appreciate the relationships discussed.

      Thanks for your valuable suggestion. In order to enhance the resolution of the magnified images, we conducted FISH co-localization of miR-6012 and CcBurs-R in 3rd instar nymphs and obtained detailed zoomed-in images. As shown in the magnified view of Figure 5D, miR-6012 and CcBurs-R appear to exhibit complementary expression patterns in tissues. During the FISH assays, epidermis transparency of C. chinensis was achieved via decolorization treatment. Noteworthy observations from Figure 3G and Figure 5E reveal an inverse correlation in the expression profiles of CcBurs-R and miR-6012. Consequently, the FISH results distinctly highlight a significant disparity in the expression levels of CcBurs-R and miR-6012 within the same tissue. We have added related explanation and discussion in Line 291-293 of our revised manuscript.

      (10) The schematic in Fig. 7 is a useful summary, but there is a part of the logic that is unsupported by the data, specifically in terms of environmental influence on cuticle formation (i.e., plasticity). What is the evidence that lower temperatures influence expression of miR-6012? The study measures its expression over life stages, whether with an agonist or not, over a single temperature. Measuring levels of expression under summer form-inducing temperature is necessary to test the dependence of miR-6012 expression on temperature. Otherwise, this result cannot be interpreted as polyphenism control, but rather the control of a specific trait.

      Thanks for your great suggestion. We actually conducted the assessment of miR-6012 expression at specific time intervals (3, 6, 9, 12, 15 days) under different temperatures of 10°C and 25°C. As depicted in Figure 5E, the expression levels of miR-6012 were notably reduced at 10°C compared to 25°C. Additionally, the evaluation of agomir-6012 expression level of C. chinensis under 25°C conditions at various time points (3, 6, 9, 12, 15 days) revealed no significant changes. Hence, we suggest that the impact of miR-6012 on the seasonal morphological transition is influenced upon temperature.

      Recommendations for the authors:

      The authors report a novel role of Bursicon and its receptor in regulating the seasonal polyphenism of Cacopsylla chinensis. They found that low temperature treatment (10°C) activated the Bursicon signaling pathway during the transition from summer-form to winter-form, which influences cuticle pigment content, cuticle chitin content, and cuticle thickness. Moreover, the authors identified miR-6012 and show that it targets CcBurs-R, thereby modulating the function of Bursicon signaling pathway in the seasonal polyphenism of C. chinensis. This discovery expands our knowledge of multiple roles of neuropeptide bursicon action in arthropod biology. However, the m

      anuscript does have several major weaknesses, described under "Public review", which the authors need to address.

      Major issues:

      (1) L152-154 Fig S2E and S2F: Bursicon has been shown to be expressed in the CNS in a specific set of neurons. For example, In the larval CNS of Manduca sexta, bursicon expression is restricted to the subesophageal ganglion (SG), thoracic ganglia, and first abdominal ganglion. Pharate pupae and pharate adults show expression of this heterodimer in all ganglia. In Drosophila larvae, expression of a bursicon heterodimer is confined to abdominal ganglia. The additional neurons in the ventral nerve cord express only burs. In pharate adults, bursicon is produced by neurons in the SG and abdominal ganglia. I am wondering where bursicon subunits are expressed in the C. chinensis CNS? Since the authors have the antibodies, it would be useful to include immunocytochemical staining of bursicon alpha and beta in the CNS. The qPCR results from head or other tissues (Fig S2E and S2F) is not the most informative way to document localization of gene expression. Regarding the qPCR results, they show that the cuticle and the fat body express CcBurs-α and CcBurs-β. Can the authors confirm this unexpected results independently?

      Thanks for your insightful comment. In this study, we did not directly used antibodies targeting bursicon subunits, instead, the bursicon subunits along with a histidine tag were integrated into the expression vector pcDNA3.1 using homologous recombination. The experimental procedures were executed as follows: initially, the histidine tag was fused to the pcDNA3.1-mCherry vector through homologous recombination to generate the recombinant plasmid pcDNA3.1-his-mCherry. Subsequently, the amino acid sequences of the two bursicon subunits were introduced into the pcDNA3.1-his-mCherry vector via homologous recombination to produce the recombinant plasmids pcDNA3.1-CcBurs-α-his-mCherry and pcDNA3.1-CcBurs-β-his-mCherry. Finally, the P2A sequence was incorporated into the vector using reverse PCR to yield the recombinant plasmids pcDNA3.1-CcBurs-α-his-P2A-mCherry and pcDNA3.1-CcBurs-β-his-P2A-mCherry. Consequently, the bursicon subunits, along with the histidine tag, were capable of generating fusion proteins with the histidine tag. Western blot analysis was conducted using antibodies targeting the histidine tag, enabling the detection of histidine expression, which corresponds to the expression of the bursicon subunits. However, they are not suitable to conduct the in vivo immunocytochemical staining of bursicon alpha and beta in the CNS.

      Due to the diminutive size of the C. chinensis nymphs, dissection of the central nervous system (CNS) was unfeasible, precluding specific assessment of bursicon expression in the CNS. Prior literature has documented the expression of bursicon subunits in the epidermis and fat body of C. chinensis. Studies suggest that bursicon subunits not only play a role in the melanization and sclerotization processes of insect epidermis but also have significant roles in insect immunity (An et al., 2012). The presence of bursicon subunits in the epidermis, gut, and fat body of C. chinensis may indicate their crucial roles in the immune functions of these tissues. Further investigation is required to elucidate the specific immune functions they perform, hinting at the potential expression of these bursicon subunits in these two tissues.

      An, S., Dong, S., Wang, Q., Li, S., Gilbert, L. I., Stanley, D., & Song, Q. (2012). Insect neuropeptide bursicon homodimers induce innate immune and stress genes during molting by activating the NF-κB transcription factor Relish. PloS one, 7(3), e34510. https://doi.org/10.1371/journal.pone.0034510

      (2) L222: "CcBurs-R is the Bursicon receptor of C. chinensis". Is this statement supported by affinity binding assay results?

      Thanks for your excellent suggestion. We employed a fluorescence-based assay to quantify calcium ion concentrations and investigate the binding affinities of bursicon heterodimers and homodimers to the bursicon receptor across varying concentrations. Our findings suggest that activation of the receptor by the burs α-β heterodimer leads to significant alterations in intracellular calcium ion levels, whereas stimulation with burs α-α and burs β-β homodimers, in conjunction with Adipokinetic hormone (AKH), maintains consistent intracellular calcium ion levels. Consequently, this research definitively identifies CcBurs-R as the bursicon receptor. For further details, please refer to the Materials and Methods (Lines 493-504), Results (Lines 231-239), and Discussion (Lines 377-384) of our revised manuscript.

      (3) L245 Figure 4I-4J: Since knockdown of bursicon and its receptor cause a decrease pigment accumulation in the cuticle, it would be useful to examine 1-2 rate limiting enzyme-encoding genes in the bursicon regulated cuticle darkening process if possible (as was done for genes involved in cuticle thickening).

      Thanks for your excellent comment. Following the further study, a thorough analysis was conducted to evaluate the impact of bursicon and its receptor on the expression levels of Lactase, Tyrosine hydroxylase, Dopa decarboxylase, Acetyltransferase, and the effects of RNA interference targeting these genes on the seasonal morphological transition. The findings underscored their role in the bursicon-mediated cuticle darkening process. However, as this section is slated for inclusion in an upcoming manuscript intended for submission, it is deemed unsuitable for incorporation into the current manuscript.

      Minor issues:

      (1) L75 "stronger resistance (Ge et al., 2019; Tougeron et al., 2021)". Stronger resistance to what? Stronger resistance to environmental stress or weather condition? Please clarify.

      Thanks for your excellent suggestion. We have changed the statement to “stronger resistance to weather condition” in Line 75 of our revised manuscript.

      (2) L132 Figure 1A and 1B: Bursicon sequence was first identified and functionally characterized in Drosophila melanogaster: is there any reason why Drosophila bursicon sequences were not included in the comparison?

      Thanks for your excellent comment. We have added the sequence of Burs-α and Burs-β of D. melanogaster in the sequence alignment results of Figure 1A and 1B of our revised manuscript.

      (3) Although the authors clearly identify and validate the function for the bursicon genes and its receptor's, there is no mention of whether duplicates of this gene are also present in the pear psyllid. This has been known to happen in otherwise conserved hormone pathways (e.g., insulin receptor in some insects), so a formal check of this should be done.

      Thanks for your excellent comment. As shown in Figure S2A-S2B and 3B, there are two bursicon subunit genes and only one bursicon receptor gene in our selected insect species, for examples Drosophila melanogaster, Diaphorina citri, Bemisia tabaci, Nilaparvata lugens, and Sogatella furcifera. In our transcriptome database of C. chinensis, we also only identified two bursicon subunit genes and only one bursicon receptor gene.

      (4) Line 41: Here, as in the title, "fascinating" is a subjective judgement that does not improve a study's presentation.

      Thanks for your great comment. We have changed "fascinating" to "transformation" in Line 41 and also revised the title of our revised manuscript.

      (5) Line 44: What makes some fields "cutting-edge" and others not?

      Thanks for your excellent suggestion. The expression of "in cutting-edge fields" has been deleted in Line 44 of our revised manuscript.

      (6) Line 97: This is a peculiar choice of reference for the concept of slower development in cold temperatures. The concept of degree-days and growth rates is old and widespread in entomology.

      Thanks for your insightful comment. The reference of Nyamaukondiwa et al., 2011 in Line 95 has been deleted in our revised manuscript.

      (7) Lines 149-150: What justifies the assumption that higher levels of expression mean a more important role? This gene might be just as necessary for development of the summer form, even if expressed at lower levels.

      Thanks for your excellent suggestion. This sentence has been revised to “Increased gene expression levels may potentially contribute to the transition from summer-form to winter-form in C. chinensis.” in Line 168-169 of our revised manuscript.

      (8) The blue arrow in Fig. 7 is confusing.

      Thanks for your excellent suggestion. In Figure 7, the blue arrow represents the down-regulated expression of miR-6012. We have added a description about the blue arrow in Figure 7 of our revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Over the last decade, numerous studies have identified adaptation signals in modern humans driven by genomic variants introgressed from archaic hominins such as Neanderthals and Denisovans. One of the most classic signals comes from a beneficial haplotype in the EPAS1 gene in Tibetans that is evidently of Denisovan origin and facilitated high altitude adaptation (HAA). Given that HAA is a complex trait with numerous underlying genetic contributions, in this paper Ferraretti et al. asked whether additional HAA-related genes may also exhibit a signature of adaptive introgression. Specifically, the authors considered that if such a signature exists, they most likely are only mild signals from polygenic selection, or soft sweeps on standing archaic variation, in contrast to a strong and nearly complete selection signal like in the EPAS1. Therefore, they leveraged two methods, including a composite likelihood method for detecting adaptive introgression and a biological networkbased method for detecting polygenic selection, and identified two additional genes that harbor plausible signatures of adaptive introgression for HAA.

      Strengths: 

      The study is well motivated by an important question, which is, whether archaic introgression can drive polygenic adaptation via multiple small effect contributions in genes underlying different biological pathways regulating a complex trait (such as HAA). This is a valid question and the influence of archaic introgression on polygenic adaptation has not been thoroughly explored by previous studies.

      The authors reexamined previously published high-altitude Tibetan whole genome data and applied a couple of the recently developed methods for detecting adaptive introgression and polygenic selection. 

      Weaknesses: 

      My main concern with this paper is that I am not too convinced that the reported genomic regions putatively under polygenic selection are indeed of archaic origin. Other than some straightforward population structure characterizations, the authors mainly did two analyses with regard to the identification of adaptive introgression: First, they used one composite likelihood-based method, the VolcanoFinder, to detect the plausible archaic adaptive introgression and found two candidate genes (EP300 and NOS2). Next, they attempted to validate the identified signal using another method that detects polygenic selection based on biological network enrichments for archaic variants.

      In general, I don't see in the manuscript that the choice of methods here are well justified. VolcanoFinder is one among the several commonly used methods for detecting adaptive introgression (eg. the D, RD, U, and Q statistics, genomatnn, maldapt etc.). Even if the selection was mild and incomplete, some of these other methods should be able to recapitulate and validate the results, which are currently missing in this paper. Besides, some of the recent papers that studied the distribution of archaic ancestry in Tibetans don't seem to report archaic segments in the two gene regions. These all together made me not sure about the presence of archaic introgression, in contrast to just selection on ancestral variation.

      Furthermore, the authors tried to validate the results by using signet, a method that detects enrichments of alleles under selection in a set of biological networks related to the trait. However, the authors did not provide sufficient description on how they defined archaic alleles when scoring the genes in the network. In fact, reading from the method description, they seemed to only have considered alleles shared between Tibetans and Denisovans, but not necessarily exclusively shared between them. If the alleles used for scoring the networks in Signet are also found in other populations such as Han Chinese or Africans, then that would make a substantial difference in the result, leading to potential false positives.

      Overall, given the evidence provided by this article, I am not sure they are adequate to suggest archaic adaptive introgression. I recommend additional analyses for the authors to consider for rigorously testing their hypothesis. Please see the details in my review to the authors. 

      Reviewer #2 (Public Review):

      In Ferrareti et al. they identify adaptively introgressed genes using VolcanoFinder and then identify pathways enriched for adaptively introgressed genes. They also use a signet to identify pathways that are enriched for Denisovan alleles. The authors find that angiogenesis and nitric oxide induction are enriched for archaic introgression.

      Strengths: 

      Most papers that have studied the genetic basis of high altitude (HA) adaptation in Tibet have highly emphasized the role of a few genes (e.g. EPAS1, EGLN1), and in this paper, the authors look for more subtle signals in other genes (e.g EP300, NOS2) to investigate how archaic introgression may be enriched at the pathway level.

      Looking into the biological functions enriched for Denisovan introgression in Tibetans is important for characterizing the impact of Denisovan introgression.

      Weaknesses: 

      The manuscript lacks details or justification about how/why some of the analyses were performed. Below are some examples where the authors could provide additional details.

      The authors made specific choices in their window analysis. These choices are not justified or there is no comment as to how results might change if these choices were perturbed. For example, in the methods, the authors write "Then, the genome was divided into 200 kb windows with an overlap of 50 kb and for each of them we calculated the ratio between the number of significant SNVs and the total number of variants." 

      Additional information is needed for clarity. For example, "we considered only protein-protein interactions showing confidence scores {greater than or equal to} 0.7 and the obtained protein frameworks were integrated using information available in the literature regarding the functional role of the related genes and their possible involvement in high-altitude adaptation." What do the confidence scores mean? Why 0.7?

      In the method section (Identifying gene networks enriched for Denisovan-like derived alleles), the authors write "To validate VolcanoFinder results by using an independent approach". Does this mean that for signet the authors do not use the regions identified as adaptively introgressed using volcanofinder? I thought in the original signet paper, the authors used a summary describing the amount of introgression of a given region.

      Later, the authors write "To do so, we first compared the Tibetan and Denisovan genomes to assess which SNVs were present in both modern and archaic sequences. These loci were further compared with the ancestral reconstructed reference human genome sequence (1000 Genomes Project Consortium et al., 2015) to discard those presenting an ancestral state (i.e., that we have in common with several primate species)." It is not clear why the authors are citing the 1000 genomes project. Are they comparing with the reference human genome reference or with all populations in the 1000 genomes project? Also, are the authors allowing derived alleles that are shared with Africans? Typically, populations from Africa are used as controls since the Denisovan introgression occurred in Eurasia.

      The methods section for Figures 4B, 4C, and 4D is a little hard to understand. What is the x-axis on these plots? Is it the number of pairwise differences to Denisovan? The caption is not clear here. The authors mention that "Conversely, for non-introgressed loci (e.g., EGLN1), we might expect a remarkably different pattern of haplotypes distribution, with almost all haplotype classes presenting a larger proportion of non-Tibetan haplotypes rather than Tibetan ones." There is clearly structure in EGLN1. There is a group of non-Tibetan haplotypes that are closer to Denisovan and a group of Tibetan haplotypes that are distant from Denisovan...How do the authors interpret this? 

      In the original signet paper (Guoy and Excoffier 2017), they apply signet to data from Tibetans. Zhang et al. PNAS (2021) also applied it to Tibetans. It would be helpful to highlight how the approach here is different. 

      We thank the Reviewers for having appreciated the rationale of our study and to have identified potential issues that deserve to be addressed in order to better focus on robust results specifically supported by multiple approaches.

      First, we agree with the Reviewers that clarification and justification for the methodologies adopted in the present study should be deepened with respect to what done in the original version of the manuscript, with the purpose of making it more intelligible for a broad range of scientists. As reported thoroughly in the revised version of the text, the VolcanoFinder algorithm, which we used as the primary method to discover new candidate genomic regions affected by events of adaptive introgression, was chosen among several approaches developed to detect signatures ascribable to such an evolutionary process according to the following reasons: i) VolcanoFinder is one of the few methods that can test jointly events of both archaic introgression and adaptive evolution (e.g., the D statistic cannot formally test for the action of natural selection, having been also developed to provide genome wide estimates of allele sharing between archaic and modern groups rather than to identify specific genomic regions enriched for introgressed alleles); ii) the model tested by the VolcanoFinder algorithm remarkably differs from those considered by other methods typically used to test for adaptive introgression, such as the RD, U and Q statistics, which are aimed at identifying chromosomal segments showing low divergence with respect to a specific archaic sequence and/or enriched in alleles uniquely shared between the admixed group and the source population, as well as characterized by a frequency above a certain threshold in the population under study, thus being useful especially to test an evolutionary scenario conformed to that expected in the case that adaptation was mediated by strong selective sweeps rather than weak polygenic mechanisms (see answer to comment #1 of Reviewer #1 for further details); iii) VolcanoFinder relies on less demanding computational efforts respect to other algorithms, such as genomatnn and Maladapt, which also require to be trained on large genomic simulations built specifically to reflect the evolutionary history of the population under study, thus increasing the possibility to introduce bias in the obtained results if the information that guides simulation approaches is not accurate.

      Despite that, we agree with Reviewer #2 that some criteria formerly implemented during the filtering of VolcanoFinder results (e.g., normalization of LR scores, use of a sliding windows approach, and implementation of enrichment analysis based on specific confidence scores) might introduce erratic changes, which depend on the thresholds adopted, in the list of the genomic regions considered as the most likely candidates to have experienced adaptive introgression. To avoid this issue, and to adhere more strictly to the VolcanoFinder pipeline of analyses developed by Setter et al. 2020, in the revised version of the manuscript we have opted to use raw LR scores and to shortlist the most significant results by focusing on loci showing values falling in the top 5% of the genomic distribution obtained for such a statistic (see Materials and methods for details). 

      Moreover, to further reduce the use of potential arbitrary filtering thresholds we decided to do not implement functional enrichment analysis to prioritize results from the VolcanoFinder method. To this end, although a STRING confidence score (i.e., the approximate probability that a predicted interaction exists between two proteins belonging to the same functional pathway according to information stored in the KEGG database) above 0.7 is generally considered a high confidence score (string-db.org, Szklarczyk et al. 2014), we replaced such a prioritization criterion by considering as the most robust candidates for adaptive introgression only those genomic regions that turned out to be supported by all the approaches used (i.e., VolcanoFinder, Signet, LASSI and Haplostrips analyses).

      According to the Reviewers’ comments on the use of the Signet algorithm, we realized that the rationale beyond such a validation approach was not well described in the original version of the manuscript. First and foremost, we would like to clarify that in the present study we did not use this method to test for the action of natural selection (as it was formerly used by Gouy et al. 2017), but specifically to identify genomic regions putatively affected by archaic introgression. For this purpose, we followed the approach described by Gouy and Excoffier 2020 by searching for significant networks of genes presenting archaic-derived variants observable in the considered Tibetan populations but not in an outgroup population of African ancestry. Accordingly, we used the Signet method as an independent approach to obtain a first validation of introgressed (but not necessarily adaptive) loci pointed out by VolcanoFinder results. 

      In detail, in response to the question by Reviewer #2 about which genomic regions have been considered in the Signet analysis, it is necessary to clarify that to obtain the input score associated to each gene along the genome, as required by the algorithm, we calculated average frequency values per gene by considering all the archaic-derived alleles included in the Tibetan dataset but not in the outgroup one. Therefore, we did not take into account only those loci identified as significant by VolcanoFinder analysis, but we performed an independent genome scan. Then, we crosschecked significant results from VolcanoFinder and Signet approaches and we shortlisted the genomic regions supported by both. This approach thus differs from that of Zhang et al. 2021 in which the input scores per gene were obtained by considering only those loci previously pointed out by another method as putatively introgressed. Moreover, as mentioned in the previous paragraph, our approach differs also from that implemented by Guoy et al. 2017, in which the input scores assigned to each gene were represented by the variants showing the smallest P-value associated to a selection statistic, being thus informative about putative adaptive events but not introgression ones.

      However, as correctly pointed out by both the Reviewers, we formerly performed Signet analysis by considering derived alleles shared between Tibetans and the Denisovan species, without filtering out those alleles that are observed also in other modern human populations. We agree with the Reviewers that this approach cannot rule out the possibility of retaining false positive results ascribable to ancestral polymorphisms rather than introgressed alleles. According to the Reviewers’ suggestion, we thus repeated the Signet analysis by removing derived alleles observed also in an outgroup population of African ancestry (i.e., Yoruba), by assuming that only Eurasian H. sapiens populations experienced Denisovan admixture. In detail, we considered only those alleles that: i) were shared between Tibetans and Denisovan (i.e., Denisovan-like alleles); ii) were assumed to be derived according to the comparison with the ancestral reconstructed reference human genome sequence; iii) were completely absent (i.e., present frequency equal to zero) in the Yoruba population sequenced by the 1000 Genomes Project. Despite the comment of Reviewer #1 seems to propose the possible use of Han Chinese as a further control population, we decided to do not filter out Denisovan-like derived alleles present also in this human group because evidence collected so far suggest that Denisovan introgression in the gene pool of East Asian ancestors predated the split between low-altitude and high-altitude populations (Lu et al. 2016; Hu et al. 2017) and, as mentioned before, we aimed at using the Signet algorithm to validate introgression events rather than adaptive ones (see the answer to comment #6 of Reviewer #1 for further details). Moreover, we would like to remark that we decided to maintain the Signet analysis as a validation method in the revised version of the manuscript because: i) comments from both the Reviewers converge in suggesting how to effectively improve this approach, and ii) it represents a method that goes beyond the simple identification of single putative introgressed alleles, by instead enabling us to point out those biological functions that might have been collectively shaped by gene flow from Denisovans.

      In addition to validate genomic regions putatively affected by archaic introgression by crosschecking results from the VolcanoFinder and Signet analyses, according to the suggestion by Reviewer #1 we implemented a further validation procedure aimed at formally testing for the adaptive evolution of the identified candidate introgressed loci. For this purpose, we applied the LASSI likelihood haplotype based method (Harris & DeGiorgio 2020) to Tibetan whole genome data. Notably, we choose this approach mainly for the following reasons: i) because it is able to detect and distinguish genomic regions that have experienced different types of selective events (i.e. strong and weak ones); ii) it has been demonstrated to have increased power in identifying them with respect to other selection statistics (e.g., H12 and nSL) (Harris & DeGiorgio 2020). Again, we performed an independent genome scan using the LASSI algorithm and then we crosschecked the obtained significant results with those previously supported by VolcanoFinder and Signet approaches in order to shortlist genomic regions that have plausibly experienced both archaic introgression and adaptive evolution.

      Moreover, we maintained a final validation step represented by Haplostrips analysis, which was instead specifically performed on chromosomal segments supported by results from both VolcanoFinder, Signet, and LASSI approaches. This enabled us to assess the similarity between Denisovan haplotypes and those observed in Tibetans (i.e., the population under study in which archaic alleles might have played an adaptive role in response to high-altitude selective pressures), Han Chinese (i.e., a sister group whose common ancestors with Tibetans have experienced Denisovan admixture, but have then evolved at low altitude), and Yoruba (i.e., an outgroup that is assumed to have not received gene flow from Denisovans). 

      In conclusion, we believe that the substantial changes incorporated in the manuscript according to the Reviewers’ suggestions strongly improved the study by enabling us to focus on more solid results with respect to those formerly presented. Interestingly, although the single candidate loci supported by all the approaches now implemented for validating the obtained results have attained higher prioritization with respect to previous ones (which are supported by some but not all the adopted methods), angiogenesis still stands out as the one of the main biological functions that have been shaped by events of adaptive introgression in human groups of Tibetan ancestry. This provides new evidence for the contribution of introgressed Denisovan alleles other than the EPAS1 ones in modulating the complex adaptive responses evolved by Himalayan populations to cope with selective pressures imposed by high altitudes.

      Responses to Recommendations For The Authors:

      Reviewer #1:

      The authors mainly relied on one method, VolcanoFinder (VF), to detect adaptive introgression signals. As one of the recently developed methods, VF indeed demonstrated statistical power at detecting mild selection on archaic variants, as well as detecting soft sweeps on standing variations. However, compared to other commonly used methods for detecting adaptive introgression, such as the U and Q stats (Racimo et al. 2017), genomatnn (Gower et al. 2021), or MaLAdapt (Zhang et al. 2023),

      VF doesn't seem to have better power at capturing mild and incomplete sweeps. And it makes me wonder about the justification for choosing VF over other methods here, which is not clearly explained in the manuscript. If these adaptive introgression candidates are legitimate, even if the signals are mild, at least some of the other methods should be able to recapitulate the signature (even if they don't necessarily make it through the genome-wide significance thresholds). I would be more convinced about the archaic origin of these regions if the authors could validate their reported findings using some of the aforementioned other methods. 

      According to the Reviewer’s suggestion, in the revised version of the manuscript we have expanded the considerations reported as concern the rationale that guided the choice of the adopted methods. In particular, in the Materials and methods section (see page 12) we have specificed the reasons for having used the VolcanoFinder algorithm. 

      First, it represents one of the few approaches that relies on a model able to test jointly the occurrence of archaic introgression and the adaptive evolution of the genomic regions affected by archaic gene flow, without the need for considering the putative source of introgression. This was a relevant aspect for us, beacuse we planned to adopt at least two main independent (and possibly quite different in terms of the underlying approaches) methods to validate the identified candidate intregressed loci and the other algorithm we used (i.e., Signet) was explicitly based on the comparison of modern data with the archaic sequence. Accordingly, the model tested by VolcanoFinder differs from those considered by the RD, U and Q statistics. In fact, RD statistic is aimed at identifying regions of the genome with low divergence with respect to a given archaic reference, while the U/Q statistics can detect those chromosomal segments enriched in alleles that are i) uniquely shared between the admixed group (e.g., Tibetans) and the source population (e.g., Denisovans), and ii) that present a frequency above a specific threshold in the admixed population (Racimo et al. 2016). For instance, all the loci considered as likely involved in adaptive introgression events by Racimo et al. 2016 presented remarkable frequencies, with most of them showing values above 50%. That being so, we decided to do not implement these methods because we believe that they are more suitable for the detection of adaptive introgression events involving few variants with a strong effect on the phenotype, which comport a substantial increase in frequency in the population subjected to the selective pressure (i.e., cases such as that of  EPAS1), while it appears challenging to choose an arbitrary frequency threshold appropriate for the detection of weak and/or polygenic selective events. 

      As regards the possible use of Maladapt or genomatnn approaches as validation methods, we believe that they rely on more demanding computational efforts with respect to the Signet algorithm and, above all, they have the disadvantage of requiring to be trained on simulated genomic data. This makes them more prone to the potential bias introduced in the obtained results by simulations that do not carefully reflect the evolutionary history of the population under study.

      Overall, we do not agree with the Reviwer’s statement about the fact that we mainly relied on a single method to detect adaptive introgression signals because, as mentioned above, the Signet algorithm was specifically used to identify genomic regions putatively affected by introgression. This method relies on assumptions very similar to those described above for the U/Q statistics (e.g. it considers alleles uniquely shared between Tibetans and Denisovans), but avoids the necessity to select a frequency threshold to shortlist the most likely adaptive intregressed loci. In addition, according to another suggestion by the Reviewer we have now implemented a further approach to provide evidence for the adaptive evolution of the candidate introgressed loci (see response to comment #3).  

      As regards the use of Signet, based on comments from both the Reviewers we realized that the rationale beyond such a validation approach was not well described in the original version of the manuscript. First and foremost, we would like to clarify that in the present study we did not use this method to test for the action of natural selection (as it was formerly used by Gouy et al. 2017), but specifically to identify genomic regions putatively affected by archaic introgression. For this purpose, we followed the approach described by Gouy and Excoffier (2020) by searching for significant networks of genes presenting archaic-derived variants observable in the considered Tibetan populations. That being so, we used the Signet method as an independent approach to obtain a first validation of VolcanoFinder results. However, by following suggestions from both the Reviweres, we modified the criteria adopted to filter for archaic-derived variants, by excluding those alleles in common between Denisovan and the Yoruba outgroup population (see response to comment #6 for further information regarding this aspect). 

      To sum up, we think that the combination of VolcanoFinder and Signet+LASSI approaches offered a good compromise between required computational efforts to shortlist the most robust candidates of adaptive introgressed loci and the typologies of model tested (i.e. that does not diascard a priori genomic signatures ascribable to weak and/or polygenic selective events). Morevoer, we would like to remark that we decided to maintain the Signet method as a validation approach in the revised version of the manuscript because: i) comments from both the Reviewers converge in suggesting how to effectively improve this approach, and ii) it represents a method that can be used to perform both single-locus validation analysis and to search for those biological functions that have been collectively much more impacted by archaic introgression, allowing to test a more realistic approximation of the polygenic model of adaptation involving introgressed alleles. In fact, although the single candidate loci supported by all the approaches now implemented for validating the obtained results  (see responses to comments #3 and #7 for further details) have attained higher prioritization with respect to previous ones (i.e., EP300 and NOS2, which are now supported by some but not all the adopted methods), angiogenesis still stands out as one of the main biological functions that have been shaped by events of adaptive introgression in the ancestors of Tibetan populations. 

      Besides, I am a little surprised to see that in Supplementary Figure 2, VF didn't seem to capture more significant LR values in the EPAS1 region (positive control of adaptive introgression) than in the negative control EGLN1 region. The author explained this as the selection on EPAS1 region is "not soft enough", which I find a bit confusing. If there is no major difference in significant values between the positive and negative controls, how would the authors be convinced the significant values they detected in their two genes are true positives? I would like to see more discussion and justification of the VF results and interpretations.

      In the light of such a Reviewer’s observation and according to the Reviewer #2 overall comment on the procedures implemented for filtering VolcanoFinder results, we realized that both normalization of  LR scores and the use of a sliding windows approach might introduce erratic changes, which depend on the thresholds adopted, in the list of the genomic regions considered as the most likely candidates to have experienced adaptive introgression. To avoid this issue, and to adhere more strictly to the VolcanoFinder pipeline of analyses developed by Setter et al. 2020, in the revised version of the manuscript we have opted to use raw LR scores and to shortlist the most significant results by focusing on loci showing values falling in the top 5% of the genomic distribution obtained for such a statistic (see Materials and methods, page 13 lines 4 -16 for further details).

      By following this approach, we indeed observed a pattern clearer than that previously described, in which the distribution of LR scores in the EPAS1 genomic region is remarkably different with respect to that obtained for the EGLN1 gene (Figure 2 – figure supplement 1). More in detail, we identified a total of 19 EPAS1 variants showing scores within the top 5% of LR values, in contrast to only three EGLN1 SNVs. Moreover, LR values were collectively more aggregated in the EPAS1 genomic region and showed a higher average value with respect to what observed for EGLN1. We reported LR values, as well as -log (a) scores calculated for these control genes in Supplement tables 3 and 4.

      Nevertheless, we agree with the Reviewer that results pointed out by VolcanoFinder require to be confirmed by additional methods, which is was what we have done to define both new candidate adaptive intregressed loci and the considered positive/negative controls. In fact, validation analyses performed to confirm signatures of both archaic introgression and adaptive evolution (i.e., Signet, LASSI and Haplostrips) converged in indicating that Tibetan variability at the EGLN1 gene does not seem to have been shaped by archaic introgression events but only by the action of natural selection (see Results, page 5 lines 3-9, page 6 lines 23-25, page 7 lines 29-36; Discussion page 14 lines 33-36; Figure 2 – figure supplement 1B and Figure 4 – figure supplement 1B, 3B and 3D), also according to what was previously proposed (Hu et al., 2017). On the other hand, results from all validation analyses confirmed adaptive introgression signatures at the EPAS1 genomic region (see Results page 4 lines 32-37, page 5 lines 1-2 and 30-34, page 6 lines 23-29; Figure 3A, 3B and Figure 4 – figure supplement 1A, 3A and 3C). 

      Finally, as already reported in the former version of the manuscript, our choice of considering EPAS1 and EGLN1 respectively as positive and negative controls for adaptive introgression was guided by previous evidence suggesting these loci as targets of natural selection in high-altitude Himalayan populations (Yang et al., 2017; Liu et al., 2022), although only EPAS1 was proved to have been involved also in an adaptive introgression event (Huerta-Sanchez et al., 2014; Hu et al., 2017). 

      With that being said, I suggest the authors try to first validate the signal of positive selection in the two gene regions using methods such as H2/H1 (Garud et al. 2015), iHS (Voight et al. 2006) etc. that have demonstrated power and success at detecting mild sweeps and soft sweeps, regardless of if these are adaptive introgression.

      According to the Reviewer’s suggestion, we validated the new candidate adaptive introgressed loci by using also a method to formally test for the action of natural selection. In particular, we decided to use the LASSI (Likelihood-based Approach for Selective Sweep Inference) algorithm developed by Harris & DeGiorgio (2020) mainly for the following reasons: i) it is able to identify both strong and weak genomic signatures of positive selection similarly to others approaches, but additionally it can distinguish these signals by explicitly classifying genomic windows affected by hard or soft selective sweeps; ii) when applied on simulated data generated under different demographic models and by setting a range of different values for the parameters that describe a selective event (e.g., the time at which the beneficial mutation arose, the selection coefficient s) it has been proved to have an increased power with respect to traditional selection scans, such as nSL, H2/H1 and H12 (see Harris & DeGiorgio 2020 for further details).  

      According to such an approach, we were able to recapitulate signatures of natural selection previously observed in Tibetans for both EPAS1 and EGLN1 (Figure 4 – figure supplement 1 and 3C – 3D).  We also obtained comparable patterns for our previous candidate adaptive introgressed loci (i.e., EP300 and NOS2), as well as for the new ones that have been instead prioritized in the revised version of the manuscript according to consistent results also from VolcanoFinder, Signet and Haplostrips analyses (see Results, page 6 lines 30-35; Figure 4C, 4D, Figure 4 – figure supplement 2C and 2D).    

      With regard to the plausible archaic origin of the haplotypes under selection in these gene regions, my concern comes from the fact that other recent studies characterizing the archaic ancestry landscape in Tibetans and East Asians (eg. SPrime reports from Browning et al. 2018, as well as ArchaicSeeker reports from Yuan et al. 2021) didn't report archaic segments in regions overlapping with EP300 and NOS2. So how would the authors explain the discrepancy here, that adaptive introgression is detected yet there is little evidence of archaic segments in the regions? 

      We thank the Reviewer for the comment and the references provided. However, we read the suggested articles and in both of them it does not seem that genomes from individuals of Tibetan ancestry have been analysed. Moreover, in the study by Yuan et al. 2021 we were not able to find any table or supplementary table reporting the genomic segments showing signatures of Denisovan-like introgression in East Asian groups, with only findings from enrichment analyses performed on significant results being described for the Papuan population. Anyway, as reported below in the response to comment #5, in line with what observed by the Reviwer as concerns the original version of the manuscript, according to the additional validation analyses implemented during this revison EP300 and NOS2 received lower prioritization with respect to other loci showing more robust signatures supporting introgression of Denisovan alleles in the gene pool of Tibetan ancestors (i.e., TBC1D1, PRKAG2, KRAS and RASGRF2). Three out of four of these genes are in accordance also with previously published results supporting introgression of Denisovan alleles in the ancestors of present-day Han Chinese (Browning et al. 2018) or directly in the Tibetan genomes (Hu et al. 2017) (see Results, page 5 lines 10-21 and Supplement table 5). Despite that, the reason why not all the candidate adaptive introgression regions detected by our analyses are found among results from Browning et al. 2018 can be represented by the fact that in Han Chinese this archaic variation could have evolved neutrally after the introgression events, thus preventing the identification of chromosomal segments enriched in putative archaic introgressed variants according to VolcanoFinder and LASSI approaches (which consider also the impact of natural selection). In fact, the Sprime method implemented by Browning et al. 2018 focuses only on introgression events rather than adaptive introgression ones. For instance, the Denisovan-like regions identified with Sprime in Han Chinese by such a study do not comprise at all the EPAS1 region. 

      Additionally, looking at Figure 4 and Supplementary Figure 4, the authors showed haplotype comparisons between Tibetans, Denisovan, and Han Chinese for EP300 and NOS2 regions. However, in both figures, there are about equal number of Tibetans and Han Chinese that harbor the haplotype with somewhat close distance to the Denisovan genotype. And this closest haplotype is not even that similar to the Denisovan. So how would the authors rule out the possibility that instead of adaptive introgression, the selection was acting on just an ancestral modern human haplotype?

      We agree with the Reviewer that according to the analyses presented in the original version of the manuscript haplotype patterns observed at EP300 and NOS2 loci by means of the Haplostrips approach cannot ruled out the possibility that their adaptative evolution involved ancestral modern human haplotypes. In fact, after the modifications implemented in the adopted pipeline of analyses based on the Reviewers’ suggestions, their role in modulating complex adaptations to high-altitudes was confirmed also by results obtained with the LASSI algorithm (in addition to results from previous studies Bigham et al., 2010; Zheng et al., 2017; Deng et al., 2019; X. Zhang et al., 2020), but their putative archaic origin received lower prioritization with respect to other loci, being not confirmed by all the analyses performed.

      Furthermore, I have a question about how exactly the authors scored the genes in their network analysis using Signet. The manuscript mentioned they were looking for enrichment of archaic-like derived alleles, and in the methods section, they mentioned they used SNPs that are present in both Denisovan and Tibetan genomes but are not in the chimp ancestral allele state. But are these "derived" alleles also present in Han Chinese or Africans? If so, what are the frequencies? And if the authors didn't use derived alleles exclusively shared between Tibetans and Denisovans, that may lead to false positives of the enrichment analysis, as the result would not be able to rule out the selection on ancestral modern human variation.

      As mentioned in the response to comment #1, by following the suggestions of both the Reviewers we have modified the criteria adopted for filtering archaic derived variants exclusively shared between Denisovans and Tibetans. In particular, we retained as input for Signet analysis only those alleles that i) were shared between Tibetans and Denisovan (i.e., Denisovan-like alleles) ii) were in their derived state and iii) were completely absent (i.e., show frequency equal to zero) in the Yoruba population sequenced by the 1000 Genome Project and used here as an outgroup by assuming that only Eurasian H. sapiens populations experienced Denisovan admixture. We instead decided to do not filter out potential Denisovan-like derived alleles present also in the Han Chinese population because multiple evidence agreed at indicating that gene flow from Denisovans occurred in the ancestral East Asian gene pool no sooner than 48–46 thousand years ago (Teixeira et al. 2019; Zhang et al. 2021; Yuan et al. 2021), thus predating the split between low-altitude and high-altitude groups, which occurred approximately 15 thousand years ago (Lu et al. 2016; Hu et al. 2017). In fact, traces of such an archaic gene-flow are still detectable in the genomes of several low-altitude populations of East Asian ancestry (Yuan et al. 2021).

      Concerning the above, I would also suggest the authors replot their Figure 4 and Figure S4 by adding the African population (eg. YRI) in the plot, and examine the genetic distance among the modern human haplotypes, in contrast to their distance to Denisovan.

      According to the Reviewer’s suggestion, after having identified new candidate adaptive introgressed loci according to the revised pipeline of analyses, we run the Haplostrips algorithm by including in the dataset 27 individuals (i.e., 54 haplotypes) from the Yoruba population sequenced by the 1000 Genomes Project (Figure 4A, 4B, Figure 4 - figure supplement 2A, 2B, 3A).

      Reviewer #2:

      In the methods the authors write "Since composite likelihood statistics are not associated with pvalues, we implemented multiple procedures to filter SNVs according to the significance of their LR values." What does significance mean here?

      After modifications applied to the adopted pipeline of analyses according to the Reviewers’ suggestions (see responses to public reviews and to comments #1, #3, #6, #7 of Reviewer #1), new candidate adaptive introgressed loci have been identified specifically by focusing on variants showing LR values falling in the top 5% of the genomic distribution obtained for such a statistic in order to adhere more strictly to the VolcanoFinder approach developed by Setter et al. 2020. Therefore, the related sentence in the materials and methods section was modified accordingly.

      Signet should be cited the first time it appears in the manuscript. The citation in the references is wrong. It lists R. Nielsen as the last author, but R. Nielsen is not an author of this paper.

      We thank the Reviewer for the comment. We have now mentioned the article by Gouy and Excoffier (2020) in the Results section where the Signet algorithm was first described and we have corrected the related reference.

      I could not find Figure 5 which is cited in the methods in the main text. I assume the authors mean Supplementary Figure 5, but the supplementary files have Figure 4.

      We thank the Reviewer for the comment. We have checked and modified figures included in the article and in the supplementary files to fix this issue.

      I didn't see a table with the genes identified as adaptatively introgressed with VolcanoFinder. This would be useful as I believe this is the first time VolcanoFinder is being used on Tibetan data?

      According to the Reviewer suggestion, we have reported in Supplement table 2 all the variants showing LR scores falling in the top 5% of the genomic distribution obtained for such a statistic, along with the associated α parameters computed by the VolcanoFinder algorithm.

      It is easier for the reviewer if lines have numbers.

      According to the Reviewer suggestion, we have included line numbers in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors aimed to elucidate the cytological mechanisms by which conjugated linoleic acids (CLAs) influence intramuscular fat deposition and muscle fiber transformation in pig models. Utilizing single-nucleus RNA sequencing (snRNA-seq), the study explores how CLA supplementation alters cell populations, muscle fiber types, and adipocyte differentiation pathways in pig skeletal muscles.

      Thanks!

      Strengths:

      Innovative approach: The use of snRNA-seq provides a high-resolution insight into the cellular heterogeneity of pig skeletal muscle, enhancing our understanding of the intricate cellular dynamics influenced by nutritional regulation strategy.

      Robust validation: The study utilizes multiple pig models, including Heigai and Laiwu pigs, to validate the differentiation trajectories of adipocytes and the effects of CLA on muscle fiber type transformation. The reproducibility of these findings across different (nutritional vs genetic) models enhances the reliability of the results.

      Advanced data analysis: The integration of pseudotemporal trajectory analysis and cell-cell communication analysis allows for a comprehensive understanding of the functional implications of the cellular changes observed.

      Practical relevance: The findings have significant implications for improving meat quality, which is valuable for both the agricultural and food industry.

      Thanks!

      Weaknesses:

      Model generalizability: While pigs are excellent models for human physiology, the translation of these findings to human health, especially in diverse populations, needs careful consideration.

      Thanks!

      Reviewer #2 (Public Review):

      Summary:

      This study comprehensively presents data from single nuclei sequencing of Heigai pig skeletal muscle in response to conjugated linoleic acid supplementation. The authors identify changes in myofiber type and adipocyte subpopulations induced by linoleic acid at depth previously unobserved. The authors show that linoleic acid supplementation decreased the total myofiber count, specifically reducing type II muscle fiber types (IIB), myotendinous junctions, and neuromuscular junctions, whereas type I muscle fibers are increased. Moreover, the authors identify changes in adipocyte pools, specifically in a population marked by SCD1/DGAT2. To validate the skeletal muscle remodeling in response to linoleic acid supplementation, the authors compare transcriptomics data from Laiwu pigs, a model of high intramuscular fat, to Heigai pigs. The results verify changes in adipocyte subpopulations when pigs have higher intramuscular fat, either genetically or diet-induced. Targeted examination using cell-cell communication network analysis revealed associations with high intramuscular fat with fibro-adipogenic progenitors (FAPs).  The authors then conclude that conjugated linoleic acid induces FAPs towards adipogenic commitment. Specifically, they show that linoleic acid stimulates FAPs to become SCD1/DGAT2+ adipocytes via JNK signaling. The authors conclude that their findings demonstrate the effects of conjugated linoleic acid on skeletal muscle fat formation in pigs, which could serve as a model for studying human skeletal muscle diseases.

      Thanks!

      Strengths:

      The comprehensive data analysis provides information on conjugated linoleic acid effects on pig skeletal muscle and organ function. The notion that linoleic acid induces skeletal muscle composition and fat accumulation is considered a strength and demonstrates the effect of dietary interactions on organ remodeling. This could have implications for the pig farming industry to promote muscle marbling. Additionally, these data may inform the remodeling of human skeletal muscle under dietary behaviors, such as elimination and supplementation diets and chronic overnutrition of nutrient-poor diets. However, the biggest strength resides in thorough data collection at the single nuclei level, which was extrapolated to other types of Chinese pigs.

      Thanks!

      Weaknesses:

      While the authors generated a sizeable comprehensive dataset, cellular and molecular validation needed to be improved. For example, the single nuclei data suggest changes in myofiber type after linoleic acid supplementation, yet these data are not validated by other methodologies. Similarly, the authors suggest that linoleic acid alters adipocyte populations, FAPs, and preadipocytes; however, no cellular and molecular analysis was performed to reveal if these trajectories indeed apply. Attempts to identify JNK signaling pathways appear superficial and do not delve deeper into mechanistic action or transcriptional regulation. Notably, a variety of single cell studies have been performed on mouse/human skeletal muscle and adipose tissues. Yet, the authors need to discuss how the populations they have identified support the existing literature on cell-type populations in skeletal muscle.Moreover, the authors nicely incorporate the two pig models into their results, but the authors only examine one muscle group. It would be interesting if other muscle groups respond similarly or differently in response to linoleic acid supplementation.Further, it was unclear whether Heigai and Laiwu pigs were both fed conjugated linoleic acid or whether the comparison between Heigai-fed linoleic acid and Laiwu pigs (as a model of high intramuscular fat). With this in mind, the authors do not discuss how their results could be implicated in human and pig nutrition, such as desirability and cost-effectiveness for pig farmers and human diets high in linoleic acid. Notably, while single nuclei data is comprehensive, there needs to be a statement on data deposition and code availability, allowing others access to these datasets. Moreover, the experimental designs do not denote the conjugated linoleic acid supplementation duration. Several immunostainings performed could be quantified to validate statements. This reviewer also found the Nile Red staining hard to interpret visually and did not appear to support the conclusions convincingly. Within Figure 7, several letters (assuming they represent statistical significance) are present on the graphs but are not denoted within the figure legend.

      Thanks for your suggestions! We accepted your suggestion to revised our manuscript.

      For changes in myofiber type, we performed qPCR to verify the changes of muscle fiber type related gene expression after CLA treatment (Figure 2E); for changes of adipocyte and preadipocyte populations, we also performed immunofluorescence staining, qPCR, and western blotting in LDM tissues and FAPs to verify the alterations of cell types after feeding with CLA (Figure 3D, 3E, 6G, 7C, and 7D). Hence, we think these cellular and molecular results could support our conclusions.

      For JNK signaling pathway, we selected this signaling pathway based on snRNA-seq dataset and verified by activator in vitro experiment. However, we did not explore the mechanistic action and the downstream transcriptional regulators need to be further discussed. We have added these in the discussion part (line 443-448).

      We have added the comparation between different cell-type populations in skeletal muscles (line 362-368 and 385-390).

      For changes in myofiber type of Laiwu pigs, we have discussed in our previous study(Wang et al., 2023). Interestingly, we also found in high IMF content Laiwu pigs, the percentage of type IIa myofibers had an increased tendency (29.37% vs. 23.95%) while the percentage of type IIb myofibers had a decreased tendency (38.56% vs. 43.75%) in this study. We also added this discussion in the discussion part (line 392-395).

      We have supplied the information of treatment in the materials and methods part (line 469-478). We also added the discussion about significance of our study for human and pig nutrition in the discussion part (line 375-376 and 446-447).

      Our data will be made available on reasonable request (line 574-576).

      We have supplied the information of the CLA supplementation duration in the materials and methods part (line 465).

      Porcine FAPs have little lipid droplets and we improved the image quality (Figure 7A). In Figure 7, the Nile Red staining could be quantified and we have the quantification of Oil Red O staining (Figure 7B and 7J). We also added the statistical significance in figure legend.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for Improved or Additional Experiments, Data, or Analyses

      Cross-species analysis: To strengthen the generalizability of the results, it would be beneficial to include a comparative analysis with other species, such as human, bovine, or rodent models, using publicly available snRNA-seq datasets.

      Thanks! Our previous study has compared the conserved and unique signatures in fatty skeletal muscles between different species(Wang, Zhou, Wang, & Shan, 2024). We mainly focused on the regulatory mechanism of CLAs in regulating intramuscular fat deposition. However, there is still a blank in the snRNA-seq or scRNA-seq datasets about the effects of CLAs on regulating fat deposition in muscles across other species, including human, bovine or rodent models. Hence, we only analyze the regulatory mechanisms of CLAs influencing intramuscular fat deposition in pigs.

      Functional link: the authors should discuss in the manuscript how the muscles differ in terms of texture, flavor, aroma, etc. before and after CLA administration or between Heigai and Laiwu to provide context and help readers better understand how the observed high-resolution cellular changes relate to these functional properties of meat.

      Thanks! We have added these in the introduction part (line 90-98).

      Improve figures: some figures, particularly those involving Oil Red O and Nail Red, could be improved by including higher magnification images to assess the organization of lipid droplets of individual adipocytes (Figure 7A, I, and K).

      Thanks! Porcine FAPs have little lipid droplets and we improved the image quality (Figure 7A).

      Reviewer #2 (Recommendations For The Authors):

      All of my comments are above. However, I would recommend improving the writing as several areas throughout the results needed clarity.

      Thanks! We have revised our manuscript carefully after accepting your revisions.

      Wang, L., Zhao, X., Liu, S., You, W., Huang, Y., Zhou, Y., . . . Shan, T. (2023) Single-nucleus and bulk RNA sequencing reveal cellular and transcriptional mechanisms underlying lipid dynamics in high marbled pork NPJ Sci Food 7: 23. https://doi.org/10.1038/s41538-023-00203-4

      Wang, L., Zhou, Y., Wang, Y., & Shan, T. (2024) Integrative cross-species analysis reveals conserved and unique signatures in fatty skeletal muscles Sci Data 11: 290. https://doi.org/10.1038/s41597-024-03114-5

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Moir, Merheb et al. present an intriguing investigation into the pathogenesis of Pol III variants associated with neurodegeneration. They established an inducible mouse model to overcome developmental lethality, administering 5 doses of tamoxifen to initiate the knock-in of the mutant allele. Subsequent behavioral assessments and histological analyses revealed potential neurological deficits. Robust analyses of the tRNA transcriptome, conducted via northern blotting and RNA sequencing, suggested a selective deleterious effect of the variant on the cerebrum, in contrast to the cerebellum and non-cerebral tissues. Through this work, the authors identified molecular changes caused by Pol III mutations, particularly in the tRNA transcriptome, and demonstrated its relative progression and selectivity in brain tissue. Overall, this study provides valuable insights into the neurological manifestations of certain genetic disorders and sheds light on transcripts/products that are constitutively expressed in various tissues.

      Strengths:

      The authors utilize an innovative mouse model to constitutively knock in the gene, enhancing the study's robustness. Behavioral data collection using a spectrometer reduces experimenter bias and effectively complements the neurological disorder manifestations. Transcriptome analyses are extensive and informative, covering various tissue types and identifying stress response elements and mitochondrial transcriptome patterns. Additionally, metabolic studies involving pancreatic activity and glucose consumption were conducted to eliminate potential glucose dysfunction, strengthening the histological analyses.

      Weaknesses:

      The study could have explored identifying the extent of changes in the tRNA transcriptome among different cell types in the cerebrum. Although the authors attempted to show the temporal progression of tRNA transcriptome changes between P42 and P75 mice, the causal link was not established. A subsequent rescue experiment in the future could address this gap.

      Nonetheless, the claims and conclusions are supported by the presented data.

      We thank Reviewer 1 for their thoughtful review and commentary.  We appreciate the reviewer’s finding that our “claims and conclusions are supported by the presented data.”   

      We note that our findings on the temporal progression of transcriptional changes between P42 and P75 apply to both the Pol II and Pol III transcriptomes. Importantly, in the case of Pol III, only precursor and mature tRNAs are affected at P42 whereas at P75, numerous other Pol III transcripts are also changed.  We therefore attribute the changes in tRNA as being causal in disease initiation since this is the earliest  direct consequence of the Polr3a mutation.

      To expand on the evidence demonstrating the progressive nature of Polr3-related disease in our mouse model, the revised manuscript includes new immunofluorescence data showing no change in microglial cell density in the cerebral cortex or the striatum at an early stage in the disease (Supplementary Fig. S6F, G).  This is in striking contrast to the findings at later times (P75) where the number of microglia increased significantly in the Polr3a mutant and exhibit an activated morphology (Fig. 4G,H).   

      We agree with the reviewer that it will be interesting in the future to assess the impact of the Polr3a mutation in different neural cell types and to explore opportunities for suppressing disease phenotypes. 

      Reviewer #2 (Public Review):

      Summary:

      The study "Molecular basis of neurodegeneration in a mouse model of Polr3 related disease" by Moir et.al. showed that how RNA Pol III mutation affects production, maturation and transport of tRNAs. Furthermore, their study suggested that RNA pol III mutation leads to behavioural deficits that are commonly observed in neurodegeneration. Although, this study used a mouse model to establish theses aspects, the study seems to lack a clear direction and mechanism as to how the altered level of tRNA affects locomotor behaviour. They should have used conditional mouse to delete the gene in specific brain area to test their hypothesis. Otherwise, this study shows a more generalized developmental effect rather than specific function of altered tRNA level. This is very evident from their bulk RNA sequencing study. This study provides some discrete information rather than a coherent story. My enthusiasm for publication of this article in eLife is dampened considering following reasons mentioned in the weakness.

      Reviewer 2’s summary contains two misstatements: 

      Moir et.al. showed that how RNA Pol III mutation affects production, maturation and transport of tRNAs.

      Our experiments document the effect of a neurodegenerative disease-causing mutation in RNA polymerase III on the Pol III transcriptome with a particular focus on the tRNAome (i.e. the mature tRNA population). Experiments on the maturation and transport of tRNA were not performed as there was no indication that these processes might be negatively impacted at the earliest time point (P42). Additional comments about tRNA maturation and export are provided under points 8 and 9 (see below). 

      The study seems to lack a clear direction and mechanism as to how the altered level of tRNA affects locomotor behaviour.

      This comment misstates the purpose of our study while overlooking the important results. As stated in the abstract, our goal was to develop “a postnatal whole-body mouse model expressing pathogenic Polr3a mutations to examine the molecular mechanisms by which reduced Pol III transcription results primarily in central nervous system phenotypes.”

      Accordingly, our work provides the first molecular analysis of RNA polymerase III transcription in an animal model of Polr3-related disease. The novelty and importance of the findings, as stated in the abstract, include the discovery that a global reduction in tRNA levels (and not other Pol III transcripts) at an early stage in the disease precedes the frank induction of integrated stress and innate immune responses, activation of microglia and neuronal loss at later times. These later events readily account for the observed neurobehavioral deficits that collectively include risk assessment, locomotor, exploratory and grooming behaviors. 

      Strengths:

      The study created a mouse model to investigate role of RNA PolIII transcription. Furthermore, the study provided RNA seq analysis of the mutant mice and highlighted expression specific transcripts affected by the RNA PolIII mutation.

      Weaknesses:

      (1) The abstract is not clearly written. It is hard to interpret what is the objective of the study and why they are important to investigate. For example: "The molecular basis of disease pathogenesis is unknown." Which disease? 4H leukodystrophy? All neurodegenerative disease?

      We have modified the abstract to more clearly frame the objective of the study and its importance as reflected in the title “Molecular basis of neurodegeneration in a mouse model of Polr3-related disease”. We hope the reviewer will agree that the fourth sentence of the abstract, unchanged from the initial submission, clearly outlines the objective of the study.  

      (2) How cerebral pathology and exocrine pancreatic atrophy are related? How altered tRNA level connects these two axes?

      It is not known how cerebral pathology and exocrine pancreatic atrophy are related beyond their shared Pol III dysfunction in our mouse model of Polr3-related disease. We anticipate that altered tRNA levels connect these two axes. Indeed, the pancreas and the brain are both known to be highly sensitive to perturbations affecting translation (Costa-Mattioli and Walter, 2020 Science doi: 10.1126/science.aat5314). Changes to the tRNA population in the cerebrum and cerebellum of Polr3a mutant mice were extensively documented in the manuscript (e.g. Figs. 3, 5 and 6).  We also found reduced tRNA levels in the pancreas of the mutant mice but did not report these findings due to the absence of a stable reference transcript in total RNA from the atrophied pancreatic tissue, even at the earliest time point examined (P42). 

      (3) Authors mentioned that previously observed reduction mature tRNA level also recapitulated in their study. Why this study is novel then?

      Our study reports the novel finding that a pathogenic Polr3a mutation causes a global reduction in the steady state levels of mature tRNAs, i.e. the levels of all tRNA decoders were reduced with the vast majority these reaching statistical significance (Fig. 6D and 6F). In the introduction we refer to several studies that examined the effect of pathogenic Polr3 mutations on the levels of Pol III-derived transcripts. We noted that these studies examined only a small number of Pol III transcripts in CRISPR-Cas9 engineered cell lines, patient-derived fibroblasts and patient blood. Thus, no study until now has tested for or reported a global defect in the abundance of mature tRNAs in any model of Polr3-related disease. Moreover, no previous study of _Polr3_related disease has analyzed Pol III transcript levels in the brain or in any other tissue. 

      (4) It is very intuitive that deficit in Pol III transcription would severely affect protein synthesis in all brain areas as well as other organs. Hence, growth defect observed in Polr3a mutant mice is not very specific rather a general phenomenon.

      While we agree with the simple assumption that a “deficit in Pol III transcription likely would affect protein synthesis in all brain areas as well as other organs”, this turned out not to be the case. In fact, a novel finding of our study is that not all Polr3a mutant tissues show a translation stress response despite reduced Pol III transcription and reduced mature tRNA levels. This implies that in some tissues the reduction in tRNA levels caused by the Polr3a mutation is not sufficient to affect protein synthesis, at least to a point where the Integrated Stress Response is induced. The underlying basis for the growth deficit has not been defined in this work. However, we noted in the discussion that a growth defect was previously seen in mice where expression of the Polr3a mutation was restricted to the Olig2 lineage.  In the present postnatal whole-body inducible model, we anticipate that the diminished growth of the mice results from a combination of hormonal and nutritional deficits caused by cerebral and pancreatic dysfunction.

      (5) Authors observed specific myelination defect in cortex and hippocampus but not in cerebellum. This is an interesting observation. It is important to find the link between tRNA removal and myelin depletion in hippocampus or cortex? Why is myelination not affected in cerebellum?

      We agree that the specific myelin defect observed in the cortex and hippocampus, but not the cerebellum, is an interesting observation. Pol III dysfunction in this model and reduced tRNA levels are common to both cerebra and cerebella, yet the pathological consequences differ between these regions.  While we do not know why this is the case, the cells that oligodendrocytes support in these regions are functionally different. We suggest in the discussion that subtle defects in oligodendrocyte function in the cerebellum may be uncovered using more sensitive or specific assays than the ones we have employed to date.  In addition, consistent with our findings in other tissues where Pol III transcription and tRNA levels are reduced but phenotypes are lacking, we suggest that oligodendrocytes in the cerebellum may have a different minimum threshold for Pol III activity than in other regions of the brain. 

      (6) How was the locomotor activity measured? The detailed description is missing. Also, locomotion is primarily cerebellum dependent. There is no change in term of growth rate and myelination in cerebellar neurons. I do not understand why locomotor activity was measured.

      We used a behavioral spectrometer with video tracking and pattern-recognition software to quantify ~20 home cage-like behaviors, including locomotor activity, as part of our phenotypic characterization of the mice. This experimenter-unbiased approach reported several metrics of locomotion, specifically, total Track length (the total distance traveled in the instrument), Center Track length and the time spent running (Run Sum) and standing still (Still Sum) in a longitudinal study (Figs. 2A-C and Supplemental Fig. S3A-C). The Materials and Methods section on mouse behavior has been amended to provide a detailed description of these experiments. 

      locomotion is primarily cerebellum dependen_t_

      While we agree that the cerebellum plays a critical role in balance and locomotion, regions of the cerebrum that are affected in our mice, including the primary motor cortex and the basal ganglia (Fig. 4), also have important roles in locomotor activity and control. 

      (7) The correlation with behavioural changes and RNA seq data is missing. There a number of transcripts are affected and mostly very general factors for cellular metabolism. Most of them are RNA Pol II transcribed. How a Pol III mutation influences RNA Pol II driven transcription? I did not find differential expression of any specific transcripts associated with behavioural changes. What is the motivation for transcriptomics analysis? None of these transcripts are very specific for myelination. It is rather a general cellular metabolism effect that indirectly influences myelination.

      The differentially expressed mRNAs identified in our RNAseq analysis at P75 reflect both direct and secondary consequences of dysfunctional Pol III transcription on Pol II transcription. These effects can be achieved by multiple mechanisms. Induction of the Integrated Stress Response (ISR) due to insufficient tRNA can be considered a direct consequence of diminished Pol III transcription on Pol II transcription. An example of a secondary response is the activation of microglia and the innate immune response (which is known to accompany prolonged activation of the ISR), and the loss of neurons and oligodendrocytes. These changes are documented in Figs. 3 and 4. Importantly, loss of neurons, activated microglia and reduced oligodendrocyte numbers are each readily reconciled with changes in behavior.  

      None of these transcripts are very specific for myelination 

      The RNAseq data at P75 indicates only a modest reduction in oligodendrocyte-specific gene expression (as defined by single-cell RNAseq studies of purified cell populations, Mackenzie et al., 2018 Sci. Rep. doi: 10.1038/s41598-018-27293-5). Despite this, some oligodendrocytespecific transcripts with well-known roles in myelination were down-regulated in the Polr3a mutant (e.g. Plp1, Mog and Mobp). In addition, steroid synthesis pathway transcripts involved in the production of cholesterol, an abundant and essential component of myelin, were also downregulated (Supplementary Fig. S4E).

      (8) What genes identified by transcriptomics analysis regulates maturation of tRNA? Authors should at least perform RNAi study to identify possible factor and analyze their importance in maturation of tRNA.

      Of the many proteins involved in the maturation of tRNA (Phizicky and Hopper, 2023 RNA doi: 10.1261/rna.079620.123), RNAseq analysis at P75 identified only amino-acyl tRNA synthetases as being differentially-expressed (fold change >1.5, p adj. < 0.05, Table S1). These genes are canonical indicators of the ATF4-dependent Integrated Stress Response and their upregulation is widely interpreted as an attempt to restore efficient translation. In addition, our analysis of Pol III transcripts at P75 identified a reduction in the level of RppH1 (Fig. 3C), the RNA component of RNase P, which removes the 5’ leader of precursor tRNAs.  However, at P42, there was no effect on RppH1 abundance, or the expression of amino-acyl tRNA synthetase genes (Fig. 5C and Table S3).  Thus, an RNAi study to identify and analyze a possible factor involved in the maturation of tRNA is neither warranted nor relevant to the current body of work.

      (9) What factors are influencing tRNA transport to cytoplasm? It may be possible that Polr3a mutation affect cytoplasmic transport of tRNA. Authors should study this aspect using an imaging experiment.

      Our analysis of tRNA populations in this study employed total cellular RNA and thus reflect the abundance of mature tRNA from all cellular compartments. We have not assessed whether the reduction in tRNA abundance caused by the Polr3a mutation alters the dynamics of tRNA transport from the nucleus to the cytoplasm. However, we consider it highly unlikely that the Polr3a mutation would have a significant effect on cytoplasmic transport of tRNA. Imaging experiments along these lines are beyond the scope of the current study.

      (10) Does alteration of cytoplasmic level of tRNA affects translation? Author should perform translation assay using bio-orthoganal amino acid (AHA) labelling.

      It is not known whether the reduced tRNA levels affect translation globally in the Polr3a mutant, but we predict that this may not be the case. Since tissues (heart and kidney) and brain regions (cerebrum and cerebellum) that share a decrease in tRNA abundance do not share activation of the Integrated Stress Response (a reporter of aberrant translation), we anticipate that effects on translation may be limited to specific regions or cell populations and to specific mRNAs within these cells. The current study provides the foundation for further work to address these questions.

      Reviewer #1 (Recommendations For The Authors):

      Below are a few comments, mostly regarding typographical errors, presentation, and clarity, that we believe would enhance this manuscript:

      On the heatmaps generated, it would be ideal to place "WT" before "KI," with "WT" on the left. This will maintain consistency with the rest of the manuscript, where "WT" conditions precede "KI" conditions, as observed in the bar graphs and dot plots.

      All heatmaps have been remade with WT on the left and KI on the right to maintain consistency throughout the manuscript. 

      Authors mentioned in several instances (Discussion Pg 19 Line 2, for instance) the analysis of changes in the "Pol II transcriptome." Is this a typographical error?

      The reference to the Pol II transcriptome is not a typographical error (Discussion Pg 19 Line2). Here and elsewhere in the manuscript, we are distinguishing between changes to the Pol III transcriptome and the timing of subsequent changes to the Pol II transcriptome. The text has been edited to clarify this relationship in several places.   

      (1) Introduction, Page 4, last paragraph.

      Analysis of the Pol III transcriptome reveals a common decrease in pre-tRNA and mature tRNA populations and few if any changes among other Pol III transcripts across multiple tissues. Analysis of the Pol II transcriptome reveals activation of the integrated stress response in cerebra but not in other surveyed tissues.

      (2) Results, page 8, 2nd paragraph

      To investigate the molecular changes to Pol III transcript levels caused by the Polr3a mutation and any secondary effects on the Pol II transcriptome, we initially focused on the cerebra of adult mice at P75.

      (3) Discussion, Page 19, second paragraph

      Pol III dysfunction and the reduction in the cerebral tRNA population at P42 coincides with behavioral deficits and precedes substantial downstream alterations in the Pol II transcriptome, which include induction of an innate immune response (IR) and an ISR, and indicators of neurodegeneration (i.e., activation of cell death pathways and loss of mitochondrial DNA). These findings suggest a causal role for the lower tRNA abundance and/or altered tRNA profile in disease progression.

      In supplementary figure 1, authors validated the expression of their systems using flow cytometry and observed a high level of recombination frequency in different tissue types. Can the flow cytometry data distinguish between cell types within the cerebrum (neurons/microglia/astrocytes)?

      The flow cytometry experiments reported in Supplementary Fig. S1 used a dual tdTomato-EGFP reporter to assess recombination. The cerebral and cerebellar samples were gated on fluorescence from endogenous expression of tdTomato (red), EGFP (green) and DAPI (blue) staining. In principle, flow cytometry could be used to distinguish between cell types within the cerebrum (neurons/microglia/astrocytes). However,  this would require (i) an antibody to a cell surface marker on the cell type of interest and (ii) a fluorescent probe conjugated to the primary antibody or a fluorescent secondary antibody that is spectrally well resolved from the emission spectra of tdTomato, eGFP and DAPI.

      Results section 1: Is there any particular reason why P28 was chosen as the commencement of tamoxifen injection?

      P28 was chosen so that any effect of the Polr3a mutation on development and differentiation would be limited in the tissues we examined. 

      Fig 1C: The number of asterisks does not match between the graph and the figure legend.

      Fig. 1C has been corrected to match the number of asterisks in the graph and figure legend.

      Results section 3:

      This section seemed a little brief, especially when compared to the depth of the succeeding sections. Authors can state in greater detail which behaviors were quantified. In S3A-C, my understanding is that the animals were placed in an open-field test. This procedure can be briefly mentioned in the methods, as well as in the main manuscript text.

      In the legends of S3, a bracket is missing for "(D-F)" on line 5. Additionally, the alignment of legends for each bar graph could be consistent for all graphs except under the condition of spatial constraint.

      Detailed methods pertaining to the measurement and calculation of home cage-like behaviors reported by the behavioral spectrometer have been added to the Methods section on Mouse Behavior. 

      In the Results, Figs. S3A-C show anxiety-like behaviors which measure the number and duration of visits and the distance traveled  in a 15 cm2  central area of the arena. Figs. 2A-C show locomotor behaviors including Tracklength, Run sum and Still sum. The open field-like behavior is reported as total Tracklength in the behavioral spectrometer, i.e. the total distance travelled in the arena. This is now more clearly described in  the main manuscript and the Methods section. “overall locomotor activity was decreased in Polr3a-tamKI mice as indicated by the reduced track length at P42, P49, P56 and P63 (Fig. 2A).” 

      The legend of S3, now has the missing bracket "(D-F)" on line 5. 

      The legends within each bar graph are now consistent and aligned as much as spatial constraints allow.

      Results section 4:

      Similar to our earlier questions for S1, is it possible to distinguish samples derived from different cell types (neurons/glia)? In figure 4, this is mainly done post-hoc, based on the known gene expression. Maybe the authors could discuss this small limitation? In Fig S4C, the color contrast for the heatmap legend needs to be corrected.

      It is not possible to accurately distinguish different neural cell sub-types, such as different types of neurons, or different types of oligodendrocytes in bulk RNAseq. Hence, we have reported only high confidence correlations based on known gene expression signatures (Fig. 4). We discuss only the data for which we can draw confident conclusions. The heatmap and legend in Fig. S4C has been amended. 

      Results section 5:

      In figure S5A, the alignment of asterisk significance markers could be adjusted.

      Asterisks have been realigned in Fig. S5A

      Reviewer #2 (Recommendations For The Authors):

      Methods Section should include detailed procedure.

      A detailed description of the methods pertaining to the measurement and calculation of behaviors using the behavioral spectrometer has been added to the Methods section.

      Statistical tests should have detailed information

      Statistical tests are detailed in the Methods section “Statistical Analysis”. Additional details pertaining to calculations of behavioral data have been added to the “Mouse behavior” section of the Methods.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #2 (Public review):

      Weaknesses:

      The authors have clarified that the first features available for each patient have been used. However, they have not shown that these features did not occur before the time of post-stroke epilepsy. Explicit clarification of this should be performed.

      The data utilized in our analysis were collected during the first examination or test conducted after the patients' admission. We specifically excluded any patients with a history of epilepsy, ensuring that all cases of epilepsy identified in our study occurred after admission. Therefore, the features we analyzed were collected after the patients' admission but prior to the onset of post-stroke epilepsy.

      Reviewer #3 (Public review):

      Weaknesses:

      The writing of the article may be significantly improved.

      Although the external validation is appreciated, cross-validation to check robustness of the models would also be welcome.

      Thank you for your helpful advice.  Performing n-fold cross-validation is a crucial step to ensure the reliability and robustness of the reported results, especially when dealing with the datasets which don't have sufficient quantity.   We revised our code and did a 5 fold cross-validation version ,it didn’t have much promote(because our model has reach the auc of 0.99).Considering that we have sufficient quantity of more than 20000 records, we think split the dataset by 7:3 and train the model is enough for us. We have uploaded the code of 5 fold cross-validation version and ploted the 5 fold test roc  on GitHub at https://github.com/conanan/lasso-ml/lasso_ml_cross_validation.ipynb as an external resource. We  trained the 5 fold average model and ploted the 5 fold test roc curves, the results show some improvement, but it is not substantial because the best model are still tree models in the end.

      External validation results may be biased/overoptimistic, since the authors informed that "The external validation cohort focused more on collecting positive cases 80 to examine the model's ability to identify positive samples", which may result in overoptimistic PPV and Sensitivity estimations. The specificity for the external validation set has not been disclosed.

      Thank you for your valuable feedback regarding the external validation results. We appreciate your concerns about potential bias and overoptimism in our estimations of positive predictive value (PPV) and sensitivity.

      To clarify, we have uploaded the code for external validation on GitHub at https://github.com/conanan/lasso-ml. The results indicate that the PPV is 0.95 and the specificity is 0.98.

      While we focused on collecting more positive cases due to their lower occurrence rate, this approach allows us to better evaluate the model's ability to predict positive samples, which is crucial in clinical settings. We believe that emphasizing positive cases enhances the model's utility for practical applications(So a little overoptimism is acceptable ).


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses 1:

      The methodology needs further consideration. The Discussion needs extensive rewriting.

      Thanks for your advice, we have revised the Discussion

      Reviewer #2 (Public Review):

      Weaknesses 2:

      There are many typos and unclear statements throughout the paper.

      There are some issues with SHAP interpretation. SHAP in its default form, does not provide robust statistical guarantees of effect size. There is a claim that "SHAP analysis showed that white blood cell count had the greatest impact among the routine blood test parameters". This is a difficult claim to make.

      Thank you for your suggestion that the SHAP analysis is really just a means of interpreting the model.  In our research, we compared the SHAP analysis with traditional statistical methods, such as regression analysis.  We found the SHAP results to be consistent with the statistical results from the regression for variables like white blood cell count (see Table 1). This alignment leads us to believe the SHAP analysis is providing reliable insights in this context

      The Data Collection section is very poorly written, and the methodology is not clear.

      Thanks for your advice, we have revised the Data Collection section.

      There is no information about hyperparameter selection for models or whether a hyperparameter search was performed. Given this, it is difficult to conclude whether one machine learning model performs better than others on this task.

      Thank you for the advices of performing hyperparameter. We used the package of sklearn, xgboost, lightgbm of python 3.10 to construct the model and  didn’t change the default settings before. It is not proper and may lead to  less certain conclusions. Now we carry out grid search to select and optimize hyperparameters and they make the model better. The best model is still RF.

      The inclusion and exclusion criteria are unclear - how many patients were excluded and for what reasons?

      The procedure of selection is in figure1. Total there are 42079 records from the stroke database, 24733 patients were diagnosed as ischemic stroke or lacular stoke with new onset. Then we excluded hemorrage stroke(4565),history of stroke(2154), TIA(3570), unclear cause stroke(561) and records who missed important data(6496). Then we excluded patients whose seizure might be attributed to other potential causes (brain tumor, intracranial vascular malformation, traumatic brain injury,etc)(865). Then we exclude patient who had a seizure history(152) or died in hospital (1444). Then we excluded patients who were lost in follow-up (had no outpatient records and can’t contact by phone )or died within 3 months of the stroke incident(813). Finally 21459 cases are involved in this research.

      There is no sensitivity analysis of the SMOTE methodology: How many synthetic data points were created, and how does the number of synthetic data points affect classification accuracy?

      Thanks for your remind, we have accept these advice and change the SMOTE to SMOTEENN (Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors) technique to resample an imbalanced dataset for machine learning. The code is

      smoteenn = SMOTEENN(samplingstrategy='auto', randomstate=42)

      the SMOTEENN class comes from the imblearn library. The samplingstrategy='auto' parameter tells the algorithm to automatically determine the appropriate sampling strategy based on the class distribution. The randomstate=42 parameter sets a seed for the random number generator, ensuring reproducibility of the results.

      Did the authors achieve their aims? Do the results support their conclusions?

      Yes, we have achieve some of the aims of predicting PSE while still leave some problem.

      The paper does not clarify the features' temporal origins. If some features were not recorded on admission to the hospital but were recorded after PSE occurred, there would be temporal leakage.

      The data used in our analysis is from the first examination or test conducted after the patients' admission, retrieved from a PostgreSQL database. First, we extracted the initial admission date for patients admitted due to stroke. Then, we identified the nearest subsequent examination data for each of those patients.

      The sql code like follows:

      SELECT TO_DATE(condition_start_date, 'DD-MM-YYYY') AS DATE

      FROM diagnosis

      WHERE person_id ={} and (condition_name like '%梗死%' or condition_name like '%梗塞%') and(condition_name like '%脑%'or condition_name like '%腔隙%'))

      order by DATE limit 1

      The authors claim that their models can predict PSE. To believe this claim, seeing more information on out-of-distribution generalisation performance would be helpful. There is limited reporting on the external validation cohort relative to the reporting on train and test data.

      Thank you for the advice. The external validation is certainly very important, but there have been some difficulties in reaching a perfect solution.  We have tried using open-source databases like the MIMIC database, but the data there does not fit our needs as closely as the records from our own hospital.  The MIMIC database lacks some of the key features we require, and also lacks the detailed patient follow-up information that is crucial for our analysis.   Given these limitations, we have decided to collect newer records from the same hospitals here in Chongqing.  We believe this will allow us to build a more comprehensive dataset to support robust external validation.  While it may not be a perfect solution, gathering this additional data from our local healthcare system is a pragmatic step forward.   Looking ahead, we plan to continue expanding this Chongqing-based dataset and report on the results of the greater external validation in the future.  We are committed to overcoming the challenges around data availability to strengthen the validity and generalizability of our research findings.

      For greater certainty on all reported results, it would be most appropriate to perform n-fold cross-validation, and report mean scores and confidence intervals across the cross-validation splits

      Thank you for your helpful advice. Performing n-fold cross-validation is a crucial step to ensure the reliability and robustness of the reported results, especially when dealing with the datasets which don't have sufficient quantity. While we have sufficient quantity of more than 20000 records, so we think split the dataset by 7:3 and train the model is enough for us. We revised our code and did a 5 fold cross-validation version ,it had little promote(because our model has reach the auc of 0.99), we may use this great technique in our next study if there is not enough cases.

      Additional context that might help readers

      The authors show force plots and decision plots from SHAP values. These plots are non-trivial to interpret, and the authors should include an explanation of how to interpret them.

      Thank you for your helpful advice. It is a great improve for our draft, we have added the explanation that we use the force plot of the first person to show the influence of different features of the first person, we can see that long APTT time contribute best to PSE, then the AST level and others, the NIHSS score may be low and contribute opposite to the final result. Then the decision plot is a collection of model decisions that show how complex models arrive at their predictions

      Reviewer #3 (Public Review):

      Weaknesses3:

      There are issues with the readability of the paper. Many abbreviations are not introduced properly and sometimes are written inconsistently. A lot of relevant references are omitted. The methodological descriptions are extremely brief and, sometimes, incomplete.

      Thanks for your advice, we have revised these flaws.

      The dataset is not disclosed, and neither is the code (although the code is made available upon request). For the sake of reproducibility, unless any bioethical concerns impede it, it would be good to have these data disclosed.

      Thank you for your recommendations. We have made the code available on GitHub at https://github.com/conanan/lasso-ml. While the data is private and belongs to the hospital. Access can be requested by contacting the corresponding author to apply from the hospitals and specifying the purpose of inquiry.

      Although the external validation is appreciated, cross-validation to check the robustness of the models would also be welcome.

      Thank you for your valuable advice. Performing n-fold cross-validation is crucial for ensuring the reliability and robustness of results, especially with limited datasets. However, since we have over 20,000 records, we believe that a 70:30 split for training and testing is sufficient.

      We revised our code and implemented 5-fold cross-validation, which provided minimal improvement, as our model has already achieved an AUC of 0.99. We plan to use this technique in future studies if we encounter fewer cases.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      My comments include two parts:

      (1) Methodology<br /> a-This study was based on multiple clinical indicators to construct a model for predicting the occurrence of PSE. It involved various multi-class indicators such as the affected cortical regions, locations of vascular occlusion, NIHSS scores, etc. Only using the SHAP index to explain the impact of multi-class variables on the dependent variable seems slightly insufficient. It might be worth considering the use of dummy variables to improve the model's accuracy.

      Thank you for the detailed feedback on the study methodology. The SHAP analysis is really just a means of interpreting the model, which we compared with the combination of SHAP and traditional statistics, so we think SHAP analysis is reliable in this research. We have used the dummy variables, expecially when dealing with the affected cortical regions, locations of vascular occlusion, for example if frontal region is involved the variable is 1. But they have less impact in the machine learning model

      b-The study used Lasso regression to select 20 features to build the model. How was the optimal number of 20 features determined?

      Lasso regression is a commonly used feature screening method. Since we extract information from the database and try to include as many features as possible, the cross-verification curve of lasso regression includes 78 features best, but it will lead to too complex model. We select 10,15,20,25,30 features for modeling according to the experiment. When 20 features are found, the model parameters are good and relatively concise. Improve the number of features contribute little to the model effect, decrease the number of features influence the concise of model ,for example the auc of the model with 15 features will drop under 0.95. So we finally select 20 features.

      c-The study indicated that the incidence rate of PSE in the enrolled patients is 4.3%, showing a highly imbalanced dataset. If singly using the SMOTE method for oversampling, could this lead to overfitting?

      Thanks for your remind, singly using the SMOTE method for oversampling is inproper. Now we have find this improvement and change the SMOTE to SMOTEENN (Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors) technique to resample an imbalanced dataset for machine learning. First, oversampling with SMOTE and then undersampling with ENN to remove possible noise and duplicate samples. The code is

      smoteenn = SMOTEENN(sampling_strategy='auto', random_state=42)

      the SMOTEENN class comes from the imblearn library. The sampling_strategy='auto' parameter tells the algorithm to automatically determine the appropriate sampling strategy based on the class distribution. The random_state=42 parameter sets a seed for the random number generator, ensuring reproducibility of the results.

      (2) Clinical aspects:

      Line 8, history of ischemic stroke, this is misexpression, could be: diagnosis of ischemic stroke.

      Line 8, several hospitals, should be more exact; how many?

      Line 74 indicates that the data are from a single centre, this should be clarified.

      Line 4 data collection: The criteria read unclear; please clarify further.

      Thanks for your remind, we have revised the draft and correct these errors.

      Line 110, lab parameters: Why is there no blood glucose?

      Because many patients' blood sugar fluctuates greatly and is easily affected by drugs or diet, we finally consider HBA1c as a reference index by asking experts which is more stable.

      Line 295, The author indicated that data lost; this should be clarified in the results part, and further, the treatment of missing data should be clarified in the method part.

      Thanks for your remind, we have revised the draft and correct these errors.

      I hope to see a table of the cohort's baseline characters. The discussion needs extensive rewriting; the author seems to be swinging from the stoke outcome and the seizure, sometimes losing the target.

      Figure1 is the procedure of the selection of patients. Table1 contains the cohort's baseline characters

      For the swinging from the stoke outcome and the seizure, that is because there are few articles on predicting epilepsy directly by relevant indicators, while there are more articles on prognosis. So we can only take epilepsy as an important factor in prognosis and comprehensively discuss it, or we can't find enough articles and discuss them

      Reviewer #2 (Recommendations For The Authors):

      There are typos and examples of text that are not clear, including:

      "About the nihss score, the higher the nihss score, the more likely to be PSE, nihss score has a third effect just below white blood cell count and D-dimer."

      "and only 8 people made incorrect predictions, demonstratijmng a good predictive ability of the model."

      "female were prone to PSE"

      " Waafi's research"

      "One-heat' (should be one-hot)

      Thanks for your remind, we have revised the draft and correct these errors.

      The Data Collection section is poorly written, and the methodology is not clear. It would be much more appropriate to include a table of all features used and an explanation of what these features involve. It would also be useful to see the mean values of these features to assess whether the feature values are reasonable for the dataset.

      Thanks for your remind. All data are from the first examination or test after admission, presented through the postgresql database . First we extract the first date of the patients who was admitted by stroke ,then we extract informations from the nearest examination from the admission. We extract by the SQL code by computer instead of others who may extract data by manual so we get as much data as possible other than only get the features which was reported before .The table of all features used and their mean±std is in table1.

      The paper does not clarify the features' temporal origins. If some features were not recorded on admission to the hospital but were recorded after PSE occurred, there would be temporal leakage. I would need this clarified before believing the authors achieved their claims of building a predictive model.

      All relevant index results were from the first examination after admission, and the mean standard deviation was listed in the statistical analysis section in table1.

      The authors claim that their models can predict PSE. To believe this claim, seeing more information on out-of-distribution generalisation performance would be helpful. There is limited reporting on the external validation cohort relative to the reporting on train and test data.

      Thank you for the advice, the external validation is very important but there are some difficulties to reach a perfect one. We have tried some of the open source database like the mimic database ,but these data don't fit our request because they don't have as much features as our hospital and lack of follow-up of the relevant patients. In the end we collected the newer records in the same hospitals in Chongqing and we will collect more and report a greater external validation in the future.

      For greater certainty on all reported results, It would be most appropriate to perform n-fold cross-validation, and report mean scores and confidence intervals across the cross-validation splits.

      Thank you for your helpful advice. Performing n-fold cross-validation is a crucial step to ensure the reliability and robustness of the reported results, especially when dealing with the datasets which don't have sufficient quantity. While we have sufficient quantity of more than 20000 records, so we think split the dataset by 7:3 and train the model is enough for us. We revised our code and did a 5 fold cross-validation version ,it had little promote, we will use this great technique in our next study.

      The authors show force plots and decision plots from SHAP values. These plots are non-trivial to interpret, and the authors should include an explanation of how to interpret them.

      It is a great improve for our draft, we have added the explanation we use the force plot of the first person to show the influence of different features of the first person, we can see that long APTT time contribute best to PSE, then the AST level and others, the NIHSS score may be low and contribute lower to the final result. Then the decision plot is a collection of model decisions that show how complex models arrive at their predictions

      Reviewer #3 (Recommendations For The Authors):

      Abbreviations should not be defined in the abstract )or only in the abstract).

      Please explicit what are the purposes of the study you are referring to in "Currently, most studies utilize clinical data to establish statistical models, survival analysis and cox regression."

      Authors affirm: "there is still a relative scarcity of research 49 on PSE prediction, with most studies focusing on the analysis of specific or certain risk factors ." This statement is especially curious since the current study uses risk factors as predictors.

      It is not clear to me what the authors mean by "No study has proposed or established a more comprehensive and scientifically accurate prediction model." The authors do not summarize the statistical parameters of previously reported model, or other relevant data to assess coverage or validity (maybe including a Table summarizing such information would be appropriate. In any case, I would try to omit statements that imply, to some extent, discrediting previous studies without sufficient foundation.

      "antiepileptic drugs" is an outdated name. Please use "antiseizure medications"

      Thanks for your remind, we have revised the draft and correct these errors.

      The authors say regarding missing data that they "filled the data of the remaining indicators with missing values of more than 1000 cases by random forest algorithm". Please clarify what you mean by "of more than 1000 cases." Also, provide details on the RF model used to fill in missing data.

      Thanks for your remind. "of more than 1000 cases" was a wrong sentence and we have corrected it. Here is the procedure, first we counted the values of all laboratory indicators for the first time after stroke admission( everyone who was admitted because of stroke would perform blood routine , liver and kidney function and so on), excluded indicators with missing values of more than 10%, and filled the data of the remaining indicators with missing values by random forest algorithm using the default parameter. First, we go through all the features, starting with the one with the least missing (since the least accurate information is needed to fill in the feature with the least missing). When filling in a feature, replace the missing value of the other feature with 0. Each time a regression prediction is completed, the predicted value is placed in the original feature matrix and the next feature is filled in. After going through all the features, the data filling is complete.

      Please specify what do you mean by negative group and positive group, Avoid tacit assumptions.

      Thanks for your remind, we have revised the draft and correct these errors.

      Please provide more details (and references) on the smote oversampling method. Indicate any relevant parameters/hyperparameters.

      Thanks for your remind, we have accept these advice and change the SMOTE to SMOTEENN (Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors) technique to resample an imbalanced dataset for machine learning. The code is

      smoteenn = SMOTEENN(sampling_strategy='auto', random_state=42)

      the SMOTEENN class comes from the imblearn library. The sampling_strategy='auto' parameter tells the algorithm to automatically determine the appropriate sampling strategy based on the class distribution. The random_state=42 parameter sets a seed for the random number generator, ensuring reproducibility of the results.

      The methodology is presented in an extremely succinct and non-organic manner (e.g., (Model building) Select the 20 features with the largest absolute value of LASSO." Please try to improve the narrative.

      Lasso regression is a commonly used feature screening method. Since we extract information from the database and try to include as many features as possible, the cross-verification curve of lasso regression includes 78 features best, but it will lead to too complex model. We select 10,15,20,25,30 features for modeling according to the experiment. When 20 features are found, the model parameters are good and relatively concise. Improve the number of features contribute little to the model effect, decrease the number of features influence the concise of model ,for example the auc of the model with 15 features will drop under 0.95. So we finally select 20 features.

      Many passages of the text need references. For example, those that refer to Levene test, Welch's t-test, Brier score, Youden index, and many others (e.g., NIHSS score). Please revise carefully.

      Thanks for your remind, we have revised the draft and correct these errors.

      "Statistical details of the clinical characteristics of the patients are provided in the table." Which table? Number?

      Thanks for your remind, we have revised the draft and correct these errors, it is in table1.

      Many abbreviations are not properly presented and defined in the text, e.g., wbc count, hba1c, crp, tg, ast, alt, bilirubin, bua, aptt, tt, d_dimer, ck. Whereas I can guess the meaning, do not assume everyone will. Avoid assumptions.

      ROC is sometimes written "ROC" and others, "roc." The same happens for PPV/ppv, and many other words (SMOTE; NIHSS score, etc.).

      Please rephrase "ppv value of random forest is the highest, reaching 0.977, which is more accurate for the identification of positive patients(the most important function of our models).". PPV always refer to positive predictions that are corroborated, so the sentences seem redundant.

      Thanks for your remind, we have revised the draft and correct these errors.

      What do you mean by "Complex algorithms". Please try to be as explicit as possible. The text looks rather cryptic or vague in many passages.

      Thanks for your remind, "Complex algorithms" is corrected by machine learning.

      The text needs a thorough English language-focused revision, since the sense of some sentences is really misleading. For instance "only 8 people made incorrect predictions,". I guess the authors try to say that the best algorithm only mispredicted 8 cases since no people are making predictions here. Also, regarding that quote... Are the authors still speaking of the results of the random forest model, which was said to be one of the best performances?

      Thanks for your remind, we have revised the draft and correct these errors.

      The authors say that they used, as predictors "comprehensive clinical data, imaging data, laboratory test data, and other data from stroke patients". However, the total pool of predictors is not clear to me at this point. Please make it explicit and avoid abbreviations.

      Thanks for your remind, we have revised the draft and correct these errors.

      Although the authors say that their code is available upon request, I think it would be better to have it published in an appropriate repository.

      Thanks for your remind, we showed our code at  https://github.com/conanan/lasso-ml.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors investigated how the presence of interspecific introgressions in the genome affects the recombination landscape. This research was intended to inform about genetic phenomena influencing the evolution of introgressed regions, although it should be noted that the research itself is based on examining only one generation, which limits the possibility of drawing far-reaching evolutionary conclusions. In this work, yeast hybrids with large (from several to several dozen percent of the chromosome length) introgressions from another yeast species were crossed. Then, the products of meiosis were isolated and sequenced, and on this basis, the genome-wide distribution of both crossovers (COs) and noncrossovers (NCOs) was examined. Carrying out the analysis at different levels of resolution, it was found that in the regions of introduction, there is a very significant reduction in the frequency of COs and a simultaneous increase in the frequency of NCOs. Moreover, it was confirmed that introgressions significantly limit the local shuffling of genetic information, and NCOs are only able to slightly contribute to the shuffling, thus they do not compensate for the loss of CO recombination.

      Strengths:

      - Previously, experiments examining the impact of SNP polymorphism on meiotic recombination were conducted either on the scale of single hotspots or the entire hybrid genome, but the impact of large introgressed regions from another species was not examined. Therefore, the strength of this work is its interesting research setup, which allows for providing data from a different perspective.

      - Good quality genome-wide data on the distribution of CO and NCO were obtained, which could be related to local changes in the level of polymorphism.

      Weaknesses:

      (1)  The research is based on examining only one generation, which limits the possibility of drawing far-reaching evolutionary conclusions. Moreover, meiosis is stimulated in hybrids in which introgressions occur in a heterozygous state, which is a very unlikely situation in nature. Therefore, I see the main value of the work in providing information on the CO/NCO decision in regions with high sequence diversification, but not in the context of evolution.

      While we are indeed only examining recombination in a single generation, we respectfully disagree that our results aren't relevant to evolutionary processes. The broad goals of our study are to compare recombination landscapes between closely related strains, and we highlight dramatic differences between recombination landscapes. These results add to a body of literature that seeks to understand the existence of variation in traits like recombination rate, and how recombination rate can evolve between populations and species. We show here that the presence of introgression can contribute to changes in recombination rate measured in different individuals or populations, which has not been previously appreciated. We furthermore show that introgression can reduce shuffling between alleles on a chromosome, which is recognized as one of the most important determinants for the existence and persistence of sexual reproduction across all organisms. As we describe in our introduction and conclusion, we see our experimental exploration of the impacts of introgression on the recombination landscape as complementary to studies inferring recombination and introgression from population sequencing data and simulations. There are benefits and challenges to each approach, but both can help us better understand these processes. In regards to the utility of exploring heterozygous introgression, we point out that introgression is often found in a heterozygous state (including in modern humans with Neanderthal and/or Denisovan ancestry). Introgression will always be heterozygous immediately after hybridization, and depending on the frequency of gene flow into the population, the level of inbreeding, selection against introgression, etc., introgression will typically be found as heterozygous.

      - The work requires greater care in preparing informative figures and, more importantly, re-analysis of some of the data (see comments below).

      More specific comments:

      (1) The authors themselves admit that the detection of NCO, due to the short size of conversion tracts, depends on the density of SNPs in a given region. Consequently, more NCOs will be detected in introgressed regions with a high density of polymorphisms compared to the rest of the genome. To investigate what impact this has on the analysis, the authors should demonstrate that the efficiency of detecting NCOs in introgressed regions is not significantly higher than the efficiency of detecting NCOs in the rest of the genome. If it turns out that this impact is significant, analyses should be presented proving that it does not entirely explain the increase in the frequency of NCOs in introgressed regions.

      We conducted a deeper exploration of the effect of marker resolution on NCO detection by randomly removing different proportions of markers from introgressed regions of the fermentation cross in order to simulate different marker resolutions from non-introgressed regions. We chose proportions of markers that would simulate different quantiles of the resolution of non-introgressed regions and repeated our standard pipeline in order to compare our NCO detection at the chosen marker densities. More details of this analysis have been added to the manuscript (lines 188-199, 525-538). We confirmed the effect of marker resolution on NCO detection (as reported in the updated manuscript and new supplementary figures S2-S10, new Table S10) and decided to repeat our analyses on the original data with a more stringent correction. For this we chose our observed average tract size for NCOs in introgressed regions (550bp), which leads to a far more conservative estimate of NCO counts (As seen in the updated Figure 2 and Table 2). This better accounts for the increased resolution in introgressed regions, and while it's possible to be more stringent with our corrections, we believe that further stringency would be unreasonable. We also see promising signs that the correction is sufficient when counting our CO and NCO events in both crosses, as described in our response to comment 39 (response to reviewer #3).

      (2) CO and NCO analyses performed separately for individual regions rarely show statistical significance (Figures 3 and 4). I think that the authors, after dividing the introgressed regions into non-overlapping windows of 100 bp (I suggest also trying 200 bp, 500 bp, and 1kb windows), should combine the data for all regions and perform correlations to SNP density in each window for the whole set of data. Such an analysis has a greater chance of demonstrating statistically significant relationships. This could replace the analysis presented in Figure 3 (which can be moved to Supplement). Moreover, the analysis should also take into account indels.

      We're uncertain of what is being requested here. If the comment refers to the effect of marker density on NCO detection, we hope the response to comment 2 will help resolve this comment as well. Otherwise, we ask for some clarification so that we may correct or revise as appropriate.

      (3) In Arabidopsis, it has been shown that crossover is stimulated in heterozygous regions that are adjacent to homozygous regions on the same chromosome (http://dx.doi.org/10.7554/eLife.03708.001, https://doi.org/10.1038/s41467-022-35722-3).

      This effect applies only to class I crossovers, and is reversed for class II crossovers (https://doi.org/10.15252/embj.2020104858, https://doi.org/10.1038/s41467-023-42511-z). This research system is very similar to the system used by the authors, although it likely differs in the level of DNA sequence divergence. The authors could discuss their work in this context.

      We thank the reviewer for sharing these references. We have added a discussion of our work in the context of these findings in the Discussion, lines 367-376.

      Reviewer #2 (Public Review):

      Summary:

      Schwartzkopf et al characterized the meiotic recombination impact of highly heterozygous introgressed regions within the budding yeast Saccharomyces uvarum, a close relative of the canonical model Saccharomyces cerevisiae. To do so, they took advantage of the naturally occurring Saccharomyces bayanus introgressions specifically within fermentation isolates of S. uvarum and compared their behavior to the syntenic regions of a cross between natural isolates that do not contain such introgressions. Analysis of crossover (CO) and noncrossover (NCO) recombination events shows both a depletion in CO frequency within highly heterozygous introgressed regions and an increase in NCO frequency. These results strongly support the hypothesis that DNA sequence polymorphism inhibits CO formation, and has no or much weaker effects on NCO formation. Eventually, the authors show that the presence of introgressions negatively impacts "r", the parameter that reflects the probability that a randomly chosen pair of loci shuffles their alleles in a gamete.

      The authors chose a sound experimental setup that allowed them to directly compare recombination properties of orthologous syntenic regions in an otherwise intra-specific genetic background. The way the analyses have been performed looks right, although this reviewer is unable to judge the relevance of the statistical tests used. Eventually, most of their results which are elegant and of interest to the community are present in Figure 2.

      Strengths:

      Analysis of crossover (CO) and noncrossover (NCO) recombination events is compelling in showing both a depletion in CO frequency within highly heterozygous introgressed regions and an increase in NCO frequency.

      Weaknesses:

      The main weaknesses refer to a few text issues and a lack of discussion about the mechanistic implications of the present findings.

      - Introduction

      (1) The introduction is rather long. | I suggest specifically referring to "meiotic" recombination (line 71) and to "meiotic" DSBs (line 73) since recombination can occur outside of meiosis (ie somatic cells).

      We agree and have condensed the introduction to be more focused. We also made the suggested edits to include “meiotic” when referring to recombination and DSBs.

      (2) From lines 79 to 87: the description of recombination is unnecessarily complex and confusing. I suggest the authors simply remind that DSB repair through homologous recombination is inherently associated with a gene conversion tract (primarily as a result of the repair of heteroduplex DNA by the mismatch repair (MMR) machinery) that can be associated or not to a crossover. The former recombination product is a crossover (CO), the latter product is a noncrossover (NCO) or gene conversion. Limited markers may prevent the detection of gene conversions, which erase NCO but do not affect CO detection.

      We changed the language in this section to reflect the reviewer’s suggestions.

      (3) In addition, "resolution" in the recombination field refers to the processing of a double Holliday junction containing intermediates by structure-specific nucleases. To avoid any confusion, I suggest avoiding using "resolution" and simply sticking with "DSB repair" all along the text.

      We made the suggested correction throughout the paper.

      (4) Note that there are several studies about S. cerevisiae meiotic recombination landscapes using different hybrids that show different CO counts. In the introduction, the authors refer to Mancera et al 2008, a reference paper in the field. In this paper, the hybrid used showed ca. 90 CO per meiosis, while their reference to Liu et al 2018 in Figure 2 shows less than 80 COs per meiosis for S. cerevisiae. This shows that it is not easy to come up with a definitive CO count per meiosis in a given species. This needs to be taken into account for the result section line 315-321.

      This is an excellent point. We added this context in the results (lines 180-187).

      (5) In line 104, the authors refer to S. paradoxus and mention that its recombination rate is significantly different from that of S. cerevisiae. This is inaccurate since this paper claims that the CO landscape is even more conserved than the DSB landscape between these two species, and they even identify a strong role played by the subtelomeric regions. So, the discussion about this paper cannot stand as it is.

      We agree with the reviewer's point. We also found that the entire paragraph was unnecessary, so it and the sentence in question have been removed.

      (6) Line 150, when the authors refer to the anti-recombinogenic activity of the MMR, I suggest referring to the published work from Martini et al 2011 rather than the not-yet-published work from Copper et al 2021, or both, if needed.

      Added the suggested citation.

      Results

      (7) The clear depletion in CO and the concomitant increase in NCO within the introgressed regions strongly suggest that DNA sequence polymorphism triggers CO inhibition but does not affect NCO or to a much lower extent. Because most CO likely arises from the ZMM pathway (CO interference pathway mainly relying on Zip1, 2, 3, 4, Spo16, Msh4, 5, and Mer3) in S. uvarum as in S. cerevisiae, and because the effect of sequence polymorphism is likely mediated by the MMR machinery, this would imply that MMR specifically inhibits the ZMM pathway at some point in S. uvarum. The weak effect or potential absence of the effect of sequence polymorphism on NCO formation suggests that heteroduplex DNA tracts, at least the way they form during NCO formation, escape the anti-recombinogenic effect of MMR in S. uvarum. A few comments about this could be added.

      We have added discussion and citations regarding the biased repair of DSB to NCO in introgression, lines 380-386.

      (8) The same applies to the fact that the CO number is lower in the natural cross compared to the fermentation cross, while the NCO number is the same. This suggests that under similar initiating Spo11-DSB numbers in both crosses, the decrease in CO is likely compensated by a similar increase in inter-sister recombination.

      Thank you to the reviewer for this observation. We agree that this could explain some differences between the crosses.

      (9) Introgressions represent only 10% of the genome, while the decrease in CO is at least 20%. This is a bit surprising especially in light of CO regulation mechanisms such as CO homeostasis that tends to keep CO constant. Could the authors comment on that?

      We interpret these results to reflect two underlying mechanisms. First, the presence of heterozygous introgression does reduce the number of COs. Second, we believe the difference in COs reflects variation in recombination rate between strains. We note that CO homeostasis need not apply across different genetic backgrounds. Indeed, recombination rate is appreciated to significantly differ between strains of S. cerevisiae (Raffoux et al. 2018), and recombination rate variation has been observed between strains/lines/populations in many different species including Drosophila, mice, humans, Arabidopsis, maize, etc. We reference S. cerevisiae strain variability in the Introduction lines 128-130, and have added context in the Results lines 180-187, and Discussion lines 343-350.

      (10) Finally, the frequency of NCOs in introgressed regions is about twice the frequency of CO in non-introgressed regions. Both CO and NCO result from Spo11-initiating DSBs.

      This suggests that more Spo11-DSBs are formed within introgressed regions and that such DSBs specifically give rise to NCO. Could this be related to the lack of homolog engagement which in turn shuts down Spo11-DSB formation as observed in ZMM mutants by the Keeney lab? Could this simply result from better detection of NCO in introgressed regions related to the increased marker density, although the authors claim that NCO counts are corrected for marker resolution?

      The effect noted by the reviewer remains despite the more conservative correction for marker density applied to NCO counts (as described in the response to Reviewer 1, comment #2). Given that CO+NCO counts in introgressed regions are not statistically different between crosses, it is likely that these regions are simply predisposed to a higher rate of DSBs than the rest of the genome. This is an interesting observation, however, and one that we would like to further explore in future work.

      (11) What could be the explanation for chromosome 12 to have more shuffling in the natural cross compared to the fermentation cross which is deprived of the introgressed region?

      We added this text to the Results, lines 323-327, "While it is unclear what potential mechanism is mediating the difference in shuffling on chromosome 12, we note that the rDNA locus on chromosome 12 is known to differ dramatically in repeat content across strains of S. cerevisiae (22–227 copies) (Sharma et a. 2022), and we speculate that differences in rDNA copy number between strains in our crosses could impact shuffling."

      Technical points:

      (12) In line 248, the authors removed NCO with fewer than three associated markers.

      What is the rationale for this? Is the genotyping strategy not reliable enough to consider events with only one or two markers? NCO events can be rather small and even escape detection due to low local marker density.

      We trust the genotyping strategy we used, but chose to be conservative in our detection of NCOs to account for potential sequencing biases.

      (13) Line 270: The way homology is calculated looks odd to this reviewer, especially the meaning of 0.5 homology. A site is either identical (1 homology) or not (0 homology).

      We've changed the language to better reflect what we are calculating (diploid sequence similarity; see comment #28). Essentially, the metric is a probability that two randomly selected chromatids--one from each parent--will share the same nucleotide at a given locus (akin to calculating the probability of homozygous offspring at a single locus). We average it along a segment of the genome to establish an expected sequence similarity if/when recombination occurs in that segment.

      (14) Line 365: beware that the estimates are for mitotic mismatch repair (MMR). Meiotic MMR may work differently.

      We removed the citation that refers exclusively to mitotic recombination. The statement regarding meiotic recombination is otherwise still reflective of results from Chen & Jinks-Robertson

      (15) Figure 1: there is no mention of potential 4:0 segregations. Did the authors find no such pattern? If not, how did they consider them?

      The program we used to call COs and NCOs (ReCombine's CrossOver program) can detect such patterns, but none were detected in our data.

      Reviewer #3 (Public Review):

      When members of two related but diverged species mate, the resulting hybrids can produce offspring where parts of one species' genome replace those of the other. These "introgressions" often create regions with a much greater density of sequence differences than are normally found between members of the same species. Previous studies have shown that increased sequence differences, when heterozygous, can reduce recombination during meiosis specifically in the region of increased difference. However, most of these studies have focused on crossover recombination, and have not measured noncrossovers. The current study uses a pair of Saccharomyces uvarum crosses: one between two natural isolates that, while exhibiting some divergence, do not contain introgressions; the other is between two fermentation strains that, when combined, are heterozygous for 9 large regions of introgression that have much greater divergence than the rest of the genome. The authors wished to determine if introgressions differently affected crossovers and noncrossovers, and, if so, what impact that would have on the gene shuffling that occurs during meiosis.

      (1) While both crossovers and noncrossovers were measured, assessing the true impact of increased heterology (inherent in heterozygous introgressions) is complicated by the fact that the increased marker density in heterozygous introgressions also increases the ability to detect noncrossovers. The authors used a relatively simple correction aimed at compensating for this difference, and based on that correction, conclude that, while as expected crossovers are decreased by increased sequence heterology, counter to expectations noncrossovers are substantially increased. They then show that, despite this, genetic shuffling overall is substantially reduced in regions of heterozygous introgression. However, it is likely that the correction used to compensate for the effect of increased sequence density is defective, and has not fully compensated for the ascertainment bias due to greater marker density. The simplest indication of this potential artifact is that, when crossover frequencies and "corrected" noncrossover frequencies are taken together, regions of introgression often appear to have greater levels of total recombination than flanking regions with much lower levels of heterology. This concern seriously undercuts virtually all of the novel conclusions of the study. Until this methodological concern is addressed, the work will not be a useful contribution to the field.

      We appreciate this concern. Please see response to comments #2 and #38. We further note that our results depicted in Figure 3 and 4 are not reliant on any correction or comparison with non-introgressed regions, and thus our results regarding sequence similarity and its effect on the repair of DSBs and the amount of genetic shuffling with/without introgression to be novel and important observations for the field.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Line 149 - this sentence refers to a mixture of papers reporting somatic or meiotic recombination and as these processes are based on different crossover pathways, this should not be mixed. For example, it is known that in Arabidopsis MSH2 has a pro-crossover function during meiotic recombination.

      Corrected

      (2) What is unclear to me is how the crosses are planned. Line 308 shows that there were only two crosses (one "natural" and one "fermentation"), but I understand that this is a shorthand and in fact several (four?) different strains were used for the "fermentation cross". At least that's what I concluded from Fig. 1B and its figure caption. This needs to be further explained. Were different strains used for each fermentation cross, or was one strain repeated in several crosses? In Figure 1, it would be worth showing, next to the panel showing "fermentation cross", a diagram of how "natural cross" was performed, because as I understand it, panel A illustrates the procedure common to both types of crosses, and not for "natural cross".

      We thank the reviewer for drawing our attention to confusion about how our crosses were created. We performed two crosses, as depicted in Figure 1A. The fermentation cross is a single cross from two strains isolated from fermentation environments. The natural cross is a single cross from two strains isolated from a tree and insect. Table S1 and the methods section "Strain and library construction" describe the strains used in more detail. We modified Figure 1 and the figure legend to help clarify this. See also response to comment #37.

      (3) The authors should provide a more detailed characterization of the genetic differences between chromosomes in their hybrids. What is the level of polymorphism along the S. uvarum chromosomes used in the experiments? Is this polymorphism evenly distributed? What are the differences in the level of polymorphism for individual introgressions? Theoretically, this data should be visible in Figure 2D, but this figure is practically illegible in the present form (see next comment).

      As suggested, we remade Figure 2D to only include chromosomes with an introgression present, and moved the remaining chromosomes to the supplements (Figure S11). The patterns of markers (which are fixed differences between the strains in the focal cross) should be more clear now. As we detail in the Methods line 507-508, we utilized a total of 24,574 markers for the natural cross and 74,619 markers for the fermentation cross (the higher number in the fermentation cross being due to more fixed differences in regions of introgression).

      (4) Figure 2D should be prepared more clearly, I would suggest stretching the chromosomes, otherwise, it is difficult to see what is happening in the introgression regions for CO and NCO (data for SNPs are more readable). Maybe leave only the chromosomes with introgressions and transfer the rest to the supplement?

      See previous comment.

      (5) How are the Y scales defined for Figure 2D?

      Figure 2D now includes units for the y-axis.

      (6) Are increases in CO levels in fermentation cross-observed at the border with introgressions? This would indicate local compensation for recombination loss in the introgressed regions, similar to that often observed for chromosomal inversions.

      We see no evidence of an increase in CO levels at the borders of introgressions, neither through visual inspection or by comparing the average CO rate in all fermentation windows to that of windows at the edges of introgressions. This is included in the Discussion lines 360-366, "While we are limited in our interpretations by only comparing two crosses (one cross with heterozygous introgression and one without introgression), these results are in line with findings in inversions, where heterozygotes show sharp decreases in COs, but the presence of NCOs in the inverted region (Crown et al., 2018; Korunes & Noor, 2019). However, unlike heterozygous inversions where an increase in COs is observed on freely recombining chromosomes (the inter-chromosomal effect), we do not see an increase in COs on the borders flanking introgression or on chromosomes without introgression."

      (7) Line 336 - "We find positive correlations between CO counts..." - you should indicate here that between fermentation and natural crosses, it was quite hard for me to understand what you calculated.

      We corrected the language as suggested.

      (8) The term "homology" usually means "having a common evolutionary origin" and does not specify the level of similarity between sequences, thus it cannot be measured. It is used incorrectly throughout the manuscript (also in the intro). I would use the term "similarity" to indicate the degree of similarity between two sequences.

      We corrected the language as suggested throughout the document.

      (9) Paragraph 360 and Figure 3 - was the "sliding window" overlapping or non-overlapping?

      We added clarifying language to the text in both places. We use a 101bp sliding window with 50bp overlaps.

      (10) Line 369 - what is "...the proportion of bases that are expected to match between the two parent strains..."?

      We clarified the language in this location, and hopefully changes associated with the comment about sequence similarity will make the comment even clearer in context.

      (11) Line 378 - should it refer to Figure S1 and not Figure 4?

      Corrected.

      (12) Line 399 - should refer to Figure 4, not Figure 5.

      Corrected

      (13) Line 444-449 - the analysis of loss of shuffling in the context of the location of introgression on the chromosome should be presented in the result section.

      We shifted the core of the analysis to the results, while leaving a brief summary in the discussion.

      (14) The authors should also take into account the presence of indels in their analyses, and they should be marked in the figures, if possible.

      We filtered out indels in our variant calling. However, we did analyze our crosses for the presence of large insertions and deletions (Table S2), which can obscure true recombination rates, and found that they were not an issue in our dataset.

      Reviewer #2 (Recommendations For The Authors):

      This reviewer suggests that the authors address the different points raised in the public review.

      (1) This reviewer would like to challenge the relevance of the r-parameter in light of chromosome 12 which has no introgression and still a strong depletion in r in the fermentation cross.

      We added this text to the Results, lines 377-381, "While it is unclear what potential mechanism is mediating the difference in shuffling on chromosome 12, we note that the rDNA locus on chromosome 12 is known to differ dramatically in repeat content across strains of S. cerevisiae (22–227 copies) (Sharma et a. 2022), and we speculate that differences in rDNA copy number between strains in our crosses could impact shuffling."

      (2) This reviewer insists on making sure that NCO detection is unaffected by the marker density, notably in the highly polymorphic regions, to unambiguously support Figure 1C.

      We've changed our correction for resolution to be more aggressive (see response to comment #2), and believe we have now adequately adjusted for marker density (see response to comment #38).

      Reviewer #3 (Recommendations For The Authors):

      I regret using such harsh language in the public review, but in my opinion, there has been a serious error in how marker densities are corrected for, and, since the manuscript is now public, it seems important to make it clear in public that I think that the conclusions of the paper are likely to be incorrect. I regret the distress that the public airing of this may cause. Below are my major concerns:

      (1) The paper is written in a way that makes it difficult to figure out just what the sequence differences are within the crosses. Part of this is, to be frank, the unusual way that the crosses were done, between more than one segregant each from two diploids in both natural and fermentation cases. I gather, from the homology calculations description, that each of these four diploids, while largely homozygous, contained a substantial number of heterozygosities, so individual diploids had different patterns of heterology. Is this correct? And if so, why was this strategy chosen? Why not start with a single diploid where all of the heterologies are known? Why choose to insert this additional complication into the mix? It seems to me that this strategy might have the perverse effect of having the heterology due to the polymorphisms present in one diploid affect (by correction) the impact of a noncrossover that occurs in a diploid that lacks the additional heterology. If polymorphic markers are a small fraction of total markers, then this isn't such a great concern, but I could not find the information anywhere in the manuscript. As a courtesy to the reader, please consider providing at the beginning some basic details about the starting strains-what is the average level of heterology between natural A and natural B, and what fraction of markers are polymorphic; what is the average level of heterology between fermentation A and fermentation B in non-introgressed regions, in introgressed regions, and what fraction of markers are polymorphic? How do these levels of heterology compare to what has been examined before in whole-genome hybrid strains? It also might be worth looking at some of the old literature describing S. cerevisiae/S. carlsbergensis hybrids.

      We thank the reviewer for drawing our attention to confusion about the cross construction. These crosses were conducted as is typical for yeast genetic crosses: we crossed 2 genetically distinct haploid parents to create a heterozygous diploid, then collected the haploid products of meiosis from the same F1 diploid. Because the crosses were made with haploid parents, it is not possible for other genetic differences to be segregating in the crosses. We have revised Figure 1 and its caption to clarify this. Further details regarding the crosses are in the Methods section "Strain and library construction" and in Supplemental Table S1. We only utilized genetic markers that are fixed differences between our parental strains to call CO and NCO. As we detail in the Methods line 507-508, we utilized a total of 24,574 markers for the natural cross and 74,619 markers for the fermentation cross (the higher number in the fermentation cross being due to more fixed differences in regions of introgression). We additionally revised Figure 2D (and Figure S11) to help readers better visualize differences between the crosses.

      (2) There are serious concerns about the methods used to identify noncrossovers and to normalize their levels, which are probably resulting in an artifactually high level of calculated crossovers in Figure 2. As a primary indication of this, it appears in Figure 2 that the total frequency of events (crossovers + noncrossovers) in heterozygous introgressed regions are substantially greater than those in the same region in non-introgressed strains, while just shifting of crossovers to noncrossovers would result in no net increase. The simplest explanation for this is that noncrossovers are being undercounted in non-introgressed relative to introgressed heterozygous regions. There are two possible reasons for this: i. The exclusion of all noncrossover events spanning less than three markers means that many more noncrossovers in introgressed heterozygous regions than in non-introgressed. Assuming that average non-homology is 5% in the former and 1% in the latter, the average 3-marker event will be 60 nt in introgressed regions and 300 nt in non-introgressed regions - so many more noncrossovers will be counted in introgressed regions. A way to check on this - look at the number of crossover-associated markers that undergo gene conversion; use the fraction that involves < 3 markers to adjust noncrossover levels (this is the strategy used by Mancera et al.). ii. The distance used for noncrossover level adjustment (2kb) is considerably greater than the measured average noncrossover lengths in other studies. The effect of using a too-long distance is to differentially under-correct for noncrossovers in non-introgressed regions, while virtually all noncrossovers in heterozygous introgressed regions will be detected. This can be illustrated by simulations that reduce the density of scored markers in heterozygous introgressed regions to the density seen in non-introgressed regions. Because these concerns go to the heart of the conclusions of the paper, they must be addressed quantitatively - if not, the main conclusions of the paper are invalid.

      We adjusted the correction factor (See also response to comment #2) and compared the average number of CO and NCO events in introgressed and non-introgressed regions between crosses (two comparisons: introgression CO+NCO in natural cross vs introgression CO+NCO in fermentation cross; non-introgression CO+NCO in natural cross vs non-introgression CO+NCO in fermentation cross). We found no significant differences between the crosses in either of the comparisons. This indicates that the distribution of total events is replicated in both crosses once we correct for resolution.

      (3) It is important to distinguish the landscape of double-strand breaks from the landscape of recombination frequencies. Double-strand breaks, as measured by uncalibrated levels of Spo11-linked oligos, is a relative number - not an absolute frequency. So it is possible that two species could have a similar break landscape in terms of topography but have absolute levels higher in one species than in the other.

      We agree with this statement, however, we have removed the relevant text to streamline our introduction.

      (4) Lines 123-125. Just meiosis will produce mosaic genomes in the progeny of the F1; further backcrossing will reduce mosaicism to the level of isolated regions of introgression.

      Adjusted the language to be more specific.

      (5) Please provide actual units for the Y axes in Figure 2D.

      We have corrected the units on the axes.

      (6) Tables (general). Are the significance measures corrected for multiple comparisons?

      In Table 3, the cutoff was chosen to be more conservative than a Bonferroni corrected alpha=0.01 with 9 comparisons (0.0011). In text, any result referred to as significant has an associated hypothesis test with a p-value less than its corresponding Bonferroni-corrected alpha of 0.05. This has been clarified in the caption for Table 3 and in the text where relevant.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      I have added a paragraph that addresses the issue of how landmarks might be used and why they are not. The suggestions made in the "Weaknesses" paragraph were concise and excellent and have directly incorporated them into my revised manuscript. This text appears on Page 21 and is shown below. I hope that this is what the editors and reviewers were looking.

      The requested revision is the second paragraph.

      The first paragraph was not written in response to reviews but inspired by a recent paper by Mahdev et al (2024) - https://doi.org/10.1038/s41593-024-01681-9.  I had already requested to add this reference and was encouraged to do so by the Editors. The Mahdev et al paper was very surprising in that it showed that path integration is not constant but that its "gain" can be recalibrated by selfmotion signals. I wondered whether this unexpected capacity extended to path integration also recalibrating the cognitive map and thereby generating the shortcutting behavior we observe. I suggested that, at an abstract level, this would correspond to "coordinate transformation" of the cognitive map. I realize that this is entirely speculative. If the Editors feel that it does not add much to the manuscript and that the speculation goes to far, I will remove the first paragraph and re-submit.

      Added text. P21 and just before the heading: " Implications for theories of hippocampal representations of spatial maps" There were no other changes made in the paper.

      "Path integration uses self-motion signals to update the animal's estimated location on its internal cognitive map. Path integration gain has been shown to be plastic and regulated by landmarks (52). Remarkably, a recent study has revealed that path integration gain can also be directly recalibrated by self-motion signals alone (53), albeit not as effectively as by landmarks (52, 53). An interesting question for future research is whether self-motion signals can also recalibrate the coordinates of a cognitive map. From this perspective, the Target B to Target A shortcut requires a transformation of the cognitive map coordinates so that the start point is now Target B.

      Extensive research has shown that external cues can control hippocampal neuron place fields (11, 12, 54) and the gain of the path integrator (52), making the failure of mice in our study to use such cues puzzling. The failure to use landmarks may be related to our task being low stakes and our pretraining procedure teaching the mouse that such cues are not necessary. Our results may not generalize to more natural conditions where many reliable prominent cues are available, and where there is urgency to find food or water while avoiding predation (55). Under these more naturalistic conditions the use of distal cues to rapidly find a food reward is more likely to be observed."

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors continue their investigations on the key role of glycosylation to modulate the function of a therapeutic antibody. As a follow-up to their previous demonstration on how ADCC was heavily affected by the glycans at the Fc gamma receptor (FcγR)IIIa, they now dissect the contributions of the different glycans that decorate the diverse glycosylation sites. Using a well-designed mutation strategy, accompanied by exhaustive biophysical measurements, with extensive use of NMR, using both standard and newly developed methodologies, they demonstrate that there is one specific locus, N162, which is heavily involved in the stabilization of (FcγR)IIIa and that the concomitant NK function is regulated by the glycan at this site.

      Strengths:

      The methodological aspects are carried out at the maximum level.

      Weaknesses:

      The exact (or the best possible assessment) of the glycan composition at the N162 site is not defined.

      We revised the Introduction to include previous findings from our laboratory regarding processing on YTS cells:

      “YTS cells, a key cytotoxic human NK cell line used for these studies, express FcγRIIIa with extensive glycan processing, including the N162 site with predominantly hybrid and complex-type glycoforms {Patel 2021}.” 

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to demonstrate a mechanistic link between Fcgamma receptor (IIIA) glycosylation and IgG binding affinity and signaling - resulting in antibody-dependent cellular cytotoxicity - ADCC. The work builds off prior findings from this group about the general impact of glycosylation on FcR (Fc receptor)-IgG binding.

      Strengths:

      The structural data (NMR) is highly compelling and very significant to the field. A demonstration of how IgG interacts with FcgRIIIA in a manner sensitive to glycosylation of both the IgG and the FcR fills a critical knowledge gap. The approach to demonstrate the selective impact of glycosylation at N162 is also excellent and convincing. The manuscript/study is, overall, very strong.

      Weaknesses:

      There are a number of minor weaknesses that should be addressed.

      (1) Since S164A is the only mutant in Figure 1 that seems to improve affinity, even if minimally, it would be a nice reference to highlight that residue in the structural model in panel B.

      We revised Figure 1B to include the S164 site.

      (2) It is confusing why some of the mutants in the study are not represented in Figure 1 panel A. Those affinities and mutants should be incorporated into panel A so the reader can easily see where they all fall on the scale.

      We thank the reviewer for this comment. We restructured the Results section to highlight that a primary outcome of the experiment referenced was to map the contribution of interface residues to antibody binding affinity. These data were not previously available, highlighting hotspots at the interface. Figure 1A and B report these results.

      We then used a subset of mutations from this experiment, as well as a subset of mutations from an additional library containing mutations proximal to the interface, to build a small library for evaluation using ADCC. The complete binding data for all variants, binding to two different IgG1 Fc glycoforms, is presented in Supplemental Table 1. 

      T167Y in particular needs to be shown, as it is one of few mutants that fall between what seems to be ADCC+ and ADCC- lines. Also, that mutant seems to have a stronger affinity compared to wt (judged by panel D), yet less ADCC than wt. This would imply that the relationship between affinity and activity is not as clean as stated, though it is clearly important. Comments about this would strengthen the overall manuscript.

      We thank the reviewer for this particular insight. We agree that the lack of a clean correlation between ADCC potency and affinity implies additional factors that could have affected these experimental results. We added the following sentence to the discussion. 

      “Notably, the ADCC potency for those high-affinity variants does not fall cleanly on a line, indicating that other factors affect our observations, which may include organization at the cell surface, changes to glycan composition, or receptor trafficking.”

      (3) This statement feels out of place: "In summary, this result demonstrates that the sensitivity to antibody fucosylation may be eliminated through FcγRIIIa engineering while preserving antibody-binding affinity." In Figure 2, the authors do indeed show that mutations in FcgRIIIa can alter the impact of IgG core fucosylation, but implying that receptor engineering is somehow translatable or as impactful therapeutically as engineering the antibody itself deflates the real basic science/biochemical impact of understanding these interactions in molecular detail. Not everything has to be immediately translatable to be important. 

      We agree and removed the highlighted sentence.   

      (4) The findings reported in Figure 2, panel C are exciting. Controls for the quality of digestion at each step should be shown (perhaps in supplementary data).

      We agree. We added an example of the digestions as Figure S2.  

      (5) Figure 3 is confusing (mislabeled?) and does not show what is described in the Results. First, there is a F158V variant in the graph but a V158F variant in the text.

      Please correct this. 

      Thank you for identifying this typo. We corrected Figure 3.

      Second, this variant (V158F/F158V) does not show the 2-fold increase in ADCC with kifunesine as stated. 

      Thank you for drawing our attention to this rounding error. We revised the text to report a statistically significant 1.4-fold increase.

      Finally, there are no statistical evaluations between the groups (+/- kif; +/- fucose). 

      We provide the p values for +/-fuc and +/- Kifunensine for each YTS cell line in the figure. We did not provide a global comparison of p values that included all cell lines due to some cell lines experiencing a significant change and others not. However, we added the raw data as Supplemental Table 2 should readers wish to perform these analyses.

      The differences stated are not clearly statistically significant given the wide spread of the data. This is true even for the wt variant.

      We agree that there are points that overlap in this figure between the different treatments. However, our use of the students T-test (two tailed) using three experiments collected on three different days (each with three technical replicates) provides enough resolution to determine the significance of difference of the means for the different treatments. This is, by our estimation, a highly rigorous manner to collect and analyze the data.  

      (6) The kifunensine impact is somewhat confusing. They report a major change in ADCC, yet similar large changes with trimming only occur once most of the glycan is nearly gone (Figure 2). Kifunensine will tend to generate high mannose and possibly a few hybrid glycans. It is difficult to understand what glycoforms are truly important outside of stating that multi-branched complex-type N-glycans decrease affinity.

      Note that Figure 2 does not evaluate the kifunensine-treated glycan, which is mostly Man8 and Man9 structures. In our previous work, these structures likewise provide increased binding affinity (see pubmed ID 30016589). We believe the most important message is that composition of the N162 glycan (removed with the S164A mutation) regulates NK cell ADCC. On cells, we are not able to modulate N162 glycan composition without affecting potentially every other N-glycan on the surface, so we do not have an ADCC experiments that is directly comparable to Figure 2. Thus, this increased ADCC resulting from kifunensine treatment is consistent with previously observed increases in binding affinity measurement.  

      (7) This is outside of the immediate scope, but I feel that the impact would be increased if differences in NK cell (and thus FcgRIIIA) glycosylation are known to occur during disease, inflammation, age, or some other factor - and then to demonstrate those specific changes impact ADCC activity via this mechanism.

      We agree completely. As mentioned in the Introduction, we know that N162 glycan composition varies substantially from donor to donor based on previous work from our

      lab. Curiously, little variability appeared between donors at the other four Nglycosylation sites. Thus, there is the potential that different NK cell N162 glycan compositions are coincident with different indications. This is an area we are quite interested in pursuing.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Many thanks to the editors for the reviewing of the revised manuscript.

      We are very grateful to the Reviewers for their time and for the appreciation of the revision.

      We thank the Reviewer 3 for acknowledging the use of sulforhodamine B (SRB) fluorescence as a real-time readout of astrocyte volume dynamics. Experimental data in brain slices were provided to validate this approach.<br /> The incomplete matching of our observation with early reported data in cultured astrocytes (e.g., Solenov et al., AJP-Cell, 2004), might reflect certain of their properties differing from the slice/in vivo counterparts as discussed in the manuscript.<br /> The study (T.R. Murphy et al., Front Cell Neurosci., 2017) showed that AQP4 knockout increased astrocyte swelling extent in response to hypoosmotic solution in brain slices (Fig 9), and discussed '... AQP4 can provide an efficient efflux pathway for water to leave astrocytes.’ Correspondingly, our data suggest that AQP4 mediate astrocyte water efflux in basal conditions.<br /> We have discussed the study (Igarashi et al., NeuroReport 2013); our current data would help to understand the cellular mechanisms underlying the finding of Igarashi et al.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Pham and colleagues provide an illuminating investigation of aquaporin-4 water flux in the brain utilizing ex vivo and in vivo techniques. The authors first show in acute brain slices, and in vivo with fiber photometry, SRB-loaded astrocytes swell after inhibition of AQP4 with TGN-020, indicative of tonic water efflux from astrocytes in physiological conditions. Excitingly, they find that TGN-020 increases the ADC in DW-MRI in a region-specific manner, potentially due to AQP4 density. The resolution of the DW-MRI cannot distinguish between intracellular or extracellular compartments, but the data point to an overall accumulation of water in the brain with AQP4 inhibition. These results provide further clarity on water movement through AQP4 in health and disease.

      Overall, the data support the main conclusions of the article, with some room for more detailed treatment of the data to extend the findings.

      Strengths:

      The authors have a thorough investigation of AQP4 inhibition in acute brain slices. The demonstration of tonic water efflux through AQP4 at baseline is novel and important in and of itself. Their further testing of TGN-020 in hyper- and hypo-osmotic solutions shows the expected reduction of swelling/shrinking with AQP4 blockade.

      Their experiment with cortical spreading depression further highlights the importance of water efflux from astrocytes via AQP4 and transient water fluxes as a result of osmotic gradients. Inhibition of AQP4 increases the speed of tissue swelling, pointing to a role in the efflux of water from the brain.

      The use of DW-MRI provides a non-invasive measure of water flux after TGN-020 treatment.

      We thank the reviewer for the insightful comments.

      Weaknesses:

      The authors specifically use GCaMP6 and light sheet microscopy to image their brain sections in order to identify astrocytic microdomains. However, their presentation of the data neglects a more detailed treatment of the calcium signaling. It would be quite interesting to see whether these calcium events are differentially affected by AQP4 inhibition based on their cellular localization (ie. processes vs. soma vs. vascular end feet which all have different AQP4 expressions).

      Following the suggestion, we provide new data on the effect of AQP4 inhibition on spontaneous calcium signals in perivascular astrocyte end-feet. As shown now in Fig.S2, acute application of TGN020 induced Ca2+ oscillations in astrocyte end-feet regions where the GCaMP6 labeling lines the profile of the blood vessel. It is noted that on average, the strength of basal Ca2+ signals in the end-feet is higher than that observed across global astrocyte territories (4.65 ± 0.55 vs. 1.45 ± 0.79, p < 0.01), as does the effect of TGN (8.4 ± 0.62 vs. 6.35 ± 0.97, p < 0.05; Fig S2 vs. Fig 2B). This likely reflects the enrichment of AQP4 in astrocyte end-feet. We describe the data in Fig.S2, and on page 8, line 20 – 23.  

      We now use the transgenic line GLAST-GCaMP6 for cytosolic GCaMP6 expression in astrocytes. Spontaneous calcium signals, reflected by transient fluorescence rises, occur in discrete micro-domains whereas the basal GCaMP6 fluorescence in the soma is weak. In the present condition, it is difficult to unambiguously discriminate astrocyte soma from the highly intermingled processes. 

      The authors show the inhibition of AQP4 with TGN-020 shortens the onset time of the swelling associated with cortical spreading depression in brain slices. However, they do not show quantification for many of the other features of CSD swelling, (ie. the duration of swelling, speed of swelling, recovery from swelling).

      Regarding the features of the CSD swelling, we have performed new analysis to quantify the duration of swelling, speed of swelling and the recovery time from swelling in control condition and in the presence of TGN-020. The new analysis is now summarized in Fig. S5. Blocking AQP4 with TGN-020 increases the swelling speed, prolongs the duration of swelling and slows down the recovery from swelling, confirming our observation that acute inhibition of AQP4 water efflux facilitates astrocyte swelling while restrains shrinking. We describe the result on page 11, line 19-21. 

      Significance:

      AQP4 is a bidirectional water channel that is constitutively open, thus water flux through it is always regulated by local osmotic gradients. Still, characterizing this water flux has been challenging, as the AQP4 channel is incredibly water-selective. The authors here present important data showing that the application of TGN-020 alone causes astrocytic swelling, indicating that there is constant efflux of water from astrocytes via AQP4 in basal conditions. This has been suggested before, as the authors rightfully highlight in their discussion, but the evidence had previously come from electron microscopy data from genetic knockout mice.

      AQP4 expression has been linked with the glymphatic circulation of cerebrospinal fluid through perivascular spaces since its rediscovery in 2012 [1]. Further studies of aging[2], genetic models[3], and physiological circadian variation[4] have revealed it is not simply AQP4 expression but AQP4 polarization to astrocytic vascular endfeet that is imperative for facilitating glymphatic flow. Still, a lingering question in the field is how AQP4 facilitates fluid circulation. This study represents an important step in our understanding of AQP4's function, as the basal efflux of water via AQP4 might promote clearance of interstitial fluid to allow an influx of cerebrospinal fluid into the brain. Beyond glymphatic fluid circulation, clearly, AQP4-dependent volume changes will differentially alter astrocytic calcium signaling and, in turn, neuronal activity.

      (1) Iliff, J.J., et al., A Paravascular Pathway Facilitates CSF Flow Through the Brain Parenchyma and the Clearance of Interstitial Solutes, Including Amyloid β. Sci Transl Med, 2012. 4(147): p. 147ra111.

      (2) Kress, B.T., et al., Impairment of paravascular clearance pathways in the aging brain. Ann Neurol, 2014. 76(6): p. 845-61.

      (3) Mestre, H., et al., Aquaporin-4-dependent Glymphatic Solute Transport in the Rodent Brain. eLife, 2018. 7.

      (4) Hablitz, L., et al., Circadian control of brain glymphatic and lymphatic fluid flow. Nature Communications, 2020. 11(1).

      We thank the reviewer in acknowledging the significance of our study and the functional implication in brain glymphatic system. We have now highlighted the mentioned studies as well as the potential implication glymphatic fluid circulation (page 4, line 9-10; page 5, line 1-3; and page 19, line 3-10). 

      Reviewer #2 (Public Review):

      Summary:

      The paper investigates the role of astrocyte-specific aquaporin-4 (AQP4) water channel in mediating water transport within the mouse brain and the impact of the channel on astrocyte and neuron signaling. Throughout various experiments including epifluorescence and light sheet microscopy in mouse brain slices, and fiber photometry or diffusion-weighted MRI in vivo, the researchers observe that acute inhibition of AQP4 leads to intracellular water accumulation and swelling in astrocytes. This swelling alters astrocyte calcium signaling and affects neighboring neuron populations. Furthermore, the study demonstrates that AQP4 regulates astrocyte volume, influencing mainly the dynamics of water efflux in response to osmotic challenges or associated with cortical spreading depolarization. The findings suggest that AQP4-mediated water efflux plays a crucial role in maintaining brain homeostasis, and indicates the main role of AQP4 in this mechanism. However authors highlight that the report sheds light on the mechanisms by which astrocyte aquaporin contributes to the water environment in the brain parenchyma, the mechanism underlying these effects remains unclear and not investigated. The manuscript requires revision.

      Strengths:

      The paper elucidates the role of the astrocytic aquaporin-4 (AQP4) channel in brain water transport, its impact on water homeostasis, and signaling in the brain parenchyma. In its idea, the paper follows a set of complimentary experiments combining various ex vivo and in vivo techniques from microscopy to magnetic resonance imaging. The research is valuable, confirms previous findings, and provides novel insights into the effect of acute blockage of the AQP4 channel using TGN-020.

      We thank the reviewer for the constructive comments.

      Weaknesses:

      Despite the employed interdisciplinary approach, the quality of the manuscript provides doubts regarding the significance of the findings and hinders the novelty claimed by the authors. The paper lacks a comprehensive exploration or mention of the underlying molecular mechanisms driving the observed effects of astrocytic aquaporin-4 (AQP4) channel inhibition on brain water transport and brain signaling dynamics. The scientific background is not very well prepared in the introduction and discussion sections. The important or latest reports from the field are missing or incompletely cited and missconcluded. There are several citations to original works missing, which would clarify certain conclusions. This especially refers to the basis of the glymphatic system concept and recently published reports of similar content. The usage of TGN-020, instead of i.e. available AER-270(271) AQP4 blocker, is not explained. While employing various experimental techniques adds depth to the findings, some reasoning behind the employed techniques - especially regarding MRI - is not clear or seemingly inaccurate. Most of the time the number of subjects examined is lacking or mentioned only roughly within the figure captions, and there are lacking or wrongly applied statistical tests, that limit assessment and reproducibility of the results. In some cases, it seems that two different statistical tests were used for the same or linked type of data, so the results are contradictory even though appear as not likely - based on the figures. Addressing these limitations could strengthen the paper's impact and utility within the field of neuroscience, however, it also seems that supplementary experiments are required to improve the report.

      The current data hint at a tonic water efflux from astrocyte AQP4 in physiological condition, which helps to understand brain water homeostasis and the functional implication for the glymphatic system. The underlying molecular and cellular mechanisms appear multifaceted and functionally interconnected, as discussed (page 14 line 8 –page 15, line 3). We agree that a comprehensive exploration will further advance our understanding.

      The introduction and discussion are now strengthened by incorporating the important advances in glymphatic system while highlighting the relevant studies. 

      The use of TGN-020 was based on its validation by wide range of ex vivo and in vivo studies including the use of heterologous expression system and the AQP4 KO mice. The validation of AER-270(271, the water soluble prodrug) using AQP4 KO mice is reported recently (Giannetto et al., 2024). AER-271 was noted to impact brain water ADC (apparent diffusion coefficient evaluated by diffusion-weighted MRI) in AQP4 KO mice ~75 min after the drug application (Giannetto et al., 2024). This likely reflects that AER270(271) is also an inhibitor for κΒ nuclear factor (NF-κΒ) whose inhibition could reduce CNS water content independent of AQP4 targeting (Salman et al., 2022). In addition, the inhibition efficiency of AER-270(271) seems lower than TGN-020 (Farr et al., 2019; Giannetto et al., 2024; Huber et al., 2009; Salman et al., 2022). We have now supplemented this information in the manuscript (page 7, line 1-6 and page15, line 7-17).

      The description on the DW-MRI is now updated (page 4, line 10-14). 

      We also performed new experiments and data analysis as described in a point-to-point manner below in the section ‘Recommendations For The Authors’.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors propose that astrocytic water channel AQP4 represents the dominant pathway for tonic water efflux without which astrocytes undergo cell swelling. The authors measure changes in astrocytic sulforhodamine fluorescence as the proxy for cell volume dynamics. Using this approach, they perform a technically elegant series of ex vivo and in vivo experiments exploring changes in astrocytic volume in response to AQP4 inhibitor TGN-020 and/or neuronal stimulation. The key finding is that TGN-020 produces an apparent swelling of astrocytes and modifies astrocytic cell volume regulation after spreading depolarizations. Additionally, systemic application of TGN-020 produced changes in diffusion-weighted MRI signal, which the authors interpret as cellular swelling. This study is perceived as potentially significant. However, several technical caveats should be strongly considered and perhaps addressed through additional experiments.

      Strengths:

      (1) This is a technically elegant study, in which the authors employed a number of complementary ex vivo and in vivo techniques to explore functional outcomes of aquaporin inhibition. The presented data are potentially highly significant (but see below for caveats and questions related to data interpretation).

      (2) The authors go beyond measuring cell volume homeostasis and probe for the functional significance of AQP4 inhibition by monitoring Ca2+ signaling in neurons and astrocytes (GCaMP6 assay).

      (3) Spreading depolarizations represent a physiologically relevant model of cellular swelling. The authors use ChR2 optogenetics to trigger spreading depolarizations. This is a highly appropriate and much-appreciated approach.

      We thank the reviewer for the effort in evaluating our work.

      Weaknesses:

      (1) The main weakness of this study is that all major conclusions are based on the use of one pharmacological compound. In the opinion of this reviewer, the effects of TGN-020 are not consistent with the current knowledge on water permeability in astrocytes and the relative contribution of AQP4 to this process.

      Specifically: Genetic deletion of AQP4 in astrocytes reduces plasmalemmal water permeability by ~two-three-fold (when measured a 37oC, Solenov et al., AJP-Cell, 2004). This is a significant difference, but it is thought to have limited/no impact on water distribution. Astrocytic volume and the degree of anisosmotic swelling/shrinkage are unchanged because the water permeability of the AQP4null astrocytes remains high. This has been discussed at length in many publications (e.g., MacAulay et al., Neuroscience, 2004; MacAulay, Nat Rev Neurosci, 2021) and is acknowledged by Solenov and Verkman (2004).

      Keeping this limitation in mind, it is important to validate astrocytic cell volume changes using an independent method of cell volume reconstruction (diameter of sulforhodamine-labeled cell bodies? 3D reconstruction of EGFP-tagged cells? Else?)

      Solenov and coll. used the calcein quenching assay and KO mice demonstrating AQP4 as a functional water channel in cultured astrocytes (Solenov et al., 2004). AQP4 deletion reduced both astrocyte water permeability and the absolute amplitude of swelling over comparable time, and also slowed down cell shrinking, which overall parallels our results from acute AQP4 blocking. Yet in Solenovr’s study, the time to swelling plateau was prolonged in AQP4 KO astrocytes, differing from our data from the pharmacological acute blocking. This discrepancy may be due to compensatory mechanisms in chronic AQP4 KO, or reflect the different volume responses in cultured astrocytes from brain slices or in vivo results as suggested previously (Risher et al., 2009). 

      Soma diameter might be an indicator of cell volume change, yet it is challenging with our current fluorescence imaging method that is diffraction-limited and insufficient to clearly resolve the border of the soma in situ. In addition, the lateral diameter of cell bodies may not faithfully reflect the volume changes that can occur in all three dimensions. Rapid 3D imaging of astrocyte volume dynamics with sufficient high Z-axis resolution appears difficult with our present tools. 

      We have now accordingly updated the discussion with relevant literatures being cited (page 17 line 14 – page 18, line 3).

      (2) TGN-020 produces many effects on the brain, with some but not all of the observed phenomena sensitive to the genetic deletion of AQP4. In the context of this work, it is important to note that TGN020 does not completely inhibit AQP4 (70% maximal inhibition in the original oocyte study by Huber et al., Bioorg Med Chem, 2009). Thus, besides not knowing TGN-020 levels inside the brain, even

      "maximal" AQP4 inhibition would not be expected to dramatically affect water permeability in astrocytes.

      This caveat may be addressed through experiments using local delivery of structurally unrelated AQP4 blockers, or, preferably, AQP4 KO mice.

      It is an important point that TGN-020 partially blocks AQP4, implying the actual functional impact of AQP4 per se might be stronger than what we observed. TGN provides a means to acutely probe AQP4 function in situ, still we agree, its limitation needs be acknowledged. We mention this now on page 15, line 7-9 and 14-17.

      We agree that local delivery of an alternative blocker will provide additional information. Meanwhile, local delivery requires the stereotaxic implantation of cannula, which would cause inflammations to surrounding astrocytes (and neurons). The recently introduced AQP4 blocker AER-270(271) has received attention that it influences brain water dynamics (ADC in DW-MRI) in AQP4 KO mice (Giannetto et al., 2024), recalling that AER-270(271) is also an inhibitor for κΒ nuclear factor (NF-κΒ). This pathway can potentially perturb CNS water content and influence brain fluid circulation, in an AQP4independent manner (Salman et al., 2022). The inhibition efficiency on mouse AQP4 of AER-270 (~20%, Farr et al., 2019; Salman et al., 2022) appears lower than TGN-020 (~70%, Huber et al., 2009).

      We chose to use the pharmacological compound to achieve acute blocking of AQP4 thereby avoiding the chronic genetics-caused alterations in brain structural, functional and water homeostasis. Multiple lines of evidence including the recent study (Gomolka et al., 2023), have shown that AQP4 KO mice alters brain water content, extracellular space and cellular structures, which raises concerns to use the transgenic mouse to pinpoint the physiological functions of the AQP4 water channel. 

      We have now mentioned the concerns on AQP4 pharmacology by supplementing additional literatures in the field (page 15, line 8-18). 

      (3) This reviewer thinks that the ADC signal changes in Figure 5 may be unrelated to cellular swelling. Instead, they may be a result of the previously reported TGN-020-induced hyphemia (e.g., H. Igarashi et al., NeuroReport, 2013) and/or changes in water fluxes across pia matter which is highly enriched in AQP4. To amplify this concern, AQP4 KO brains have increased water mobility due to enlarged interstitial spaces, rather than swollen astrocytes (RS Gomolka, eLife, 2023). Overall, the caveats of interpreting DW-MRI signal deserve strong consideration.

      The previous observation show that TGN-020 increases regional cerebral blood flow in wild-type mice but not in AQP4 KO mice (Igarashi et al., 2013). Our current data provide a possible mechanism explanation that TGN-020 blocking of astrocyte AQP4 causes calcium rises that may lead to vasodilation as suggested previously (Cauli and Hamel, 2018). We now add updates to the discussion on page 15, line 3-7.

      We are in line with the reviewer regarding the structural deviations observed with the AQP4 KO mice

      (Gomolka et al., 2023), now mentioned on page 19, line 3-5. Following the Reviewer’s suggestion, we have also updated the interpretation of the DW-MRI signal and point that in addition to being related to the astrocyte swelling, the ADC signal changes may also be caused by indirect mechanisms, such as the transient upregulation of other water-permeable pathways in compensating AQP4 blocking. We now describe this alternative interpretation and the caveats of the DW-MRI signals (page 20, line 1-8). 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Private recommendations

      My more broad experimental suggestions are in the "weaknesses" section. Some minor points that would improve the manuscript are included below:

      (1) A more detailed explanation for why SRB fluorescence reflects the astrocyte volume changes, whereas typical intracellular GFP does not.

      As an engineered fluorescence protein, the GFP has been used to tag specific type of cells. Meanwhile, as a relatively big protein (MW, 26.9 kDa), the diffusion rate of EGFP is expected to be much less than SRB, a small chemical dye (MW, 558.7 Da). Also, the IP injection of SRB enables geneticsless labeling of brain astrocytes, so to avoid the influence of protein overexpression on astrocyte volume and water transport responses. We have now stated this point in the manuscript (page 13, line 21 – page 14, line 4).

      (2) Figure 1 panel B should have clear labels on the figure and a description in the legend to delineate which part of the panel refers to hyper- or hypo-osmotic treatment.

      We have now updated the figure and the legend.  

      (3) For Figure 2, what is the rationale for analyzing the calcium signaling data between the cell types differently?

      We analyzed calcium micro-domains for astrocytes as their spontaneous signals occur mainly in discrete micro-domains (Shigetomi et al., 2013). While for neurons, we performed global analysis by calculating the mean fluorescence of imaging field of view, because calcium signal changes were only observed at global level rather than in micro-domains. This information is now included (page 24, line1820).

      (4) For Figure 3, the authors mention that TGN-020 likely caused swelling prior to the hypotonic solution administration. Do they have any measurements from these experiments prior to the TGN-020 application to use as a "true baseline" volume?

      The current method detects the relative changes in astrocyte volume (i.e., transmembrane water transport), which nevertheless is blind to the absolute volume value. We have no readout on baseline volumes.  

      (5) For Figures 3 and 4, did the authors see any evidence for regulatory volume decrease? And is this impaired by TGN-020? It is a well-characterized phenomenon that astrocytes will open mechanosensitive channels to extrude ions during hypo-osmotic induced swelling. This process is dependent on AQP4 and calcium signaling [5]

      Mola and coll. provided important results demonstrating the role of AQP4 in astrocyte volume regulation (Mola et al., 2016). In the present study in acute brain slices, when we applied hypotonic solution to induce astrocyte swelling, our protocol did not reveal rapid regulatory volume decrease (e.g., Fig. 3D). When we followed the volume changes of SRB-labeled astrocytes during optogenetically induced CSD, we observed the phase of volume decrease following the transient swelling (Fig. 4F), where the peak amplitude and the degree of recovery were both reduced by inhibiting AQP4 with TGN020. These data imply that regulatory astrocyte volume decrease may occur in specific conditions, which intriguingly has been suggested to be absent in brain slices and in vivo (e.g., Risher et al., 2009). We have not specifically investigated this phenomenon, and now briefly discuss this point on page18 line 6-14.

      (6) Figure 5 box plots do not show all data points, could the authors modify to make these plots show all the animals, or edit the legend to clarify what is plotted?

      We have now updated the plot and the legend. This plot is from all animals (n = 7 per condition).

      (7) pg. 9 line 6, there is a sentence that seems incomplete or otherwise unfinished. "We first followed the evoked water efflux and shrinking induced by hypertonic solution while."

      Fixed (now, page 9 line 17-18). 

      (8)  During the discussion on pg 13 line 11, it may be more clear to describe this as the cotransport of water into the cells with ions/metabolites as reviewed by Macaulay 2021 [6].

      We agree; the text is modified following this suggestion (now page14, line 12-13).  

      (1) Iliff, J.J., et al., A Paravascular Pathway Facilitates CSF Flow Through the Brain Parenchyma and the Clearance of Interstitial Solutes, Including Amyloid β. Sci Transl Med, 2012. 4(147): p. 147ra111.

      (2) Kress, B.T., et al., Impairment of paravascular clearance pathways in the aging brain. Ann Neurol, 2014. 76(6): p. 845-61.

      (3) Mestre, H., et al., Aquaporin-4-dependent Glymphatic Solute Transport in the Rodent Brain. eLife, 2018. 7.

      (4) Hablitz, L., et al., Circadian control of brain glymphatic and lymphatic fluid flow. Nature Communications, 2020. 11(1).

      (5) Mola, M., et al., The speed of swelling kinetics modulates cell volume regulation and calcium signaling in astrocytes: A different point of view on the role of aquaporins. Glia, 2016. 64(1).

      (6) MacAulay, N., Molecular mechanisms of brain water transport. Nat Rev Neurosci, 2021. 22(6): p. 326-344.

      We thank the reviewer. These important literatures are now supplemented to the manuscript together with the corresponding revisions.

      Reviewer #2 (Recommendations For The Authors):

      In its concept, the paper is interesting and provides additional value - however, it requires revision.

      Below, I provide the following remarks for the following sections/ pages/lines:

      ABSTRACT/page 2 (remarks here refer to the rest of the manuscript, where these sentences are repeated):

      - It seems that the 'homeostasis' provides not only physical protection, but also determines the diffusion of chemical molecules...' Please correct the sentence as it is grammatically incorrect.

      It is now corrected (page 2, line 1).

      - The term 'tonic water' is not clear. I understand, after reading the paper, that it is about tonicity of the solutes injected into the mouse.

      We use the term ‘tonic’ to indicate that in basal conditions, a constant water efflux occurs through the APQ4 channel.

      - 'tonic aquaporin water efflux maintains volume equilibrium' - I believe it is about maintaining volume and osmotic equilibrium?

      This description is now refined (now page 2, line 10).

      - It is not clear whether the tonic water outflow refers to the cellular level or outflow from the brain parenchyma (i.e., glymphatic efflux)

      It refers to the cellular level. 

      INTRODUCTION/page 3:

      - 'clearance of waste molecules from the brain as described in the glymphatic system' - The original papers describing the phenomena are not cited: Iliff et al. 2012, 2013, Mestre et al. 2018, as well as reviews by Nedergaard et al.

      Indeed. We have now cited these key literatures (now page 4, line 10).

      - 'brain water diffusion is the basis for diffusion-weighted magnetic resonance imaging (DW-MRI)' - The statement is wrong. it is the mobility of the water protons that DWI is based on, but not the diffusion of molecules in the brain. This should be clarified and based on the DW-MRI principle and the original works by Le Bihan from 1986, 1988, or 2015.

      This sentence is now updated (page 4, line10-14).

      - Similarly, I suggest correcting or removing the citations and the sentence part regarding the clinical use of DWI, as it has no value here. Instead, it would be worth mentioning what actually ADC reflects as a computational score, and what were the results from previous studies assessing glymphatic systems using DWI. This is especially important when considering the mislocalization of the AQP4 channel.

      We now states recent studies using DW-MRI to evaluate glymphatic systems (page 4, line16-17).  

      - 'In the brain, AQP4 is predominantly expressed in astrocytes'-please review the citations. I suggest reading the work by Nielsen 1997, Nagelhus 2013, Wolburg 2011, and Li and Wang from 2017. To my best knowledge, in the brain AQP4 is exclusively expressed in astrocytes.

      Thanks for the reviewer. It is described that while enriched in astrocytes, AQP4 is also expressed in ependymal cells lining the ventricles (e.g., (Mayo et al., 2023; Verkman et al., 2006)). ‘predominantly’ is now removed (page 4, line 21).

      - The conclusion: ' Our finding suggests that aquaporin acts as a water export route in astrocytes in physiological conditions, so as to counterbalance the constitutive intracellular water accumulation caused by constant transmitter and ion uptake, as well as the cytoplasmic metabolism processes. This mechanism hence plays a necessary role in maintaining water equilibrium in astrocytes, thereby brain water homeostasis' seems to be slightly beyond the actual findings in the paper. I suggest clarifying according to the described phenomena.

      We have now refined the conclusion sticking to the experimental observations (page 5, line16-18).

      - The introduction lacks important information on existing AQP4 blockers and their effects, pros and cons on why to use TGN-020. Among others, I would refer to recent work by Giannetto et al 2024, as well as previous work of Mestre et al. 2018 and Gomolka et al. 2023.

      We initiated the study by using TGN-020 as an AQP4 blocker because it has been validated by wide range of ex vivo and in vivo studies as documented in the text (page 7, line 1-6). We also update discussions on the recent advances in validating the AQP4 blocker AER-270(271) while citing the relevant studies (page 15, line 7-17).  

      RESULTS:

      - Page 5, lines 19-20: '...transport, we performed fluorescence intensity translated (FIT) imaging.' - this term was never introduced in the methods so it is difficult for the reader to understand it at first sight. -'To this end,' - it is not clear which action refers to 'this'. (is it about previous works or the moment that the brain samples were ready for imaging? Please clarify, as it is only starting to be clear after fully reading the methods.

      We now refine the description give the principle of our imaging method first, then explain the technical steps. To avoid ambiguity, the term ‘To this end’ is removed. The updated text is now on page 6, line 1-3.  

      - From page 6 onwards - all references to Figures lack information to which part of the figure subpanel the information refers (top/middle bottom or left/middle/right).

      We apologize. The complementary indication is now added for figure citations when applicable.  

      - 'whereas water export and astrocyte shrinking upon hyperosmotic manipulation increased astrocyte fluorescence (Figure 1B). Hence, FIT imaging enables real-time recording of astrocyte transmembrane water transport and volume dynamics.' - this part seems to be undescribed or not clear in the methods.

      We have now refined this description (page 6, line 19-20).

      - Page 6, lines 17-22: TGN-020. In addition to the above, I suggest familiarizing also with the following works by Igarashi 2011. doi: 10.1007/s10072-010-0431-1, and by Sun 2022. doi: 10.3389/fimmu.2022.870029.

      These studies are now cited (page 7, line 3-4).

      - Page 7: ' AQP4 is a bidirectional channel facilitating... ' - AQP4 water channel is known as the path of least resistance for water transfer, please see Manley, Nature Medicine, 2000 and Papadopoulos, Faseb J, 2004.

      This sentence is now updated (page 7, line 12-13).

      - ' astrocyte AQP4 by TGN-020 caused a gradual decrease in SRB fluorescence intensity, indicating an intracellular water accumulation' - tissue slice experiment is a very valuable method. However it seems right, the experiment does not comment on the cell swelling that may occur just due to or as a superposition of tissue deterioration and the effect of TGN-020. The AQP4 channel is blocked, and the influx of water into astrocytes should be also blocked. Thus, can swelling be also a part of another mechanism, as it was also observed in the control group? I suggest this should be addressed thoroughly.

      We performed this experiment in acute brain slices to well control the pharmacological environment and gain spatial-temporal information. Post slicing, the brain slices recovered > 1hr prior to recording, so that the slices were in a stable state before TGN-020 application as evidenced by the stable baseline. The constant decrease in the control trace is due to photobleaching which did not change its curve tendency in response to vehicle. TGN-020, in contrast, caused a down-ward change suggesting intracellular water accumulation and swelling. 

      The experiment was performed at basal condition without active water influx; a decrease in SRB fluorescence hints astrocyteintracellular water buildup. This result shows that in basal condition, astrocyte aquaporin mediates a constant (i.e., tonic) water efflux; its blocking causes intracellular water accumulation and swelling. 

      We have accordingly updated the description of this part (page 7, line 15-20).

      - From the Figure 1 legend: Only 4 mice were subjected to the experiment, and only 1 mouse as a control. I suggest expanding the experiment and performing statistics including two-way ANOVA for data in panels B, C, and D, as no results of statistical tests confirm the significance of the findings provided.

      The panel B confirms that cytosolic SRB fluorescence displays increasing tendency upon water efflux and volume shrinking, and vice versa. As for the panel C, the number of mice is now indicated. Also, the downward change in the SRB fluorescence was now respectively calculated for the phases prior and post to TGN (and vehicle) application, and this panel is accordingly updated. TGN-020 induced a declining in astrocyte SRB fluorescence, which is validated by t-test performed in MATLAB. To clarify, we now add cross-link lines to indicate statistical significance between the corresponding groups (Fig 1C, middle). As for panel D, we calculated the SRB fluorescence change (decrease) relative to the photobleaching tendency illustrated by the dotted line. The significance was also validated by t-test performed in MATLAB.  

      - Figure 1: Please correct the figure - pictures in panel A are low quality and do not support the specificity of SRB for astrocytes. Panels B-D are easier to understand if plotted as normal X/Y charts with associated statistical findings. Some drawings are cut or not aligned.

      In GFAP-EGFP transgenic, astrocytes are labeled by EGFP. SRB labeling (red fluorescence) shows colocalization with EGFP-positive astrocytes, meanwhile not all EGFP-positive astrocytes are labeled by SRB. The PDF conversion procedure during the submission may also somehow have compromised image quality. We have tried to update and align the figure panels.  

      - Page 12: ' TGN-020 increased basal water diffusion within multiple regions including the cortex,

      hippocampus and the striatum in a heterogeneous manner (Figure 5C).'

      This sentence is updated now (page 12, line 12 – page13, line 2). It reads ‘The representative images reveal the enough image quality to calculate the ADC, which allow us to examine the effect of TGN-020 on water diffusion rate in multiple regions (Fig. 5C).’

      - The expression of AQP4 within the brain parenchyma is known to be heterogenous. Please familiarize yourself with works by Hubbard 2015, Mestre 2018, and Gomolka 2023. A correlation between ADC score and AQP4 expression ROI-wise would be useful, but it is not substantial to conduct this experiment.

      We thank the reviewer. This point is stressed on page 19, line 12-14.

      DISCUSSION:

      - Most of the issues are commented on above, so I suggest following the changes applied earlier. -Page 16: 'We show by DW-MRI that water transport by astrocyte aquaporin is critical for brain water homeostasis.' This statement is not clear and does not refer to the actual impact of the findings. DWI is allowed only to verify the changes of ADC fter the application of TGN-020. I suggest commenting on the recent report by Giannetto 2024 here.

      This sentence is now refined (page 19, line 1-2), followed by the updates commenting on the recent studies employing DW-MRI to evaluate brain fluid transport, including the work of (Giannetto et al., 2024) (page 19, line 3-10). 

      METHODS:

      - Page 18: no total number of mice included in all experiments is provided, as well as no clearly stated number of mice used in each experiment. Please correct.

      We have now double checked the number of the mice for the data presented and updated the figure legends accordingly (e.g., updates in legends fig1, fig5, etc).

      -  Page 18, line 7: 'Axscience' is not a producer of Isoflurane, but a company offering help with scientific manuscript writing. If this company's help was used, it should be stated in the acknowledgments section. Reference to ISOVET should be moved from line 15 to line 7.

      We apologize. We did not use external writing help, and now have removed the ‘Axcience’. The Isoflurane was under the mark ‘ISOVET’ from ‘Piramal’. This info is now moved up (page 21, line 11). 

      - Page 18, line 9: ' modified artificial cerebrospinal fluid (aCSF)'. Additional information on the reason for the modified aCSF would be useful for the reader.

      In this modified solution, the concentration of depolarizing ions (Na+, Ca2+) was reduced to lower the potential excitotoxicity during the tissue dissection (i.e., injury to the brain) for preparing the brain slices. Extra sucrose was added to balance the solution osmolarity. This solution has been used previously for the dissection and the slicing steps in adult mice (Jiang et al., 2016). We now add this justification in the text and quote the relevant reference (page 21, line14-16). 

      - Page 19, line 6: a reasoning for using Tamoxifen would be helpful for the reader.

      The Glast-CreERT2 is an inducible conditional mouse line that expresses Cre recombinase selectively in astrocytes upon tamoxifen injection. We now add this information in the text (page 22, line 10-11). 

      - Line 8 - 'Sigma'

      Fixed.

      - Line 7/8: It is not clear if ethanol is of 10% solution or if proportions of ethanol+tamoxifen to oil were of 1:9. The reasoning for each performed step is missing.

      We have now clarified the procedure (page 22, line 11-15).

      - Line 10: '/' means 'or'?

      Here, we mean the bigenic mice resulting from the crossing of the heterozygous Cre-dependent GCaMP6f and Glast-CreERT2 mouse lines. We now modify it to ‘Glast-CreERT2::Ai95GCaMP6f//WT’, in consistence with the presentation of other mouse lines in our manuscript (page 22, line 16).

      - Lines 22-23: being in-line with legislation was already stated at the beginning of the Methods so I suggest combining for clearance.

      Done. 

      - Page 21, line 4: it is good to mention which printer was used, but it would be worth mentioning the material the chamber was printed from - was it ABS?

      Yes. We add this info in the text now (page 24, line 5).

      - Line 9 -'PI' requires spelling out.

      It is ‘Physik Instrumente’, now added (page 24, line 10).

      - Line 11-12: What is the reason for background subtraction - clearer delineation of astrocytes/ increasing SNR in post-processing, or because SRB signal was also visible and changing in the background over time? Was the background removed in each frame independently (how many frames)? How long was the time-lapse and was the F0 frame considered as the first frame acquired? The background signal should be also measured and plotted alongside the astrocytic signal, as a reference (Figure 1). This should be clarified so that steps are to be followed easily.

      We sought to follow the temporal changes in SRB fluorescence signal. The acquired fluorescent images contain not only the SRB signals, but also the background signals consisting of for instance the biological tissue autofluorescence, digital camera background noise and the leak light sources from the environments. The value of the background signal was estimated by the mean fluorescence of peripheral cell-free subregions (15 × 15 µm²) and removed from all frames of time-lapse image stack. The traces shown in the figures reflect the full lengths of the time-lapse recordings. F0 was identified as the mean value of the 10 data points immediately preceding the detected fluorescence changes. The text is now updated (page 24 line 21 - page 25 line 5).

      - Line 15: Was astrocyte image delineation performed manually or automatically? Where was the center of the region considered in the reference to the astrocyte image? It would be good to see the regions delineated for reference.

      Astrocytes labeled by SRB were delineated manually with the soma taken as the center of the region of interest. We now exemplify the delineated region in Fig 1A, bottom.

      - Page 22, line 2: 'x4 objective'.

      Added (now, page 25, line 16). 

      - Line 3: 'barrels' - reference to publication or the explanation missing.

      The relevant reference is now added on barrel cortex (Erzurumlu and Gaspar, 2020) (page 25, line 19-20). 

      - Line 19: were the coordinates referred to = bregma?

      Yes. This info is now added (page 26, line 12). 

      - Line 20: was the habituation performed directly at the acquisition date? It is rather difficult to say that it was a habituation, but rather acute imaging. I suggest correcting, that mice were allowed to familiarize themselves with the setup for 30 minutes prior to the imaging start.

      In this context, although it is a very nice idea and experiment, the influence of acute stress in animals familiar with the setup only from the day of acquisition is difficult to avoid. It is a major concern, especially when considering norepinephrine as a master driver of neuronal and vascular activity through the brain, and strong activation of the hypothalamic-adrenal axis in response to acute stress. It is well known, that the response of monoamines is reduced in animals subjected to chronic v.s acute stress, but still larger than that if the stressor is absent.

      Major remark: The animals should, preferably, be imaged at least after 3 days of habituation based on existing knowledge. I suggest exploring the topic of the importance of habituation. It is difficult though, to objectively review these findings without considering stress and associated changes in vascular dynamics.

      Many thanks for the reviewer to help to precise this information. The text is accordingly updated to describe the experiment (now page 26, line 14). 

      - Page 23, line 17: number of animals included in experiments missing.

      The number of animals is added in Methods (page 27, line 12) and indicated in the legend of Figure 5. 

      - Line 18/19: were the respiratory effects observed after injection of saline or TGN-020? Since DWI was performed, the exclusion of perfusive flow on ADC is impossible.

      I suggest an additional experiment in n=3 animals per group, verifying the HR (and if possible BP) response after injection of TGN-020 and saline in mice.

      The respiratory rate has been recorded. We added the averaged respiratory rate before and after injection of TGN-020 or saline (now, Fig. S6; page 13, line 5-6).

      - Line 22: Please, provide the model of the scanner, the model of the cryoprobe, as well as the model of the gradient coil used, otherwise it is difficult to assess or repeat these experiments.

      We have now added the information of MRI system in Methods section (page 27, line17-21).

      - Page 24: line 3/4: although the achieved spatial resolution of DWI was good and slightly lower than desired and achievable due to limitations of the method itself as well as cryoprobe, it is acceptable for EPI in mice.

      Still, there is no direct explanation provided on the reasoning for using surface instead of volumetric coil, as well as on assuming an anisotropic environment (6 diffusion directions) for DWI measurements. This is especially doubtful if such a long echo-time was used alongside lower-thanpossible spatial resolution. Longer echo time would lower the SNR of the depicted signal but also would favor the depiction of signal from slow-moving protons and larger water pools. On the other hand, only 3 b-values were used, which is the minimum for ADC measurements, while a good research protocol could encompass at least 5 to increase the accuracy of ADC estimation and avoid undersampling between 250 and 1800 b-values. What was the reason for choosing this particular set of b-values and not 50, 600, and 2000? Besides, gradient duration time was optimally chosen, however, I have concerns about the decision for such a long gradient separation times.

      If the protocol could have been better optimized, the assessment could have been also performed in respiratory-gated mode, allowing minimization of the effects of one of the glymphatic system driving forces.

      Thus, I suggest commenting on these issues.

      We chose the cryoprobe to increase the signal-to-noise ratio (SNR) in DW-MRI with long echo-time and high b-value. The volume coil has a more homogeneous SNR in the whole brain rather than the cryoprobe, but SNR should be reduced compared with cryoprobe. We confirmed that, even at the ventral part of the brain, the image quality of DW-MRI images was enough to investigate the ADC with cryoprobe (Fig. 5B-C). This is mentioned now in Methods (page 27, line 17-21).

      We performed DW-MRI scanning for 5 min at each time-point using the condition of anisotropic resolution and 3 b-values, to investigate the time-course of ADC change following the injection of TGN020. Because the effect of TGN-020 appears about dozen of minutes post the injection (Igarashi et al., 2011), fast DW-MRI scanning is required. If isotropic DW-MRI with lower echo-time and more direction is used, longer scan time at each time point is required, maybe more than 1h. We agree that three bvalues is minimum to calculate the ADC and more b-values help to increase the accuracy. However, to achieve the temporal resolution so as to better catch the change of water diffusion, we have decided to use the minimum b-values. The previous study also validates the enough accuracy of DW-MRI with three b-values (Ashoor et al., 2019). Furthermore, previous study that used long diffusion time (> 20 ms) and long echo time (40 ms) shows the good mean diffusivity (Aggarwal et al., 2020), supporting that our protocol is enough to investigate the ADC. We have now updated the description (page 28 line 5-9).  The reason why we choose the b = 250 and 1800 s/mm² is that 2000 s/mm² seems too high to get the good quality of image. In the previous study, we have optimized that ADC is measurable with b = 0, 250, and 1800 s/mm² (Debacker et al., 2020). 

      - Page 24, line 7: What was the post-processing applied for images acquired over 70 minutes? Did it consider motion-correction, co-registration, or drift-correction crucial to avoid pitfalls and mismatches in concluding data?

      The motion correction and co-registration were explained in Methods (page 28, line 12-14).

      Also, were these trace-weighted images or magnitude images acquired since DTI software was used for processing - while ADC fitting could be reliably done in Matlab, Python, or other software. Thus, was DSI software considering all 3 b-values or just used 0 and 1800 for the calculation of mean diffusivity for tractography (as ADC). The details should be explained.

      DSIstudio was used with all three b values (b = 0, 250, and 1800 s/mm²) to calculate the ADC. We added the description in Methods (page 28, line 16-18).

      To make sure that the results are not affected by the MR hardware, I suggest performing 3 control measurements in a standard water phantom, and presenting the results alongside the main findings.

      Thanks for this suggestion. We have performed new experiments and now added the control measurement with three phantoms, that is water, undecane, and dodecane. These new data are summarized now in Fig. S7, showing the stability of ADC throughout the 70 min scanning. We have updated the description on Method part (page 28, line 9-11) and on the Results (page 13, line 6-8).  

      - Line 13: were the ROI defined manually or just depicted from previously co-registered Allen Brain atlas?

      The ROIs of the cortex, the hippocampus, and the striatum were depicted with reference to Allen mouse brain atlas (https://scalablebrainatlas.incf.org/mouse/ABA12). This is explained in Methods (page 28, line 14-16).

      - Line 10: why the average from 1st and 2nd ADC was not considered, since it would reduce the influence of noise on the estimation of baseline ADC?

      We are sorry that it was a typo. The baseline was the average between 1st and 2nd ADC. We corrected the description (page 28, line 20).

      STATISTIC:

      Which type of t-test - paired/unpaired/two samples was used and why? Mann-Whitney U-tets are used as a substitution for parametric t-tests when the data are either non-parametric or assuming normal distribution is not possible. In which case Bonferroni's-Holm correction was used? - I couldn't find any mention of any multiple-group analysis followed by multiple comparisons. Each section of the manuscript should have a description of how the quantitative data were treated and in which aim. I suggest carefully correcting all figures accordingly, and following the remarks given to the Figure 1.

      We used unpaired t-test for data obtained from samples of different conditions. Indeed, MannWhitney U-test is used when the data are non-parametric deviating from normal distributions.  Bonferroni-Holm correction was used for multiple comparisons (e.g., Fig. 4D-E).

      Reviewer #3 (Recommendations For The Authors):

      I think that the following statement is insufficient: "The authors commit to share data, documentation, and code used in analysis". My understanding is eLife expects that all key data to be provided in a supplement.

      We thank the reviewer; we follow the publication guidelines of eLife. 

      References

      Aggarwal, M., Smith, M.D., and Calabresi, P.A. (2020). Diffusion-time dependence of diffusional kurtosis in the mouse brain. Magn Reson Med 84, 1564-1578.

      Ashoor, M., Khorshidi, A., and Sarkhosh, L. (2019). Estimation of microvascular capillary physical parameters using MRI assuming a pseudo liquid drop as model of fluid exchange on the cellular level. Rep Pract Oncol Radiother 24, 3-11.

      Cauli, B., and Hamel, E. (2018). Brain Perfusion and Astrocytes. Trends in neurosciences 41, 409-413.

      Debacker, C., Djemai, B., Ciobanu, L., Tsurugizawa, T., and Le Bihan, D. (2020). Diffusion MRI reveals in vivo and non-invasively changes in astrocyte function induced by an aquaporin-4 inhibitor. PLoS One 15, e0229702.

      Erzurumlu, R.S., and Gaspar, P. (2020). How the Barrel Cortex Became a Working Model for Developmental Plasticity: A Historical Perspective. J Neurosci 40, 6460-6473.

      Farr, G.W., Hall, C.H., Farr, S.M., Wade, R., Detzel, J.M., Adams, A.G., Buch, J.M., Beahm, D.L., Flask, C.A., Xu, K., et al. (2019). Functionalized Phenylbenzamides Inhibit Aquaporin-4 Reducing Cerebral Edema and Improving Outcome in Two Models of CNS Injury. Neuroscience 404, 484-498.

      Giannetto, M.J., Gomolka, R.S., Gahn-Martinez, D., Newbold, E.J., Bork, P.A.R., Chang, E., Gresser, M., Thompson, T., Mori, Y., and Nedergaard, M. (2024). Glymphatic fluid transport is suppressed by the aquaporin-4 inhibitor AER-271. Glia.

      Gomolka, R.S., Hablitz, L.M., Mestre, H., Giannetto, M., Du, T., Hauglund, N.L., Xie, L., Peng, W., Martinez, P.M., Nedergaard, M., et al. (2023). Loss of aquaporin-4 results in glymphatic system dysfunction via brain-wide interstitial fluid stagnation. eLife 12.

      Huber, V.J., Tsujita, M., and Nakada, T. (2009). Identification of aquaporin 4 inhibitors using in vitro and in silico methods. Bioorg Med Chem 17, 411-417.

      Igarashi, H., Huber, V.J., Tsujita, M., and Nakada, T. (2011). Pretreatment with a novel aquaporin 4 inhibitor, TGN-020, significantly reduces ischemic cerebral edema. Neurol Sci 32, 113-116.

      Igarashi, H., Tsujita, M., Suzuki, Y., Kwee, I.L., and Nakada, T. (2013). Inhibition of aquaporin-4 significantly increases regional cerebral blood flow. Neuroreport 24, 324-328.

      Jiang, R., Diaz-Castro, B., Looger, L.L., and Khakh, B.S. (2016). Dysfunctional Calcium and Glutamate Signaling in Striatal Astrocytes from Huntington's Disease Model Mice. J Neurosci 36, 3453-3470.

      Mayo, F., Gonzalez-Vinceiro, L., Hiraldo-Gonzalez, L., Calle-Castillejo, C., Morales-Alvarez, S., Ramirez-Lorca, R., and Echevarria, M. (2023). Aquaporin-4 Expression Switches from White to Gray Matter Regions during Postnatal Development of the Central Nervous System. Int J Mol Sci 24.

      Mola, M.G., Sparaneo, A., Gargano, C.D., Spray, D.C., Svelto, M., Frigeri, A., Scemes, E., and Nicchia, G.P. (2016). The speed of swelling kinetics modulates cell volume regulation and calcium signaling in astrocytes: A different point of view on the role of aquaporins. Glia 64, 139-154.

      Risher, W.C., Andrew, R.D., and Kirov, S.A. (2009). Real-time passive volume responses of astrocytes to acute osmotic and ischemic stress in cortical slices and in vivo revealed by two-photon microscopy. Glia 57, 207-221.

      Salman, M.M., Kitchen, P., Yool, A.J., and Bill, R.M. (2022). Recent breakthroughs and future directions in drugging aquaporins. Trends Pharmacol Sci 43, 30-42.

      Shigetomi, E., Bushong, E.A., Haustein, M.D., Tong, X., Jackson-Weaver, O., Kracun, S., Xu, J., Sofroniew, M.V., Ellisman, M.H., and Khakh, B.S. (2013). Imaging calcium microdomains within entire astrocyte territories and endfeet with GCaMPs expressed using adeno-associated viruses. J Gen Physiol 141, 633-647.

      Solenov, E., Watanabe, H., Manley, G.T., and Verkman, A.S. (2004). Sevenfold-reduced osmotic water permeability in primary astrocyte cultures from AQP-4-deficient mice, measured by a fluorescence quenching method. Am J Physiol Cell Physiol 286, C426-432.

      Verkman, A.S., Binder, D.K., Bloch, O., Auguste, K., and Papadopoulos, M.C. (2006). Three distinct roles of aquaporin-4 in brain function revealed by knockout mice. Biochim Biophys Acta 1758, 10851093.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:  

      Reviewer #1 (Public Review):  

      Summary:  

      The authors have presented data showing that there is a greater amount of spontaneous differentiation in human pluripotent cells cultured in suspension vs static and have used PKCβ and Wnt signaling pathway inhibitors to decrease the amount of differentiation in suspension culture.  

      Strengths:  

      This is a very comprehensive study that uses a number of different rector designs and scales in addition to a number of unbiased outcomes to determine how suspension impacts the behaviour of the cells and in turn how the addition of inhibitors counteracts this effect. Furthermore, the authors were also able to derive new hiPSC lines in suspension with this adapted protocol.  

      Weaknesses:  

      The main weakness of this study is the lack of optimization with each bioreactor change. It has been shown multiple times in the literature that the expansion and behaviour of pluripotent cells can be dramatically impacted by impeller shape, RPM, reactor design, and multiple other factors. It remains unclear to me how much of the results the authors observed (e.g. increased spontaneous differentiation) was due to not having an optimized bioreactor protocol in place (per bioreactor vessel type). For instance - was the starting seeding density, RPM, impeller shape, feeding schedule, and/or any other aspect optimized for any of the reactors used in the study, and if not, how were the values used in the study determined?  

      Thank you for your thoughtful comments. According to your comments, we have performed several experiments to optimize the bioreactor conditions in revised manuscripts. We tested several cell seeding densities and several stirring speeds with or without WNT/PKCβ inhibitors  (Figure 6—figure supplement 1). We found that 1 - 2 x 105 cells/mL of the seeding densities and 50 - 150 rpm of the stirring speeds were applicable in the proliferation of these cells. Also, PKCβ and Wnt inhibitors suppressed spontaneous differentiation in bioreactor conditions regardless with stirring speeds. As for the impeller shape and reactor design, we just used commonly-used ABLE's bioreactor for 30 mL scale and Eppendorf's bioreactors for 320 mL scale, which had been designed and used for human pluripotent stem cell culture conditions in previous studies, respectively (Matsumoto et al., 2022 (doi: 10.3390/bioengineering9110613); Kropp et al., 2016 (doi: 10.5966/sctm.2015-0253)). We cited these previous studies in the Results and Materials and Methods section. We believe that these additional data and explanation are sufficient to satisfy your concerns on the optimization of bioreactor experiments.

      Reviewer #2 (Public Review):  

      This study by Matsuo-Takasaki et al. reported the development of a novel suspension culture system for hiPSC maintenance using Wnt/PKC inhibitors. The authors showed elegantly that inhibition of the Wnt and PKC signaling pathways would repress spontaneous differentiation into neuroectoderm and mesendoderm in hiPSCs, thereby maintaining cell pluripotency in suspension culture. This is a solid study with substantial data to demonstrate the quality of the hiPSC maintained in the suspension culture system, including long-term maintenance in >10 passages, robust effect in multiple hiPSC lines, and a panel of conventional hiPSC QC assays. Notably, large-scale expansion of a clinical grade hiPSC using a bioreactor was also demonstrated, which highlighted the translational value of the findings here. In addition, the author demonstrated a wide range of applications for the IWR1+LY suspension culture system, including support for freezing/thawing and PBMC-iPSC generation in suspension culture format. The novel suspension culture system reported here is exciting, with significant implications in simplifying the current culture method of iPSC and upscaling iPSC manufacturing.  

      Another potential advantage that perhaps wasn't well discussed in the manuscript is the reported suspension culture system does not require additional ECM to provide biophysical support for iPSC, which differentiates from previous studies using hydrogel and this should further simplify the hiPSC culture protocol.  

      Interestingly, although several hiPSC suspension media are currently available commercially, the content of these suspension media remained proprietary, as such the signaling that represses differentiation/maintains pluripotency in hiPSC suspension culture remained unclear. This study provided clear evidence that inhibition of the Wnt/PKC pathways is critical to repress spontaneous differentiation in hiPSC suspension culture.  

      I have several concerns that the authors should address, in particular, it is important to benchmark the reported suspension system with the current conventional culture system (eg adherent feeder-free culture), which will be important to evaluate the usefulness of the reported suspension system.  

      Thank you for this insightful suggestion. In this revised manuscript, we have performed additional experiments using conventional media, mTeSR1 (Stem Cell Technologies, Vancouver, Canada), comparing with the adherent feeder-free culture system in four different hiPSC lines simultaneously. Compared to the adherent conditions, the suspension conditions without chemical treatment decreased the expression of self-renewal marker genes/proteins and increased the expression levels of SOX17, T, and PAX6 (Figure 4 - figure supplement 2). Importantly, the treatment of LY333531 and IWR-1-endo in mTeSR1 medium reversed the decreased expression of these undifferentiated markers and suppressed the increased expression of differentiation markers in suspension culture conditions, reaching the comparable levels of the adherent culture conditions. These results indicated that these chemical treatments in suspension culture are beneficial even when using a conventional culture medium.

      Also, the manuscript lacks a clear description of a consistent robust effect in hiPSC maintenance across multiple cell lines.  

      Thank you for this insightful suggestion. We have performed additional experiments on hiPSC maintenance across 5 hiPSC lines in suspension culture using StemFit AK02N medium simultaneously (Figure 3C - E). Overall, the treatment of LY333531 and IWR-1-endo in the StemFit AK02N medium reversed the decreased expression of these undifferentiated markers and suppressed the increased expression of differentiation markers in suspension culture conditions. Also as above, we have added results using conventional media, mTeSR1, in comparison to the adherent feeder-free culture system in four different hiPSC lines simultaneously. These results show that this chemical treatment consistently produced robust effects in hiPSC maintenance across multiple cell lines using multiple conventional media.

      There are also several minor comments that should be addressed to improve readability, including some modifications to the wording to better reflect the results and conclusions.  

      In the revised manuscript, we have added and corrected the descriptions to improve readability, including some modifications to the wording to better reflect the results and conclusions. 

      Reviewer #3 (Public Review):  

      In the current manuscript, Matsuo-Takasaki et al. have demonstrated that the addition of PKCβ and WNT signaling pathway inhibitors to the suspension cultures of iPSCs suppresses spontaneous differentiation. These conditions are suitable for large-scale expansion of iPSCs. The authors have shown that they can perform single-cell cloning, direct cryopreservation, and iPSC derivation from PBMCs in these conditions. Moreover, the authors have performed a thorough characterization of iPSCs cultured in these conditions, including an assessment of undifferentiated stem cell markers and genetic stability. The authors have elegantly shown that iPSCs cultured in these conditions can be differentiated into derivatives of three germ layers. By differentiating iPSCs into dopaminergic neural progenitors, cardiomyocytes, and hepatocytes they have shown that differentiation is comparable to adherent cultures.

      This new method of expanding iPSCs will benefit the clinical applications of iPSCs.  

      Recently, multiple protocols have been optimized for culturing human pluripotent stem cells in suspension conditions and their expansion. Additionally, a variety of commercially available media for suspension cultures are also accessible. However, the authors have not adequately justified why their conditions are superior to previously published protocols (indicated in Table 1) and commercially available media. They have not conducted direct comparisons.  

      Thank you for this careful suggestion. In this revised manuscript, we have added results using a conventional medium, mTeSR1 (Stem Cell Technologies), which has been used for the suspension culture in several studies. Compared to the adherent conditions using mTeSR1 medium, the suspension conditions with the same medium decreased the ratio of TRA1-60/SSEA4-positive cells and OCT4positive cells and the expression levels of OCT4 and NANOG and decreased the expression levels of SOX17, T, and PAX6 in 4 different hiPSC lines simultaneously (Figure 4 - Supplement 2). Importantly, the treatment of LY333531 and IWR-1-endo in the mTeSR1 medium reversed the decreased expression of these undifferentiated markers. With these direct comparisons, we were able to justify why our conditions are superior to previously published protocols using commercially available media.

      Additionally, the authors have not adequately addressed the observed variability among iPSC lines. While they claim in the Materials and Methods section to have tested multiple pluripotent stem cell lines, they do not clarify in the Results section which line they used for specific experiments and the rationale behind their choices. There is a lack of comparison among the different cell lines. It would also be beneficial to include testing with human embryonic stem cell lines.  

      Thank you for this insightful suggestion. In this revised manuscript, we have added results on 5 different hiPSC lines at the same time (Figure 3 C-E). Excuse for us, but it is hard to use human embryonic stem cell lines for this study due to ethical issues in Japanese governmental regulations. The treatment of LY333531 and IWR-1-endo increased the expression of self-renewal marker genes/proteins and decreased the expression levels of SOX17, T, and PAX6 in these hiPSC lines in general. These results indicated that these chemical treatments in suspension culture were robust in general while addressing the observed variability among iPSC lines.

      Additionally, there is a lack of information regarding the specific role of the two small molecules in these conditions.  

      In this revised manuscript, we have added data and discussion regarding the specific role of the two small molecules in these conditions in the Results and Discussion section. For using WNT signaling inhibitor, we hypothesized that adding Wnt signaling inhibitors may inhibit the spontaneous differentiation of hiPSCs into mesendoderm. Because exogenous Wnt signaling induces the differentiation of human pluripotent stem cells into mesendoderm lineages (Nakanishi et al, 2009; Sumi et al, 2008; Tran et al, 2009; Vijayaragavan et al, 2009; Woll et al, 2008). Also, endogenous expression and activation of Wnt signaling in pluripotent stem cells are involved in the regulation of mesendoderm differentiation potentials (Dziedzicka et al, 2021). For using PKC inhibitors, "To identify molecules with inhibitory activity on neuroectodermal differentiation, hiPSCs were treated with candidate molecules in suspension conditions. We selected these candidate molecules based on previous studies related to signaling pathways or epigenetic regulations in neuroectodermal development (reviewed in (GiacomanLozano et al, 2022; Imaizumi & Okano, 2021; Sasai et al, 2021; Stern, 2024) ) or in pluripotency safeguards (reviewed in (Hackett & Surani, 2014; Li & Belmonte, 2017; Takahashi & Yamanaka, 2016; Yagi et al, 2017))." 

      We also found that the expression of naïve pluripotency markers, KLF2, KLF4, KLF5, and DPPA3, were up-regulated in the suspension conditions treated with LY333531 and IWR-1-endo while the expression of OCT4 and NANOG was at the same levels (Figure 5—figure supplement 2). Combined with RT-qPCR analysis data on 5 different hiPSC lines (Figure 3E), these results suggest that IWRLY conditions may drive hiPSCs in suspension conditions to shift toward naïve pluripotent states.

      The authors have not attempted to elucidate the underlying mechanism other than RNA expression analysis.  

      Regarding the underlying mechanisms, we have added results and discussion in the revised manuscript.  For Wnt activation in human pluripotent stem cells, several studies reported some WNT agonists were expressed in undifferentiated human pluripotent stem cells (Dziedzicka et al., 2021; Jiang et al, 2013; Konze et al, 2014). In suspension culture, cell aggregation causes tight cell-cell interaction. The paracrine effect of WNT agonists in the cell aggregation may strongly affect neighbor cells to induce spontaneous differentiation into mesendodermal cells. Thus, we think that the inhibition of WNT signaling is effective to suppress the spontaneous differentiation into mesendodermal lineages in suspension culture.

      For PKC beta activation in human pluripotent stem cells, we have shown that phosphorylated PKC beta protein expression is up-regulated in suspension culture than in adherent culture with western blotting (Figure 3 - figure supplement 1). The treatment of PKCβ inhibitor is effective to suppress spontaneous differentiation into neuroectodermal lineages. For future perspectives, it is interesting to examine (1) how and why PKCβ is activated (or phosphorylated), especially in suspension culture conditions, and (2) how and why PKCβ inhibition can suppress the neuroectodermal differentiation. Conversely, it is also interesting to examine how and why PKCβ activation is related to neuroectodermal differentiation.

      For these reasons some aspects of the manuscript need to be extended:  

      (1) It is crucial for authors to specify the culture media used for suspension cultures. In the Materials and Methods section, the authors mentioned that cells in suspension were cultured in either StemFit AK02N medium, 415 StemFit AK03N (Cat# AK03N, Ajinomoto, Co., Ltd., Tokyo, Japan), or StemScale PSC416 suspension medium (A4965001, Thermo Fisher Scientific, MA, USA). The authors should clarify in the text which medium was used for suspension cultures and whether they observed any differences among these media.  

      Sorry for this confusion. Basically in this study, we use StemFit AK02N medium (Figure 1-5, 7-9). For bioreactor experiments (Figure 6), we use StemFit AK03N medium, which is free of human and animalderived components and GMP grade. To confirm the effect of IWRLY chemical treatment, we use StemScale suspension medium (Figure 4 - figure supplement 1) and mTeSR1 medium (Figure 4 - figure supplement 2 and Figure 8 - figure supplement 1). In the revised manuscript we clarified which medium was used for suspension cultures in the Results and Materials and Methods section.

      Although we have not compared directly among these media in suspension culture (, which is primarily out of the focus of this study), we have observed some differences in maintaining self-renewal characteristics, preventing spontaneous differentiation (including tendencies to differentiate into specific lineages), stability or variation among different experimental times in suspension culture conditions. Overcoming these heterogeneity caused by different media, the IWRLY chemical treatment stably maintain hiPSC self-renewal in general. We have added this issue in the Discussion section.

      (2) In the Materials and Methods section, the authors mentioned that they used multiple cell lines for this study. However, it is not clear in the text which cell lines were used for various experiments. Since there is considerable variation among iPSC lines, I suggest that the authors simultaneously compare 2 to 3 pluripotent stem cell lines for expansion, differentiation, etc.  

      Thank you for this careful suggestion. We have added more results on the simultaneous comparison using StemFit AK02N medium in 5 different hiPSC lines (Figure 3 C-E) and using mTeSR1 medium in 4 different hiPSC lines (Figure 4 - figure supplement 2). From both results, we have shown that the treatment of LY333531 and IWR-1-endo was beneficial in maintaining the self-renewal of hiPSCs while suppressing spontaneous differentiation.

      (3) Single-cell sorting can be confusing. Can iPSCs grown in suspensions be single-cell sorted?

      Additionally, what was the cloning efficiency? The cloning efficiency should be compared with adherent cultures.  

      Sorry for this confusion. With our method, iPSCs grown in IWRLY suspension conditions can be singlecell sorted. We have improved the clarity of the schematics (Figure 7A). Also, we added the data on the cloning efficiency, which are compared with adherent cultures (Figure 7B). The cloning efficiency of adherent cultures was around 30%. While the cloning efficiency of suspension cultures without any chemical treatment was less than 10%, the IWR-1-endo treatment in the suspension cultures increased the efficiency was more than 20%. However, the treatment of LY333531 decreased the efficiency. These results indicated that the IWR-1-endo treatment is beneficial in single-cell cloning in suspension culture.

      (4) The authors have not addressed the naïve pluripotent state in their suspension cultures, even though PKC inhibition has been shown to drive cells toward this state. I suggest the authors measure the expression of a few naïve pluripotent state markers and compare them with adherent cultures  

      Thank you for this insightful comment. In the revised manuscript, we have added the data of RT-qPCR in 5 different hiPSC lines and specific gene expression from RNA-seq on naïve pluripotent state markers (Figure 3E and Figure 5 - figure supplement 2), respectively. Interestingly, the expression of KLF2, KLF4, KLF5, and DPPA3 is significantly up-regulated in IWRLY conditions. These results suggested that IWRLY suspension conditions drove hiPSCs toward naïve pluripotent state.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):  

      Overall, I feel that this study is very interesting and comprehensive, but has significant weaknesses in the bioprocessing aspects. More optimization data is required for the suspension culture to truly show that the differentiation they are observing is not an artifact of a non-optimized protocol.  

      Thank you for your thoughtful comments. Following your comments, we have performed several experiments to optimize the bioreactor conditions in revised manuscripts. We tested several cell seeding densities and several stirring speeds with or without WNT/PKCβ inhibitors (Figure 6—figure supplement 1). From these optimization experiments, we found that 1 - 2 x 105 cells/mL of the seeding densities and 50 - 150 rpm of the stirring speeds were applicable in the proliferation of these cells. Also, PKCβ and Wnt inhibitors suppressed spontaneous differentiation in bioreactor conditions regardless with acceptable stirring speeds. As for the impeller shape and reactor design, we just used commonly-used ABLE's bioreactor for 30 mL scale and Eppendorf's bioreactors for 320 mL scale, which had been designed and used for human pluripotent stem cell culture conditions in previous studies, respectively (Matsumoto et al., 2022 (doi: 10.3390/bioengineering9110613); Kropp et al., 2016 (doi:10.5966/sctm.2015-0253). We cited these previous studies in the Results section. We believe that these additional data and explanation are sufficient to satisfy your concerns on the optimization of bioreactor experiments.

      Reviewer #2 (Recommendations For The Authors):  

      The following comments should be addressed by the authors to improve the manuscript:  

      (1) Abstract: '...a scalable culture system that can precisely control the cell status for hiPSCs is not developed yet.' There were previous reports for a scalable iPSC culture system so I would suggest toning down/rephrasing this point: eg that improvement in a scalable iPSC culture system is needed.  

      Thank you for this careful suggestion. Following this suggestion, We have changed the sentence as "the improvement in a scalable culture system that can precisely control the cell status for hiPSCs is needed."

      (2) Line 71: please specify what media was used as a 'conventional medium' for suspension culture, was it Stemscale?  

      As suggested, we specified the media as StemFit AK02N used for this experiment. 

      (3) Fig 1E: It's not easy to see gating in the FACS plots as the threshold line is very faint, please fix this issue.  

      As suggested, we used thicker lines for the gating in the FACS plots (Figure 1E).

      (4) Fig 1G-J, Fig 2D-H: The RNAseq figures appeared pixelated and the resolution of these figures should be improved. The x-axis label for Fig 1H is missing.  

      We have improved these figures in their resolution and clarity. Also, we have added the x-axis label as "enrichment distribution" for gene set enrichment analysis (GSEA) in Figures 1H, 5F, and 5- figure supplement 1B.

      (5) Line 103-107: 'Since Wnt signaling induces the differentiation of human pluripotent stem cells into mesendoderm lineages, and is endogenously involved in the regulation of mesendoderm differentiation of pluripotent stem cells.....'. The two points seem the same and should be clarified.  

      Sorry for this unclear description. We have changed this description as "Exogenous Wnt signaling induces the differentiation of human pluripotent stem cells into mesendoderm lineages (Nakanishi et al, 2009; Sumi et al, 2008; Tran et al, 2009; Vijayaragavan et al, 2009; Woll et al, 2008). Also, endogenous expression and activation of WNT signaling in pluripotent stem cells are involved in the regulation of mesendoderm differentiation potentials (Dziedzicka et al, 2021; Jiang et al, 2013)." With this description, we hope that you will understand the difference of two points.

      (6) Line 113: 'In samples treated with inhibitors' should be 'In samples treated with Wnt inhibitors'.  

      Thank you for this careful suggestion. We have corrected this. 

      (7) Line 115: '....there was no reduction in PAX6 expression.' That's not entirely correct, there was a reduction in PAX6 in IWR-1 endo treatment compared to control suspension culture (is this significant?), but not consistently for IWP-2 treatment. Please rephrase to more accurately describe the results.  

      Sorry for this inaccurate description. We have corrected this phrase as "there was only a small reduction in PAX6 expression in the IWR-1-endo-treated condition and no reduction in the IWP2-treated condition" as recommended.

      (8) It's critical to show that the effect of the suspension culture system developed here can maintain an undifferentiated state for multiple hiPSC lines. I think the author did test this in multiple cell lines, but the results are scattered and not easy to extract. I would recommend adding info for the hiPSC line used for the results in the legend, eg WTC11 line was used for Figure 3, 201B7 line was used for Figure 2. I would suggest compiling a figure that confirms the developed suspension system (IWR-1 +LY) can support the maintenance of multiple hiPSC lines.  

      Thank you for this insightful suggestion. We have added data on hiPSC maintenance across 5 hiPSC lines in suspension culture using StemFit AK02N medium simultaneously (Figure 3C - E) and on hiPSC maintenance across 4 hiPSC lines in suspension culture using mTeSR1 medium simultaneously  (Figure 4 - figure supplement 2). Together, the treatment of LY333531 and IWR-1-endo in these media reversed the decreased expression of these undifferentiated markers and suppressed the increased expression of differentiation markers in suspension culture conditions. These results show that these chemical treatment produced a consistent robust effect in hiPSC maintenance across multiple cell lines.

      (9) Line 166: Please use the correct gene nomenclature format for a human gene (italicised uppercase) throughout the manuscript. Also, list the full gene name rather than PAX2,3,5.  

      Sorry for the incorrectness of the gene names. We have corrected them.

      (10) Please improve the resolution for Figure 4D.  

      We have provided clearer images of Figure 4D.

      (11) In the first part of the study, the control condition was referred to as 'suspension culture' with spontaneous differentiation, but in the later parts sometimes the term 'suspension culture' was used to describe the IWR1+LY condition (ie lines 271-272). I would suggest the authors carefully go through the manuscript to avoid misinterpretation on this issue.  

      Thank you for this careful suggestion. To avoid this misinterpretation on this issue, we use 'suspension culture' for just the conventional culture medium and 'LYIWR suspension culture' for the culture medium supplemented with LY333531 and IWR1-endo in this manuscript.

      (12) Figure 5: It is impressive to demonstrate that the IWR1+LY suspension culture enables large-scale expansion of a clinical-grade hiPSC line using a bioreactor, yielding 300 vials/passage. Can the author add some information regarding cell yield using a conventional adherent culture system in this cell line? This will provide a comparison of the performance of the IWR1+LY suspension culture system to the conventional method.  

      Thank you for this valuable suggestion. We have provided information regarding cell yield using a conventional adherent culture system in this cell line in the Results as "Since the population doubling time (PDT) of this hiPSC line in adherent culture conditions is 21.8 - 32.9 hours at its production (https://www.cira-foundation.or.jp/e/assets/file/provision-of-ips-cells/QHJI14s04_en.pdf), this proliferation rate in this large scale suspension culture is comparable to adherent culture conditions."

      (13) Line 273: For testing the feasibility of using IWR1+LY media to support the freeze and thaw process, the author described the cell number and TRA160+/OCT4+ cell %. How is this compared to conventional media (eg E8)? It would be nice to see a head-to-head comparison with conventional media, quantification of cell count or survival would be helpful to determine this.  

      For this issue, we attempted a direct freeze and thaw process using conventional media, StemFit AK02N in 201B7 line (Figure 8) or mTeSR1 in 4 different hiPSC lines(Figure 8 - figure supplement 1) with or without IWR1+LY. However, since the hiPSCs cultured in suspension culture conditions without IWR1+LY quickly lost their self-renewal ability, these frozen cells could not be recovered in these conditions nor counted. Our results indicate that the addition to IWR1+LY in the thawing process support the successful recovery in suspension conditions.

      (14) More details of the passaging method should be added in the method section. Do you do cell count following accutase dissociation and replate a defined density (eg 1x10^5/ml)?  

      Yes. We counted the cells in every passage in suspension culture conditions. We have added more explanation in the Materials and Methods as below.

      "The dissociated cells were counted with an automatic cell counter (Model R1, Olympus) with Trypan Blue staining to detect live/dead cells. The cell-containing medium was spun down at 200 rpm for 3 minutes, and the supernatant was aspirated. The cell pellet was re-suspended with a new culture medium at an appropriate cell concentration and used for the next suspension culture."

      (15) The IWR1+LY suspension culture system requires passage every 3-5 days. Is there still spontaneous differentiation if the hiPSC aggregate grows too big?  

      Thank you for this insightful question.

      Yes. The size of hiPSC aggregates is critical in maintaining self-renewal in our method as previous studies showed. Stirring speed is a key to make the proper size of hiPSC aggregates in suspension culture. Also, the culture period between passages is another key not to exceed the proper size of hiPSC aggregates. Thus, we keep stirring speed at 90 rpm (135 rpm for bioreactor conditions) basically and passaging every 3 - 5 days in suspension culture conditions.

      (16) Several previous studies have described the development of hiPSC suspension culture system using hydrogel encapsulation to provide biophysical modulation (reviewed in PMID: 32117992). In comparison, it seems that the IWR1+LY suspension system described here does not require ECM addition which further simplifies the culture system for iPSC. It would be good to add more discussion on this topic in the manuscript, such as the potential role of the E-cadherin in mediating this effect - as RNAseq results indicated that CDH1 was upregulated in the IWR1+LY condition).  

      Thank you for this valuable suggestion. We have added more discussion on this topic in the Discussion section as below.

      "Thus, our findings show that suspension culture conditions with Wnt and PKCβ inhibitors (IWRLY suspension conditions) can precisely control cell conditions and are comparable to conventional adhesion cultures regarding cellular function and proliferation. Many previous 3D culture methods intended for mass expansion used hydrogel-based encapsulation or microcarrier-based methods to provide scaffolds and biophysical modulation (Chan et al, 2020). These methods are useful in that they enable mass culture while maintaining scaffold dependence. However, the need for special materials and equipment and the labor and cost involved are concerns toward industrial mass culture. On the other hand, our IWRLY suspension conditions do not require special materials such as hydrogels, microcarriers, or dialysis bags, and have the advantage that common bioreactors can be used. "

      "On the other hand, it is interesting to see whether and how the properties of hiPSCs cultured in IWRLY suspension culture conditions are altered from the adherent conditions. Our transcriptome results in comparison to adherent conditions show that gene expression associated with cell-to-cell attachment, including E-cadherin (CDH1), is more activated. This may be due to the status that these hiPSCs are more dependent on cell-to-cell adhesion where there is no exogenous cell-to-substrate attachment in the three-dimensional culture. Previous studies have shown that cell-to-cell adhesion by E-cadherin positively regulates the survival, proliferation, and self-renewal of human pluripotent stem cells (Aban et al, 2021; Li et al, 2012; Ohgushi et al, 2010). Furthermore, studies have shown that human pluripotent stem cells can be cultured using an artificial substrate consisting of recombinant E-cadherin protein alone without any ECM proteins (Nagaoka et al, 2010). Also, cell-to-cell adhesion through gap junctions regulates the survival and proliferation of human pluripotent stem cells (Wong et al, 2006; Wong et al, 2004). These findings raise the possibility that the cell-to-cell adhesion, such as E-cadherin and gap junctions, are compensatory activated and support hiPSC self-renewal in situations where there are no exogenous ECM components and its downstream integrin and focal adhesion signals are not forcedly activated in suspension culture conditions. It will be interesting to elucidate these molecular mechanisms related to E-cadherin in the hiPSC survival and self-renewal in IWRLY suspension conditions in the future."

      Reviewer #3 (Recommendations For The Authors):  

      (1) I am a bit confused about the passage of adherent cultures. The authors claim that they used EDTA for passaging and plated cells at a density of 2500 cells/cm2. My understanding is that EDTA is typically used for clump passaging rather than single-cell passaging.  

      Sorry about this confusion. We routinely use an automatic cell counter (model R1, Olympus) which can even count small clumpy cells accurately. Thus, we show the cell numbers in the passaging of adherent hiPSCs.  

      (2) Figure 2D- The authors have not directly compared IWR-1-endo with IWR-1-endo+Go6983 for the expression of T and SOX17, a simultaneous comparison would be an interesting data.  

      As recommended, we have added the data that directly compared IWR-1-endo with IWR-1endo+Go6983 for the expression of T and SOX17 in Figure 2D. The addition of IWR-1-endo alone decreased the expression of T and SOX17, but not PAX6, which were similar to the data in Figure 2C.

      (3) Oxygen levels play a crucial role in pluripotency maintenance. Could the authors please specify the oxygen levels used for culturing cells in suspension?  

      Sorry for not mentioning about oxygen levels in this study. We basically use normal oxygen levels (i.e., 21% O2) in suspension culture conditions. We have explained this in the Materials and Methods section.

      (4) Figure supplement 1 (G and H): In the images, it is difficult to determine whether the green (PAX6 and SOX17) overlaps with tdT tomato. For better visualization, I suggest that the authors provide separate images for the green and red colors, as well as an overlay.  

      Sorry for these unclear images. We have provided separate images for the green and red colors, as well as an overlay in Figure 1- figure supplement 1 G and H.

      (5) The authors have only compared quantitatively the expression of TRA-1-60 for most of the figures. I suggest that the authors quantitatively measure the expression of other markers of undifferentiated stem cells, such as NANOG, OCT4, SSEA4, TRA-1-81, etc.  

      We have added the quantitative data of the expression of markers of undifferentiated hiPSCs including NANOG, OCT4, SSEA4, and TRA-1-60 on 5 different hiPSC lines in Figure 3 C-E.

      (6) In Figure 2D, the authors have tested various small molecules but the rationale behind testing those molecules is missing in the text.  

      These molecules are chosen as putatively affecting neuroectodermal induction from the pluripotent state.

      We have added the rationale with appropriate references in the Results section as below.

      "We have chosen these candidate molecules based on previous studies related to signaling pathways or epigenetic regulations in neuroectodermal development (reviewed in (Giacoman-Lozano et al, 2022; Imaizumi & Okano, 2021; Sasai et al, 2021; Stern, 2024) ) or in pluripotency safeguards (reviewed in (Hackett & Surani, 2014; Li & Belmonte, 2017; Takahashi & Yamanaka, 2016; Yagi et al, 2017)) (Figure 2A; listed in Supplementary Table 1). "

      (7) In the beginning authors used Go6983 but later they switched to LY333531, the reasoning behind the switch is not explained well.  

      To explain the reasons for switching to LY333531 from Go6983 clearly, we reorganized the order of results and figures. In short, we found that the suppression of PAX6 expression in hiPSCs cultured in suspension conditions was observed with many PKC inhibitors, all of which possessed PKCβ inhibition activity (Figure 2—figure supplement 2B-D). Also, elevated expression of PKCβ in suspension-cultured hiPSCs could affect the spontaneous differentiation (Figure 3—figure supplement 1A-C). To further explore the possibility that the inhibition of PKCβ is critical for the maintenance of self-renewal of hiPSCs in the suspension culture, we evaluated the effect of LY333531, a PKCβ specific inhibitor. The maintenance of suspension-cultured hiPSCs is specifically facilitated by the combination of PKCβ and Wnt signaling inhibition (Figure 3A and B; Figure 2—figure supplement 1). Last, we performed longterm culture for 10 passages in suspension conditions and compared hiPSC growth in the presence of LY333531 or Go6983. LY333531 was superior in the proliferation rate and maintaining OCT4 protein expression in the long-term culture (Figure 4). Thus, we used IWR-1-endo and LY333531 for the rest of this study.

      (8) I suggest the authors measure cell death after the treatment with LY+IWR-1-endo.  

      Thank you for this valuable suggestion. We have measured cell death after the treatment with LY+IWR1-endo and found that the chemical combination had no or little effects on the cell death. We have added data in Figure 3—figure supplement 2 and the description in the Results section as below. "We also examined whether the combination of PKCb and Wnt signaling inhibition affects the cell survival in suspension conditions. In this experiment, we used another PKC inhibitor, Staurosporine (Omura et al, 1977), which has a strong cytotoxic effect as a positive control of cell death in suspension conditions. The addition of IWR-1-endo and LY333531 for 10 days had no effects on the apoptosis while the addition of Staurosporine for 2 hours induced Annexin-V-positive apoptotic cells  (Figure 3—figure supplement 2). These results indicate that the combination of PKCb and Wnt signaling inhibition has no or little effects on the cell survival in suspension conditions."

      (9) The authors have performed reprogramming using episomal vectors and using Sendai viruses. In both the protocols authors have added small molecules at different time points, for episomal vector protocol at day 3 and Sendai virus protocol at day 23. Why is this different?  

      Thank you for this insightful question. We intended that these differences should be reflected in the degree of the expression from these reprogramming vectors. The expression of reprogramming factors from these vectors should suppress the spontaneous differentiation in reprogramming cells. Sendai viral vectors should last longer than episomal plasmid vectors. Thus, we thought that adding these chemical inhibitors for episomal plasmid vector conditions from the early phase of reprogramming and for Sendai viral vector conditions from the late phase of reprogramming. For future perspectives, we might further need to optimize the timing of adding these molecules.

      (10) The protocol for three germ layer differentiation using a specific differentiation medium requires further elaboration. For instance, the authors mentioned that suspension cultures were transferred to differentiation media but did not emphasize the cell number and culture conditions before moving the cultures to the differentiation media.  

      Sorry for this unclear description. We have added the explanation on the cell number and culture conditions before moving the cultures to the differentiation media in the Materials and Methods section as below.

      "As in the maintenance conditions, 4 × 105 hiPSC were seeded in one well of a low-attachment 6-well plate with 4 mL of StemFit AK02N medium supplemented with 10 µM Y-27632. This plate was placed onto the plate shaker in the CO2 incubator. Next day, the medium was changed to the germ layer specific differentiation medium."

    1. Author response:

      Joint Public Reviews:

      Here, the authors compare how different operationalizations of adverse childhood experience exposure related to patterns of skin conductance response during a fear conditioning task. They use a large dataset to definitively understand a phenomenon that, to date, has been addressed using a range of different definitions and methods, typically with insufficient statistical power. Specifically, the authors compared the following operationalizations: dichotomization of the sample into "exposed" and "non-exposed" categories, cumulative adversity exposure, specificity of adversity exposure, and dimensional (threat versus deprivation) adversity exposure. The paper is thoughtfully framed and provides clear descriptions and rationale for procedures, as well as package version information and code. The authors' overall aim of translating theoretical models of adversity into statistical models, and comparing the explanatory power of each model, respectively, is an important and helpful addition to the literature. However, the analysis would be strengthened by employing more sophisticated modelling techniques that account for between-subjects covariates and the presentation of the data needs to be streamlined to make it clearer for the broad audience for which it is intended.

      Strengths

      Several outstanding strengths of this paper are the large sample size and its primary aim of statistically comparing leading theoretical models of adversity exposure in the context of skin conductance response. This paper also helpfully reports Cohen's d effect sizes, which aid in interpreting the magnitude of the findings. The methods and results are generally thorough.

      Weaknesses

      Weakness 1: The largest concern is that the paper primarily relies on ANOVAs and pairwise testing for its analyses and does not include between-subjects covariates. Employing mixedeffects models instead of ANOVAs would allow more sophisticated control over sources of random variance in the sample (especially important for samples from multi-site studies such as the present study), and further allow the inclusion of potentially relevant between-subjects covariates such as age (e.g. Eisenstein et al., 1990) and gender identity or sex assigned at birth (e.g. Kopacz II & Smith, 1971) (perhaps especially relevant due to possible to gender or sex-related differences in ACE exposure; e.g. Kendler et al., 2001). Also, proxies for socioeconomic status (e.g. income, education) can be linked with ACE exposure (e.g. Maholmes & King, 2012) and warrant consideration as covariates, especially if they differ across adversity-exposed and unexposed groups. 

      We appreciate the reviewer's suggestion and recognize the value of using (more) sophisticated statistical methods. However, we think that considerations which methods to employ should not only be guided by perceived complexity and think that the chosen ANOVA -based approach provides reliable and valid data. In our revision, we address the reviewer's suggestion by demonstrating that employing mixed models leaves the reported results unchanged (a). We would also like to refer the reviewer to the robustness analyses provided in the initial supplementary material (b).

      a) Re-running analyses using mixed models

      Based on the reviewers' suggestion, we repeated our main analyses (association between exposure to childhood adversity and SCRs, arousal, valence, and contingency ratings during fear acquisition and generalization) using linear mixed models, including age, sex, educational attainment, and childhood adversity as fixed effects, and site as a random effect. These analyses produced results similar to those in our manuscript, demonstrating a significant effect of childhood adversity on SCRs, as assessed by CS discrimination during both acquisition training and the generalization phase, and on general reactivity, but not on linear deviation scores (LDS). For the different rating types, we did not observe any significant effects of childhood adversity.

      We would prefer to retain our main analyses as they are and report the linear mixed model results as additional results in the supplement. However, if the reviewer and editor have strong preferences otherwise, we are open to presenting the mixed models in the main manuscript and moving our previous analyses to the supplement.

      We added the following paragraph to the main manuscript (page 25-26):

      “At the request of a reviewer, we repeated our main analyses by using linear mixed models including age, sex, school degree (i.e., to approximate socioeconomic status), and exposure to childhood adversity as mixed effects as well as site as random effect. These analyses yielded comparable results demonstrating a significant effect of childhood adversity on CS discrimination during acquisition training and the generalization phase as well as on general reactivity, but not on the generalization gradients in SCRs (see Supplementary Table 2 A). Consistent with the results of the main analyses reported in our manuscript, we did not observe any significant effects of childhood adversity on the different types of ratings when using mixed models (see Supplementary Table 2 B-D). Some of the mixed model analyses showed significantly lower CS discrimination during acquisition training and generalization, and lower general reactivity in males compared to females (see Supplementary Table 2 for details).”

      b) Additional robustness tests for the main analyses (already provided in the initial submission as supplementary material)

      We would also like to refer the reviewer to the robustness analyses in the initial supplement to account for possible site effects. Adding site to the analyses affected the pvalue in only one instance: entering site as covariate in analyses of CS discrimination during acquisition training attenuated the p-value of the ACQ exposure effect from p = 0.020 to p = 0.089.

      Further robustness checks involved repeating our main analyses while excluding (a) physiological non-responders (participants with only SCRs = 0) and (b) extreme outliers (data points ± 3 SDs from the mean) to ensure generalizable results. These repetitions of the analyses did not lead to any changes in the results.

      We did not include age in our primary analyses due to the homogeneity of our sample and the lack of related hypotheses. Additionally, socio-economic status was assessed only crudely via the highest education level attained, rendering it of limited use.

      Weakness 2: On a related methodological note, the authors mention that scores representing threat and deprivation were not problematically collinear due to VIFs being <10; however, some sources indicate that VIFs should be <5 (e.g. Akinwande et al., 2015).

      We thank the reviewer for bringing different cut-offs to our attention. We have revised this section to highlight the arbitrary nature of their interpretation (page 33):

      “Within the dimensional model framework, the issue of multicollinearity among predictors (i.e., different childhood adversity types) is frequently discussed (McLaughlin et al., 2021; Smith & Pollak, 2021). If we apply the rule of thumb of a variance inflation factor (VIF) > 10, which is often used in the literature to indicate concerning multicollinearity (e.g., Hair, Anderson, Tatham, & Black, 1995; Mason, Gunst, & Hess, 1989; Neter, Wasserman, & Kutner, 1989), we can assume that that multicollinearity was not a concern in our study (abuse: VIF = 8.64; neglect: VIF = 7.93). However, some authors state that VIFs should not exceed a value of 5 (e.g., Akinwande, Dikko, and Samson (2015)), while others suggest that these rules of thumb are rather arbitrary (O’brien, 2007).”

      Weakness 3: Additionally, the paper reports that higher trait anxiety and depression symptoms were observed in individuals exposed to ACEs, but it would be helpful to report whether patterns of SCR were in turn associated with these symptom measures and whether the different operationalizations of ACE exposure displayed differential associations with symptoms.

      We thank the reviewer for highlighting these relevant points. We have included additional analyses in the supplementary material in response to this comment. Figures and the corresponding text are also copied below for your convenience.

      We added the following paragraphs to the main manuscript: Methods (page 21):

      “Analyses of trait anxiety and depression symptoms

      To further characterize our sample, we compared individuals being unexposed compared to exposed to childhood adversity on trait anxiety and depression scores by using Welch tests due to unequal variances.

      On the request of a reviewer, we additionally investigated the association of childhood adversity as operationalized by the different models used in our explanatory analyses (i.e., cumulative risk, specificity, and dimensional model) and trait anxiety as well as depression scores (see Supplementary Figure 7). By using STAI-T and ADS-K scores as independent variable, we calculated a) a comparison of conditioned responding of the four severity groups (i.e., no, low, moderate, severe exposure to childhood adversity) using one-way ANVOAs and the association with the number of sub-scales exceeding an at least moderate cut-off in simple linear regression models for the implementation of the cumulative risk model, and b) the association with the CTQ abuse and neglect composite scores in separate linear regression models for the implementation of the specificity/dimensional models. On request of the reviewer, we also calculated the Pearson correlation between trait anxiety (i.e., STAI-T scores), depression scores (i.e., ADS-K scores) and conditioned responding in SCRs (see Supplementary Table 8).”

      Results (page 38):

      “Analyses of trait anxiety and depression symptoms

      As expected, participants exposed to childhood adversity reported significantly higher trait anxiety and depression levels than unexposed participants (all p’s < 0.001; see Table 1 and Supplementary Figure 6). This pattern remained unchanged when childhood adversity was operationalized differently - following the cumulative risk approach, the specificity, and dimensional model (see methods). These additional analyses all indicated a significant positive relationship between exposure to childhood adversity and trait anxiety as well as depression scores irrespective of the specific operationalization of “exposure” (see Supplementary Figure 7).

      CS discrimination during acquisition training and the generalization phase, generalization gradients, and general reactivity in SCRs were unrelated to trait anxiety and depression scores in this sample with the exception of a significant association between depression scores and CS discrimination during fear acquisition training (see Supplementary Table 8). More precisely, a very small but significant negative correlation was observed indicating that high levels of depression were associated with reduced levels of CS discrimination (r = -0.057, p =0.033). The correlation between trait anxiety levels and CS discrimination during fear acquisition training was not statistically significant but on a descriptive level, high anxiety scores were also linked to lower CS discrimination scores (r = -0.05, p = 0.06) although we highlight that this should not be overinterpreted in light of the large sample. However, both correlations (i.e., CS-discrimination during fear acquisition training and trait anxiety as well as depression, respectively) did not statistically differ from each other (z = 0.303, p = 0.762, Dunn & Clark, 1969). Interestingly, and consistent with our results showing that the relationship between childhood adversity and CS discrimination was mainly driven by significantly lower CS+ responses in exposed individuals, trait anxiety and depression scores were significantly associated with SCRs to the CS+, but not to the CS- during acquisition training (see Supplementary Table 8).”

      Weakness 4: Given the paper's framing of SCR as a potential mechanistic link between adversity and mental health problems, reporting these associations would be a helpful addition. These results could also have implications for the resilience interpretation in the discussion (lines 481-485), which is a particularly important and interesting interpretation.

      We have added a paragraph on this to the discussion (page 41):

      “Interestingly, in our study, trait anxiety and depression scores were mostly unrelated to SCRs, defined by CS discrimination and generalization gradients based on SCRs as well as general SCR reactivity, with the exception of a significant - albeit minute - relationship between CS discrimination during acquisition training and depression scores (see above). Although reported associations in the literature are heterogeneous (Lonsdorf et al., 2017), we may speculate that they may be mediated by childhood adversity. We conducted additional mediation analyses (data not shown) which, however, did not support this hypothesis. As the potential links between reduced CS discrimination in individuals exposed to childhood adversity and the developmental trajectories of psychopathological symptoms are still not fully understood, future work should investigate these further in - ideally - prospective studies.”

      Weakness 5: Given that the manuscript criticizes the different operationalizations of childhood adversity, there should be greater justification of the rationale for choosing the model for the main analyses. Why not the 'cumulative risk' or 'specificity' model? Related to this, there should also be a stronger justification for selecting the 'moderate' approach for the main analysis. Why choose to cut off at moderate? Why not severe, or low? Related to this, why did they choose to cut off at all? Surely one could address this with the continuous variable, as they criticize cut-offs in Table 2.

      We thank the reviewers and editors for bringing to our attention that our reasoning for choosing the main model was not clear. As outlined in the manuscript, we chose the approach for the main analyses from the literature as a recent review on this topic (Ruge et al., 2023) has shown the moderate CTQ cut-off to be the most abundantly employed in the field of research on associations between childhood adversity and threat learning. We have made this rationale more explicit in our revised manuscript (page 15/21):

      “Operationalization of "exposure"

      We implemented different approaches to operationalize exposure to childhood adversity in the main analyses and exploratory analyses (see Table 2). In the main analyses, we followed the approach most commonly employed in the field of research on childhood adversity and threat learning - using the moderate exposure cut-off of the CTQ (for a recent review see Ruge et al. (2024)). In addition, the heterogeneous operationalizations of classifying individuals into exposed and unexposed to childhood adversity in the literature (Koppold, Kastrinogiannis, Kuhn, & Lonsdorf, 2023; Ruge et al., 2024) hampers comparison across studies and hence cumulative knowledge generation. Therefore, we also provide exploratory analyses (see below) in which we employ different operationalizations of childhood adversity exposure.”

      “Exploratory analyses

      Additionally, the different ways of classifying individuals as exposed or unexposed to childhood adversity in the literature (Koppold et al., 2023; for discussion see Ruge et al., 2024) hinder comparison across studies and hence cumulative knowledge generation. Therefore, we also conducted exploratory analyses using different approaches to operationalize exposure to childhood adversity (see Table 2 for details).”

      Furthermore, as correctly noted, we fully agree that employing the moderate cut-off (or any cut-off in fact) is in principle an arbitrary decision - despite being guided by and derived from the literature in the field. However, we would like to draw the reviewers’ attention to Figure 5 in the initial submission (please see also below): Although the differences in SCR between severity groups were not significant, the overall pattern suggests at a descriptive level that the decline in CS discrimination, LDS and general reactivity in SCR occurs mainly when childhood adversity exceeds a moderate level. Thus, while we used the moderate cut-off as it was recently shown to be the most widely used approach in the literature (see Ruge et al., 2023), our exploratory analyses also seem to suggest on a descriptive level, that this cut-off may indeed “make sense”. We also refer to this in the results section (page 31-32) and discussion (page 43-44):

      Results:

      “However, on a descriptive level (see Figure 5), it seems that indeed exposure to at least a moderate cut-off level may induce behavioral and physiological changes (see main analysis, Bernstein & Fink, 1998). This might suggest that the cut-off for exposure commonly applied in the literature (see Ruge et al., 2024) may indeed represent a reasonable approach.”

      Discussion:

      “It is noteworthy, however, that this cut-off appears to map rather well onto psychophysiological response patterns observed here (see Figure 5). More precisely, our exploratory results of applying different exposure cut-offs (low, moderate, severe, no exposure) seem to indicate that indeed a moderate exposure level is “required” for the manifestation of physiological differences, suggesting that childhood adversity exposure may not have a linear or cumulative effect.”

      Weakness 6: In the Introduction, the authors predict less discrimination between signals of danger (CS+) and safety (CS-) in trauma-exposed individuals driven by reduced responses to the CS+. Given the potential impact of their findings for a larger audience, it is important to give greater theoretical context as to why CS discrimination is relevant here, and especially what a reduction in response specifically to danger cues would mean (e.g. in comparison to anxiety, where safety learning is impacted).

      We thank the reviewer for highlighting that this was not sufficiently clear. We revised the paragraph in the introduction as follows (page 7-8):

      “Fear acquisition as well as extinction are considered as experimental models of the development and exposure-based treatment of anxiety- and stress-related disorders. Fear generalization is in principle adaptive in ensuring survival (“better safe than sorry”), but broad overgeneralization can become burdensome for patients. Accordingly, maintaining the ability to distinguish between signals of danger (i.e., CS+) and safety (i.e., CS-) under aversive circumstances is crucial, as it is assumed to be beneficial for healthy functioning (Hölzel et al., 2016) and predicts resilience to life stress (Craske et al., 2012), while reduced discrimination between the CS+ and CS- has been linked to pathological anxiety (Duits et al., 2015; Lissek et al., 2005): Meta-analyses suggest that patients suffering from anxiety- and stress-related disorders show enhanced responding to the safe CS- during fear acquisition (Duits et al., 2015). During extinction, patients exhibit stronger defensive responses to the CS+ and a trend toward increased discrimination between the CS+ and CS- compared to controls, which may indicate delayed and/or reduced extinction (Duits et al., 2015). Furthermore, meta-analytic evidence also suggests stronger generalization to cues similar to the CS+ in patients and more linear generalization gradients (Cooper, van Dis, et al., 2022; Dymond, Dunsmoor, Vervliet, Roche, & Hermans, 2015; Fraunfelter, Gerdes, & Alpers, 2022). Hence, aberrant fear acquisition, extinction, and generalization processes may provide clear and potentially modifiable targets for intervention and prevention programs for stress-related psychopathology (McLaughlin & Sheridan, 2016).”

      Recommendations for the authors:

      Abstract:

      Comment 1:

      (a) It does not succinctly describe the background rationale well (i.e. it tries to say too much). It should be streamlined. There is a lot of 'jargon', which muddies the results, and too many concepts are introduced at each part and assume knowledge from the reader. 

      We thank the reviewer for providing constructive guidance for revisions. We have revised our abstract according to these suggestions.

      (b) Multiple terms for childhood trauma are used: ACEs, early adversity, childhood trauma, and childhood maltreatment. Choose one term and stick to it to enhance clarity. Why not just use childhood adversity, as in the title? Related to this, the use of ACEs sets up an expectation that ACE questionnaire was used, so readers are then surprised to find they used the childhood trauma questionnaire.

      We thank the reviewer for bringing this to our attention. As suggested by the reviewer, we use the term “childhood adversity” in our revised manuscript.

      Introduction:

      Comment 2:

      The phrasing seems to 'exaggerate' the trauma problem and is too broad in the first paragraph - e.g., "two-thirds of people experience one or more traumatic events..." It is important to clarify that not all of these people will go on to develop behavioral, somatic, and psychopathological conditions. Could break this down more into how many people have low, moderate, or severe for clarity, as 1 childhood adversity is different to 5+, and the type.

      We thank the reviewer for bringing this to our attention and have revised the first paragraph accordingly (page 6). Please note, however, that in the literature typically a specific cut-off (e.g. moderate) is used and the number of individuals that would meet different cut-offs (e.g., low and high) are not specifically reported.

      “Exposure to childhood adversity is rather common, with nearly two thirds of individuals experiencing one or more traumatic events prior to their 18th birthday (McLaughlin et al., 2013). While not all trauma-exposed individuals develop psychopathological conditions, there is some evidence of a dose-response relationship (Danese et al., 2009; Smith & Pollak, 2021; Young et al., 2019). As this potential relationship is not yet fully clear, understanding the mechanisms by which childhood adversity becomes biologically embedded and contributes to the pathogenesis of stress-related somatic and mental disorders is central to the development of targeted intervention and prevention programmes.”

      Comment 3:

      The published cut-offs for exposed/unexposed should be indicated here.

      We have included the published cut-offs as suggested (page 10):

      We operationalize childhood adversity exposure through different approaches: Our main analyses employ the approach adopted by most publications in the field (see Ruge et al., 2024 for a review) - dichotomization of the sample into exposed vs. unexposed based on published cut-offs for the Childhood Trauma Questionnaire [CTQ; Bernstein et al. (2003); Wingenfeld et al. (2010)]. Individuals were classified as exposed to childhood adversity if at least one CTQ subscale met the published cut-off (Bernstein & Fink, 1998; Häuser, Schmutzer, & Glaesmer, 2011) for at least moderate exposure (i.e., emotional abuse  13, physical abuse  10, sexual abuse  8, emotional neglect  15, physical neglect  10).

      Comment 4:

      Please check for overly complex sentences, and reduce the complexity. For example: "In addition, we provide exploratory analyses that attempt to translate dominant (verbal) theoretical accounts (McLaughlin et al., 2021; Pollak & Smith, 2021) on the impact of exposure to ACEs into statistical tests while acknowledging that such a translation is not unambiguous and these exploratory analyses should be considered as showcasing a set of plausible solutions."

      We have revised this section and carefully proofread our manuscript by paying attention to this (page 10):

      “In addition, we provide exploratory analyses that attempt to translate dominant (verbal) theoretical accounts (McLaughlin et al., 2021; Pollak & Smith, 2021) on the impact of exposure to childhood adversity into statistical tests. At the same time, we acknowledge that such a translation is not unambiguous and these exploratory analyses should be considered as showcasing a set of plausible solutions”

      Here is another example of reducing the complexity of our sentences (page 6):

      “Learning is a core mechanism through which environmental inputs shape emotional and cognitive processes and ultimately behavior. Thus, learning mechanisms are key candidates potentially underlying the biological embedding of exposure to childhood adversity and their impact on development and risk for psychopathology (McLaughlin & Sheridan, 2016).”

      Methods:

      Comment 5:

      Is this study part of a larger project? These outcomes were probably not the primary outcomes of this multicenter project. The readers need to understand how this (crosssectional?) analysis was nested in this larger trial.

      We thank the reviewers and editor for bringing to our attention that this was not sufficiently clear. Thus far, we included the information that we used the participants recruited for large multicentric study in the main manuscript, but point to the inclusion of more information in the supplement (page 11):

      “In total, 1678 healthy participants (age_M_ = 25.26 years, age_SD_ = 5.58 years, female = 60.10%, male = 39.30%) were recruited in a multi-centric study at the Universities of Münster, Würzburg, and Hamburg, Germany (SFB TRR58). Data from parts of the Würzburg sample have been reported previously (Herzog et al., 2021; Imholze et al., 2023; Schiele, Reinhard, et al., 2016; Schiele, Ziegler, et al., 2016; Stegmann et al., 2019). These previous reports, also those focusing on experimental fear conditioning (Schiele, Reinhard, et al., 2016; Stegmann et al., 2019), addressed, however, research questions different from the ones investigated here (see also Supplementary Material for details).”

      Moreover, we have included additional information on the larger trial in our revised supplement (page 2):

      “Participants of this study were recruited in a multi-centric collaborative research center “Fear, anxiety, anxiety disorders” joining forces between the Universities of Hamburg,

      Würzburg, and Münster, Germany (SFB TRR58). During the second funding period of (20132016), all three sites recruited a large sample (N ~500) in the context of the Z project. All participants underwent the cross-sectional experimental paradigm reported here and were additionally extensively characterized to allow specific subprojects to recruit target subpopulations serving different aims with a focus on molecular genetic, epigenetic, or other research questions (see Herzog et al. (2021); Imholze et al. (2023); Schiele, Reinhard, et al. (2016); Schiele, Ziegler, et al. (2016); Stegmann et al. (2019)). The question on the association of exposure to childhood adversity and recent adversity was part of the primary research question of one subproject led by the senior author of this work (B07, TBL) and was hence a research question of primary interest also for this multicentric project.”

      Comment 6:

      Table 1 does not include percentages (a reader must calculate them: for example, 15% exposed?). These numbers belong in the results (i.e., it is confusing to read about the exposed/non-exposed before we know how it has been calculated).

      We have added the percentages as suggested and have included information on how exposed and unexposed was calculated as a table caption. We have considered moving the table to the results section but find it more suitable here. 

      Comment 7:

      A procedure figure could be useful.

      We thank the reviewer for this advice and have included a procedure figure in the supplementary material.

      Comment 8:

      Physiological data recordings and processing paragraph: The reasoning as to why the authors chose log transformation over square root transformation, or an approach that does not require transformation is not clear.

      We thank the reviewer for notifying us that we did not make this point clear enough. We opted for a log-transformation and range-correction of the SCR data because we use these transformations consistently in our laboratory (e.g., Ehlers et al., 2020; Kuhn et al., 2016; Scharfenort & Lonsdorf, 2016; Sjouwerman et al., 2015; Sjouwerman et al. 2020). In addition, log-transformed and range-corrected data are assumed to be closer to a normal distribution, to have a lower error variance resulting in larger effect sizes (Lykken & Venables, 1971; Lykken, 1972; Sjouwerman et al., 2022), and appear to have - at least descriptively - higher reliability compared to raw data (Klingelhöfer-Jens et al., 2022). We added a sentence on this to the methods section (page 14):

      Note that previous work using this sample (Schiele, Reinhard, et al., 2016; Stegmann et al., 2019) had used square-root transformations but we decided to employ a log-transformation and range-correction (i.e., dividing each SCR by the maximum SCR per participant). We used log-transformation and range-correction for SCR data because these transformations are standard practice in our laboratory and we strive for methodological consistency across different projects (e.g., Ehlers, Nold, Kuhn, Klingelhöfer-Jens, & Lonsdorf, 2020; Kuhn, Mertens, & Lonsdorf, 2016; Scharfenort, Menz, & Lonsdorf, 2016; Sjouwerman & Lonsdorf, 2020; Sjouwerman, Niehaus, & Lonsdorf, 2015). Additionally, log-transformed and rangecorrected data are generally assumed to approximate a normal distribution more closely and exhibit lower error variance, which leads to larger effect sizes (Lykken, 1972; Lykken & Venables, 1971; Sjouwerman, Illius, Kuhn, & Lonsdorf, 2022). Additionally, on a descriptive level, this combination of transformations appear to offer greater reliability compared to using raw data alone (Klingelhöfer-Jens, Ehlers, Kuhn, Keyaniyan, & Lonsdorf, 2022).

      Ehlers, M. R., Nold, J., Kuhn, M., Klingelhöfer-Jens, M., & Lonsdorf, T. B. (2020). Revisiting potential associations between brain morphology, fear acquisition and extinction through new data and a literature review. Scientific Reports, 10(1), 19894. https://doi.org/10.1038/s41598-020-76683-1

      Kuhn, M., Mertens, G., & Lonsdorf, T. B. (2016). State anxiety modulates the return of fear. International Journal of Psychophysiology: Official Journal of the International Organization of Psychophysiology, 110, 194–199. https://doi.org/10.1016/j.ijpsycho.2016.08.001

      Scharfenort, R., & Lonsdorf, T. B. (2016). Neural correlates of and processes underlying generalized and differential return of fear. Social Cognitive and Affective Neuroscience, 11(4), 612–620. https://doi.org/10.1093/scan/nsv142

      Sjouwerman, R., Niehaus, J., & Lonsdorf, T. B. (2015). Contextual Change After Fear Acquisition Affects Conditioned Responding and the Time Course of Extinction Learning—Implications for Renewal Research. Frontiers in Behavioral Neuroscience, 9. https://doi.org/10.3389/fnbeh.2015.00337

      Sjouwerman, R., Scharfenort, R., & Lonsdorf, T. B. (2020). Individual differences in fear acquisition: Multivariate analyses of different emotional negativity scales, physiological responding, subjective measures, and neural activation. Scientific Reports, 10(1), 15283. https://doi.org/10.1038/s41598-020-72007-5

      Comment 9:

      There are 24 lines of text of R packages. I do not think this is necessary for the manuscript document and could be moved to the Supplement.

      We thank the reviewer for this comment and understand that it may take a considerable amount of space to list all the references of the R packages. However, we think it is important to prominently credit the respective authors of the R packages. Yet, if this is an important concern of the reviewer and editor, we will reconsider this point.

      Comment 10:

      It is not clear why the authors chose to analyze summary scores across trials rather than including a time factor for the acquisition phase.

      We would like to thank the reviewer for highlighting that the factor time may be interesting as well. However, we think that in our case the time factor is less interesting, as the acquisition effect itself is rather strong. Nevertheless, we have included a figure in the supplement that shows the time course of the SCR by displaying trial-by-trial data across the acquisition and generalization phase for transparency. This figure (Supplementary figure 4) shows that the trajectories appear to barely differ between individuals who were unexposed vs. exposed to moderate childhood adversity. Hence, we think that the analysis approach we have chosen is unlikely to overshadow central time-depending effects. However, if the reviewer and editor has strong feelings about this point, we will consider integrating additional analyses including the time factor in the supplement.

      Results:

      Comment 11:

      The caption of Figure 3 does not match the figure. Please check this.

      We thank the reviewers and editor for attentive reading and have revised this part.

      References:

      Comment 12:

      The Ruge et al paper that is cited many times throughout does not have a valid DOI in the References section. Additionally, the author list on the preprint server is substantially different from that listed in the manuscript. Please correct this reference.

      We thank the reviewers and editor for attentive reading and have corrected this reference. The provided doi was functioning at our end and we hope that this now also applies to the reviewers.

    1. Author response:

      Reviewer #1:

      Response to Public Review

      We thank the reviewer for taking the time to carefully read our paper and to provide helpful comments and suggestions, most of which we have incorporated in our revised manuscript.  One of this reviewer’s (and reviewer #2’s) main concerns was that the confocal images provided in some cases did not appear to reflect the quantitative data in the bar graphs.  These images were provided only for illustrative purposes, to give the reader a sense of what the primary data look like. The reviewer may not have appreciated that the quantitative data reflect counts of RNA smFISH signals (dots) in hundreds of cells collected through z-stacks comprising multiple optical sections in multiple flies for each condition  For example, in P1a control condition (in Figure 2A), we have analyzed 135 neurons from 8 individuals. There, the number of z-planes ranged from 3 to 8 per hemisphere. It is generally not possible to find a single confocal section that encompasses quantitatively the statistics that are presented in the graphs. Presenting the data as an MIP (Maximum Intensity Projection, i.e., collapsed z-stack) in a single panel would generate an image that is too cluttered to see any detail.  We have now included, for the reader’s benefit, additional example confocal sections in both a z-stack and from the opposite hemisphere, in Supplemental Figure S4D. We have also inserted clarifying statements in the text on p. 7 (lines 154-156).

      Another suggestion from Reviewer #1 is that "it would be more informative to separate in the quantification between the GAL4-expressing neurons and the non-expressing ones" based on the presented pictures where more non-P1a neurons (that the reviewer speculates may be pC1-type neurons) are activated by a male-male encounter than by a male-female encounter, while the P1a-positive neurons seem to be more responsive during courtship behavior. In this paper, we were not looking at pC1 neurons and did not try to answer which neuronal population(s) outside of the P1a population is/are responsible for aggression and/or courtship. Rather, we focused on P1a neurons and addressed whether P1a neurons that induce both aggression and courtship behavior when they are artificially activated (Hoopfer et al. 2015) are also naturally activated during spontaneous performance of these two social behaviors. However, this result did not exclude the possibility that P1a neurons were inactive during naturalistic courtship or aggression. Our data in the current manuscript provide further experimental evidence in support of the idea that P1a neurons as a population play a role in both of these behaviors. Moreover, we provided data identifying P1a neurons activated only during aggression or during courtship (or both). However this does not exclude that pC1 or other neighboring populations are activated during aggression as well (See also the response to 'Recommendations For The Authors' and text lines 151-154).

      In Figure 3, we used opto-HI-FISH to identify candidate downstream targets (direct or indirect) of P1a neurons. We used 50 Hz Chrimson stimulation to activate P1a neurons to induce expression of Hr38 and identified Kenyon cells in the mushroom body (MB) and PAM neurons (as well as pCd neurons) as potential downstream targets of P1a cells. In Figure 3 – supplement we performed calcium imaging of KCs and PAM neurons in response to P1a optogenetic stimulation to confirm independently our results from the Hr38 labeling experiments. That control was the purpose of that supplemental experiment.

      Based on those imaging data, the reviewer asked the further question of which [natural] behavioral context induces Hr38 expression in these populations (i.e., mating or aggression). This question is reasonable because our calcium imaging data (Figure 3-supplement) showed that both Kenyon cells and PAM neurons are active only during photo-stimulation of P1a neurons.  Our previous behavioral studies (Inagaki et al., 2014; Hoopfer et al., 2015) showed that 50 Hz photo-stimulation of P1a neurons in freely moving flies induced unilateral wing extension during stimulation, while aggression was observed only after the offset of the stimulation (Hoopfer et.al., 2015). Based on the comparison of those behavioral data to the imaging results in this paper, the reviewer suggested that Kenyon cells and PAM neurons are activated during courtship rather than during aggression. This is certainly a possible interpretation. However it is difficult to extrapolate from behavioral experiments in freely moving animals to calcium imaging results in head-fixed flies, particularly with response to neural dynamics.  Furthermore, Hr38 expression, like that of other IEGs (e.g., c-fos), may reflect persistently activated 2nd messenger pathways (e.g., cAMP, IP3) in Kenyon cells and PAM neurons that are not detected by calcium imaging, but that nevertheless play a role in mediating its behavioral effects. We still do not understand the mechanisms of how optogenetic stimulation of P1a neurons in freely behaving flies induces aggression vs. courtship behavior. Although 50 Hz stimulation of P1a neurons does not induce aggressive behavior during photo-stimulation, it is possible that this manipulation activates both aggression and courtship circuits, but that the courtship circuit might inhibit aggressive behavior at a site downstream of the MB (e.g., in the VNC). Once stimulation is terminated and courtship stops the fly would show aggressive behavior, due to release of that downstream inhibition (see Models in Anderson (2016) Fig 2d, e). In that case, there would be no apparent inconsistency between the imaging data and behavioral data. We agree that the reviewer's question is interesting and important but we feel that answering this question with decisive experiments is beyond the scope of this manuscript.

      Finally, Reviewer #1 suggested a method to evaluate the Hr38 signals in the catFISH experiment of Figure 4. We appreciate their suggestions, but the way that we evaluated the Hr38 signals was basically the same as the way the reviewer suggested. We apologize for the confusion caused by the lack of detailed descriptions in the original manuscript. We have now revised the methods section to explain more clearly how we define the cells as positive based on Hr38EXN and Hr38INT signals.

      Response to Recommendations for the authors:

      “To strengthen the author's argumentation, I would distinguish in their quantification between gal4+ from the other [classes of neighboring neurons]” (Fig. 2 and 4).”

      Our focus in this paper was to ask simply whether P1a neurons are active or not active during natural occurrences of the social behaviors they can evoke when artificially activated. We did not claim that they are the only cells in the region that control the behaviors.  It is not possible to compare their activation to that of 'other' cells neighboring P1a neurons without a separate marker to identify those cells driven by a different reporter system (e.g., LexA). This in turn would require repeating all of the experiments in Figs 2 and 4 from scratch with new genotypes permitting dual-labeling of the two populations by different XFPs, and quantifying the data using 4-color labeling. We respectfully submit that such curiosity-driven experiments, while in principle interesting, are beyond the scope of the present manuscript.  However, we have inserted text to acknowledge the possibility that the aggression-activated Hr38 signals in P1a- cells neighboring P1a+ cells may correspond to other classes of P1 neurons (of which there are 70 in total) or to pC1 cells. Changes:  Text lines 151-154.

      “if the magenta dot is outside of the nuclei I would not count this as positive also the size of the dot seems to be a good marker of the reality of the signal). I would measure the intensity of the hr38EXN. A high Hr38EXN level associated with the presence of hr38INT would indicate that the cell has been activated during both encounters, while a lower hr38EXN with no hr38INT would suggest only an activation during the 1st behavioural context. Finally, a lower hr38EXN associated with the presence of hr38INT would suggest the opposite, an activation only during the 2nd behaviour.”

      We agree that there are some tiny dot signals with hr38 INT probe that are more likely the background signals. We only counted the INT probe signals as positive when the cells had a clearly visible dot and also co-localize with the exonic probe's signal, as primary (un-spliced) Hr38 transcripts in the nucleus should be positive for both EXN and INT probes. Regarding the reviewer’s latter comments, we agree with their interpretation of the catFISH results and that is how we interpreted them originally. We measured the intensity of hr38EXN expression and defined hr38EXN-labeled cells as “positive” when the relative intensity was 3σ >average, a stringent criterion. In the revised manuscript, we added more detailed information in the methods section regarding our criteria for defining cell types as positive.

      “Knowing that the P1a neurons (using the split-gal4) can trigger only wing extension when activated by optogenetic 50Hz, I would test to which behavioral context the MB neurons and the PAM neurons positively respond to.”

      As we answered in 'Response to Public Review,' our opto-HI-FISH experiments identified Kenyon cells in the mushroom body (MB) and PAM neurons (as well as pCd neurons) as potential downstream targets of P1a cells, using Hr38 labeling. The purpose of the calcium imaging experiment in Figure 3 – supplement was to confirm the P1a-dependent activation of KCs and PAM neurons using an independent method. In that respect this control experiment was successful in that methodological confirmation. The reviser raised an interesting question about how our calcium imaging experiments relate to our behavioral experiments, in terms of the dynamics of KC and PAM activation. A recent publication (Shen et al., 2023) revealed that courtship behavior has a positive valence and that activation of P1 neurons mimics a courtship-reward state via activation of PAM dopaminergic neurons. Therefore, it is reasonable to think that PAM neurons (and Kenyon cells as downstream of PAM neurons) are activated during female exposure. However those data do not exclude the possibility that inter-male aggression is also rewarding in Drosophila males, as it has shown to be in mice. This is an interesting curiosity-driven question that has yet to be resolved.  Therefore, as mentioned in the 'Response to Public Review,' we feel that the additional experiment the reviewer suggests is beyond the scope of our manuscript.

      Changes: None.

      Minor comments:

      “Please provide different pictures from main fig2 and sup2 for the three common conditions (control, aggression, and courtship).” 

      The data set for Figure 2 and Figure 2 supplement are from the same experiment. Because of the limited space, we just presented the selected key conditions ('Control', 'Aggression', and 'Courtship') in the main figure and put the complete data set (including these three key conditions) in the supplemental figure.

      Changes: None

      “Please, provide scale bars for the images.”

      Also, Reviewer #2 commented, 'Scale bars are missing on all the images throughout the main and supplementary figures.'

      We have now added scale bars for each figure. 

      “Fig.1: “Is the chrimsonTdtom images from endogenous fluorescence? It is not said in the legend and anti-dsred is not provided in the material and method while anti-GFP is.”

      We are sorry for the confusion and thank the reviewer for raising that question. The signals were native fluorescence, and we have now added that information to the figure legend.

      P7: "As an initial proof-of-concept application of HI-FISH, we asked whether neuronal subsets initially identified in functional screens for aggression-promoting neurons (Asahina et al., 2014; Hoopfer et al., 2015; Watanabe et al., 2017) were actually active during natural aggressive behavior. These included P1a, Tachykinin-FruM+ (TkFruM), and aSP2 neurons". Please put the references to the corresponding group of neurons listed. For example: "These included P1a neurons [Hoopfer et al., 2015]". 

      We have now added these references.

      P9: "Optogenetic and thermogenetic stimulation experiments have shown that that P1a interneurons can promote both male-directed aggression and male- or female-directed courtship" typo

      We appreciate the reviewer for catching this error and have corrected the text.

      (P10:" To validate this approach, we first asked whether we could detect Hr38 induction in pCd neurons, which were previously shown by calcium imaging to be (indirect) targets of P1a neurons". Reference [Jung et al., 2020] 

      We have now added this reference.

      Fig. 4A: Put the time scale on the diagram (3h adaptation-20min-30min rest-20min-10min rest-collect) 

      We have now added the time scale in Figure 4A.

      Reviewer #2: 

      Response to Public Review: 

      We thank the reviewer for their helpful comments and suggestions. We have addressed most of them in our revised manuscript. The main concern of Reviewer #2 was the temporal resolution of the HI-catFISH experiment shown in Figure 4 and Figure 4-Supplement. Our original manuscript illustrated temporal patterns of Hr38EXN and Hr38ITN signals concomitant with different behavioral paradigms (Figure 4B). The reviewer pointed out that the illustrated experimental design does not reflect the actual data shown in Figure 4-Supplement A-C. We believe this issue was raised because we drew the temporal pattern of Hr38EXN signals in Figure 4B based on the intensity of Hr38EXN signals (Figure 4-Supplement B) rather than based on the % number of positive cells (Figure 4-Supplement C). We have now revised the schematic time course of Hr38EXN signals in Figure 4B using the % of positive cells. We believe this change will be helpful for readers to understand better the experimental design since we used the % of positive cells to identify patterns of P1a neuron activation during male-male vs. male-female social interactions in Figure 4D. Another suggestion from Reviewer #2 was to add additional controls, such as the quantification of the intronic and exonic Hr38 probes after either only the first or second social context exposure. In response, we have now added the data from only the first social context (Figure 4C, and 4D, right column). These new data provides evidence that there are essentially no detectable Hr38INT signals 60 minutes later without a second behavioral context, while Hr38EXN signals are still present at the time of the analysis.  Unfortunately, we are not able to provide the converse dataset with the second behavioral context only to show that Hr38 INT signals are detected. On this point, we call the reviewer’s attention to Figure 4-supplement-S4A-C, which show that the INT probe signals are detectable at 15 and 30 minutes following stimulation, but not at 60 minutes.  In the experiment of Fig. 4B, flies are fixed and labeled for Hr38 30 minutes after the beginning of the second behavior, conditions under which we should obtain robust INT signals (as observed).  EXN signals are also expected at 30 minutes because the primary (non-spliced) RNA transcript detected by the INT probe also contains exonic sequences.

      Response to Recommendations for the authors:

      Given that the development of in situ HCR for the adult fly brain is so central to the present manuscript, I think that the methods section describing the HCR protocol can be significantly improved. In particular, the authors should fully describe the in situ HCR protocol including the 'minor modifications' they refer to, and define how they calculate the 'relative intensity to the background'.

      We appreciate the reviewer’s suggestion. We have now revised the methods section to describe the procedure in more detail. Also, we will submit a separate document describing the HI-FISH protocol.

      Note: The authors refer to a recently published paper by Takayanagi-Kiya et al (2023) describing activity-based neuronal labeling using a different immediate early gene, stripe/egr-1. The authors state the following: 'That study used a GAL4 driver for the stripe/egr-1 gene to label and functionally manipulate activated neurons. In contrast, our approach is based purely on detecting expression of the IEG mRNA using..'. Takayanagi-Kiya et al. (2023) also use in situ mRNA detection of the IEG stripe/egr-1 and not only a GAL4 driver system. This claim should be modified and the paper should be cited in the introduction of the present paper.

      We have now cited the paper in the Introduction and have modified and moved the description originally in 'Note' section to Discussion (text lines: 392-404) as the reviewer requested. We have emphasized the difference between the two approaches for comparing neuronal activities during two different behaviors within the same animal. Takayanagi-Kiya used GAL4/UAS and stripe protein expression with immunohistochemistry to analyze neuronal activities during two different behaviors, while we exclusively analyzed Hr38 mRNA expression for this purpose, using intronic and exonic Hr38 probes. This approach made it possible to perform catFISH with higher temporal resolution and also allows extension of our approach to other IEGs for which antibodies are not available.

      Please specify the nature of the iron fillings in the methods section.

      We added a detailed description in the methods section, including the catalog number.

      In Figure 1B, the authors may add a dashed outline to the regions magnified in 1C so that readers can more easily follow the figures. Moreover, it would be informative to see a more detailed quantification of the number of Hr38-positive cells in different brain regions marked by Fru-GAL4.

      We have now added the whole brain images for each condition in Figure 1C and also quantitative data in Figure 1-Supplement C, as the reviewer suggested.

      In the middle right aggression panel of Figure 2A, it looks as if one P1a neuron is not outlined.

      We have carefully examined other z-planes through this region and based on those data have concluded that the signals mentioned by the reviewer are neurites from neurons labeled in other z-planes.

      Changes: None.

      The images in Figure 2A can be again found in Figure Supplement 2A, yet the number of neurons analyzed suggests the quantification was performed from different samples. The images in Figure Supplement 2A should be either changed or it should be explained as to why the images are the same yet the numbers in the legend are different.

      We apologize for the confusion. Figure 2 and Figure 2-Supplement are from the same experiment. To avoid clutter we illustrated three key conditions ('Control,' 'Aggression,' and 'Courtship') in the main figure. The reason why the numbers in the legend are different is that the purpose of presenting Figure 2-Supplement B-D was to determine whether there were differences in the intensity of Hr38 FISH signals in the neurons considered as 'positive' in different conditions. Therefore, the numbers described in Figure 2-Supplement legend are derived only from those neurons that were considered Hr38-positive, while the numbers in Figure 2 include all neurons analyzed. We have now added notes to explain this in the Figure 2 – supplement legend.

      The panels of the quantification of the Hr38 relative intensity in Figure 2B/C/D are very difficult to read, ideally, they should be plotted as in Figure Supplement 2B/C/D.

      The graphs in Figure 2B-D (upper) show data from all GFP-labeled cells scored, including cells defined as 'negative' or 'borderline.' In contrast, the graphs in Figure 2-supplement show the relative Hr38 signal intensity in those GFP neurons defined as positive based on the analysis in Fig. 2B. If we were to plot the data in Fig. 2B (upper) as box plots (like that in Figure-2-supplement), we would see either a skewed (only negative cells) or a bimodal distribution (one around the negative population and the other around the positive population); the shapes of these distributions would likely be hidden in the box-whisker plots format. Therefore, we prefer to plot all of the data points as we did in the original manuscript. However, we agree that the data points in the original manuscript were hard to read. We therefore changed the format of the datapoints from blurry dots to open circles with clear solid lines.

      In Figure 2B/C/D, please specify in the figure legend what 'grouped in categories according to character' means. 

      We used letters to mark statistically significant differences (or lack thereof) between conditions. Bars sharing at least one common letter are not significantly different.  If they do not share any letter, they are significantly different. For example, Aggression: bc vs. Dead: bc, means no difference. Aggression: bc vs. No Food: b, or Aggression: bc vs. Courtship: c also means no difference between Aggression and each of the two other conditions. However, 'No Food: b' and 'Courtship: c' have no common letter, meaning they are different. This is a standard method for showing statistically comparisons among multiple bars without lots of asterisks and horizontal bars cluttering the figure, and we have revised the legend to clarify what each letter means. We have also removed the color shading in Figure 2 B-D as it may have been confusing.

      A quantification of the number of Hr38-positive neurons and Hr38 relative intensity during the entire time course would be informative in Figure 3D. 

      Although the data set for this figure is different from that for Figure 4-Supplement A-C, the main claim is the same. Therefore, Figure 4 - Supplement essentially provides the information that the reviewer suggested. However, we also reanalyzed the data set used for the original Figure 3D and evaluated % positive cells at the 30-minute time point and have now added that number in the figure legend.

      In the legend of Figure 3D, it says '..The expression level reaches its peak at 30-60min', yet I don't see timepoints beyond 60min. Please rephrase or add additional timepoints. 

      We apologize for the error. We have rephrased the text.

      Figure Supplement 3A/D: please add an outline or a schematic figure to better understand where the imaging is performed.

      We added illustrated schemas next to the title of each experiment (P1->PAM neurons (bundle) and P1 -> Kenyon cells (bundle)).

      Figure Supplement 3C/F: please add information about the statistical test to the corresponding figure legend.

      We have added a phrase to describe the test used.

      Figure Supplement 3G/H/I/J: motion artifacts can potentially strongly affect the performed analysis given that cell bodies are very small and highly subjected to motion. Can the authors comment on how they corrected for motion?

      We have now described how we corrected for motion artifacts in the Methods section.

      Figure 4C/D: It seems as if the representative images don't reflect the quantification, e.g., in the male -> female panel, close to 100% of the neurons are positive for the exonic probe as opposed to approx. 40% in the bar graph.

      Please see our response to this issue in the 'Response to Public Review (Reviewer #1)'.

      Additional controls should be included in Figure 4C in order to assess the temporal resolution of HI-CatFISH more in detail (see 'Weaknesses').

      We have also answered this in the 'Response to Public Review'.

      The authors should adjust the scheme in the main Figure 4B to reflect the data presented in Figure S4A and C. For instance, the peak for the intronic version is observed at 15 minutes, while at 30 minutes, both the exonic and intronic signals show an equal level of signal.

      We have addressed this issue in the 'Response to Public Review'.

      We thank the reviewers again for their helpful comments and hope that with these changes, the manuscript will now be acceptable for official publication in eLife.

    1. Author response:

      Reviewer #1 (Public Review): 

      The manuscript entitled "A septo-hypothalamic-medullary circuit directs stress-induced analgesia" by Shah et al., showed that the dLS-to-LHA circuit is sufficient and necessary for stress-induced analgesia (SIA), which is mediated by the rostral ventromedial medulla (RVM) in a opioid-dependent manner. This study is interesting and important and the conclusions are largely supported by the data. I have a few concerns as follows:

      We thank the reviewer for finding our study “interesting”, “important”, and “conclusions are largely supported by data”.

      (1)  The present data show that activation of dLS neurons produces SIA, however, this manipulation is non-specific. It may be better to see the effect of specific manipulation of stress-activated c-Fos positive neurons in the dLS using a combination of the Tet-Off system and chemogenetic/optogenetic tools. 

      We agree with the reviewer that activating the stress-“trapped” neurons will be more specific way to induce SIA through septal activation, compared to the activation of entire dLS strategy pursued by us. In most likelihood, we expect to see a robust SIA if specifically stress responsive dLS neurons are observed. We are in the process of acquiring the genetic tools required for “Trapping” stress neurons and expect to be able to perform the experiments suggested by the reviewers in the coming months. 

      (2)  Depending on its duration, and intensity, stress can exert potent and bidirectional modulatory effects on pain, either reducing pain (SIA) or exacerbating it (stress-induced hyperalgesia, SIH). Is the circuit in the manuscript involved in SIH?

      As mentioned by the reviewer, it would be reasonable to suspect that the dLS neurons are involved in SIH. However, we believe that the experiments to test this hypothesis is outside the scope of this paper, since here we have focused on the circuit mechanisms for SIA. However, in the revised discussion section, we have included the possibility of dLS neurons driving SIH. 

      (3)  It is well-accepted that opioid and cannabinoid receptors participate in the SIA, and the evidence is especially strong for the RVM endocannabinoid system. Given this, why did the authors focus their study on the opioid system?

      We agree with the reviewer that dLS-mediated SIA may work through neural circuits centered on RVM expressing receptors for either or both opioids and endocannabinoids. We primarily focused on the opioidergic system in the RVM as decades of mechanistic work has revealed how the ON, OFF, and neutral neurons modulate pain through the endogenous opioids and even mediate SIA. In the revised discussion, we have included the possibility of involvement of both pain modulatory systems. 

      (4)  Does silencing of the dLS neurons affect stress-induced anxiety-like behaviors? Alternatively, what is the relationship between SIA and the level of stress-induced anxiety?

      We did not test if the silencing of dLS would affect stress-induced anxiety, as our focus was on the pain modulatory effects of dLS activation. The relationships between levels of SIA and stress-induced anxiety will be interesting to explore in future. We believe we would need better behavioral assays compared to the existing ones to quantitatively measure levels of stress-induced anxiety and SIA levels.

      (5)  Direct electrophysiological evidence should be provided to confirm the efficacy of the MP-CNO.

      We agree with the reviewer that ex-vivo electrophysiology experiments will substantiate the effectiveness of the MP-CNO. However, we do not have the expertise, or the instrumentation required to perform these experiments in our laboratory.

      (6)  Is the LHA a specific downstream target for SIA, and is the LHA involved in stressinduced anxiety-like behaviors?

      Several lines of evidence points to the fact that LHA neurons are involved in stressinduced anxiety. We have also shown that the dLS downstream neurons in the LHA are activated by acute restraint by fiber photometry recordings. Thus, we expect activation of the LHA neurons will cause stress-induced anxiety. However, we wanted to focus on the pain modulation aspect of the dLS-LHA-RVM circuitry.

      (7)  Do LHA neurons have direct projections to the RVM? If yes, what is its role in the SIA?

      Our anatomical studies using transsynaptic anterograde and retrograde viral strategies in the Figure 6 shows that the LHA neurons have direct projections to the RVM, and these neurons are sufficient in driving hyperalgesia, as well as necessary for SIA. 

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, Shah et al. explore the function of an understudied neural circuitry from the dLS -> LHA -> RVM in mediating stress-induced analgesia. They initially establish this neural circuitry through a series of intersectional tracings. Subsequently, they conduct behavioral tests, coupled with optogenetic or chemogenetic manipulations, to confirm the involvement of this pathway in promoting analgesia. Additionally, fiber photometry experiments are employed to investigate the activity of each brain region in response to stress and pain. 

      Strengths: 

      Overall, the study is comprehensive, and the findings are compelling. 

      We appreciate the reviewer for finding our manuscript “comprehensive” and “compelling”.

      Weaknesses: 

      One noteworthy concern arises regarding the overarching hypothesis that restrainedinduced stress promotes analgesia. A more direct interpretation suggests that intense struggling, rather than stress per se, activates the dLS -> LHA -> RVM pathway that may drive analgesic responses. 

      We agree with the reviewer that our data can be interpreted as “intense struggling”, rather than the “acute stress” might have altered the pain thresholds in mice. However, we would like to point out that the restraint induced stress model that we have used has been long regarded as a standard for inducing stress. Moreover, we have demonstrated that dLS activation results into acute stress by measuring the blood corticosterone levels, and showed that dLS activations caused stress-induced anxiety through lightdark box tests.

      Reviewer #2 (Recommendations For The Authors): 

      Please find below my other comments for improvements. 

      Introduction: The authors claimed that "dLS neurons receive nociceptive inputs from the thalamus and somatosensory cortices." However, citations are missing.

      We have added the citations.

      Figure 1 B&C: Although this paper focuses on the dLS, it would be informative to also include vLS c-Fos images (maybe in a supplementary figure), given that these data appear to be already acquired. The inclusion of vLS data will provide critical information regarding potential specificity (or lack of) across LS subregions in stress responses.

      In the revised manuscript we have added the vLS c-Fos images as suggested by the reviewer. 

      Figure 1D: Quantification of Vgat vs. Vglut neurons is missing. It is unclear if the Vgat neurons are restricted to small clusters.

      We did not add the Vglut vs, Vgat quantification since from both of our experiments and publicly available data from the Allen Brain Atlas show that almost all of the neurons in the LS are gabaergic. We found very rare,0-2 Vglut2 expressing neurons per section in the the LS of the mouse brain.

      Figure 1G: The Y-axis label is missing. 

      We have added the axis in the revised manuscript.

      Figure 2: The authors claimed that dLS neurons are preferentially tuned to stress caused by physical restraint. However, it appears that these neurons are specifically tuned to intense struggle behavior (transient) rather than stress (prolonged).

      We agree with the reviewer that the SIA observed in mice with dLS activation, can be interpreted as the effect of transient struggle behavior rather than the prolonged stress. However, we would like to point out that the acute restraint for one hour is known to produce prolonged stress, and is backed up by increased blood coticosterone levels and stress-induced anxiety (Fig1-Fig Supplementary 1).

      Figure 4: The authors provided compelling evidence that dLS neurons synapse on LHA Vglut2 neurons. However, it is unclear if they exclusively target the Vglut2 neurons or also synapse on LHA Vgat neurons.

      We agree with the reviewer that even though the majority of the dLS downstream neurons in the LHA are glutamatergic, as now shown in the Fig. 4D, few neurons do not express Vglut and thus must be Gabaergic. 

      Figure 5D: It is unclear if the trace represents dLS or LHA calcium signal (in the main text, the authors claimed both).

      Now, we have mentioned the neurons on the LHA we have recorded from at the top of Figure 5C, D. 

      Figure 6 G&H: Presumably, ΔG-Rabies does not transmit across neurons due to the deletion of the glycoprotein (G) gene. Thus, it is unclear why dLS and LHA neurons express mCherry after injecting rabies into RVM.

      The aim of the rabies experiment was to test that the cells in the LHA that receive inputs from the dLS are the same ones that send projections downstream to the RVM. To this end, we used a monosynaptic rabies virus that has retrograde properties. Hence, when injected into the RVM, it was taken up by the terminals of the LHA neurons in the RVM and traveled to the cell bodies in the LHA. We injected the AAV1-Transsyn-Cre in the dLS, so only the cells downstream of the dLS in the LHA can express the Credependent glycoprotein (G) gene. Thus, the rabies-mCherry virus infected the LHA neurons downstream of dLS specifically, and jumped a synapse, to label the upstream dLS neurons.

      The authors claim that "RVMpost-LHA neurons may modulate nociceptive thresholds through their local synaptic connections within the RVM, recurrent connections with the PAG, or direct interactions with spinal cord neurons." It is unclear what the "local synaptic connections within the RVM" means. It is also unclear whether there is evidence of recurrent connections between the RVM and PAG.

      We meant by local connections as intrinsic connections within the RVM, as in some or few of the RVM neurons, post LHA might be interneurons and mediating SIA by modulating the ON or OFF cells. There are some anatomical evidence for the ascending inputs from RVM to the PAG and the we have now included the citation in the mentioned section of the manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this manuscript, Day et al. present a high-throughput version of expansion microscopy to increase the throughput of this well-established super-resolution imaging technique. Through technical innovations in liquid handling with custom-fabricated tools and modifications to how the expandable hydrogels are polymerized, the authors show robust ~4-fold expansion of cultured cells in 96-well plates. They go on to show that HiExM can be used for applications such as drug screens by testing the effect of doxorubicin on human cardiomyocytes. Interestingly, the effects of this drug on changing DNA organization were only detectable by ExM, demonstrating the utility of HiExM for such studies.

      Overall, this is a very well-written manuscript presenting an important technical advance that overcomes a major limitation of ExM - throughput. As a method, HiExM appears extremely useful and the data generally support the conclusions.

      Strengths

      Hi-ExM overcomes a major limitation of ExM by increasing the throughput and reducing the need for manual handling of gels. The authors do an excellent job of explaining each variation introduced to HiExM to make this work and thoroughly characterize the impressive expansion isotropy. The dox experiments are generally well-controlled and the comparison to an alternative stressor (H2O2) significantly strengthens the conclusions.

      Weaknesses

      (1) It is still unclear to me whether or not cells that do not expand remain in the well given the response to point 1. The authors say the cells are digested and washed away but then say that there is a remaining signal from the unexpanded DNA in some cases. I believe this is still a concern that potential users of the protocol should be aware of.

      Although ProteinaseK digestion removes most of the unexpanded cells, DNA can sometimes persist. As such, we occasionally observe Hoechst signal underneath cells. The residual DNA is easily differentiated from nuclear Hoechst signal and does not confound interpretation of results. We have added a new supplementary figure that further clarifies this point.

      (2) Regarding the response to point 9, I think this information should be included in the manuscript, possibly in the methods. It is important for others to have a sense of how long imaging may take if they were to adopt this method.

      We have added detailed information to the methods section to address this point as shown below.  In general, we image HiExM samples on the Opera Phenix at 63x with the following parameters: 100% laser power for all channels; 200 ms exposure for Hoechst, 500-1000+ ms exposure for immunostained channels depending on the strength of the stain and the laser; 60 optical sections with 1 micron spacing; and 4-20 fields of view per well depending on the cell density and sample size requirements. Therefore, imaging one full 96-well plate (60 wells total as we avoid the outer wells) takes anywhere from 3 hr to 64 hr depending on the combination of parameters used.

      Reviewer #2 (Public review):

      Summary:

      In the present work, the authors present an engineering solution to sample preparation in 96-well plates for high-throughput super resolution microscopy via Expansion Microscopy. This is not a trivial problem, as the well cannot be filled with the gel, which would prohibit expansion of the gel. They thus engineered a device that can spot a small droplet of hydrogel solution and keep it in place as it polymerises. It occupies only a small portion space at the center of each well, the gel can expand into all directions and imaging and staining can proceed by liquid handling robots and an automated microscope.

      Strengths:

      In contrast to Reference 8, the authors system is compatible with standard 96 well imaging plates for high-throughput automated microscopy and automated liquid handling for most parts of the protocol. They thus provide a clear path towards high throughput exM and high throughout super resolution microscopy, which is a timely and important goal.

      Addition upon revision:

      The authors addressed this reviewer's suggestions.

      Reviewer #3 (Public review):

      Summary:

      Day et al. introduced high-throughput expansion microscopy (HiExM), a method facilitating the simultaneous adaptation of expansion microscopy for cells cultured in a 96-well plate format. The distinctive features of this method include: 1) the use of a specialized device for delivering a minimal amount (~230 nL) of gel solution to each well of a conventional 96-well plate, and 2) the application of the photochemical initiator, Irgacure 2959, to successfully form and expand toroidal gel within each well.

      Addition upon revision:

      Overall, the authors have adequately addressed most of the concerns raised. There are a few minor issues that require attention.

      Minor comments:

      Figure S10: There appears to be a discrepancy in the panel labeling. The current labels are EH, but it is unclear whether panels A-D exist. Also, this reviewer thought that panels G and H would benefit from statistical testing to strengthen the conclusions. As a general rule for scientific graph presentation, the y-axis of all graphs should start at zero unless there is a compelling reason not to do so.

      We have revised Figure S10 to address your comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      By examining the prevalence of interactions with ancient amino acids of coenzymes in ancient versus recent folds, the authors noticed an increased interaction propensity for ancient interactions. They infer from this that coenzymes might have played an important role in prebiotic proteins.

      Strengths:

      (1) The analysis, which is very straightforward, is technically correct. However, the conclusions might not be as strong as presented.

      (2) This paper presents an excellent summary of contemporary thought on what might have constituted prebiotic proteins and their properties.

      (3) The paper is clearly written.

      We are grateful for the kind comments of the reviewer on our manuscript. However, we would like to clarify a possible misunderstanding in the summary of our study. Specifically, analysis of "ancient versus recent folds" was not really reported in our results. Our analysis concerned "coenzyme age" rather than the "protein folds age" and was focused mainly on interaction with early vs. late amino acids in protein sequence. While structural propensities of the coenzyme binding sites were also analyzed, no distinction on the level of ancient vs. recent folds was assumed and this was only commented on in the discussion, based on previous work of others. 

      Weaknesses:

      (1) The conclusions might not be as strong as presented. First of all, while ancient amino acids interact less frequently in late with a given coenzyme, maybe this just reflects the fact that proteins that evolved later might be using residues that have a more favorable binding free energy.

      We would like to point out that there was no distinction between proteins that evolved early or late in our dataset of coenzyme-binding proteins. The aim of our analysis was purely to observe trends in the age of amino acids vs. age of coenzymes. While no direct inference can be made from this about early life as all the proteins are from extant life (as highlighted in the discussion of our work), our goal was to look for intrinsic propensities of early vs. late amino acids in binding to the different coenzyme entities. Indeed, very early interactions would be smeared by the eons of evolutionary history (perhaps also towards more favourable binding free energy, as pointed out also by the reviewer). Nevertheless, significant trends have been recorded across the PDB dataset, pointing to different propensities and mechanistic properties of the binding events. Rather than to a specific evolutionary past, our data therefore point to a “capacity” of the early amino acids to bind certain coenzymes, and we believe that this is the major (and standing) conclusion of our work, along with the properties of such interactions. In our revised version, we will carefully go through all the conclusions and make sure that this message stands out, but we are confident that the following concluding sentences copied from the abstract and the discussion of our manuscript fully comply with our data:

      “These results imply the plausibility of a coenzyme-peptide functional collaboration preceding the establishment of the Central Dogma and full protein alphabet evolution”

      “While no direct inferences about distant evolutionary past can be drawn from the analysis of extant proteins, the principles guiding these interactions can imply their potential prebiotic feasibility and significance.”

      “This implies that late amino acids would not be necessarily needed for the sovereignty of coenzyme-peptide interplay.”

      We would also like to add that proteins that evolved later might not always have higher free energy of binding. Musil et al., 2021 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8294521/)  showed in their study on the example of haloalkane dehalogenase Dha A that the ancestral sequence reconstruction is a powerful tool for designing more stable, but also more active proteins. Ancestral sequence reconstruction relies on finding ancient states of protein families to suggest mutations that will lead to more stable proteins than are currently existing proteins. Their study did not explore the ligand-protein interactions specifically but showed that ancient states often show more favorable properties than modern proteins.

      (2) What about other small molecules that existed in the probiotic soup? Do they also prefer such ancient amino acids? If so, this might reflect the interaction propensity of specific amino acids rather than the inferred important role of coenzymes.

      We appreciate the comment of the reviewer towards other small molecules, which we assume points mainly towards metal ions (i.e. inorganic cofactors). We completely agree with the reviewer that such interactions are of utmost importance to the origins of life. Intentionally, they were not part of our study, as these have already been studied previously by others (e.g. Bromberg et al., 2022; and reviewed in Frenkel-Pinter et al., 2020) and also us (Fried et al., 2022). For example, it is noteworthy that prebiotically relevant metal binding sites (e.g. of Mg2+) exhibit enrichment in early amino acids such as Asp and Glu while more recent metal (e.g. Cu and Zn) site in the late amino acids His and Cys (Fried et al., 2022). At the same time, comparable analyses of amino acid - coenzyme trends were not available.

      Nevertheless, involvement of metal ions in the coenzyme binding sites was also studied here and pointed to their bigger involvement with the Ancient coenzymes. In the revised version of the manuscript, we will be happy to enlarge the discussion of the studies concerning inorganic cofactors.

      The following sentence was added in the discussion of the revised manuscript:

      “This would also be true for direct interaction of early peptides/proteins and metal ions, independent of organic cofactor involvement, as discussed previously by us and others (Bromberg et al., 2022; Frenkel-Pinter et al., 2020; Fried et al., 2022).  For example, it has been observed that coordination of prebiotically most relevant metal ions (e.g., Mg2+) is more often mediated by early amino acids such as Asp and Glu, whereas metal ions of later relevance (e.g., Cu and Zn) bind more frequently via late amino acids like His and Cys (Fried et al. 2022). Similarly, ancient metal binding folds have been shown to be enriched in early amino acids (Bromberg et al., 2022).”

      (3) Perhaps the conclusions just reflect the types of active sites that evolved first and nothing more.

      We partly agree on this point with the reviewer but not on the fact why it is listed as the weakness of our study and on the “nothing more” notion. Understanding what the properties of the earliest binding sites is key to merging the gap between prebiotic chemistry and biochemistry. The potential of peptides preceding ribosomal synthesis (and the full alphabet evolution) along with prebiotically plausible coenzymes addresses exactly this gap, which is currently not understood.  

      Reviewer #2 (Public Review):

      I enjoyed reading this paper and appreciate the careful analysis performed by the investigators examining whether 'ancient' cofactors are preferentially bound by the first-available amino acids, and whether later 'LUCA' cofactors are bound by the late-arriving amino acids. I've always found this question fascinating as there is a contradiction in inorganic metal-protein complexes (not what is focused on here). Metal coordination of Fe, Ni heavily relies on softer ligands like His and Cys - which are by most models latecomer amino acids. There are no traces of thiols or imidazoles in meteorites - although work by Dvorkin has indicated that could very well be due to acid degradation during extraction. Chris Dupont (PNAS 2005) showed that metal speciation in the early earth (such as proposed by Anbar and prior RJP Williams) matched the purported order of fold emergence.

      As such, cofactor-protein interactions as a driving force for evolution has always made sense to me and I admittedly read this paper biased in its favor. But to make sure, I started to play around with the data that the authors kindly and importantly shared in the supplementary files. Here's what I found:

      Point 1: The correlation between abundance of amino acids and protein age is dominated by glycine.

      There is a small, but visible difference in old vs new amino acid fractional abundance between Ancient and LUCA proteins (Figure 3, Supplementary Table 3). However, the bias is not evenly distributed among the amino acids - which Figure 4A shows but is hard to digest as presented. So instead I used the spreadsheet in Supplement 3 to calculate the fractional difference FDaa = F(old aa)-F(new aa). As expected from Figure 3, the mean FD for Ancient is greater than the mean FD for LUCA. But when you look at the same table for each amino acid FDcofactor = F(ancient cofactor) - F(LUCA cofactor), you now see that the bias is not evenly distributed between older and newer amino acids at all. In fact, most of the difference can be explained by glycine (FDcofactor = 3.8) and the rest by also including tryptophan (FDcofactor = -3.8). If you remove these two amino acids from the analysis, the trend seen in Figure 3 all but disappears.

      Troubling - so you might argue that Gly is the oldest of the old and Trp is the newest of the new so the argument still stands. Unfortunately, Gly is a lot of things - flexible, small, polar - so what is the real correlation, age, or chemistry? This leads to point 2.

      We truly acknowledge the effort that the reviewer made in the revision of the data and for the thoughtful, deeper analysis. We agree that this deserves further discussion of our data. 

      As invited by the reviewer, we indeed repeated the analysis on the whole dataset. First, we would like to point out that the reviewer was most probably referring to the Supplementary Fig. 2 (and not 3, which concerns protein folds). While the difference between Ancient and LUCA coenzyme binding is indeed most pronounced for Gly and Trp, we failed to confirm that the trend disappears if those two amino acids are removed from the analysis (additional FDcofactors of 3.2 and -3.2 are observed for the early and late amino acids, resp.), as seen in Table I below. The main additional contributors to this effect are Asp (FD of 2.1) and Ser (FD of 1.8) from the early amino acids and Arg (FD of -2.6) and Cys (FD of -1.7) of the late amino acids. Hence, while we agree with the reviewer that Gly and Trp (the oldest and the youngest) contribute to this effect the most, we disagree that the trend reduces to these two amino acids.  

      In addition, the most recent coenzyme temporality (the Post-LUCA) was neglected in the reviewer’s analysis. The difference between F (old) and F (new) is even more pronounced in Post-LUCA than in LUCA, vs. Ancient (Supplementary table 5A) and depends much less on Trp. Meanwhile, Asp, Ser, Leu, Phe, and Arg dominate the observed phenomenon (Supplementary table 5b). This further supports our lack of agreement with the reviewer’s point. Nevertheless, we remain grateful for this discussion and we will happily include this additional analysis in the Supplementary Material of our revised manuscript.

      The following text (and the additional data) was included in the revised manuscript version:

      “To explore the contribution of individual amino acids to this effect, fractional difference (FD) for early vs. late amino acids among the Ancient, LUCA, and Post-LUCA coenzyme binding was calculated (Supplementary Table 5). The mean FD revealed a similar trend to the amino acid composition analysis (Fig. 3). The amino acids most enriched in LUCA vs. Post-LUCA are Gly, Ser, and Leu (FD of 4.4, 4.3, and 4.1 respectively), while the most depleted include Phe, Arg, and His (FD of -11, -4.2, and -3.2) (Supplementary Table 5B).”

      Point 2 - The correlation is dominated by phosphate.

      In the ancient cofactor list, all but 4 comprise at least one phosphate (SAM, tetrahydrofolic acid, biopterin, and heme). Except for SAM, the rest have very low Gly abundance. The overall high Gly abundance in the ancient enzymes is due to the chemical property of glycine that can occupy the right-hand side of the Ramachandran plot. This allows it to make the alternating alphaleft-alpharight conformation of the P-loop forming Milner-White's anionic nest. If you remove phosphate binding folds from the analysis the trend in Figure 3 vanishes.

      Likewise, Trp is an important functional residue for binding quinones and tuning its redox potential. The LUCA cofactor set is dominated by quinone and derivatives, which likely drives up the new amino acid score for this class of cofactors.

      Once again, we are thankful to the reviewer for raising this point. The role of Gly in the anionic nests proposed by Milner-White and Russel, as well as the Trp role in quinone binding are important points that we would be happy to highlight more in the discussion of the revised manuscript. 

      Nevertheless, we disagree that the trends reduce only to the phosphate-containing coenzymes and importantly, that “the trend in Figure 3 vanishes” upon their removal. Supplementary table 6A and 6B show the data for coenzymes excluding those with phosphate moiety and the trend in Fig. 3 remains, albeit less pronounced.

      The following text was included in the revised manuscript version:

      “Moreover, we investigated whether the observed trend in amino acid occurrence at the binding sites was dominated by the presence of phosphate groups, which are common in many ancient cofactors except for SAM, Tetrahydrofolic acid, Biopterin, and Heme. An additional analysis therefore excluded all phosphate-containing coenzymes indicating that while the trend is less pronounced, it remains even in the absence of phosphate groups (Supplementary Table 6).”

      In summary, while I still believe the premise that cofactors drove the shape of peptides and the folds that came from them - and that Rossmann folds are ancient phosphate-binding proteins, this analysis does not really bring anything new to these ideas that have already been stated by Tawfik/Longo, Milner-White/Russell, and many others.

      I did this analysis ad hoc on a slice of the data the authors provided and could easily have missed something and I encourage the authors to check my work. If it holds up it should be noted that negative results can often be as informative as strong positive ones. I think the signal here is too weak to see in the noise using the current approach.

      We are grateful to the reviewer for encouraging further look at our data. While we hope that the analysis on the whole dataset (listed in Tables I - IV) will change the reviewer’s standpoint on our work, we would still like to comment on the questioned novelty of our results. In fact, the extraordinary works by Tawfik/Longo and Milner-While/Russel (which were cited in our manuscript multiple times) presented one of the motivations for this study.   We take the opportunity to copy the part of our discussion that specifically highlights the relevance of their studies, and points out the contribution of our work with respect to theirs.  

      “While all the coenzymes bind preferentially to protein residue sidechains, more backbone interactions appear in the ancient coenzyme class when compared to others. This supports an earlier hypothesis that functions of the earliest peptides (possibly of variable compositions and lengths) would be performed with the assistance of the main chain atoms rather than their sidechains (Milner-White and Russel 2011). Longo et al., recently analyzed binding sites of different phosphate-containing ligands which were arguably of high relevance during earliest stages of life, connecting all of today’s core metabolism (Longo et al., 2020 (b)). They observed that unlike the evolutionary younger binding motifs (which rely on sidechain binding), the most ancient lineages indeed bind to phosphate moieties predominantly via the protein backbone.

      Our analysis assigns this phenomenon primarily to interactions via early amino acids that (as mentioned above) are generally enriched in the binding interface of the ancient coenzymes. This implies that late amino acids would not be necessarily needed for the sovereignty of coenzyme-peptide interplay.”

      Unlike any other previous work, our study involves all the major coenzymes (not just the phosphate-containing ones) and is based on their evolutionary age, as well as age of amino acids. It is the first PDB-wide systematic evolutionary analysis of coenzyme-amino acid binding. Besides confirming some earlier theoretical assertions (such as role of backbone interactions in early peptide-coenzyme evolution) and observations (such as occurrence of the ancient phosphate-containing coenzymes in the oldest protein folds), it uncovers substantial novel knowledge. For example, (i) enrichment of early amino acids in the binding of ancient coenzymes, vs. enrichment of late amino acids in the binding of LUCA and Post-LUCA coenzymes, (ii) the trends in secondary structure content of the binding sites of coenzyme of different temporalities, (iii) increased involvement of metal ions in the ancient coenzyme binding events, and (iv) the capacity of only early amino acids to bind ancient coenzymes. In our humble opinion, all of these points bring important contributions in the peptide-coenzyme knowledge gap which has been discussed in a number of previous studies.

      Recommendations for the authors:

      (1) By only focusing on coenzymes, the authors may have overestimated their importance. What about other small molecules that existed in the prebiotic soup? Do they also prefer such ancient amino acids? If so, this might reflect the interaction propensity of specific amino acids rather than some possible role in very ancient proteins. Or it might diminish the conjectured importance of coenzymes.

      The following sentence was added in the discussion of the revised manuscript:

      “This would also be true for direct interaction of early peptides/proteins and metal ions, independent of organic cofactor involvement, as discussed previously by us and others (Bromberg et al., 2022; Frenkel-Pinter et al., 2020; Fried et al., 2022).  For example, it has been observed that coordination of prebiotically most relevant metal ions (e.g., Mg2+) is more often mediated by early amino acids such as Asp and Glu, whereas metal ions of later relevance (e.g., Cu and Zn) bind more frequently via late amino acids like His and Cys (Fried et al. 2022). Similarly, ancient metal binding folds have been shown to be enriched in early amino acids (Bromberg et al., 2022).”

      (2) The authors should analyze whether the interactions are with similar types of amino acids in ancient versus early proteins.

      While we appreciate the interesting suggestion, we would like to clarify that we did not aim to elucidate the differences between early and late protein folds - we agree that this might add an interesting perspective to our work, but we feel that it is well beyond the scope of our current study.

      (3) The authors might also wish to do sequence alignments to the structures in early versus late evolving proteins to see how general this pattern of residue usage is beyond the limited set of proteins found in the PDB.

      This is an interesting suggestion but similar to the previous recommendation, it is not within the scope of this study where no distinction between early and late evolving proteins has been made.  

      There has been a number of attempts to classify the folds as shared among Bacteria, Archea and Eukaryota or specific to  one or two of these groups of organisms (https://link.springer.com/article/10.1007/s00239-023-10136-xhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9541633/) - this does not however compare easily with our time scales - where ancient ligands occur well before the last common ancestor.

      We also agree  the set of sequences present in the PDB is biased, but perhaps it is less biased than we have thought. The recent fantastic work https://www.biorxiv.org/content/10.1101/2024.03.18.585509v2)  from Nicola Bordin and his colleagues from Orengo group attempted to classify over 200 milion structures in Alphafold database in so called Encyclopedia of Domains and they found out that nearly 80% of detected domains can be assigned to already known superfamilies in CATH (https://www.biorxiv.org/content/10.1101/2024.03.18.585509v2).

      (4) The authors might wish to consider the results in Skolnick, H. Zhou, and M. Gao. On the possible origin of protein homochirality, structure, and biochemical function. PNAS 2019: 116(52): 26571-26579.

      Based on the editorial recommendation, the following sentence was added in the discussion:

      “It has been implied by computer simulations that coenzymes could bind to proteins with similar propensity even before the onset of protein homochirality, despite lower structural stability and secondary structure content in heterochiral polypeptides (Skolnick et al., 2019).”

    1. Author Response:

      Reviewer #1 (Public Review):

      This work makes several contributions: (1) a method for the self-supervised segmentation of cells in 3D microscopy images, (2) an cell-segmented dataset comprising six volumes from a mesoSPIM sample of a mouse brain, and (3) a napari plugin to apply and train the proposed method.

      First, thanks for acknowledging our contributions of a new tool, new dataset, and new software.

      (1) Method

      This work presents itself as a generalizable method contribution with a wide scope: self-supervised 3D cell segmentation in microscopy images. My main critique is that there is almost no evidence for the proposed method to have that wide of a scope. Instead, the paper is more akin to a case report that shows that a particular self-supervised method is good enough to segment cells in two datasets with specific properties.

      First, thanks for acknowledging our contributions of a new tool, new dataset, and new software. We agree we focus on lightsheet microscopy data, therefore to narrow the scope we have changed the title to “CellSeg3D: self-supervised 3D cell segmentation for light-sheet microscopy”.

      To support the claim that their method "address[es] the inherent complexity of quantifying cells in 3D volumes", the method should be evaluated in a comprehensive study including different kinds of light and electron microscopy images, different markers, and resolutions to cover the diversity of microscopy images that both title and abstract are alluding to. The main dataset used here (a mesoSPIM dataset of a whole mouse brain) features well-isolated cells that are easily distinguishable from the background. Otsu thresholding followed by a connected component analysis already segments most of those cells correctly.

      You have selectively dropped the last part of that sentence that is key: “.... 3D volumes, often in cleared neural tissue” – which is what we tackle. The next sentence goes on to say: “We offer a new 3D mesoSPIM dataset and show that CellSeg3D can match state-of-the-art supervised methods.” Thus, we literally make it clear our claims are on MesoSPIM and cleared data.

      The proposed method relies on an intensity-based segmentation method (a soft version of a normalized cut) and has at least five free parameters (radius, intensity, and spatial sigma for SoftNCut, as well as a morphological closing radius, and a merge threshold for touching cells in the post-processing). Given the benefit of tweaking parameters (like thresholds, morphological operation radii, and expected object sizes), it would be illuminating to know how other non-learning-based methods will compare on this dataset, especially if given the same treatment of segmentation post-processing that the proposed method receives. After inspecting the WNet3D predictions (using the napari plugin) on the used datasets I find them almost identical to the raw intensity values, casting doubt as to whether the high segmentation accuracy is really due to the self-supervised learning or instead a function of the post-processing pipeline after thresholding.

      First, thanks for testing our tool, and glad it works for you. The deep learning methods we use cannot “solve” this dataset, and we also have a F1-Score (dice) of ~0.8 with our self-supervised method. We don’t see the value in applying non-learning methods; this is unnecessary and beyond the scope of this work.

      I suggest the following baselines be included to better understand how much of the segmentation accuracy is due to parameter tweaking on the considered datasets versus a novel method contribution:<br /> * comparison to thresholding (with the same post-processing as the proposed method)<br /> * comparison to a normalized cut segmentation (with the same post-processing as the proposed method)<br /> * comparison to references 8 and 9.

      Ref 8 and 9 don’t have readily usable (https://github.com/LiangHann/USAR) or even shared code (https://github.com/Kaiseem/AD-GAN), so re-implementing this work is well beyond the bounds of this paper. We benchmarked Cellpose, StartDist, SegResNets, and a transformer – SwinURNet. Moreover, models in the MONAI package can be used. Note, to our knowledge the transformer results also are a new contribution that the Reviewer does not acknowledge.

      I further strongly encourage the authors to discuss the limitations of their method. From what I understand, the proposed method works only on well-separated objects (due to the semantic segmentation bottleneck), is based on contrastive FG/BG intensity values (due to the SoftNCut loss), and requires tuning of a few parameters (which might be challenging if no ground-truth is available).

      We added text on limitations. Thanks for this suggestion.

      (2) Dataset

      I commend the authors for providing ground-truth labels for more than 2500 cells. I would appreciate it if the Methods section could mention how exactly the cells were labelled. I found a good overlap between the ground truth and Otsu thresholding of the intensity images. Was the ground truth generated by proofreading an initial automatic segmentation, or entirely done by hand? If the former, which method was used to generate the initial segmentation, and are there any concerns that the ground truth might be biased towards a given segmentation method?

      In the already submitted version, we have a 5-page DataSet card that fully answers your questions. They are ALL labeled by hand, without any semi-automatic process.

      In our main text we even stated “Using whole-brain data from mice we cropped small regions and human annotated in 3D 2,632 neurons that were endogenously labeled by TPH2-tdTomato” - clearly mentioning it is human-annotated.

      (3) Napari plugin

      The plugin is well-documented and works by following the installation instructions.

      Great, thanks for the positive feedback.

      However, I was not able to recreate the segmentations reported in the paper with the default settings for the pre-trained WNet3D: segments are generally too large and there are a lot of false positives. Both the prediction and the final instance segmentation also show substantial border artifacts, possibly due to a block-wise processing scheme.

      Your review here does not match your comments above; above you said it was working well, such that you doubt the GT is real and the data is too easy as it was perfectly easy to threshold with non-learning methods.

      You would need to share more details on what you tried. We suggest following our code; namely, we provide the full experimental code and processing for every figure, as was noted in our original submission: https://github.com/C-Achard/cellseg3d-figures.

      Reviewer #2 (Public Review):

      Summary:

      The authors propose a new method for self-supervised learning of 3d semantic segmentation for fluorescence microscopy. It is based on a WNet architecture (Encoder / Decoder using a UNet for each of these components) that reconstructs the image data after binarization in the bottleneck with a soft n-cuts clustering. They annotate a new dataset for nucleus segmentation in mesoSPIM imaging and train their model on this dataset. They create a napari plugin that provides access to this model and provides additional functionality for training of own models (both supervised and self-supervised), data labeling, and instance segmentation via post-processing of the semantic model predictions. This plugin also provides access to models trained on the contributed dataset in a supervised fashion.

      Strengths:

      (1) The idea behind the self-supervised learning loss is interesting.

      (2) The paper addresses an important challenge. Data annotation is very time-consuming for 3d microscopy data, so a self-supervised method that yields similar results to supervised segmentation would provide massive benefits.

      Thank you for highlighting the strengths of our work and new contributions.

      Weaknesses:

      The experiments presented by the authors do not adequately support the claims made in the paper. There are several shortcomings in the design of the experiment and presentation of the results. Further, it is unclear if results of similar quality as reported can be achieved within the GUI by non-expert users.

      Major weaknesses:

      (1) The main experiments are conducted on the new mesoSPIM dataset, which contains quite small and well separated nuclei. It is unclear if the good performance of the novel self-supervised learning method compared to CellPose and StarDist would hold for dataset with other characteristics, such as larger nuclei with a more complex morphology or crowded nuclei.

      StarDist is not pretrained, we trained it from scratch as we did for WNet3D. We retrained Cellpose and reported the results both with their pretrained model and our best-retrained model. This is documented in Figure 1 and Suppl. Figure 1. We also want to push back and say that they both work very well on this data. In fact, our main claim is not that we beat them, it is that we can match them with a self-supervised method.

      Further, additional preprocessing of the mesoSPIM images may improve results for StarDist and CellPose (see the first point in minor weaknesses). Note: having a method that works better for small nuclei would be an important contribution. But I am uncertain the claims hold for larger and/or more crowded nuclei as the current version of the paper implies.

      Figure 2 benchmarks our method on larger and denser nuclei, but we do not intend to claim this is a universal tool. It was specifically designed for light-sheet (brain) data, and we have adjusted the title to be more clear. But we also show in Figure 2 it works well on more dense and noisy samples, hinting that it could be a promising approach. But we agree, as-is, it’s unlikely to be good for extremely dense samples like in electron microscopy, which we never claim it would be.

      With regards to preprocessing, we respectfully disagree. We trained StarDist (and asked the main developer of StarDist, Martin Weigert, to check our work and he is acknowledged in the paper) and it does very well. Cellpose we also retrained and optimized and we show it works as-well-as leading transformer and CNN-based approaches. Again, we only claimed we can be as good as these methods with an unsupervised approach.

      The contribution of the paper would be stronger if a comparison with StarDist / CellPose was also done on the additional datasets from Figure 2.

      We appreciate that more datasets would be ideal, but we always feel it’s best for the authors of tools to benchmark their own tools on data. We only compared others in Figure 1 to the new dataset we provide so people get a sense of the quality of the data too; there we did extensive searches for best parameters for those tools. So while we think it would be nice, we will leave it to those authors to be most fair. We also narrowed the scope of our claims to mesoSPIM data (added light-sheet to the title), which none of the other examples in Figure 2 are.

      (2) The experimental setup for the additional datasets seems to be unrealistic. In general, the description of these experiments is quite short and so the exact strategy is unclear from the text. However, you write the following: "The channel containing the foreground was then thresholded and the Voronoi-Otsu algorithm used to generate instance labels (for Platynereis data), with hyperparameters based on the Dice metric with the ground truth." I.e., the hyperparameters for the post-processing are found based on the ground truth. From the description it is unclear whether this is done a) on the part of the data that is then also used to compute metrics or b) on a separate validation split that is not used to compute metrics. If a): this is not a valid experimental setup and amounts to training on your test set. If b): this is ok from an experimental point of view, but likely still significantly overestimates the quality of predictions that can be achieved by manual tuning of these hyperparameters by a user that is not themselves a developer of this plugin or an absolute expert in classical image analysis, see also 3. Note that the paper provides notebooks to reproduce the experimental results. This is very laudable, but I believe that a more extended description of the experiments in the text would still be very helpful to understand the set-up for the reader. Further, from inspection of these notebooks it becomes clear that hyper-parameters where indeed found on the testset (a), so the results are not valid in the current form.

      We apologize for this confusion; we have now expanded the methods to clarify the setup is now b; you can see what we exactly did as well in the figure notebook: https://c-achard.github.io/cellseg3d-figures/fig2-b-c-extra-datasets/self-supervised-extra.html#threshold-predictions. For clarity, we additionally link each individual notebook now in the Methods.

      (3) I cannot obtain similar results to the ones reported in the manuscript using the plugin. I tried to obtain some of the results from the paper qualitatively: First I downloaded one of the volumes from the mesoSPIM dataset (c5image) and applied the WNet3D to it. The prediction looks ok, however the value range is quite narrow (Average BG intensity ~0.4, FG intensity 0.6-0.7). I try to apply the instance segmentation using "Convert to instance labels" from "Utilities". Using "Voronoi-Otsu" does not work due to an error in pyClesperanto ("clGetPlatformIDs failed: PLATFORM_NOT_FOUND_KHR"). Segmentation via "Connected Components" and "Watershed" requires extensive manual tuning to get a somewhat decent result, which is still far from perfect.

      We are sorry to hear of the installation issue; pyClesperanto is a dependency that would be required to reproduce the images (sounds like you had this issue; https://forum.image.sc/t/pyclesperanto-prototype-doesnt-work/45724 ) We added to our docs now explicitly the fix: https://github.com/AdaptiveMotorControlLab/CellSeg3D/pull/90. We recommend checking the reproduction notebooks (which were linked in initial submission): https://c-achard.github.io/cellseg3d-figures/intro.html.

      Then I tried to obtain the results for the Mouse Skull Nuclei Dataset from EmbedSeg. The results look like a denoised version of the input image, not a semantic segmentation. I was skeptical from the beginning that the method would transfer without retraining, due to the very different morphology of nuclei (much larger and elongated). None of the available segmentation methods yield a good result, the best I can achieve is a strong over-segmentation with watersheds.

      - We are surprised to hear this; did you follow the following notebook which directly produces the steps to create this figure? (This was linked in preprint): https://c-achard.github.io/cellseg3d-figures/fig2-c-extra-datasets/self-supervised-extra .html

      -  We have made a video demo for you such that any step that might be unclear is also more clear to a user: (https://youtu.be/U2a9IbiO7nE).

      -  We also expanded the methods to include the exact values from the notebook into the text.

      Minor weaknesses:

      (1) CellPose can work better if images are resized so that the median object size in new images matches the training data. For CellPose the cyto2 model should do this automatically. It would be important to report if this was done, and if not would be advisable to check if this can improve results.

      We reported this value in Figure 1 and found it to work poorly, that is why we retrained Cellpose and found good performance results (also reported in Figure 1). Resizing GB to TB volumes for mesoSPIM data is otherwise not practical, so simply retraining seems the preferable option, which is what we did.

      (2) It is a bit confusing that F1-Score and Dice Score are used interchangeably to evaluate results. The dice score only evaluates semantic predictions, whereas F1-Score evaluates the actual instance segmentation results. I would advise to only use F1-Score, which is the more appropriate metric. For Figure 1f either the mean F1 score over thresholds or F1 @ 0.5 could be reported. Furthermore, I would advise adopting the recommendations on metric reporting from https://www.nature.com/articles/s41592-023-01942-8.

      We are using the common metrics in the field for instance and semantic segmentation, and report them in the methods. In Figure 2f we actually report the “Dice” as defined in StarDist (as we stated in the Methods). Note, their implementation is functionally equivalent to F1-Score of an IoU >= 0, so we simply changed this label in the figure now for clarity. We agree this clarifies for the expert readers what was done, and we expanded the methods to be more clear about metrics. We added a link to the paper you mention as well.

      (3) A more conceptual limitation is that the (self-supervised) method is limited to intensity-based segmentation, and so will not be able to work for cases where structures cannot be distinguished based on intensity only. It is further unclear how well it can separate crowded nuclei. While some object separation can be achieved by morphological operations this is generally limited for crowded segmentation tasks and the main motivation behind the segmentation objective used in StarDist, CellPose, and other instance segmentation methods. This limitation is only superficially acknowledged in "Note that WNet3D uses brightness to detect objects [...]" but should be discussed in more depth.

      Note: this limitation does not mean at all that the underlying contribution is not significant, but I think it is important to address this in more detail so that potential users know where the method is applicable and where it isn't.

      We agree, and we added a new section specifically on limitations. Thanks for raising this good point. Thus, while self-supervision comes at the saving of hundreds of manual labor, it comes at the cost of more limited regimes it can work on. Hence why we don’t claim this should replace excellent methods like Cellpose or Stardist, but rather complement them and can be used on mesoSPIM samples, as we show here.

    1. Author Response:

      We thank the reviewers for their thoughtful comments on our manuscript. In this provisional response, we aim to address the major concerns raised and outline a plan for a revised version of the manuscript. A more detailed point-by-point response will follow with the revision.

      The reviewers appreciated our efforts to combine computational modelling with experimental work. However, they also expressed the need for more clarity in explaining how the model was set up, what was simulated, and what the insights and limitations are. In the revision, we plan to improve the discussion section to clarify all of these points. 

      The reviewers also highlighted the need for more transparency regarding the code and the mathematical formulas used in this study. We agree that this is an important issue. While we have already made the software and code for our computational model, along with instructions on how to run it, available in Zenodo (see Ref. 1), and have extensively described the original computational model and formulas in a 13-page supplementary file in our previous study (see Ref. 2), we recognize from the reviewers’ comments that additional transparency is needed. To address this, we will provide an appendix in the revision that includes a full model description, covering the incorporation of cell differentiation and death, a list of parameters, and details on how parameter values were chosen.

      Additionally, in the revised manuscript, we will add a paragraph to more thoroughly discuss the limitations of our approach, as well as avenues for future studies. We hope this will clarify both capabilities and limitations of our model in a way that is more  accessible to readers of eLife.

      References:

      1. Virtual Thymus Model (version 2.0). Published: Jun 14, 2024.  doi:10.5281/zenodo.11656320

      2. Aghaallaei, Narges, et al. "αβ/γδ T cell lineage outcome is regulated by intrathymic cell localization and environmental signals." Science Advances 7.29 (2021): eabg3613.

    1. Author response:

      Reviewer #1:

      Weaknesses:

      (1) The activity of the dominant negatives lacks appropriate controls. This is crucial given that mouse mutants for PG5, PG6, PG7, and three of the four PG4 genes show no major effects on limb induction or growth. Understanding these discrepancies is essential.

      Given the importance of the Loss of Function (LOF) experiments, we will provide additional evidence for the validity of the dominant-negative strategy and constructs used.

      (2) The authors mention redundancies in Hox activity, consistent with numerous previous reports. However, they only use single dominant-negative versions of each Hox paralog gene individually. If Hox4 and Hox5 functions are redundant, experiments should include simultaneous dominant negatives for both groups.

      To clarify redundancies in Hox activity, we will test whether simultaneous expression of dominant-negative forms of more than one Hox genes induces a stronger effect compared to the expression of a single dominant-negatives Hox genes.

      (3) The main conclusion that Hox4 and Hox5 provide permissive cues on which Hox6/7 induce the forelimb is not sufficiently supported by the data. An experiment expressing simultaneous dnHox4/5 and Hox6/7 is needed. If the hypothesis is correct, this should block Hox6/7's capacity to expand the limb bud or generate an extra bulge.

      We agree that this is an excellent additional experiment to corroborate our conclusion and will perform this experiment in our revision.

      (4) The identity of the extra bulge or extended limb bud is unclear. The only marker supporting its identity as a forelimb is Tbx5, while other typical limb development markers are absent. Tbx5 is also expressed in other regions besides the forelimb, and its presence does not guarantee forelimb identity. For instance, snakes express Tbx5 in the lateral mesoderm along much of their body axis.

      To date, Tbx5 is the best marker for the forelimb. While it is true that the Tbx5 expression is broader than the limb field, this occurs only at early stages before forelimb bud formation. We will work towards a further definition of this extra bulge.

      (5) It is important to analyze the skeletons of all embryos to assess the effect of reduced limb buds upon dnHox expression and determine whether extra skeletal elements develop from the extended bud or ectopic bulge.

      We have analysed the cartilage structure of operated embryos with GOF experiments and found no skeletal elements within the ectopic wing bud in the neck. Additionally, in our revision, we can further analyse the wing skeleton of operated embryos with LOF experiments, which would provide more detailed assessments of the impact of dominant-negative Hox genes on wing bud formation.

      Reviewer #2:

      Weaknesses

      (1) By contrast to the GOF experiments that induce ectopic limb budding, the LOF experiments, which use dominant negative forms of Hoxa4, Hoxa5, Hoxa6, and Hoxa7, are more challenging to interpret due to the absence of data on the specificity of the dominant negative constructs. Absent such controls, one cannot be certain that effects on limb development are due to disruption of the specific Hox proteins that are being targeted.

      We will revise our manuscript to clarify the specificity of the dominant-negative strategy used.

      (2) A test of their central hypothesis regarding the necessity and sufficiency of the Hox genes under investigation would be to co-transfect the neck with full-length Hoxa6/a7 AND the dnHoxA4/a5. If their hypothesis is correct, then the dn constructs should block the limb-inducing ability of Hoxa6/a7 overexpression (again, validation of specificity of the DN constructs is important here).

      This is an excellent idea and we will implement the experiment in our revision.

      (3) The paper could be strengthened by providing some additional data, which should already exist in their RNA-Seq dataset, such as supplementary material that shows the actual gene expression data that are represented in the Venn diagram, heatmap, and GO analysis in Figure 3.

      We will incorporate this suggestion and include additional data from our RNA-seq analysis.

      (4) The results of these experiments in chick embryos are rather unexpected based on previous knockout experiments in mice, and this needs to be discussed.

      In our revision, we will appropriately expand the discussion on the discrepancies observed between knockout mouse models and our chick embryo experiments.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Fernandez et al. investigate the influence of maternal behavior on bat pup vocal development in Saccopteryx bilineata, a species known to exhibit vocal production learning. The authors performed detailed longitudinal observations of wild mother-pup interactions to ask whether non-vocal maternal displays during juvenile vocal practice or 'babbling', affect vocal production. Specifically, the study examines the durations of pup babbling events and the developmental babbling phase, in relation to the amount of female display behavior, as well as pup age and the number of nearby singing adult males. Furthermore, the authors examine pup vocal repertoire size and maturation in relation to the number of maternal displays encountered during babbling. Statistical models identify female display behavior as a predictor of i) babbling bout duration, ii) the length of the babbling phase, iii) song composition, and iv) syllable maturation. Notably, these outcomes were not influenced by the number of nearby adult males (the pups' source of song models) and were largely independent of general maturation (pup age). These findings highlight the impact of non-vocal aspects of social interactions in guiding mammalian vocal development.

      We thank Reviewer 1 for the time and effort dedicated to the revision of our study. The suggestions for the revision of our manuscript were very helpful and will improve our manuscript. 

      Strengths:

      Historically, work on developmental vocal learning has focused on how juvenile vocalizations are influenced by the sounds produced by nearby adults (often males). In contrast, this study takes the novel approach of examining juvenile vocal ontogeny in relation to non-vocal maternal behavior, in one of the few mammals known to exhibit vocal production learning. The authors collected an impressive dataset from multiple wild bat colonies in two Central American countries. This includes longitudinal acoustic recordings and behavioral monitoring of individual mother-pup pairs, across development.

      The identified relationships between maternal behavior and bat pup vocalizations have intriguing implications for understanding the mechanisms that enable vocal production learning in mammals, including human speech acquisition. As such, these findings are likely to be relevant to a broad audience interested in the evolution and development of social behavior as well as sensory-motor learning.

      We thank reviewer 1 for this assessment. 

      Weaknesses:

      The authors qualitatively describe specific patterns of female displays during pup babbling, however, subsequent quantitative analyses are based on two aggregate measures of female behavior that pool across display types. Consequently, it remains unclear how certain maternal behaviors might differentially influence pup vocalizations (e.g. through specific feedback contingencies or more general modulation of pup behavioral states).

      In analyzing the effects of maternal behavior on song maturation, the authors focus on the most common syllable type produced across pups. This approach is justified based on the syllable variability within and across individuals, however, additional quantification and visual presentation of categorized syllable data would improve clarity and potentially strengthen resulting claims.

      We agree that our analysis of maternal behaviour does not investigate potential contingencies between particular maternal behavioural displays and pup vocalizations (e.g.

      particular syllable types). Our data collected for this study on maternal behaviour includes direct observations, field notes and/or video recordings. In the future, it will be necessary to work with high-speed cameras for the analysis of potential contingencies between particular maternal behavioural displays and specific pup vocalizations, which allow this kind of fine-detailed analysis. We have planned future studies investigating whether pup vocalizations elicit contingent maternal responses or vice versa. In the revision of our manuscript, we will include a comment pointing out that this special behaviour will be investigated in greater detail in the future. 

      As suggested by reviewer 1, in our revised manuscript we will include more information on methods to improve understandability. In particular, we will:

      - present more information on different steps of our acoustic analyses

      - provide additional and clearer spectrogram figures representing the different syllable types and categorizations 

      - change the figures accompanying our GLMM analyses following the suggestion of Reviewer 1

      Reviewer #2 (Public review):

      Summary:

      This study explores how maternal behaviors influence vocal learning in the greater sac-winged bat (Saccopteryx bilineata). Over two field seasons, researchers tracked 19 bat pups from six wild colonies, examining vocal development aspects such as vocal practice duration, syllable repertoire size, and song syllable acquisition. The findings show that maternal behaviors significantly impact the length of daily babbling sessions and the overall babbling phase, while the presence of adult male tutors does not.

      The researchers conducted detailed acoustic analyses, categorizing syllables and evaluating the variety and presence of learned song syllables. They discovered that maternal interactions enhance both the number and diversity of learned syllables and the production of mature syllables in the pups' vocalizations. A notable correlation was found between the extent of acoustic changes in the most common learned syllable type and maternal activity, highlighting the key role of maternal feedback in shaping pups' vocal development.

      In summary, this study emphasizes the crucial role of maternal social feedback in the vocal development of S. bilineata. Maternal behaviors not only increase vocal practice but also aid in acquiring and refining a complex vocal repertoire. These insights enhance our understanding of social interactions in mammalian vocal learning and draw interesting parallels between bat and human vocal development.

      We thank reviewer 2 for his/her time and effort dedicated to the revision of our study. The suggestions were very helpful in improving our manuscript. 

      Strengths:

      This paper makes significant contributions to the field of vocal learning by looking at the role of maternal behaviors in shaping the vocal learning phenotype of Saccopteryx bilineata. The paper uses a longitudinal approach, tracking the vocal ontogeny of bat pups from birth to weaning across six colonies and two field seasons, allowing the authors to assess how maternal interactions influence various aspects of vocal practice and learning, providing strong empirical evidence for the critical role of social feedback in non-human mammalian vocal learners. This kind of evidence highlights the complexity of the vocal learning phenotype and shows that it goes beyond the right auditory experience and having the right circuitry.

      The paper offers a nuanced understanding of how specific maternal behaviors impact the acquisition and refinement of the vocal repertoire, while showing the number of male tutors - the source of adult song - did not have much of an effect. The correlation between maternal activity and acoustic changes in learned syllable types is a novel finding that underscores the importance of non-vocal social interactions in vocal learning. In vocal learning research, with some notable exceptions, experience is often understood as auditory experience. This paper highlights how, even though that is one important piece of the puzzle, other kinds of experience directly affect the development of vocal behavior. This is of particular importance in the case of a mammalian species such as Saccopteryx bilineata, as this kind of result is perhaps more often associated with avian species.

      Moreover, the study's findings have broader implications for our understanding of vocal learning across species. By drawing parallels between bat and human vocal development (and in some ways to bird vocal development), the paper highlights common mechanisms that may underlie vocal practice and learning in both humans and other mammals. This interdisciplinary perspective enriches the field and encourages further comparative studies, ultimately advancing our knowledge of the evolutionary and developmental processes that shape vocal productive learning in all its dimensions.

      Weaknesses:

      Some weaknesses can be pointed out, but in fairness, the authors acknowledge them in one way or another. As such, these are not flaws per se, but gaps that can be filled with further research.

      Experimental manipulations, such as controlled playback experiments or controlled environments, could strengthen the causal claims by directly testing the effects of specific maternal behaviors on vocal development. Certainly, the strengths of the paper will be consolidated after such work is performed.

      The reliance on the number of singing males as a proxy for social acoustic input. This measure does not account for the variability in the quality, frequency, or duration of the male songs to which the pups are exposed. A more detailed analysis of the acoustic environment, including direct measurements of song exposure and its impact on vocal learning, would provide a clearer understanding of the role of male tutors.

      Finally, and although it would be unlikely that these results are unique to Saccopteryx bilineata, the study's focus on a single species limits at present the generalizability of some of its findings to other vocal learning mammals. While the parallels drawn between bat and human vocal development are intriguing, the conclusions will be more robust when supported by comparative studies involving multiple species of vocal learners. This will help to identify whether the observed maternal influences on vocal development reported here are unique to Saccopteryx bilineata or represent a broader phenomenon in chiropteran, mammalian, or general vocal learning. Expanding the scope of research to include a wider range of species and incorporating cross-species comparisons will significantly enhance the contribution of this study to the field of vocal learning.

      Thank you for your suggestions and comments. 

      Regarding your main comment 1: In the future, we plan to implement temporary captivity experiments to investigate how maternal behaviours affect pup vocal development. This study provides the necessary basis for conducting future playback studies investigating specific behaviours in a controlled environment.

      Regarding your main comment 2: We completely agree that the number of singing males only represents a proxy for acoustic input that pups receive during ontogeny. In the future, we plan to investigate in detail how the acoustic landscape influences pup vocal development and learning. This will include quantifying how long pups are exposed to song during ontogeny and, assessing the influence of different tutors, including a detailed analysis of song syllables of the adult tutors to compare it to vocal trajectories of song syllables in pups. 

      Regarding your main comment 3: We also fully agree that it is unlikely that these results are unique to Saccopteryx bilineata. We are certain that other mammalian vocal learners show parallels to the vocal development and learning processes of S. bilineata. Especially bats are a promising taxon for comparative studies because their vocal production and perception systems are highly sophisticated (due to their ability to echolocate). The high sociability of this taxon also includes a variety of social systems and vocal capacities (e.g. regarding vocal repertoire size, vocal learning capacities, information content, etc.) which support social learning and social feedback – as shown in our study. 

      As suggested, in our revised manuscript we will include information on the validation of the ethogram. Furthermore, we will correct all the spelling mistakes – thank you very much for pointing them out!

    1. Author response:

      We appreciate all the reviewers for their encouraging comments and thoughtful feedback. We are confident that we can incorporate many of the suggestions to provide a clearer overall picture in the revised manuscript. In particular, we agree with the reviewers' concern that some of our methodological decisions, including our choice of metrics, require further clarification. We will focus on revising the methods section to make these decisions more transparent and to address any misunderstandings related to the analysis.

      We also value the request to include more data, such as intermediate results and additional control analyses. We will carefully assess which results to include in the main manuscript and which to provide in an extended supplementary section.

      To offer a more detailed understanding of our quantification of "prediction tendency," we refer to our previous work (Schubert et al., 2023, 2024), where we elaborate on our analytical choices in great detail and provide additional control analyses (e.g., ensuring that the relationship with speech tracking is not driven by participants' signal-to-noise ratio; Schubert et al., 2023).

      Additionally, we would like to clarify that the aim of this manuscript is not to analyze viewing behavior in depth but to replicate the general finding of ocular speech tracking, as presented in Gehmacher et al. (2024). A thorough investigation of specific ocular contributions (e.g., microsaccades or blinks) would require a separate research question and distinct analysis approaches, given the binary nature of such events.

      Nevertheless, we share the reviewers' interest in independent results from the current study, and we plan to carefully select and present the most relevant findings in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Misic et al showed that white matter properties can be used to classify subacute back pain patients that will develop persisting pain.

      Strengths:

      Compared to most previous papers studying associations between white matter properties and chronic pain, the strength of the method is to perform a prediction in unseen data. Another strength of the paper is the use of three different cohorts. This is an interesting paper that provides a valuable contribution to the field.

      We thank the reviewer for emphasizing the strength of our paper and the importance of validation on multiple unseen cohorts.

      Weaknesses:

      The authors imply that their biomarker could outperform traditional questionnaires to predict pain: "While these models are of great value showing that few of these variables (e.g. work factors) might have significant prognostic power on the long-term outcome of back pain and provide easy-to-use brief questionnaires-based tools, (21, 25) parameters often explain no more than 30% of the variance (28-30) and their prognostic accuracy is limited.(31)". I don't think this is correct; questionnaire-based tools can achieve far greater prediction than their model in about half a million individuals from the UK Biobank (Tanguay-Sabourin et al., A prognostic risk score for the development and spread of chronic pain, Nature Medicine 2023).

      We agree with the reviewer that we might have under-estimated the prognostic accuracy of questionnaire-based tools, especially, the strong predictive accuracy shown by Tangay-Sabourin 2023.  In this revised version, we have changed both the introduction and the discussion to reflect the questionnaire-based prognostic accuracy reported in the seminal work by Tangay-Sabourin. 

      In the introduction (page 4, lines 3-18), we now write:

      “Some studies have addressed this question with prognostic models incorporating demographic, pain-related, and psychosocial predictors.1-4 While these models are of great value showing that few of these variables (e.g. work factors) might have significant prognostic power on the long-term outcome of back pain, their prognostic accuracy is limited,5 with parameters often explaining no more than 30% of the variance.6-8. A recent notable study in this regard developed a model based on easy-to-use brief questionnaires to predict the development and spread of chronic pain in a variety of pain conditions capitalizing on a large dataset obtained from the UK-BioBank. 9 This work demonstrated that only few features related to assessment of sleep, neuroticism, mood, stress, and body mass index were enough to predict persistence and spread of pain with an area under the curve of 0.53-0.73. Yet, this study is unique in showing such a predictive value of questionnaire-based tools. Neurobiological measures could therefore complement existing prognostic models based on psychosocial variables to improve overall accuracy and discriminative power. More importantly, neurobiological factors such as brain parameters can provide a mechanistic understanding of chronicity and its central processing.”

      And in the conclusion (page 22, lines 5-9), we write:

      “Integrating findings from studies that used questionnaire-based tools and showed remarkable predictive power9 with neurobiological measures that can offer mechanistic insights into chronic pain development, could enhance predictive power in CBP prognostic modeling.”

      Moreover, the main weakness of this study is the sample size. It remains small despite having 3 cohorts. This is problematic because results are often overfitted in such a small sample size brain imaging study, especially when all the data are available to the authors at the time of training the model (Poldrack et al., Scanning the horizon: towards transparent and reproducible neuroimaging research, Nature Reviews in Neuroscience 2017). Thus, having access to all the data, the authors have a high degree of flexibility in data analysis, as they can retrain their model any number of times until it generalizes across all three cohorts. In this case, the testing set could easily become part of the training making it difficult to assess the real performance, especially for small sample size studies.

      The reviewer raises a very important point of limited sample size and of the methodology intrinsic of model development and testing. We acknowledge the small sample size in the “Limitations” section of the discussion.   In the resubmission, we acknowledge the degree of flexibility that is afforded by having access to all the data at once. However, we also note that our SLF-FA based model is a simple cut-off approach that does not include any learning or hidden layers and that the data obtained from Open Pain were never part of the “training” set at any point at either the New Haven or the Mannheim site.  Regarding our SVC approach we follow standard procedures for machine learning where we never mix the training and testing sets. The models are trained on the training data with parameters selected based on cross-validation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model. We write in the limitation section of the discussion (page 20, lines 20-21, and page 21, lines 1-6):

      “In addition, at the time of analysis, we had “access” to all the data, which may lead to bias in model training and development.  We believe that the data presented here are nevertheless robust since multisite validated but need replication. Additionally, we followed standard procedures for machine learning where we never mix the training and testing sets. The models were trained on the training data with parameters selected based on cross-validation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model”. 

      Finally, as discussed by Spisak et al., 10 the key determinant of the required sample size in predictive modeling is the ” true effect size of the brain-phenotype relationship”, which we think is the determinant of the replication we observe in this study. As such the effect size in the New Haven and Mannheim data is Cohen’s d >1.

      Even if the performance was properly assessed, their models show AUCs between 0.65-0.70, which is usually considered as poor, and most likely without potential clinical use. Despite this, their conclusion was: "This biomarker is easy to obtain (~10 min of scanning time) and opens the door for translation into clinical practice." One may ask who is really willing to use an MRI signature with a relatively poor performance that can be outperformed by self-report questionnaires?

      The reviewer is correct, the model performance is fair which limits its usefulness for clinical translation.  We wanted to emphasize that obtaining diffusion images can be done in a short period of time and, hence, as such models’ predictive accuracy improves, clinical translation becomes closer to reality. In addition, our findings are based on older diffusion data and limited sample sizes coming from different sites and different acquisition sequences.  This by itself would limit the accuracy especially since the evidence shows that sample size affects also model performance (i.e. testing AUC)10.  In the revision, we re-worded the sentence mentioned by the reviewer to reflect the points discussed here. This also motivates us to collect a more homogeneous and larger sample.  In the limitations section of the discussion, we now write (page 21, lines 6-9):

      “Even though our model performance is fair, which currently limits its usefulness for clinical translation, we believe that future models would further improve accuracy by using larger homogenous sample sizes and uniform acquisition sequences.”

      Overall, these criticisms are more about the wording sometimes used and the inference they made. I think the strength of the evidence is incomplete to support the main claims of the paper.

      Despite these limitations, I still think this is a very relevant contribution to the field. Showing predictive performance through cross-validation and testing in multiple cohorts is not an easy task and this is a strong effort by the team. I strongly believe this approach is the right one and I believe the authors did a good job.

      We thank the reviewer for acknowledging that our effort and approach were useful.

      Minor points:

      Methods:

      I get the voxel-wise analysis, but I don't understand the methods for the structural connectivity analysis between the 88 ROIs. Have the authors run tractography or have they used a predetermined streamlined form of 'population-based connectome'? They report that models of AUC above 0.75 were considered and tested in the Chicago dataset, but we have no information about what the model actually learned (although this can be tricky for decision tree algorithms). 

      We apologize for the lack of clarity; we did run tractography and we did not use a pre-determined streamlined form of the connectome.

      Finding which connections are important for the classification of SBPr and SBPp is difficult because of our choices during data preprocessing and SVC model development: (1) preprocessing steps which included TNPCA for dimensionality reduction, and regressing out the confounders (i.e., age, sex, and head motion); (2) the harmonization for effects of sites; and (3) the Support Vector Classifier which is a hard classification model11.

      In the methods section (page 30, lines 21-23) we added: “Of note, such models cannot tell us the features that are important in classifying the groups.  Hence, our model is considered a black-box predictive model like neural networks.”

      Minor:

      What results are shown in Figure 7? It looks more descriptive than the actual results.

      The reviewer is correct; Figure 7 and Supplementary Figure 4 were both qualitatively illustrating the shape of the SLF. We have now changed both figures in response to this point and a point raised by reviewer 3.  We now show a 3D depiction of different sub-components of the right SLF (Figure 7) and left SLF (Now Supplementary Figure 11 instead of Supplementary Figure 4) with a quantitative estimation of the FA content of the tracts, and the number of tracts per component.  The results reinforce the TBSS analysis in showing asymmetry in the differences between left and right SLF between the groups (i.e. SBPp and SBPr) in both FA values and number of tracts per bundle.

      Reviewer #2 (Public Review):

      The present study aims to investigate brain white matter predictors of back pain chronicity. To this end, a discovery cohort of 28 patients with subacute back pain (SBP) was studied using white matter diffusion imaging. The cohort was investigated at baseline and one-year follow-up when 16 patients had recovered (SBPr) and 12 had persistent back pain (SBPp). A comparison of baseline scans revealed that SBPr patients had higher fractional anisotropy values in the right superior longitudinal fasciculus SLF) than SBPp patients and that FA values predicted changes in pain severity. Moreover, the FA values of SBPr patients were larger than those of healthy participants, suggesting a role of FA of the SLF in resilience to chronic pain. These findings were replicated in two other independent datasets. The authors conclude that the right SLF might be a robust predictive biomarker of CBP development with the potential for clinical translation.

      Developing predictive biomarkers for pain chronicity is an interesting, timely, and potentially clinically relevant topic. The paradigm and the analysis are sound, the results are convincing, and the interpretation is adequate. A particular strength of the study is the discovery-replication approach with replications of the findings in two independent datasets.

      We thank reviewer 2 for pointing to the strength of our study.

      The following revisions might help to improve the manuscript further.

      - Definition of recovery. In the New Haven and Chicago datasets, SBPr and SBPp patients are distinguished by reductions of >30% in pain intensity. In contrast, in the Mannheim dataset, both groups are distinguished by reductions of >20%. This should be harmonized. Moreover, as there is no established definition of recovery (reference 79 does not provide a clear criterion), it would be interesting to know whether the results hold for different definitions of recovery. Control analyses for different thresholds could strengthen the robustness of the findings.

      The reviewer raises an important point regarding the definition of recovery.  To address the reviewers’ concern we have added a supplementary figure (Fig. S6) showing the results in the Mannheim data set if a 30% reduction is used as a recovery criterion, and in the manuscript (page 11, lines 1,2) we write: “Supplementary Figure S6 shows the results in the Mannheim data set if a 30% reduction is used as a recovery criterion in this dataset (AUC= 0.53)”.

      We would like to emphasize here several points that support the use of different recovery thresholds between New Haven and Mannheim.  The New Haven primary pain ratings relied on visual analogue scale (VAS) while the Mannheim data relied on the German version of the West-Haven-Yale Multidimensional Pain Inventory. In addition, the Mannheim data were pre-registered with a definition of recovery at 20% and are part of a larger sub-acute to chronic pain study with prior publications from this cohort using the 20% cut-off12. Finally, a more recent consensus publication13 from IMMPACT indicates that a change of at least 30% is needed for a moderate improvement in pain on the 0-10 Numerical Rating Scale but that this percentage depends on baseline pain levels.

      - Analysis of the Chicago dataset. The manuscript includes results on FA values and their association with pain severity for the New Haven and Mannheim datasets but not for the Chicago dataset. It would be straightforward to show figures like Figures 1 - 4 for the Chicago dataset, as well.

      We welcome the reviewer’s suggestion; we added these analyses to the results section of the resubmitted manuscript (page 11, lines 13-16): “The correlation between FA values in the right SLF and pain severity in the Chicago data set showed marginal significance (p = 0.055) at visit 1 (Fig. S8A) and higher FA values were significantly associated with a greater reduction in pain at visit 2 (p = 0.035) (Fig. S8B).”

      - Data sharing. The discovery-replication approach of the present study distinguishes the present from previous approaches. This approach enhances the belief in the robustness of the findings. This belief would be further enhanced by making the data openly available. It would be extremely valuable for the community if other researchers could reproduce and replicate the findings without restrictions. It is not clear why the fact that the studies are ongoing prevents the unrestricted sharing of the data used in the present study.

      We greatly appreciate the reviewer's suggestion to share our data sets, as we strongly support the Open Science initiative. The Chicago data set is already publicly available. The New Haven data set will be shared on the Open Pain repository, and the Mannheim data set will be uploaded to heiDATA or heiARCHIVE at Heidelberg University in the near future. We cannot share the data immediately because this project is part of the Heidelberg pain consortium, “SFB 1158: From nociception to chronic pain: Structure-function properties of neural pathways and their reorganization.” Within this consortium, all data must be shared following a harmonized structure across projects, and no study will be published openly until all projects have completed initial analysis and quality control.

      Reviewer #3 (Public Review):

      Summary:

      Authors suggest a new biomarker of chronic back pain with the option to predict the result of treatment. The authors found a significant difference in a fractional anisotropy measure in superior longitudinal fasciculus for recovered patients with chronic back pain.

      Strengths:

      The results were reproduced in three different groups at different studies/sites.

      Weaknesses:

      - The number of participants is still low.

      The reviewer raises a very important point of limited sample size. As discussed in our replies to reviewer number 1:

      We acknowledge the small sample size in the “Limitations” section of the discussion.   In the resubmission, we acknowledge the degree of flexibility that is afforded by having access to all the data at once. However, we also note that our SLF-FA based model is a simple cut-off approach that does not include any learning or hidden layers and that the data obtained from Open Pain were never part of the “training” set at any point at either the New Haven or the Mannheim site.  Regarding our SVC approach we follow standard procedures for machine learning where we never mix the training and testing sets. The models are trained on the training data with parameters selected based on cross-validation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model. We write in the limitation section of the discussion (page 20, lines 20-21, and page 21, lines 1-6):

      “In addition, at the time of analysis, we had “access” to all the data, which may lead to bias in model training and development.  We believe that the data presented here are nevertheless robust since multisite validated but need replication. Additionally, we followed standard procedures for machine learning where we never mix the training and testing sets. The models were trained on the training data with parameters selected based on cross-validation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model”. 

      Finally, as discussed by Spisak et al., 10 the key determinant of the required sample size in predictive modeling is the ” true effect size of the brain-phenotype relationship”, which we think is the determinant of the replication we observe in this study. As such the effect size in the New Haven and Mannheim data is Cohen’s d >1.

      - An explanation of microstructure changes was not given.

      The reviewer points to an important gap in our discussion.  While we cannot do a direct study of actual tissue microstructure, we explored further the changes observed in the SLF by calculating diffusivity measures. We have now performed the analysis of mean, axial, and radial diffusivity. 

      In the results section we added (page 7, lines 12-19): “We also examined mean diffusivity (MD), axial diffusivity (AD), and radial diffusivity (RD) extracted from the right SLF shown in Fig.1 to further understand which diffusion component is different between the groups. The right SLF MD is significantly increased (p < 0.05) in the SBPr compared to SBPp patients (Fig. S3), while the right SLF RD is significantly decreased (p < 0.05) in the SBPr compared to SBPp patients in the New Haven data (Fig. S4). Axial diffusivity extracted from the RSLF mask did not show significant difference between SBPr and SBPp (p = 0.28) (Fig. S5).”

      In the discussion, we write (page 15, lines 10-20):

      “Within the significant cluster in the discovery data set, MD was significantly increased, while RD in the right SLF was significantly decreased in SBPr compared to SBPp patients. Higher RD values, indicative of demyelination, were previously observed in chronic musculoskeletal patients across several bundles, including the superior longitudinal fasciculus14.  Similarly, Mansour et al. found higher RD in SBPp compared to SBPr in the predictive FA cluster. While they noted decreased AD and increased MD in SBPp, suggestive of both demyelination and altered axonal tracts,15 our results show increased MD and RD in SBPr with no AD differences between SBPp and SBPr, pointing to white matter changes primarily due to myelin disruption rather than axonal loss, or more complex processes. Further studies on tissue microstructure in chronic pain development are needed to elucidate these processes.”

      - Some technical drawbacks are presented.

      We are uncertain if the reviewer is suggesting that we have acknowledged certain technical drawbacks and expects further elaboration on our part. We kindly request that the reviewer specify what particular issues need to be addressed so that we can respond appropriately.

      Recommendations For The Authors:

      We thank the reviewers for their constructive feedback, which has significantly improved our manuscript. We have done our best to answer the criticisms that they raised point-by-point.

      Reviewer #2 (Recommendations For The Authors):

      The discovery-replication approach of the current study justifies the use of the terminus 'robust.' In contrast, previous studies on predictive biomarkers using functional and structural brain imaging did not pursue similar approaches and have not been replicated. Still, the respective biomarkers are repeatedly referred to as 'robust.' Throughout the manuscript, it would, therefore, be more appropriate to remove the label 'robust' from those studies.

      We thank the reviewer for this valuable suggestion. We removed the label 'robust' throughout the manuscript when referring to the previous studies which didn’t follow the same approach and have not yet been replicated.

      Reviewer #3 (Recommendations For The Authors):

      This is, indeed, quite a well-written manuscript with very interesting findings and patient group. There are a few comments that enfeeble the findings.

      (1) It is a bit frustrating to read at the beginning how important chronic back pain is and the number of patients in the used studies. At least the number of healthy subjects could be higher.

      The reviewer raises an important point regarding the number of pain-free healthy controls (HC) in our samples. We first note that our primary statistical analysis focused on comparing recovered and persistent patients at baseline and validating these findings across sites without directly comparing them to HCs. Nevertheless, the data from New Haven included 28 HCs at baseline, and the data from Mannheim included 24 HCs. Although these sample sizes are not large, they have enabled us to clearly establish that the recovered SBPr patients generally have larger FA values in the right superior longitudinal fasciculus compared to the HCs, a finding consistent across sites (see Figs. 1 and 3). This suggests that the general pain-free population includes individuals with both low and high-risk potential for chronic pain. It also offers one explanation for the reported lack of differences or inconsistent differences between chronic low-back pain patients and HCs in the literature, as these differences likely depend on the (unknown) proportion of high- and low-risk individuals in the control groups. Therefore, if the high-risk group is more represented by chance in the HC group, comparisons between HCs and chronic pain patients are unlikely to yield statistically significant results. Thus, while we agree with the reviewer that the sample sizes of our HCs are limited, this limitation does not undermine the validity of our findings.

      (2) Pain reaction in the brain is in general a quite popular topic and could be connected to the findings or mentioned in the introduction.

      We thank the reviewer for this suggestion.  We have now added a summary of brain response to pain in general; In the introduction, we now write (page 4, lines 19-22 and page 5, lines 1-5):

      “Neuroimaging research on chronic pain has uncovered a shift in brain responses to pain when acute and chronic pain are compared. The thalamus, primary somatosensory, motor areas, insula, and mid-cingulate cortex most often respond to acute pain and can predict the perception of acute pain16-19. Conversely, limbic brain areas are more frequently engaged when patients report the intensity of their clinical pain20, 21. Consistent findings have demonstrated that increased prefrontal-limbic functional connectivity during episodes of heightened subacute ongoing back pain or during a reward learning task is a significant predictor of CBP.12, 22. Furthermore, low somatosensory cortex excitability in the acute stage of low back pain was identified as a predictor of CBP chronicity.23”

      (3) It is clearly observed structural asymmetry in the brain, why not elaborate this finding further? Would SLF be a hub in connectivity analysis? Would FA changes have along tract features? etc etc etc

      The reviewer raises an important point. There is ground to suggest from our data that there is an asymmetry to the role of the SLF in resilience to chronic pain. We discuss this at length in the Discussion section. We have, in addition, we elaborated more in our data analysis using our Population Based Structural Connectome pipeline on the New Haven dataset. Following that approach, we studied both the number of fiber tracts making different parts of the SLF on the right and left side. In addition, we have extracted FA values along fiber tracts and compared the average across groups. Our new analyses are presented in our modified Figures 7 and Fig S11.  These results support the asymmetry hypothesis indeed. The SLF could be a hub of structural connectivity. Please note however, given the nature of our design of discovery and validation, the study of structural connectivity of the SLF is beyond the scope of this paper because tract-based connectivity is very sensitive to data collection parameters and is less accurate with single shell DWI acquisition. Therefore, we will pursue the study of connectivity of the SLF in the future with well-powered and more harmonized data.

      (4) Only FA is mentioned; did the authors work with MD, RD, and AD metrics?

      We thank the reviewer for this suggestion that helps in providing a clearer picture of the differences in the right SLF between SBPr and SBPp. We have now extracted MD, AD, and RD for the predictive mask we discovered in Figure 1 and plotted the values comparing SBPr to SBPp patients in Fig. S3, Fig. S4., and Fig. S5 across all sites using one comprehensive harmonized analysis. We have added in the discussion “Within the significant cluster in the discovery data set, MD was significantly increased, while RD in the right SLF was significantly decreased in SBPr compared to SBPp patients. Higher RD values, indicative of demyelination, were previously observed in chronic musculoskeletal patients across several bundles, including the superior longitudinal fasciculus14.  Similarly, Mansour et al. found higher RD in SBPp compared to SBPr in the predictive FA cluster. While they noted decreased AD and increased MD in SBPp, suggestive of both demyelination and altered axonal tracts15, our results show increased MD and RD in SBPr with no AD differences between SBPp and SBPr, pointing to white matter changes primarily due to myelin disruption rather than axonal loss, or more complex processes. Further studies on tissue microstructure in chronic pain development are needed to elucidate these processes.”

      (5) There are many speculations in the Discussion, however, some of them are not supported by the results.

      We agree with the reviewer and thank them for pointing this out. We have now made several changes across the discussion related to the wording where speculations were not supported by the data. For example, instead of writing (page 16, lines 7-9): “Together the literature on the right SLF role in higher cognitive functions suggests, therefore, that resilience to chronic pain is a top-down phenomenon related to visuospatial and body awareness.”, We write: “Together the literature on the right SLF role in higher cognitive functions suggests, therefore, that resilience to chronic pain might be related to a top-down phenomenon involving visuospatial and body awareness.”

      (6) A method section was written quite roughly. In order to obtain all the details for a potential replication one needs to jump over the text.

      The reviewer is correct; our methodology may have lacked more detailed descriptions.  Therefore, we have clarified our methodology more extensively.  Under “Estimation of structural connectivity”; we now write (page 28, lines 20,21 and page 29, lines 1-19):

      “Structural connectivity was estimated from the diffusion tensor data using a population-based structural connectome (PSC) detailed in a previous publication.24 PSC can utilize the geometric information of streamlines, including shape, size, and location for a better parcellation-based connectome analysis. It, therefore, preserves the geometric information, which is crucial for quantifying brain connectivity and understanding variation across subjects. We have previously shown that the PSC pipeline is robust and reproducible across large data sets.24 PSC output uses the Desikan-Killiany atlas (DKA) 25 of cortical and sub-cortical regions of interest (ROI). The DKA parcellation comprises 68 cortical surface regions (34 nodes per hemisphere) and 19 subcortical regions. The complete list of ROIs is provided in the supplementary materials’ Table S6.  PSC leverages a reproducible probabilistic tractography algorithm 26 to create whole-brain tractography data, integrating anatomical details from high-resolution T1 images to minimize bias in the tractography. We utilized DKA 25 to define the ROIs corresponding to the nodes in the structural connectome. For each pair of ROIs, we extracted the streamlines connecting them by following these steps: 1) dilating each gray matter ROI to include a small portion of white matter regions, 2) segmenting streamlines connecting multiple ROIs to extract the correct and complete pathway, and 3) removing apparent outlier streamlines. Due to its widespread use in brain imaging studies27, 28, we examined the mean fractional anisotropy (FA) value along streamlines and the count of streamlines in this work. The output we used includes fiber count, fiber length, and fiber volume shared between the ROIs in addition to measures of fractional anisotropy and mean diffusivity.”

      (7) Why not join all the data with harmonisation in order to reproduce the results (TBSS)

      We have followed the reviewer’s suggestion; we used neuroCombat harmonization after pooling all the diffusion weighted data into one TBSS analysis. Our results remain the same after harmonization. 

      In the Supplementary Information we added a paragraph explaining the method for harmonization; we write (SI, page 3, lines 25-34):

      “Harmonization of DTI data using neuroCombat. Because the 3 data sets originated from different sites using different MR data acquisition parameters and slightly different recruitment criteria, we applied neuroCombat 29  to correct for site effects and then repeated the TBSS analysis shown in Figure 1 and the validation analyses shown in Figures 5 and 6. First, the FA maps derived using the FDT toolbox were pooled into one TBSS analysis where registration to a standard template FA template (FMRIB58_FA_1mm.nii.gz part of FSL) was performed.  Next, neuroCombat was applied to the FA maps as implemented in Python with batch (i.e., site) effect modeled with a vector containing 1 for New Haven, 2 for Chicago, and 3 for Mannheim originating maps, respectively. The harmonized maps were then skeletonized to allow for TBSS.”

      And in the results section, we write (page 12, lines 2-21):

      “Validation after harmonization

      Because the DTI data sets originated from 3 sites with different MR acquisition parameters, we repeated our TBSS and validation analyses after correcting for variability arising from site differences using DTI data harmonization as implemented in neuroCombat. 29 The method of harmonization is described in detail in the Supplementary Methods. The whole brain unpaired t-test depicted in Figure 1 was repeated after neuroCombat and yielded very similar results (Fig. S9A) showing significantly increased FA in the SBPr compared to SBPp patients in the right superior longitudinal fasciculus (MNI-coordinates of peak voxel: x = 40; y = - 42; z = 18 mm; t(max) = 2.52; p < 0.05, corrected against 10,000 permutations).  We again tested the accuracy of local diffusion properties (FA) of the right SLF extracted from the mask of voxels passing threshold in the New Haven data (Fig.S9A) in classifying the Mannheim and the Chicago patients, respectively, into persistent and recovered. FA values corrected for age, gender, and head displacement accurately classified SBPr  and SBPp patients from the Mannheim data set with an AUC = 0.67 (p = 0.023, tested against 10,000 random permutations, Fig. S9B and S7D), and patients from the Chicago data set with an AUC = 0.69 (p = 0.0068) (Fig. S9C and S7E) at baseline, and an AUC = 0.67 (p = 0.0098)  (Fig. S9D and S7F) patients at follow-up,  confirming the predictive cluster from the right SLF across sites. The application of neuroCombat significantly changes the FA values as shown in Fig.S10 but does not change the results between groups.”

      Minor comments

      (1) In the case of New Haven data, one used MB 4 and GRAPPA 2, these two factors accelerate the imaging 8 times and often lead to quite a poor quality.<br /> Any kind of QA?

      We thank the reviewer for identifying this error. GRAPPA 2 was in fact used for our T1-MPRAGE image acquisition but not during the diffusion data acquisition. The diffusion data were acquired with a multi-band acceleration factor of 4.  We have now corrected this mistake.

      (2) Why not include MPRAGE data into the analysis, in particular, for predictions?

      We thank the reviewer for the suggestion. The collaboration on this paper was set around diffusion data. In addition, MPRAGE data from New Haven related to prediction is already published (10.1073/pnas.1918682117) and MPRAGE data of the Mannheim data set is a part of the larger project and will be published elsewhere.

      (3) In preprocessing, the authors wrote: "Eddy current corrects for image distortions due to susceptibility-induced distortions and eddy currents in the gradient coil"<br /> However, they did not mention that they acquired phase-opposite b0 data. It means eddy_openmp works likely only as an alignment tool, but not susceptibility corrector.

      We kindly thank the reviewer for bringing this to our attention. We indeed did not collect b0 data in the phase-opposite direction, however, eddy_openmp can still be used to correct for eddy current distortions and perform motion correction, but the absence of phase-opposite b0 data may limit its ability to fully address susceptibility artifacts. This is now noted in the Supplementary Methods under Preprocessing section (SI, page 3, lines 16-18): “We do note, however, that as we did not acquire data in the phase-opposite direction, the susceptibility-induced distortions may not be fully corrected.”

      (4) Version of FSL?

      We thank the reviewer for addressing this point that we have now added under the Supplementary Methods (SI, page 3, lines 10-11): “Preprocessing of all data sets was performed employing the same procedures and the FMRIB diffusion toolbox (FDT) running on FSL version 6.0.”

      (5) Some short sketches about the connectivity analysis could be useful, at least in SI.

      We are grateful for this suggestion that improves our work. We added the sketches about the connectivity analysis, please see Figure 7 and Supplementary Figure 11.

      (6) Machine learning: functions, language, version?

      We thank the reviewer for pointing out these minor points that we now hope to have addressed in our resubmission in the Methods section by adding a detailed description of the structural connectivity analysis. We added: “The DKA parcellation comprises 68 cortical surface regions (34 nodes per hemisphere) and 19 subcortical regions. The complete list of ROIs is provided in the supplementary materials’ Table S7.  PSC leverages a reproducible probabilistic tractography algorithm 26 to create whole-brain tractography data, integrating anatomical details from high-resolution T1 images to minimize bias in the tractography. We utilized DKA 25 to define the ROIs corresponding to the nodes in the structural connectome. For each pair of ROIs, we extracted the streamlines connecting them by following these steps: 1) dilating each gray matter ROI to include a small portion of white matter regions, 2) segmenting streamlines connecting multiple ROIs to extract the correct and complete pathway, and 3) removing apparent outlier streamlines. Due to its widespread use in brain imaging studies27, 28, we examined the mean fractional anisotropy (FA) value along streamlines and the count of streamlines in this work. The output we used includes fiber count, fiber length, and fiber volume shared between the ROIs in addition to measures of fractional anisotropy and mean diffusivity.”

      The script is described and provided at: https://github.com/MISICMINA/DTI-Study-Resilience-to-CBP.git.

      (7) Ethical approval?

      The New Haven data is part of a study that was approved by the Yale University Institutional Review Board. This is mentioned under the description of the data “New Haven (Discovery) data set (page 23, lines 1,2).  Likewise, the Mannheim data is part of a study approved by Ethics Committee of the Medical Faculty of Mannheim, Heidelberg University, and was conducted in accordance with the declaration of Helsinki in its most recent form. This is also mentioned under “Mannheim data set” (page 26, lines 2-5): “The study was approved by the Ethics Committee of the Medical Faculty of Mannheim, Heidelberg University, and was conducted in accordance with the declaration of Helsinki in its most recent form.”

      (1) Traeger AC, Henschke N, Hubscher M, et al. Estimating the Risk of Chronic Pain: Development and Validation of a Prognostic Model (PICKUP) for Patients with Acute Low Back Pain. PLoS Med 2016;13:e1002019.

      (2) Hill JC, Dunn KM, Lewis M, et al. A primary care back pain screening tool: identifying patient subgroups for initial treatment. Arthritis Rheum 2008;59:632-641.

      (3) Hockings RL, McAuley JH, Maher CG. A systematic review of the predictive ability of the Orebro Musculoskeletal Pain Questionnaire. Spine (Phila Pa 1976) 2008;33:E494-500.

      (4) Chou R, Shekelle P. Will this patient develop persistent disabling low back pain? JAMA 2010;303:1295-1302.

      (5) Silva FG, Costa LO, Hancock MJ, Palomo GA, Costa LC, da Silva T. No prognostic model for people with recent-onset low back pain has yet been demonstrated to be suitable for use in clinical practice: a systematic review. J Physiother 2022;68:99-109.

      (6) Kent PM, Keating JL. Can we predict poor recovery from recent-onset nonspecific low back pain? A systematic review. Man Ther 2008;13:12-28.

      (7) Hruschak V, Cochran G. Psychosocial predictors in the transition from acute to chronic pain: a systematic review. Psychol Health Med 2018;23:1151-1167.

      (8) Hartvigsen J, Hancock MJ, Kongsted A, et al. What low back pain is and why we need to pay attention. Lancet 2018;391:2356-2367.

      (9) Tanguay-Sabourin C, Fillingim M, Guglietti GV, et al. A prognostic risk score for development and spread of chronic pain. Nat Med 2023;29:1821-1831.

      (10) Spisak T, Bingel U, Wager TD. Multivariate BWAS can be replicable with moderate sample sizes. Nature 2023;615:E4-E7.

      (11) Liu Y, Zhang HH, Wu Y. Hard or Soft Classification? Large-margin Unified Machines. J Am Stat Assoc 2011;106:166-177.

      (12) Loffler M, Levine SM, Usai K, et al. Corticostriatal circuits in the transition to chronic back pain: The predictive role of reward learning. Cell Rep Med 2022;3:100677.

      (13) Smith SM, Dworkin RH, Turk DC, et al. Interpretation of chronic pain clinical trial outcomes: IMMPACT recommended considerations. Pain 2020;161:2446-2461.

      (14) Lieberman G, Shpaner M, Watts R, et al. White Matter Involvement in Chronic Musculoskeletal Pain. The Journal of Pain 2014;15:1110-1119.

      (15) Mansour AR, Baliki MN, Huang L, et al. Brain white matter structural properties predict transition to chronic pain. Pain 2013;154:2160-2168.

      (16) Wager TD, Atlas LY, Lindquist MA, Roy M, Woo CW, Kross E. An fMRI-based neurologic signature of physical pain. N Engl J Med 2013;368:1388-1397.

      (17) Lee JJ, Kim HJ, Ceko M, et al. A neuroimaging biomarker for sustained experimental and clinical pain. Nat Med 2021;27:174-182.

      (18) Becker S, Navratilova E, Nees F, Van Damme S. Emotional and Motivational Pain Processing: Current State of Knowledge and Perspectives in Translational Research. Pain Res Manag 2018;2018:5457870.

      (19) Spisak T, Kincses B, Schlitt F, et al. Pain-free resting-state functional brain connectivity predicts individual pain sensitivity. Nat Commun 2020;11:187.

      (20) Baliki MN, Apkarian AV. Nociception, Pain, Negative Moods, and Behavior Selection. Neuron 2015;87:474-491.

      (21) Elman I, Borsook D. Common Brain Mechanisms of Chronic Pain and Addiction. Neuron 2016;89:11-36.

      (22) Baliki MN, Petre B, Torbey S, et al. Corticostriatal functional connectivity predicts transition to chronic back pain. Nat Neurosci 2012;15:1117-1119.

      (23) Jenkins LC, Chang WJ, Buscemi V, et al. Do sensorimotor cortex activity, an individual's capacity for neuroplasticity, and psychological features during an episode of acute low back pain predict outcome at 6 months: a protocol for an Australian, multisite prospective, longitudinal cohort study. BMJ Open 2019;9:e029027.

      (24) Zhang Z, Descoteaux M, Zhang J, et al. Mapping population-based structural connectomes. Neuroimage 2018;172:130-145.

      (25) Desikan RS, Segonne F, Fischl B, et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 2006;31:968-980.

      (26) Maier-Hein KH, Neher PF, Houde J-C, et al. The challenge of mapping the human connectome based on diffusion tractography. Nature Communications 2017;8:1349.

      (27) Chiang MC, McMahon KL, de Zubicaray GI, et al. Genetics of white matter development: a DTI study of 705 twins and their siblings aged 12 to 29. Neuroimage 2011;54:2308-2317.

      (28) Zhao B, Li T, Yang Y, et al. Common genetic variation influencing human white matter microstructure. Science 2021;372.

      (29) Fortin JP, Parker D, Tunc B, et al. Harmonization of multi-site diffusion tensor imaging data. Neuroimage 2017;161:149-170.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors intended to investigate the earliest mechanisms enabling self-prioritization, especially in the attention. Combining a temporal order judgement task with computational modelling based on the Theory of Visual Attention (TVA), the authors suggested that the shapes associated with the self can fundamentally alter the attentional selection of sensory information into awareness. This self-prioritization in attentional selection occurs automatically at early perceptual stages. Furthermore, the processing benefits obtained from attentional selection via self-relatedness and physical salience were separated from each other.

      Strengths:

      The manuscript is written in a way that is easy to follow. The methods of the paper are very clear and appropriate.

      Thank you for your valuable feedback and helpful suggestions. Please see specific answers below.

      Weaknesses:

      There are two main concerns:

      (1) The authors had a too strong pre-hypothesis that self-prioritization was associated with attention. They used the prior entry to consciousness (awareness) as an index of attention, which is not appropriate. There may be other processing that makes the stimulus prior to entry to consciousness (e.g. high arousal, high sensitivity), but not attention. The self-related/associated stimulus may be involved in such processing but not attention to make the stimulus easily caught. Perhaps the authors could include other methods such as EEG or MEG to answer this question.

      We found the possibility of other mechanisms to be responsible for “prior entry” interesting too, but believe there are solid grounds for the hypothesis that it is indicative of attention:

      First, prior entry has a long-standing history as in index of attention (e.g., Titchener, 1903; Shore et al., 2001; Yates and Nicholls, 2009; Olivers et al. 2011; see Spence & Parise, 2010, for a review.) Of course, other factors (like the ones mentioned) can contribute to encoding speed. However, for the perceptual condition, we systematically varied a stimulus feature that is associated with selective attention (salience, see e.g. Wolfe, 2021) and kept other features that are known to be associated with other factors such as arousal and sensitivity constant across the two variants (e.g. clear over threshold visibility) or varied them between participants (e.g. the colours / shapes used).

      Second, in the social salience condition we used a manipulation that has repeatedly been used to establish social salience effects in other paradigms (e.g., Li et al., 2022; Liu & Sui, 2016; Scheller et al., 2024; Sui et al., 2015; see Humphreys & Sui, 2016, for a review). We assume that the reviewer’s comment suggests that changes in arousal or sensitivity may be responsible for social salience effects, specifically. We have several reasons to interpret the social salience effects as an alteration in attentional selection, rather than a result of arousal or sensitivity:

      Arousal and attention are closely linked. However, within the present model, arousal is more likely linked to the availability of processing resources (capacity parameter C). That is, enhanced arousal is typically not stimulus-specific, and therefore unlikely affects the *relative* advantage in processing weights/rates of the self-associated (vs other-associated) stimuli. Indeed, a recent study showed that arousal does not modulate the relative division of attentional resources (as modelled by the Theory of Visual Attention; Asgeirsson & Nieuwenhuis, 2017). As such, it is unlikely that arousal can explain the observed results in relative processing changes for the self and other identities.

      Further, there is little reason to assume that presenting a different shape enhances perceptual sensitivity. Firstly, all stimuli were presented well above threshold, which would shrink any effects that were resulting from increases in sensitivity alone. Secondly, shape-associations were counterbalanced across participants, reducing the possibility that specific features, present in the stimulus display, lead to the measurable change in processing rates as a result of enhanced shape-sensitivity.

      Taken together, both, the wealth of literature that suggests prior entry to index attention and the specific design choices within our study, strongly support the notion that the observed changes in processing rates are indicative of changes in attentional selection, rather than other mechanisms (e.g. arousal, sensitivity).

      (2) The authors suggested that there are two independent attention processes. I suspect that the brain needs two attention systems. Is there a probability that the social and perceptual (physical properties of the stimulus) salience fired the same attention processing through different processing?

      We appreciate this thought-provoking comment. We conceptualize attention as a process that can facilitate different levels of representation, rather than as separate systems tuned to specific types of information. Different forms of representation, such as the perceptual shape, or the associated social identity, may be impacted by the same attentional process at different levels of representation. Indeed, our findings suggest that both social and perceptual salience effects may result from the same attentional system, albeit at different levels of representation. This is further supported by the additivity of perceptual and social salience effects and the negative correlation of processing facilitations between perceptually and socially salient cues. These results may reflect a trade-off in how attentional resources are distributed between either perceptually or socially salient stimuli.

      Reviewer #2 (Public review):

      Summary:

      The main aim of this research was to explore whether and how self-associations (as opposed to other associations) bias early attentional selection, and whether this can explain well-known self-prioritization phenomena, such as the self-advantage in perceptual matching tasks. The authors adopted the Visual Attention Theory (VAT) by estimating VAT parameters using a hierarchical Bayesian model from the field of attention and applied it to investigate the mechanisms underlying self-prioritization. They also discussed the constraints on the self-prioritization effect in attentional selection. The key conclusions reported were:

      (1) Self-association enhances both attentional weights and processing capacity

      (2) Self-prioritization in attentional selection occurs automatically but diminishes when active social decoding is required, and

      (3) Social and perceptual salience capture attention through distinct mechanisms.

      Strengths:

      Transferring the Theory of Visual Attention parameters estimated by a hierarchical Bayesian model to investigate self-prioritization in attentional selection was a smart approach. This method provides a valuable tool for accessing the very early stages of self-processing, i.e., attention selection. The authors conclude that self-associations can bias visual attention by enhancing both attentional weights and processing capacity and that this process occurs automatically. These findings offer new insights into self-prioritization from the perspective of the early stage of attentional selection.

      Thank you for your valuable feedback and helpful suggestions. Please see specific answers below.

      Weaknesses:

      (1) The results are not convincing enough to definitively support their conclusions. This is due to inconsistent findings (e.g., the model selection suggested condition-specific c parameters, but the increase in processing capacity was only slight; the correlations between attentional selection bias and SPE were inconsistent across experiments), unexpected results (e.g., when examining the impact of social association on processing rates, the other-associated stimuli were processed faster after social association, while the self-associated stimuli were processed more slowly), and weak correlations between attentional bias and behavioral SPE, which were reported without any p-value corrections. Additionally, the reasons why the attentional bias of self-association occurs automatically but disappears during active social decoding remain difficult to explain. It is also possible that the self-association with shapes was not strong enough to demonstrate attention bias, rather than the automatic processes as the authors suggest. Although these inconsistencies and unexpected results were discussed, all were post hoc explanations. To convince readers, empirical evidence is needed to support these unexpected findings.

      Thank you for outlining the specific points that raise your concern. We were happy to address these points as follows:

      a. Replications and Consistency: In our study, we consistently observed trends (relative reduction in processing speed of the self-associated stimulus) in the social salience conditions across experiments. While Experiment 2 demonstrated a significant reduction in processing rate towards self-stimuli, there was a notable trend in Experiment 1 as well.

      b. Condition-specific parameters: The condition-specific C parameters, though presenting a small effect size, significantly improved model fit. Inspecting the HDI ranges of our estimated C parameters indicates a high probability (85-89%) that processing capacity increased due to social associations, suggesting that even small changes (~2Hz) can hold meaningful implications within the context attentional selection.

      Please also note that the main conclusions about relative salience (self/other, salient/non-salient) are based on the relative processing rates. Processing rates are the product of the processing capacity (condition- but not stimulus dependent) and the attentional weight (condition and stimulus dependent). The latter is crucial to judge the *relative* advantage of the salient stimulus. Hence, the self-/salient stimulus advantage that is reflected in the ‘processing rate difference’ is automatically also reflected in the relative attentional weights attributed to the self/other and salient/non-salient stimuli. As such, the overall results of an automatic relative advantage of self-associated stimuli hold, independently of the change in overall processing capacity.

      c. Correlations: Regarding the correlations the reviewer noted, we wish to clarify that these were exploratory, and not the primary focus of our research. The aim of these exploratory analyses was to gauge the contribution of attentional selection to matching-based SPEs. As SPEs measured via the matching task are typically based on multiple different levels of processing, the contribution of early attentional selection to their overall magnitude was unclear. Without being able to gauge the possible effect sizes, corrected analyses may prevent detecting small but meaningful effects. As such, the effect sizes reported serve future studies to estimate power a priori and conduct well-powered replications of such exploratory effects. Additionally, Bayes factors were provided to give an appreciation of the strength of the evidence, all suggesting at least moderate evidence in favour of a correlation. Lastly, please note that effects that were measured within individuals and task (processing rate increase in social and perceptual decision dimensions in the TOJ task) showed consistent patterns, suggesting that the modulations within tasks were highly predictive of each other, while the modulations between tasks were not as clearly linked. We will add this clarification to the revised manuscript.

      d. Unexpected results: The unexpected results concerning the processing rates of other-associated versus self-associated stimuli certainly warrant further discussion. We believe that the additional processing steps required for social judgments, reflected in enhanced reaction times, may explain the slower processing of self-associated stimuli in that dimension. We agree that not all findings will align with initial hypotheses, and this variability presents avenues for further research. We have added this to the discussion of social salience effects.

      e. Whether association strength can account for the findings: We appreciate the scepticism regarding the strength of self-association with shapes. However, our within-participant design and control matching task indicate that the relative processing advantage for self-associated stimuli holds across conditions. This makes the scenario that “the self-association with shapes was not strong enough to demonstrate attention bias” very unlikely. Firstly, the relative processing advantage of self-associated stimuli in the perceptual decision condition, and the absence of such advantage in the social decision condition, were evidenced in the same participants. Hence, the strength of association between shapes and social identities was the same for both conditions. However, we only find an advantage for the self-associated shape when participants make perceptual (shape) judgements. It is therefore highly unlikely that the “association strength” can account for the difference in the outcomes between the conditions in experiment 1. Also, note that the order in which these conditions were presented was counter-balanced across participants, reducing the possibility that the automatic self-advantage was merely a result of learning or fatigue. Secondly, all participants completed the standard matching task to ascertain that the association between shapes and identities did indeed lead to processing advantages (across different levels).

      In summary, we believe that the evidence we provide supports the final conclusions. We do, of course, welcome any further empirical evidence that could enhance our understanding of the contribution of different processing levels to the SPE and are committed to exploring these areas in future work.

      (2) The generalization of the findings needs further examination. The current results seem to rely heavily on the perceptual matching task. Whether this attentional selection mechanism of self-prioritization can be generalized to other stimuli, such as self-name, self-face, or other domains of self-association advantages, remains to be tested. In other words, more converging evidence is needed.

      The reviewer indicates that the current findings heavily rely on the perceptual matching task, and it would be more convincing to include other paradigm(s) and different types of stimuli. We are happy to address these points here: first, we specifically used a temporal order paradigm to tap into specific processes, rather than merely relying on the matching task. Attentional selection is, along with other processes, involved in matching, but the TOJ-TVA approach allows tapping into attentional selection specifically.  Second, self-prioritization effects have been replicated across a wide range of stimuli (e.g. faces: Wozniak et al., 2018; names or owned objects: Scheller & Sui, 2022a, or even fully unfamiliar stimuli: Wozniak & Knoblich, 2019) and paradigms (e.g. matching task: Sui et al., 2012; cross-modal cue integration: e.g. Scheller & Sui, 2022b; Scheller et al., 2023; continuous flash suppression: Macrae et al., 2017; temporal order judgment: Constable et al., 2019; Truong et al., 2017). Using neutral geometric shapes, rather than faces and names, addresses a key challenge in self research: mitigating the influence of stimulus familiarity on results. In addition, these newly learned, simple stimuli can be combined with other paradigms, such as the TOJ paradigm in the current study, to investigate the broader impact of self-processing on perception and cognition.

      To the best of our knowledge, this is the first study showing evidence about the mechanisms that are involved in early attentional selection of socially salient stimuli. Future replications and extensions would certainly be useful, as with any experimental paradigm.

      (3) The comparison between the "social" and "perceptual" tasks remains debatable, as it is challenging to equate the levels of social salience and perceptual salience. In addition, these two tasks differ not only in terms of social decoding processes but also in other aspects such as task difficulty. Whether the observed differences between the tasks can definitively suggest the specificity of social decoding, as the authors claim, needs further confirmation.

      Equating the levels of social and perceptual salience is indeed challenging, but not an aim of the present study. Instead, the present study directly compares the mechanisms and effects of social and perceptual salience, specifically experiment 2. By manipulating perceptual salience (relative colour) and social salience (relative shape association) independently and jointly, and quantifying the effects on processing rates, our study allows to directly delineate the contributions of each of these types of salience. The results suggest additive effects (see also Figure 7). Indeed, the possibility remains that these effects are additive because of the use of different perceptual features, so it would be helpful for future studies to explore whether similar perceptual features lead to (supra-/sub-) additive effects. In either case, the study design allows to directly compare the effects and mechanisms of social and perceptual salience.

      Regarding the social and perceptual decision dimensions, they were not expected to be equated. Indeed, the social decision dimension requires additional retrieval of the associated identity, making it likely more challenging. This additional retrieval is also likely responsible for the slower responses towards the social association compared to the shape itself. However, the motivation to compare the effects of these two decisional dimensions lies in the assumption that the self needs to be task relevant. Some evidence suggests that the self needs to be task-relevant to induce self-prioritization effects (e.g., Woźniak & Knoblich, 2022). However, these studies typically used matching tasks and were powered to detect large effects only (e.g. f = 0.4, n = 18). As it is likely that lacking contribution of decisional processing levels (which interact with task-relevance) will reduce the SPE, smaller self-prioritization effects that result from earlier processing levels may not be detected with sufficient statistical power. Targeting specific processing levels, especially those with relatively early contributions or small effect sizes, requires larger samples (here: n = 70) to provide sufficient power. Indeed, by contrasting the relative attentional selection effects in the present study we find that the self does not need to be task-relevant to produce self-prioritization effects. This is in line with recent findings of prior entry of self-faces (Jubile & Kumar, 2021)

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Liu et al., present an immersion objective adapter design called RIM-Deep, which can be utilized for enhancing axial resolution and reducing spherical aberrations during inverted confocal microscopy of thick cleared tissue.

      Strengths:

      RI mismatches present a significant challenge to deep tissue imaging, and developing a robust immersion method is valuable in preventing losses in resolution. Liu et al., present data showing that RIM-Deep is suitable for tissue cleared with two different clearing techniques, demonstrating the adaptability and versatility of the approach.

      Greetings, we greatly appreciate your feedback. In truth, we have utilized three distinct clearing techniques, including iDISCO, CUBIC, and MACS, to substantiate the adaptability and multifunctionality of the RIM-Deep adapter.

      Weaknesses:

      Liu et al., claim to have developed a useful technique for deep tissue imaging, but in its current form, the paper does not provide sufficient evidence that their technique performs better than existing ones.

      We are in complete agreement with your recommendation, and the additional experiments will conduct a thorough comparison of the efficacy between the RIM-deep adapter and the official adapter in the context of fluorescence bead experiments, along with their performance in cubic and MASC tissue clearing techniques.

      Reviewer 2 (Public review):

      The authors used different clearing methods to demonstrate the suitability of RIM-Deep for various sample preparation protocols with clearing solutions of different refractive indices. They clearly demonstrate that the RIM-Deep chamber is compatible with all three methods. Brain samples are characterized by complex networks of cells and are often hard to visualize. Despite the dense, complex structure of brain tissue, the RIM-Deep method generated high-quality images of all three samples. As the authors stated, increasing imaging depth often goes hand in hand with purchasing expensive new equipment, exchanging several microscopy parts, or purchasing a new microscopy setup. Innovations like the RIM-Deep chamber might pave the way for cost-effective imaging and expand the applicability of inverted confocal microscopy.

      Weeknesses:

      (1) However, since this study introduces a novel imaging technique aiming to revolutionize imaging of large samples, additional control experiments would strengthen the data. From the three clearing protocols used (CUBIC, MACS, and iDISCO), only the brain section from Macaca fascicularis cleared with iDISCO was imaged with the standard chamber and the RIM-Deep method. This comparison indeed shows a more than 2-fold increase in imaging depth, a significant enhancement in microscopy. However, it would have been important to evaluate and show the imaging depth differences in the other two samples, as they were cleared with different protocols and treated with clearing solutions of different refractive indices compared to iDISCO.

      Thank you for your suggestion. We will investigate the imaging performance of brain tissue using the other two clearing protocols with both the official adapter and the RIM-deep method.

      (2) The description of the figures and figure panels should be improved for a better understanding of the experiments performed and the resulting images/data.

      Thank you for your suggestion. We will revise the figure legends in detail.

      (3) While the authors used a Nikon AX inverted laser scanning confocal microscope, the study would benefit from evaluating the performance of the RIM-Deep method using other inverted confocal microscopes or even wide-field microscopes.

      Thank you for your suggestion. We also recognize that evaluating the performance of the RIM-Deep method on other inverted confocal microscopes will help further validate its applicability and robustness. We will supplement these experiments to expand the scope and reliability of RIM-Deep.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In their manuscript, the authors propose a learning scheme to enable spiking neurons to learn the appearance probability of inputs to the network. To this end, the neurons rely on error-based plasticity rules for feedforward and recurrent connections. The authors show that this enables the networks to spontaneously sample assembly activations according to the occurrence probability of the input patterns they respond to. They also show that the learning scheme could explain biases in decision-making, as observed in monkey experiments. While the task of neural sampling has been solved before in other models, the novelty here is the proposal that the main drivers of sampling are within-assembly connections, and not between-assembly (Markov chains) connections as in previous models. This could provide a new understanding of how spontaneous activity in the cortex is shaped by synaptic plasticity. 

      The manuscript is well written and the results are presented in a clear and understandable way. The main results are convincing, concerning the spontaneous firing rate dependence of assemblies on input probability, as well as the replication of biases in the decision-making experiment. Nevertheless, the manuscript and model leave open several important questions. The main problem is the unclarity, both in theory and intuitively, of how the sampling exactly works. This also makes it difficult to assess the claims of novelty the authors make, as it is not clear how their work relates to previous models of neural sampling. 

      We agree with the reviewer that our previous manuscript was not clear regarding the mechanism of the model. We have performed additional simulations and included a derivation of the learning rule to address this, which we explain below.

      Regarding the unclarity of the sampling mechanism, the authors state that withinassembly excitatory connections are responsible for activating the neurons according to stimulus probability. However, the intuition for this process is not made clear anywhere in the manuscript. How do the recurrent connections lead to the observed effect of sampling? How exactly do assemblies form from feedforward plasticity? This intuitive unclarity is accompanied by a lack of formal justification for the plasticity rules. The authors refer to a previous publication from the same lab, but it is difficult to connect these previous results and derivations to the current manuscript. The manuscript should include a clear derivation of the learning rules, as well as an (ideally formal) intuition of how this leads to the sampling dynamics in the simulation. 

      We have included a derivation of our plasticity rules in lines 871-919 in the revised manuscript. Consistent with our claim that predictive plasticity updates the feedforward and the recurrent synapses to predict output firing rates, we have shown that the corresponding cost function measures the discrepancy among the recurrent prediction, feedforward prediction, and the output firing rate. The resultant feedforward plasticity is the same with our previous rule (Asabuki and Fukai, 2020), which segments the salient patterns embedded in the input sequence. The recurrent plasticity rule suggests that the recurrent prediction learns the statistical model of the evoked activity, enabling the network to replay the learned internal model.  

      Similarly, for the inhibitory plasticity, we defined a cost function that evaluates the difference between the firing rate and inhibitory potential within each neuron. This rule is crucial for maintaining balanced network dynamics. See our response below for more details on the role of inhibitory plasticity.

      Some of the model details should furthermore be cleared up. First, recurrent connections transmit signals instantaneously, which is implausible. Is this required, would the network dynamics change significantly if, e.g., excitation arrives slightly delayed? Second, why is the homeostasis on h required for replay? The authors show that without it the probabilities of sampling are not matched, but it is not clear why, nor how homeostasis prevents this. Third, G and M have the same plasticity rule except for G being confined to positive values, but there is no formal justification given for this quite unusual rule. The authors should clearly justify (ideally formally) the introduction of these inhibitory weights G, which is also where the manuscript deviates from their previous 2020 work. My feeling is that inhibitory weights have to be constrained in the current model because they have a different goal (decorrelation, not prediction) and thus should operate with a completely different plasticity mechanism. The current manuscript doesn't address this, as there is no overall formal justification for the learning algorithm. 

      First, while the reviewer's suggestion to test with delayed excitation is intriguing and crucial for a more biologically detailed spiking neuron model, we have chosen to maintain the current model configuration. Our use of Poisson spiking neurons, which generate spikes based on instantaneous firing rates, does not heavily depend on precise spike timing information. Therefore, to preserve the simplicity of our results, we kept the model unchanged.

      Second, we agree that our previous claim regarding the importance of the memory trace h for sampling may have been confusing. As shown in Supplementary Figure 7b in the revised manuscript, when we eliminated the dynamics of the memory trace, sampling performance did indeed decrease. However, we also observed that the assembly activity ratio continued to show a linear relationship with stimulus probabilities. Based on these findings, we have revised our claim in the manuscript to clarify that the memory trace is primarily critical for firing rate homeostasis, rather than directly influencing sampling within the learned network. We have explained this in ll. 446-448 in the revised manuscript.

      Third, we explored a new architecture where all recurrent connections are either exclusively excitatory or inhibitory, keeping their sign throughout the learning process. This change addresses the reviewer's concern about our initial assumption that only the inhibitory connection G was constrained to non-negative values. We found that inhibition plays a crucial role in decorrelation and prediction, helping activate specific assemblies through competition while preventing runaway excitation within active assemblies. We have explained this in ll.560-593 in the revised manuscript.

      Finally, the authors should make the relation to previous models of sampling and error-based plasticity more clear. Since there is no formal derivation of the sampling dynamics, it is difficult to assess how they differ exactly from previous (Markov-based) approaches, which should be made more precise. Especially, it would be important to have concrete (ideally experimentally testable) predictions on how these two ideas differ. As a side note, especially in the introduction (line 90), this unclarity about the sampling made it difficult to understand the contrast to Markovian transition models. 

      As the reviewer pointed out, previous computational models have demonstrated that recurrent networks with Hebbian-like plasticity can learn appropriate Markovian statistics (Kappel et al., 2014; Asabuki and Clopath, 2024). However, our model differs conceptually from these previous models. While Kappel et al. showed that STDP in winner-take-all circuits can approximate online learning of hidden Markov models (HMMs), a key difference with our model is that their neural representations acquire sequences using Markovian sampling dynamics, whereas our model does not depend on such ordered sampling. Specifically, in their model, sequential sampling arises from learned structures in the off-diagonal elements of the recurrent connections (i.e., between-assembly connections). In contrast, our network learns to stochastically generate recurrent cell assemblies by relying solely on within-assembly connections. A similar argument can be made for Asabuki and Clopath paper as well. Further, while our model introduced plasticity rule for all types of connections, Asabuki and Clopath paper introduced plasticity for recurrent synapses projecting on the excitatory neurons only and the cell assembly memberships were preconfigured unlike our model. We have added additional clarifying sentences in ll. 757-772 of the revised manuscript to elaborate on this point.

      There are also several related models that have not been mentioned and should be discussed. In 663 ff. the authors discuss the contributions of their model which they claim are novel, but in Kappel et al (STDP Installs in Winner-Take-All Circuits an Online Approximation to Hidden Markov Model Learning) similar elements seem to exist as well, and the difference should be clarified. There is also a range of other models with lateral inhibition that make use of error-based plasticity (most recently reviewed in Mikulasch et al, Where is the error? Hierarchical predictive coding through dendritic error computation), and it should be discussed how the proposed model differs from these. 

      We have clarified the difference from previously proposed recurrent network model to perform Markovian sampling. Please see our reply above.

      We have also included additional sentence in ll. 704-709 in the revised manuscript to discuss how our model differs from similar predictive learning models: “It should be noted that while several network models that perform errorbased computations like ours exploit only inhibitory recurrent plasticity (Mikulasch et al., 2021; Mackwood et al., 2021; Hertäg and Clopath., 2022; Mikulasch et al., 2023), our model learns the structured spontaneous activity to reproduce the evoked statistics by modifying both excitatory and inhibitory recurrent connections.”

      Reviewer #2 (Public Review):

      Summary: 

      The paper considers a recurrent network with neurons driven by external input. During the external stimulation predictive synaptic plasticity adapts the forward and recurrent weights. It is shown that after the presentation of constant stimuli, the network spontaneously samples the states imposed by these stimuli. The probability of sampling stimulus x^(i) is proportional to the relative frequency of presenting stimulus x^(i) among all stimuli i=1,..., 5. 

      Methods: 

      Neuronal dynamics: 

      For the main simulation (Figure 3), the network had 500 neurons, and 5 nonoverlapping stimuli with each activating 100 different neurons where presented. The voltage u of the neurons is driven by the forward weights W via input rates x, the inhibitory recurrent weights G, are restricted to have non-negative weights (Dale's law), and the other recurrent weights M had no sign-restrictions. Neurons were spiking with an instantaneous Poisson firing rate, and each spike-triggered an exponentially decaying postsynaptic voltage deflection. Neglecting time constants of the postsynaptic responses, the expected postsynaptic voltage reads (in vectorial form) as 

      u = W x + (M - G) f (Eq. 5) 

      where f =; phi(u) represents the instantaneous Poisson rate, and phi a sigmoidal nonlinearity. The rate f is only an approximation (symbolized by =;) of phi(u) since an additional regularization variable h enters (taken up in Point 4 below). The initialisation of W and M is Gaussian with mean 0 and variance 1/sqrt(N), N the number of neurons in the network. The initial entries of G are all set to 1/sqrt(N). 

      Predictive synaptic plasticity: 

      The 3 types of synapses were each adapted so that they individually predict the postsynaptic firing rate f, in matrix form 

      ΔW ≈ (f - phi( W x ) ) x^T 

      ΔM ≈ (f - phi( M f ) ) f^T 

      ΔG ≈ (f - phi( M f ) ) f^T but confined to non-negative values of G (Dale's law). 

      The ^T tells us to take the transpose, and the ≈ again refers to the fact that the ϕ entering in the learning rule is not exactly the ϕ determining the rate, only up to the regularization (see Point 4). 

      Main formal result: 

      As the authors explain, the forward weight W and the unconstrained weight M develop such that, in expectations, 

      f =; phi( W x ) =; phi( M f ) =; phi( G f ) , 

      consistent with the above plasticity rules. Some elements of M remain negative. In this final state, the network displays the behaviour as explained in the summary. 

      Major issues: 

      Point 1: Conceptual inconsistency 

      The main results seem to arise from unilaterally applying Dale's law only to the inhibitory recurrent synapses G, but not to the excitatory recurrent synapses M. 

      In fact, if the same non-negativity restriction were also imposed on M (as it is on G), then their learning rules would become identical, likely leading to M=G. But in this case, the network becomes purely forward, u = W x, and no spontaneous recall would arise. Of course, this should be checked in simulations. 

      Because Dale's law was only applied to G, however, M and G cannot become equal, and the remaining differences seem to cause the effect. 

      Predictive learning rules are certainly powerful, and it is reasonable to consider the same type of error-correcting predictive learning rule, for instance for different dendritic branches that both should predict the somatic activity. Or one may postulate the same type of error-correcting predictive plasticity for inhibitory and excitatory synapses, but then the presynaptic neurons should not be identical, as it is assumed here. Both these types of error-correcting and error-forming learning rules for same-branches and inhibitory/excitatory inputs have been considered already (but with inhibitory input being itself restricted to local input, for instance). 

      The model presented above lacked biological plausibility in several key aspects. Specifically, we assumed that the recurrent connection M could change sign through plasticity and be either excitatory or inhibitory, while the inhibitory connection G was restricted to being inhibitory only. This initial setting does not reflect the biological constraint that synapses typically maintain a consistent excitatory or inhibitory type. Furthermore, due to this unconstrained recurrent connectivity M, the original model had two types of inhibitory connections (i.e., the negative part of M and the inhibitory connection G) without providing a clear computational role for each type of inhibition.

      To address these limitations and to understand the role of the two types of inhibition, we explored a new architecture where all recurrent connections are either exclusively excitatory or inhibitory, keeping their sign throughout the learning process. This change addresses the reviewer's concern about our initial assumption that only the inhibitory connection G was constrained to non-negative values. We found that inhibition plays a crucial role in prediction and decorrelation, helping activate specific assemblies through competition while preventing runaway excitation within active assemblies. We have explained this in ll. 561593 in the revised manuscript.

      Point 2: Main result as an artefact of an inconsistently applied Dale's law? 

      The main result shows that the probability of a spontaneous recall for the 5 nonoverlapping stimuli is proportional to the relative time the stimulus was presented. This is roughly explained as follows: each stimulus pushes the activity from 0 up towards f =; phi( W x ) by the learning rule (roughly). Because the mean weights W are initialized to 0, a stimulus that is presented longer will have more time to push W up so that positive firing rates are reached (assuming x is non-negative). The recurrent weights M learn to reproduce these firing rates too, while the plasticity in G tries to prevent that (by its negative sign, but with the restriction to non-negative values). Stimuli that are presented more often, on average, will have more time to reach the positive target and hence will form a stronger and wider attractor. In spontaneous recall, the size of the attractor reflects the time of the stimulus presentation. This mechanism so far is fine, but the only problem is that it is based on restricting G, but not M, to non-negative values. 

      As mentioned above, we have included an additional simulation where all weights are non-negative. We have demonstrated the new results in Figure 6 before presenting the two-population model in the revised manuscript (Figure 7), so that readers can follow the importance of two pathways of inhibitory connections.

      Point 3: Comparison of rates between stimulation and recall. 

      The firing rates with external stimulations will be considerably larger than during replay (unless the rates are saturated). 

      This is a prediction that should be tested in simulations. In fact, since the voltage roughly reads as  u = W x + (M - G) f,  and the learning rules are such that eventually M =; G, the recurrences roughly cancel and the voltage is mainly driven by the external input x. In the state of spontaneous activity without external drive, one has  u = (M - G) f ,  and this should generate considerably smaller instantaneous rates f =; phi(u) than in the case of the feedforward drive (unless f is in both cases at the upper or lower ceiling of phi). This is a prediction that can also be tested. 

      Because the figures mostly show activity ratios or normalized activities, it was not possible for me to check this hypothesis with the current figures. So please show non-normalized activities for comparing stimulation and recall for the same patterns. 

      We agree with the reviewer that the activity levels of spontaneous and induced activity should be compared. We have shown the distributions of activity level of these activities in our new Figure 2d. As expected, we found that the evoked activity showed stronger activity compared to the spontaneous activity.  

      Point 4: Unclear definition of the variable h. 

      The formal definition of h = hi is given by (suppressing here the neuron index i and the h-index of tau) 

      tau dh/dt = -h if h>u, (Eq. 10)  h = u otherwise. 

      But if it is only Equation 10 (nothing else is said), h will always become equal to u, or will vanish, i.e. either h=u or h=0 after some initial transient. In fact, as soon as h>u, h is decaying to 0 according to the first line. If u is >0, then it stops at u=h according to the second line. No reason to change h=u further. If u<=0 while h>u, then h is converging to 0 according to the first line and will stay there. I guess the authors had issues with the recurrent spiking simulations and tried to fix this with some regularization. However as presented, it does not become clear how their regulation works. 

      We apologize for the reviewer that our definition of h was unclear. As the reviewer pointed out, since the memory trace is always positive and larger than (or equal to) the membrane potential, it is possible that the membrane potential becomes always negative and the memory trace reach to 0 constantly. However, since the network is always balanced between excitatory and inhibitory inputs, and it does not happen that the membrane potential always diverges negatively. In fact, we trained without any manipulations other than the memory trace described in the manuscript, and the network was able to learn the assembly structure stably. 

      BTW: In Eq. 11 the authors set the gain beta to beta = beta0/h which could become infinite and, putatively more problematic, negative, depending on the value of h. Maybe some remark would convince a reader that no issues emerge from this. 

      We have mentioned in ll. 864-866 in the revised manuscript that no issues emerge from the slope parameter.

      Added from discussions with the editor and the other reviewers: 

      Thanks for alerting me to this Supplementary Figure 8. Yes, it looks like the authors did apply there Dale's law for both the excitatory and inhibitory synapses. Yet, they also introduced two types of inhibitory pathways converging both to the excitatory and inhibitory neurons. For me, this is a confirmation that applying Dale's law to both excitatory and inhibitory synapses, with identical learning rules as explained in the main part of the paper, does not work. 

      Adding such two pathways is a strong change from the original model as introduced before, and based on which all the Figures in the main text are based. Supplementary Figure 8 should come with an analysis of why a single inhibitory pathway does not work. I guess I gave the reason in my Points 1-3. Some form of symmetry breaking between the recurrent excitation and recurrent inhibition is required so that, eventually, the recurrent excitatory connection will dominate. 

      Making the inhibitory plasticity less expressive by applying Dale's law to only those inhibitory synapses seems to be the answer chosen in the Figures of the main text (but then the criticism of unilaterally applying Dale's law). 

      Applying Dale's law to both types of synapses, but dividing the labor of inhibition into two strictly separate and asymmetric pathways, and hence asymmetric development of excitatory and inhibitory weights, seems to be another option. However, introducing such two separate inhibitory pathways, just to rescue the fact that Dale's law is applied to both types of synapses, is a bold assumption. Is there some biological evidence of such two pathways in the inhibitory, but not the excitatory connections? And what is the computational reasoning to have such a separation, apart from some form of symmetry breaking between excitation and inhibition? I guess, simpler solutions could be found, for instance by breaking the symmetry between the plasticity rules for the excitatory and inhibitory neurons. All these questions, in my view, need to be addressed to give some insights into why the simulations do work. 

      The reviewer’s intuition is correct. To effectively learn cell assembly structures and replay their activities, our model indeed requires two types of inhibitory connections. Please refer to our response above for further details. 

      Overall, Supplementary Figure 8 seems to me too important to be deferred to the Supplement. The reasoning behind the two inhibitory pathways should appear more prominently in the main text. Without this, important questions remain. For instance, when thinking in a rate-based framework, the two inhibitory pathways twice try to explain the somatic firing rate away. Doesn't this lead to a too strong inhibition? Can some steady state with a positive firing rate caused by the recurrence, in the absence of an external drive, be proven? The argument must include the separation into Path 1 and Path 2. So far, this reasoning has not been entered. 

      In fact, it might be that, in a spiking implementation, some sparse spikes will survive. I wonder whether at least some of these spikes survive because of the other rescuing construction with the dynamic variable h (Equation 10, which is not transparent, and that is not taken up in the reasoning either, see my Point 4)

      Perhaps it is helpful for the authors to add this text in the reply to them. 

      We have moved the former Supplemental Figure 8 to the main Figure 7. Please see our response above about the role of dual inhibitory connection types.

      Reviewer #3 (Public Review): 

      Summary: 

      The work shows how learned assembly structure and its influence on replay during spontaneous activity can reflect the statistics of stimulus input. In particular, stimuli that are more frequent during training elicit stronger wiring and more frequent activation during replay. Past works (Litwin-Kumar and Doiron, 2014; Zenke et al., 2015) have not addressed this specific question, as classic homeostatic mechanisms forced activity to be similar across all assemblies. Here, the authors use a dynamic gain and threshold mechanism to circumnavigate this issue and link this mechanism to cellular monitoring of membrane potential history. 

      Strengths: 

      (1) This is an interesting advance, and the authors link this to experimental work in sensory learning in environments with non-uniform stimulus probabilities. 

      (2) The authors consider their mechanism in a variety of models of increasing complexity (simple stimuli, complex stimuli; ignoring Dale's law, incorporating Dale's law). 

      (3) Links a cellular mechanism of internal gain control (their variable h) to assembly formation and the non-uniformity of spontaneous replay activity. Offers a promise of relating cellular and synaptic plasticity mechanisms under a common goal of assembly formation. 

      Weaknesses: 

      (1) However, while the manuscript does show that assembly wiring does follow stimulus likelihood, it is not clear how the assembly-specific statistics of h reflect these likelihoods. I find this to be a key issue. 

      We agree that our previous claim regarding the importance of the memory trace h for sampling may have been confusing. As shown in Supplementary Figure 7b, when we eliminated the dynamics of the memory trace, sampling performance did indeed decrease. However, we also observed that the assembly activity ratio continued to show a linear relationship with stimulus probabilities. Based on these findings, we revised our claim in the manuscript to clarify that the memory trace is primarily critical for learning to avoid trivial solutions, rather than directly influencing sampling within the learned network. We have explained this in ll. 446-448 in the revised manuscript.

      (2) The authors' model does take advantage of the sigmoidal transfer function, and after learning an assembly is either fully active or nearly fully silent (Figure 2a). This somewhat artificial saturation may be the reason that classic homeostasis is not required since runaway activity is not as damaging to network activity. 

      The reviewer's intuition is correct. The saturating nonlinearity is important for the network to form stable assembly structures. We have added an additional sentence in ll. 866-868 to mention this.

      (3) Classic mechanisms of homeostatic regulation (synaptic scaling, inhibitory plasticity) try to ensure that firing rates match a target rate (on average). If the target rate is the same for all neurons then having elevated firing rates for one assembly compared to others during spontaneous activity would be difficult. If these homeostatic mechanisms were incorporated, how would they permit the elevated firing rates for assemblies that represent more likely stimuli? 

      LIF neurons) may solve this problem by utilizing spike-timing statistics.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Minor issues: 

      Figure 1: It would be helpful to display the equation for output rate here as well. 

      We have included the equation in the revised Figure 1a.

      Figure 3c: Typo "indivisual neurons". 

      We have modified the typo. We thank the reviewer for their careful review.

      Line 325: Do you mean Figure 3f,g? 

      We repeated the task with different numbers of stimuli in Supplementary Figure 1c,d.

      Line 398: Winner-take-all can be misunderstood, as it typically stands for competition in inference, not in learning. 

      We have rephrased it as “unstable dynamics” in l. 400

      Line 429: Are intra-assembly and within-assembly the same? If so please use consistent terminology. 

      We have made the terminology consistent.

      Line 792 ff.: Please mention that (t) was left away. 

      We have included a sentence to mention it in ll. 847-848 in the revised manuscript.

      Line 817: Should u_i be v_i? 

      We have modified the term.

      Methods: What is the value of tau_h? 

      We have used 𝜏! \=10 s, which is mentioned in l. 853

    1. Author response:

      We are appreciative of the editors’ and reviewers’ positive comments and constructive suggestions, which will help us to improve our manuscript. We will make changes as required by the reviewers. Our primary focus will be on revising and clarifying certain aspects:

      First, recent research has revealed a strong correlation between brain synchronization and group decision-making, a key neural marker. We aim to bolster our hypothesis by reviewing additional literature, ensuring accuracy in terminology and appropriateness in phrasing.

      Second, it is crucial to note that we will include additional methodological details, such as the details of the experiment, the significance of individual difference variables, and the details of the data analyses.

      Third, despite introducing a novel perspective in our study, we acknowledge the utilization of the conventional fNIRS hyperscanning analyses, which are widely accepted within the research community. Our methodology entails the identification of significant channels via one-sample t-tests, subsequently complemented by either ANOVAs or independent sample t-tests, without the need for double dipping.

      We will address all the issues raised by the reviewers.We believe that the manuscript will significantly benefit from the insightful suggestions and invaluable contributions made by the editors and reviewers.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      In the second round of reviews, Reviewer 2 made three specific comments. The first comment criticises us for not including a set of equations they had requested in their first review. We did, in fact, include the requested equations in our revised submission, which were in the Supplementary Information, and were also cited in the main text of our revised manuscript and our changes were made clear in our response to the reviewer. The second comment, the reviewer suggested adding one word to a sentence in the abstract. We have made this change (line 23). The third comment, the reviewer highlights a sentence where we agree we could have been more clear. The sentence can be rectified by adding one word to the current sentence, which we have done (line 232). We believe the changes required to our manuscript are very minor, and we have implemented these two suggested changes, which are highlighted in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Overall, this work is quite comprehensive and is logically and rigorously designed. The phenotypic and functional data on 2C are strong.

      Thank you for your positive feedback on our findings!

      (1) Comment from Reviewer 1 suggesting the mechanistic insights of 2C are primarily derived from transcriptomic and genomic datasets without experimental verification. 

      Thank you for emphasizing the importance of experimental validation to support our transcriptomic and genomic findings. We acknowledge the gap in direct experimental evidence for the mechanistic insights of section 2C and recognize the value of such validation in strengthening our conclusions. While we recognize the importance of such validation, our current dataset lacks the comprehensive preliminary results necessary for inclusion in the supplemental material. We believe that the mechanistic insights presented offer a substantial foundation for the future research, where we aim to explore these aspects in depth with targeted experimental approaches.

      Reviewer 2

      Together their data may suggest a regenerative effect of 2C both in vitro and in vivo settings. If confirmed, this study might unlock therapeutic strategy for cardiac regeneration.

      Thank you for your positive comment on the significance of our findings and the valuable therapeutic potential of 2C in cardiac regeneration!

      (1) Comment from Reviewer 2 pointing out the the main hypothesis (line 50) that Isl1 cells have regenerative properties is not extremely novel. 

      We agree with the reviewer that Isl1-positive cells possess regenerative properties. Following the reviewer’s suggestion, we have revised the original wording (line 46 in the revised manuscript).

      (2) Comment from Reviewer 2 asking for providing a rationale for this 20x reduction of A-485 concentration? It would be useful to get a titration of this compound for the effects tested. 

      As suggested by the reviewer, we have added the titration results of A-485 in Figure 1—figure supplement 1F-G.

      (3) Comment from Reviewer 2 confusing to clearly understand what proportion of CMs dedifferentiate to become RCCs. The lineage tracing data suggests only 0.6%-1.5% of cells undergo this transition. It is difficult to understand how such a small fraction can have wide effects in their different experimental settings. This is specifically true when the author quantified nuclear and cytosolic area on brightfield pictures - would the same effect on nuclear/cytosolic area be observed in Isl1 KO cells. 

      We appreciate the reviewer's insightful observation on the proportion of CMs undergoing dedifferentiation into RCCs and the potential impact of this subset on our experimental outcomes. The lineage tracing data indicating that only 0.6%-1.5% of CMs transition to RCCs indeed reflects a modest proportion. This observation raises valid questions regarding the broader implications of such a limited fraction in the context of cardiac regeneration and the experimental effects reported. It's important to note that while the proportion of CMs dedifferentiating into RCCs is small, the biological significance and potential impact of these RCCs could be disproportionately large. Emerging evidence suggests that even a minimal number of stem or progenitor cells can exert significant effects on tissue repair and regeneration, possibly through paracrine mechanisms or by acting as key signaling centers within the tissue microenvironment (Fernandes et al., 2015). Regarding the specific question about 2C’s effects on nuclear/cytosolic area in Isl1 knockout (KO) cells, we appreciate the suggestion and consider that such comparative studies would provide valuable insights for future comprehensively understanding the significant impact of 2C-induced RCCs in future search. In addition, ISL1 KO cells are also described in detail in the article published in eLife in 2018 by Quaranta et al.

      (4) Comment from Reviewer 2 asking for the effect of CHIR + I-BET-762 alone. 

      As suggested by the reviewer, we have added the results of CHIR + T-BET-762 in Figure 1—figure supplement 1H.

      (5) Comment from Reviewer 2 suggesting a transparent explaination about the effects of A-485 on acetylation status.

      We thank the reviewer for highlighting the confusion regarding the effects of A-485 on the acetylation status of H3K27Ac and H3K9Ac. Upon re-examination of our data and statements, we recognize the need for clarity in our explanation and the inconsistency it may have caused (lines 223-231 on page 8).

      Initially, our observations suggested a selective effect of A-485 on H3K27Ac based on early experimental results (Figure 7—figure supplement 1). This conclusion was drawn from preliminary analyses that focused predominantly on this specific histone mark. However, upon further comprehensive examination of our data, including additional replicates and more sensitive detection methods, we observed that A-485 also impacts H3K9Ac levels (Figure 7—figure supplement 1F). This latter finding emerged from expanded datasets that were not initially considered in our preliminary conclusions.

      The "further analyses" mentioned referred to these subsequent experimental investigations, which included chromatin immunoprecipitation (ChIP) assays and extended sample sizes, providing a more robust dataset for evaluating the effects of A-485. We understand the importance of transparency and rigor in scientific communication. To address this, we have revised the manuscript to clearly delineate the progression of our analyses and the evidential basis for our revised understanding of A-485's effects. This includes a detailed description of the methodologies employed in our follow-up experiments (line 537 on page 27), the statistical approaches for data analysis (lines 226-227 in supporting information), and how these led to the updated interpretation regarding A-485's impact on histone acetylation (lines232-269).

      (6) Comment from Reviewer 2 asking for the difference in the ChIP peaks representation of the y-axis on the ChIP traces.

      Thank you for raising this quest. Actually, we did not normalise the sequencing depth and the y-axis represents the number of counts (line 537 on page 27 and lines 226-227 in supporting information).

      (7) Comment from Reviewer 2 suggesting the possibility of testing this 2C protocol on mESCs to see if similar changes are subject to and how these mouse RCCs differ transcriptionally from Isl1+ progenitor cells isolated from neonatal mice (P1-P5)?

      Thank you for your insightful questions. Testing the 2C protocol on mouse embryonic stem cells (mESCs) to observe if similar changes occur presents an excellent opportunity to further validate the versatility and applicability of our findings across different stem cell models. We agree that such experiments would not only strengthen the current study but also provide valuable insights into the conservation of mechanisms across species. We are currently in the process of setting up experiments to address this very question and anticipate that the results will significantly contribute to our understanding of cardiomyocyte differentiation processes. Regarding the transcriptional comparison between mouse regenerative cardiac cells (RCCs) induced by our 2C protocol and Isl1+ progenitors isolated from neonatal mice (P1-P5), this comparison is indeed crucial for delineating the specific identity and developmental potential of the RCCs generated. However, a comprehensive side-by-side transcriptomic analysis is required to systematically identify these differences and understand their biological implications. We plan to undertake this analysis as part of our future studies, which will include detailed RNA sequencing and comparative gene expression profiling to elucidate the transcriptional similarities and differences between these cell populations. These future directions will enhance our current findings, provide a deeper mechanistic understanding, and confirm the potential of the 2C protocol in regenerative medicine applications. We appreciate the reviewer's suggestions and acknowledge the importance of these experiments in advancing the field.

      (8) Comment from Reviewer 2 with a suggestion to have a precise clarification of statistics & data acquisition.

      As suggested by the reviewer, we have revised clarifications to make them clearer (lines 228-233 in supporting information and a precise description of each paragraph involving statistical analyses).

      Reviewer 3

      The findings may have a translation potential. The idea of promoting the regenerative capacity of the heart by reprogramming CMs into RCCs is interesting.

      Thank you for your appreciation of the significance and translational potential of our findings!

      (1) Comment from Reviewer 3 suggesting the mechanism involved in the 2C-mediated generation of RCCs is unclear and the lead found in the RAN-seq and ChIP-seq are not experimatally validated.

      We acknowledge the reviewer's concern regarding the lack of experimental validation for the mechanisms identified through RNA-seq and ChIP-seq analyses in the generation of RCCs from the 2C state. We understand the importance of substantiating these molecular leads with empirical data to strengthen our conclusions. Currently, our findings are based on in-depth bioinformatic analyses, which have provided us with valuable insights and a strong basis for hypothesis generation. Moving forward, we plan to prioritize experimental validation of key pathways and targets identified in our study. This will include designing targeted experiments to elucidate the functional roles of these mechanisms in the 2C-mediated generation of RCCs. We appreciate the opportunity to clarify our approach and future directions, and we are committed to addressing this gap in subsequent work.

      (2) Comment from Reviewer 3 considering the very low number of RCCs (0.6%-1.5% of cells) generated cannot protect the heart from MI, and whether 2C affects the the survival or metabolism of existing CM under hypoxia conditions, and what percentage of cells are regenerated by 2C treatment post-MI?

      We appreciate the reviewer's insightful queries regarding the protective effects of 2C treatment against myocardial infarction (MI) given the low percentage of RCCs generated. It is our hypothesis that the benefits of 2C treatment extend beyond mere cell numbers. We propose that 2C may enhance the survival and metabolic resilience of existing CMs under hypoxic conditions, thereby contributing to cardiac protection post-MI. Our future investigations will aim to quantify the precise percentage of cells regenerated by 2C treatment post-MI and explore its broader impacts on cardiac tissue survival and repair mechanisms.

      (3) Comment from Reviewer 3 suggesting the administration of 2C in mice, as well as whether 2C affects cardiac function under basal conditions and any physiology in mice, and the need to examine cardiac structural and functional parameters after administration of 2C.

      We appreciate the reviewer's interest in the potential effects of 2C administration on cardiac function and overall physiology in mice. While we observed a decrease in body weight at P5 compared to controls, our immunofluorescence staining did not indicate any changes in cardiac structure (Figure 4— figure supplement 1E). This suggests that while 2C administration impacts neonatal rat physiology, it does not adversely affect cardiac structure under basal conditions. Further investigations are planned to assess the functional parameters of the heart post-2C administration to comprehensively understand its effects.

      (4) Comment from Reviewer 3 suggesting the potential effects of 2C on other cell types of the heart, including fibroblasts and endothelial cells, in vitro and in vivo.

      We value the reviewer's suggestion to explore the effects of 2C on various cardiac cell types, including fibroblasts and endothelial cells, both in vitro and in vivo. We acknowledge the importance of understanding the broader impact of 2C treatment across different cell populations within the heart, given its potential protective effects. To address this, we are designing a series of experiments to assess 2C's influence on these cell types, aiming to elucidate any changes in their behavior, proliferation, and function following treatment. This comprehensive approach will allow us to better understand the mechanistic basis of 2C's cardioprotective effects.

      (5) Comment from Reviewer 3 suggesting validation the effect of 2C in a dose-dependent manner.

      As suggested by the reviewer, we have supplemented the effect of 2C in dose-dependent (Figure 1— figure supplement 1F-G).

      (6) Comment from Reviewer 3 suggesting an explanation of how A-485 affects H3K27Ac and H3K9Ac.

      We appreciate the reviewer pointing out the discrepancy regarding the effects of A-485 on H3K27Ac and H3K9Ac. Upon re-examination of our data, we realize that our initial interpretation may have overlooked the broader impact of A-485 on histone acetylation patterns. It appears that A-485 does indeed influence both H3K27Ac and H3K9Ac, contrary to our initial statement. This oversight will be corrected in our revised manuscript, where we will provide a more detailed analysis and discussion of A-485's impact on these histone marks, alongside an explanation for the observed effects (lines 223-269 across page 8-9).

      (7) Comment from Reviewer 3 with a correction to use "regeneration" at the screeing stage.

      As suggested by the reviewer, we have amended the wording in the text (line 66 on page 3).

      Reviewer 4

      Comment from Reviewer 4 suggesting more information that clarifies and justifies the hypothesis.

      As suggested by the reviewer, we added more information to clarify and justify the hypothesis (lines 39-47 on page 3).

      (1) Comment from Reviewer 4 pointing out the story line is not well developed.

      To address the reviewer’s question, we revised the manuscript to ensure a smooth and coherent logical flow.

      (2) Comment from Reviewer 4 pointing out the purpose in choosing to study ISL1-CMs.

      As raised by the reviewer, we have clarified the rationale for using ISL1 as a marker to define RCCs in revised manuscript (lines 39-47 on page 3).

      (3) Comment from Reviewer 4 pointing out the missing references in row 57-58.

      Thank you for pointing this out, we fixed it.

      (4) Comment from Reviewer 4 suggesting more explains and show the results of the screening compounds.

      As suggested by the reviewer, we added additional explanations in lines 65-73 and showed the screening results in Figure 1—figure supplement 1F-H.

      (5) Comment from Reviewer 4 suggesting an in-depth discussion of the findings.

      Thank you for the suggestion, we included additional discussion at the end of the article.

      (6) Comment from Reviewer 4 suggesting a conclusion should be inculded in the main text.

      Thank you for the suggestion, we made a revision.

      (7) Comment from Reviewer 4 pointing out the cell viability under different concentrations of 2C.

      As mentioned by the reviewer, have supplemented the cell numbers during different doses of 2C treatment (Figure 2F).

      (8) Comment from Reviewer 4 pointing out the missing information in the methods.

      Thank you for the suggestion, we made additions.

      (9) Comment from Reviewer 4 suggesting more explanations in Figure S3A.

      As mentioned by the reviewer, we made a revision in original Fig.S3A (now is Figure 2—figure supplement 1).

      (10) Comment from Reviewer 4 pointing out the high variability of mCherry cells (%) in Figure 3J.

      Thank you. We made a revision.

      (11) Comment from Reviewer 4 suggesting more explanations on the DNA-binding motif of ISL1 in the cells treated with A-485 or 2C.

      Thank you for the suggestion, we added additional explanations (lines 270-274 on page 9).

      (12) Comment from Reviewer 4 pointing out the unclear labeling in Figure S1B and D.

      Thank you for the suggestion, made a revision (lines 240-245 in supporting information).

      (13) Comment from Reviewer 4 suggesting a relative quantification of the proteins in Figure 1H.

      Thank you for the suggestion. We have quantified the relative expression levels of proteins in original Fig. 1H. As shown in Figure 1F.

      (14) Comment from Reviewer 4 suggesting to provide detailed information in the methodology part about the compounds.

      Thank you for the suggestion, we made a revision.

      (15) Comment from Reviewer 4 pointing out the insufficient explanations on figure legends.

      Thank you for the suggestion, we made a revision.

      (16) Comment from Reviewer 4 suggesting more independent experiments to reduce the high variations in “ns” between NC and 2C at 60h+3d shown in Figure 2E and F.

      Thank you for the suggestion, we made a revision in Figure 2F.

      (17) Comment from Reviewer 4 suggesting a limitations should be provided in the text.

      Thank you for the suggestion, we have made provide a limitation statement in the revised manuscript (lines 300-311 on page 10).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study investigates the impact of Clonal Hematopoiesis of Indeterminate Potential (CHIP) on Immune Checkpoint Inhibitor (ICI) therapy outcomes in NSCLC patients, analyzing blood samples from 100 patients pre- and post-ICI therapy for CHIP, and conducting single-cell RNA sequencing (scRNA-seq) of PBMCs in 63 samples, with validation in 180 more patients through whole exome sequencing. Findings show no significant CHIP influence on ICI response, but a higher CHIP prevalence in NSCLC compared to controls, and a notable CHIP burden in squamous cell carcinoma. Severely affected CHIP groups showed NF-kB pathway gene enrichment in myeloid clusters.

      Strengths:

      The study is commendable for analyzing a significant cohort of 100 patients for CHIP and utilizing scRNA-seq on 63 samples, showcasing the use of cutting-edge technology. The study tackles the vital clinical question of predicting ICI therapy outcomes in NSCLC.

      Weaknesses:

      The manuscript's comparison of CHIP prevalence between NSCLC patients and healthy controls could be strengthened by providing more detailed information on the control group. Specifically, details such as sex, smot king status, and comorbidities are needed to ensure the differences in CHIP are attributable to lung cancer rather than other factors. Including these details, along with a comparative analysis of demographics and comorbidities between both groups and clarifying how the control group was selected, would enhance the study's credibility and conclusions.

      Reviewer #2 (Public Review):

      Summary:

      The authors used a large cohort of patients with metastatic lung cancer pre- and 1-3 weeks post-immunotherapy. The goal was to investigate whether immunotherapy results in changes in CHIP clones (using targeted sequencing and whole exome sequencing) as well as to investigate whether patients with CHIP changed their response to immunotherapy (single-cell RNA sequencing).

      Strengths:

      This represents a large cohort of patients, and comprehensive assays - including targeted sequencing, whole exome sequencing, and single-cell RNA sequencing.

      Weaknesses:

      Findings are not necessarily unexpected. With regards to clonal dynamics, it would be very unlikely to see any changes within a few weeks' time frame. Longer follow-up to assess clonal dynamics would realistically be necessary.

      We truly appreciate constructive comments by the reviewers and editors. We agree with these comments and did our best to address them to improve the paper. Please see the following pages.

      Reviewer #1 (Recommendations For The Authors):

      Comment 1-1. In Figure 3B, the changes in frequency are challenging to discern. Consider employing connected line plots or another visual representation to enhance clarity and interpretation.

      Thank you for the suggestion. We modified Figure 3B to efficiently visualize the changes in cell proportion. Please note that the proportional changes in cell populations were not statistically significant by treatment, pathology, or clonal hematopoiesis (CH) severity.

      Comment 1-2. On page 13, Figure 3D is mentioned before Figure 3C. Please re-order to follow the correct sequence.

      We corrected the sequence of the figure and revised the text accordingly.

      Comment 1-3. Supplementary Figure 9 reveals an intriguing observation: the hypoxia and TNF signaling pathways appear to be regulated in opposite directions between CHIP-negative subjects and those with a Variant Allele Frequency (VAF) greater than 0.1. It would be insightful if the authors could delve into the potential implications or interpretations of this finding.

      We appreciate the reviewer's insightful comment. In the GSEA results presented in Supplementary Figure 9 and Figure 3C, we specifically focused on TNF signaling in monocytes and cDCs. Our subsequent analysis revealed that the adaptation of inflammatory signals is enriched in the myeloid cells in the CHIP-severe patients (Supplementary Fig. S12). Following the reviewer’s comment, we found that the leading-edge genes were shared between the TNF signaling and hypoxia pathways in most clusters (Supplementary Fig. S15). Suggested core genes, such as FOS, DUSP1, JUN, and PPP1R15A, play critical roles in the inflammatory phenotypes of myeloid lineages. Based on this finding, we added a paragraph in the Discussion section to address the implications of these shared signatures as follows (lines 340-348):

      “Our GSEA results specifically indicated the enrichment of TNF signaling and hypoxia pathways in most clusters of patients with severe CH (Supplementary Fig. S9). The leading-edge genes from GSEA results showed core genes such as FOS, DUSP1, JUN, and PPP1R15A, which are known to play critical roles in the inflammatory phenotypes of immune cells, were shared between the TNF signaling and hypoxia pathways in all significant clusters. (Supplementary Fig. S15). Furthermore, gene regulatory network analysis using SCENIC indicated a higher enrichment of inflammatory signatures in myeloid lineages (Supplementary Fig. S9), highlighting the pronounced inflammatory phenotype of CH clones in these cell lineages.”

      Comment 1-4. The plots in Supplementary Figure 12 would benefit from enlargement to improve legibility and facilitate a better understanding of the data presented.

      We improved resolutions and enlarged Supplementary Figure S12.

      Reviewer #2 (Recommendations For The Authors):

      Comment 2-1. The authors state that CHIP is seen at a higher prevalence in the metastatic lung (44/100) vs controls (5/42), however, no in-depth info other than age is given about the normal cohort (Table S2). It would be important to make sure the cohorts are matched with regards to smoking hx, age range, etc before making the claim that CHIP is more frequent in the metastatic lung cancer group.

      Thank you for the comment. To provide additional information of control cohort including current smoking habits and their sex information, we added columns in Table S2. While we tried to match the age distributions between the control group without a history of solid cancer and the lung cancer cohort, we observed that the lung cancer cohort had slightly older ages (mean ages: 58.9 vs. 64.1 years), a higher prevalence of smoking (current smokers: 11/42 vs. 37/100), and a higher proportion of males (male/female: 18/24 vs. 91/9).  Age and smoking are well-known epidemiological contributors to lung cancer and could influence the prevalence of clonal hematopoiesis (CH).

      However, previous studies have reported similar prevalence rates of CH in NSCLC patients, which aligned with our findings (Bolten et al., 2020 Nat Genet; Hong et al., 2022 Cancer Res). Moreover, our most prevalent CH mutations, including DNMT3A, TET2, and PPM1D, were marginally affected by smoking, and this trend has been consistently observed in healthy populations (Levin et al., 2022 Sci Rep). We have acknowledged these factors as major limitations of our study in the Discussion section as follows (lines 379-390):

      “Also, the distinct characteristics of our cohort can be confounders for our results. Compared to control patients, our cohort was biased toward slightly older ages, higher prevalence of smoking habits, and with a higher proportion of males (mean age: 64.1 vs. 58.9; current smokers: 37/100 vs. 11/42; male/female: 91/9 vs. 18/24 Supplementary Figures S1 and S3). However, previous studies have reported similar prevalence rates of clonal hematopoiesis in NSCLC patients, aligned with our findings (9,51). Moreover, our most prevalent CH mutations, including DNMT3A, TET2, and PPM1D, were marginally affected by smoking, and this trend has been consistently observed in both healthy populations and NSCLC patients (10,51,52).”

      Comment 2-2. Figure 1 - 1A states there were 100 CHIP and CHIP-PD mutations identified, but in 1B, C, and D there are < 100 bars and/or dots shown. How were the mutations in 1A then triaged to be shown in 1B-D?

      It appears that our poor annotation caused this misunderstanding. In Figure 1A, we showed the number of samples in each study group but did not provide detailed information in the legend. We found 67 mutations among the 100 patients and presented the mutational statistics in Figures 1B–D. Accordingly, we have revised the Figure 1 legend to clarify this sentence “The numbers indicate sample counts in each group.” (lines 426-427).

      Comment 2-3. Table S4 - would be helpful to have # of variant reads and # of total reads as columns (and also calculate VAF for an additional column).

      Thank you for the comment. We added columns revealing the total number of reads and the number of variant reads in Table S4. Also, we calculated the VAF and included it as a new column as suggested by the reviewer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The present work from Velloso and collaborators investigated the transcription profiles of resident and recruited hypothalamic microglia. They found sex-dependent differences between males and females and identified the protective role of chemokine receptor CXCR3 against diet-induced obesity.

      Strengths:

      (1) Novelty;

      (2) Relevance, since this work provides evidence about a subset of recruited microglia that has a protective effect against DIO. This provides a new concept in hypothalamic inflammation and obesity.

      Weaknesses:

      (1) Lack of mechanistic insight into the sex-dependent effects;

      (2) Analysis of indirect calorimetry data requires more depth;

      (3) A deeper analysis of hypothalamic inflammation and ER stress pathways would strengthen the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This study by Mendes et al provides novel key insights into the role of chemotaxis and immune cell recruitment into the hypothalamus in the development of diet-induced obesity. Specifically, the authors reveal that although transcriptional changes in hypothalamic resident microglia following exposure to high-fat feeding are minor, there are compelling transcriptomic differences between resident microglia and microglia recruited to the hypothalamus, and these are sexually dimorphic. Using independent loss-of-function studies, the authors also demonstrate an important role of CXCR3 and hypothalamic CXCL10 in the hypothalamic recruitment of CCR2+ positive cells on metabolism following exposure to high-fat diet-feeding in mice. This manuscript puts forth conceptually novel evidence that inhibition of chemotaxis-mediated immune cell recruitment accelerates body weight gain in high-fat diet-feeding, suggesting that a subset of microglia that express CXCR3 may confer protective, anti-obesogenic effects.

      Strengths:

      The work is exciting and relevant given the prevalence of obesity and the consequences of inflammation in the brain on perturbations of energy metabolism and ensuant metabolic diseases. Hypothalamic inflammation is associated with disrupted energy balance, and activated microglia within the hypothalamus resulting from excessive caloric intake and saturated fatty acids are often thought to be mediators of impairment of hypothalamic regulation of metabolism. The present work reports a novel notion in which immune cells recruited into the hypothalamus that express chemokine receptor CXCR3 may have a protective role against diet-induced obesity. In vivo studies reported herein demonstrate that inhibition of CXCR3 exacerbates high-fat diet-induced body weight gain, increases circulating triglycerides and fasting glucose levels, worsens glucose tolerance, and increases the expression of orexigenic neuropeptides, at least in female mice.

      This work provides a highly interesting and needed overview of preclinical and clinical brain inflammation, which is relevant to readers with an interest in metabolism and immunometabolism in the context of obesity.

      Using flow cytometry, cell sorting, and transcriptomics including RNA-sequencing, the manuscript provides novel insights into transcriptional landscapes of resident and recruited microglia in the hypothalamus. Importantly, sex differences are investigated.

      Overall, the manuscript is perceived to be highly interesting, relevant, and timely. The discussion is thoughtful, well-articulated, and a pleasure to read and felt to be of interest to a broad audience.

      Weaknesses:

      There were no major weaknesses perceived. Some comments for potential textual additions to the results/discussion are listed in recommendations to authors.

      Comments from the authors regarding the evaluation of the article: We publicly express our gratitude for the work of both Reviewers. The comments were timely and constructive and guided us toward preparing a new version of the article which contains novel data that strengthened the overall quality of the study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Experiments with ovariectomized female mice with (and without) estrogen replacement would help to address the physiological basis of the observed sexdependent effects.

      We performed an experiment with female C57BL/6J Unib, subdivided into Sham, OVX, and OVX+EST groups, which were exposed to HFD for 4 weeks. We monitored the weekly evolution of body weight and food intake. At the end of the protocol, the animals fasted for 4 hours. Then, we measured fasting blood glucose and estradiol; and extracted tissues (hypothalamus and

      WAT). In the hypothalamus samples, we evaluated, by RT-qPCR, the expression of chemokines, chemokine receptors, and some pro-inflammatory cytokines and neuropeptides. We evaluated the body mass relative WAT weight. The new results are presented in Supplementary Figure 1.

      Indirect calorimetric analysis of energy expenditure will benefit from ANCOVA analysis using body weight as a covariate. Moreover, locomotor activity should be also controlled.

      All statistical analysis regarding energy expenditure is corrected by body mass, thus, there is no need for ANCOVA, we clarified this in the text. The determination of locomotor activity is now included in Supplementary Figures 2 and 3. 

      A deeper analysis of hypothalamic inflammation and ER stress pathways would strengthen the manuscript.

      We performed new experiments to determine the expression of hypothalamic inflammation and ER stress pathaways. This is shown in Suppl. Fig. 2 and 3. 

      Mechanistic inhibition of CXCR3 was performed by CXCL10 immunoneutralization and CXCR3 antagonism. Those approaches are correct and well-performed, however considering the experience of the group in hypothalamic studies, I miss a virogenetic-based knockdown. Do the authors have any data on that?

      This is indeed a great point. Unfortunately, we did not succeed in obtaining mice Cre lineages that would be needed for the proposed experiments. We included this as a weakness of the study. 

      Reviewer #2 (Recommendations For The Authors):

      There are a few typographical errors for correction:

      -  Page 4, line 157: CCL10 to CXCL10.

      -  Page 6, line 226: makers to markers.

      -  Page 7, lines 283 and 287, Figure 6C: INF to IFN.

      All errors were corrected, as recommended. 

      Parts of the manuscript may be difficult for readers without knowledge of transcriptomics to interpret; thus, further description of several of the figures (e.g. Figure 3 and 4) may be helpful.

      We expanded the text in Results to clarify this issue.

      Could the authors comment on the choice of peripheral administration of CXCR3 antagonist as opposed to central (e.g. icv) administration? Indeed, systemic inhibition of CXCR3 produced significant alterations in body weight gain and glucose tolerance in female mice given high-fat diets and reduced CCR2 and CXCR3 immunostaining in the hypothalamus. Could changes to peripheral (e.g. WAT, liver) immune responses to the diet underlie the metabolic changes observed?

      CXCR3+ cells are present in very small numbers in the hypothalamus under basal conditions. In HFD, these are recruited from the periphery to the CNS, so, we believe ICV treatment with AMG487 would not reduce recruitment to the hypothalamic parenchyma. With the same animals in which we performed the locomotor activity, we performed RT-qPCR of WAT and liver and analyzed the expression of genes involved in lipid and glucose metabolism. This is now in Supplementary Figures 2 and 3. We included a comment in the text to explain our rationale for this approach.

      Besides hypothalamic mRNA levels of chemokines and chemokine receptors, does systemic CXCR3 antagonism affect other aspects linked to diet-induced impairments of hypothalamic regulation of energy homeostasis, like inflammation, ER stress and/or mitochondrial dynamics/function? It would be interesting to reveal the consequence of reduced CCR2+ microglial migration to the hypothalamus with chronic high-fat diet exposure.

      We performed new experiments shown in Supplementary Figures 2 and 3 to deal with these important questions. In the hypothalamus of females there were no changes in the expression of transcripts encoding proteins involved in endoplasmic reticulum homeostasis and mitochondrial turnover, whereas in males there was a reduction of Ddit3 and Mfn1. Moreover, in females the inhibition of CXCR3 promoted no changes in the liver expression of lipidogenic and gluconeogenic genes, and no changes in the white adipose tissue expression of lipidogenic genes. In the liver of males, there was a reduction in the expression of Fasn and an increase in the expression of G6pc3. As for the females, in males, there were no changes in the white adipose tissue expression of lipidogenic genes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weaknesses:

      There are some minor weaknesses.

      Comment 1:Notably, there are not a lot of new insights coming from this paper. The structural comparisons between MCC and PCC have already been described in the literature and there were not a lot of significant changes (outside of the exo- to endo- transition) in the presence vs. absence of substrate analogues.

      We agree that the structures of the human MCC and PCC holoenzymes are similar to their bacterial homologs. That is due to the conserved sequences and functions of MCC and PCC across different species.

      Comment 2: There is not a great deal of depth of analysis in the discussion. For example, no new insights were gained with respect to the factors contributing to substrate selectivity (the factors contributing to selectivity for propionyl-CoA vs. acetyl-CoA in PCC). The authors state that the longer acyl group in propionyl-CoA may mediate stronger hydrophobic interactions that stabilize the alpha carbon of the acyl group at the proper position. This is not a particularly deep analysis and doesn't really require a cryo-EM structure to invoke. The authors did not take the opportunity to describe the specific interactions that may be responsible for the stronger hydrophobic interaction nor do they offer any plausible explanation for how these might account for an astounding difference in the selectivity for propionyl-CoA vs. acetyl-CoA. This suggests, perhaps, that these structures do not yet fully capture the proper conformational states.

      We appreciate this comment. Unfortunately, in the cryo-EM maps of the PCC holoenzymes, the acyl groups were not resolved (fig. S6), so we were unable to analyze the specific interactions between the acyl-CoAs and PCC. We have revised the manuscript and acknowledged this limitation in the second paragraph of the discussion section: 

      “In the cryo-EM maps of the PCC holoenzymes, the acyl groups of acetyl-CoA and propionylCoA were not resolved (fig. S6), limiting the analysis of the interactions between the acyl groups and PCC. Nevertheless, the PCC-PCO and PCC-ACO structures determined in our study demonstrate that the conformations of the acyl-CoA binding pockets in the two structures are almost identical (Fig. 3F, fig. S7, B and C). In addition, the well resolved CoA groups of propionyl-CoA and acetyl-CoA bind at the same position in human PCC holoenzyme (Fig. 3F). These findings indicate that propionyl-CoA and acetyl-CoA bind to PCC with a similar binding mode.”

      Comment 3: The authors also need to be careful with their over-interpretation of structure to invoke mechanisms of conformational change. A snapshot of the starting state (apo) and final state (ligand-bound) is insufficient to conclude *how* the enzyme transitioned between conformational states. I am constantly frustrated by structural reports in the biotin-dependent enzymes that invoke "induced conformational changes" with absolutely no experimental evidence to support such statements. Conformational changes that accompany ligand binding may occur through an induced conformational change or through conformational selection and structural snapshots of the starting point and the end point cannot offer any valid insight into which of these mechanisms is at play.

      Point accepted. We have revised our manuscript to use conformational differences instead of conformational changes to describe the differences between the apo and ligand-bound states (see the last paragraph of the introduction section and the third paragraph of the discussion section).

      Reviewer #2 (Public Review):

      Comments and questions to the manuscripts:

      Comment 1: I'm quite impressed with the protein purification and structure determination, but I think some functional characterization of the purified proteins should be included in the manuscript. The activity of enzymes should be the foundation of all structures and other speculations based on structures.

      We appreciate this comment. However, since we purified the endogenous BDCs and the sample we obtained was a mixture of four BDCs, the enzymatic activity of this mixture cannot accurately reflect the catalytic activity of PCC or MCC holoenzyme. We have revised the manuscript and acknowledged this limitation in the first paragraph of the results section: 

      “We did not characterize the enzyme activities of the mixed BDCs because the current methods used to evaluate the carboxylase activities of BDCs, such as measuring the ATP hydrolysis or incorporation of radio-labeled CO2, are unable to differentiate the specific carboxylase activity of each BDC.”

      Comment 2: In Figure 1B, the structure of MCC is shown as two layers of beta units and two layers of alpha units, while there is only one layer of alpha units resolved in the density maps. I suggest the authors show the structures resolved based on the density maps and show the complete structure with the docked layer in the supplementary figure.

      We appreciate this comment. We have shown the cryo-EM maps of the PCC and MCC holoenzymes in fig. S8 to indicate the unresolved regions in these structures. The BC domains in one layer of MCCα in the MCC-apo structure were not resolved. However, we think it would be better to show a complete structure in Fig. 1 to provide an overall view of the MCC holoenzyme. We have revised Fig. 1B and the figure legend to clearly point out which domains were not resolved in the cryo-EM map and were built in the structure through docking. We have also revised the main text to clearly describe which parts of the holoenzymes were not resolved in the cryo-EM maps and how the complete structures were built.

      Comment 3: In the introduction, I suggest the author provide more information about the previous studies about the structure and reaction mechanisms of BDCs, what is the knowledge gap, and what problem you will resolve with a higher resolution structure. For example, you mentioned in line 52 that G437 and A438 are catalytic residues, are these residues reported as catalytic residues or this is based on your structures? Has the catalytic mechanism been reported before? Has the role of biotin in catalytic reactions revealed in previous studies?

      Point accepted. It was reported that G419 and A420 in Streptomyces coelicolor PCC, corresponding to G437 and A438 in human PCCβ, were the catalytic residues for the secondstep carboxylation reaction (PMID: 15518551). The same study also reported the catalytic mechanism of the carboxyl transfer reaction. The role of biotin in the BDC-catalyzed carboxylation reactions has been extensively studied (PMIDs: 22869039, 28683917). We have revised the manuscript to introduce the catalytic mechanisms of BDCs elucidated through the investigation of prokaryotic BDCs in the fourth paragraph of the introduction section. 

      Comment 4: In the discussion, the authors indicate that the movement of biotin could be related to the recognition of acyl-CoA in BDCs, however, they didn't observe a change in the propionyl-CoA bound MCC structure, which is contradictory to their speculation. What could be the explanation for the exception in the MCC structure?

      We appreciate this comment. We do not have a good explanation for why we did not observe a change in the propionyl-CoA bound MCC structure. It is noteworthy that neither acetyl-CoA nor propionyl-CoA is the natural substrate of MCC. Recently, a cryo-EM structure of the human MCC holoenzyme in complex with its natural substrate, 3-methylcrotonyl-CoA, has been resolved (PDB code: 8J4Z). In this structure, the binding site of biotin and the conformation of the CT domain closely resemble that in our acetyl-CoA-bound MCC structure. Therefore, the movement of biotin induced by acetyl-CoA binding mimics that induced by the binding of MCC's natural substrate, 3-methylcrotonyl-CoA, indicating that in comparison with propionylCoA, acetyl-CoA is closer to 3-methylcrotonyl-CoA regarding its ability to bind to MCC. We have discussed this possibility in the last paragraph of the discussion section. We have also added a supplementary figure (fig. S11) to compare the structures of human MCC holoenzyme in complex with acetyl-CoA and 3-methylcrotonyl-CoA.

      Comment 5: In the discussion, the authors indicate that the selectivity of PCC to different acyl-CoA is determined by the recognition of the acyl chain. However, there are no figures or descriptions about the recognition of the acyl chain by PCC and MCC. It will be more informative if they can show more details about substrate recognition in Figures 3 and 4.

      We appreciate this comment. Unfortunately, in the cryo-EM maps of the PCC holoenzymes, the acyl groups were not resolved (fig. S6), so we were unable to analyze the specific interactions between the acyl-CoAs and PCC. We have revised the manuscript and acknowledged this limitation in the second paragraph of the discussion section: 

      “In the cryo-EM maps of the PCC holoenzymes, the acyl groups of acetyl-CoA and propionylCoA were not resolved (fig. S6), limiting the analysis of the interactions between the acyl groups and PCC. Nevertheless, the PCC-PCO and PCC-ACO structures determined in our study demonstrate that the conformations of the acyl-CoA binding pockets in the two structures are almost identical (Fig. 3F, fig. S7, B and C). In addition, the well resolved CoA groups of propionyl-CoA and acetyl-CoA bind at the same position in human PCC holoenzyme (Fig. 3F). These findings indicate that propionyl-CoA and acetyl-CoA bind to PCC with a similar binding mode.”

      Comment 6: How are the solved structures compared with the latest Alphafold3 prediction?

      Since AlphaFold3 was not released when our manuscript was submitted, we did not compare the solved structures with the AlphaFold3 predictions. We have now carried out the predictions using Alphafold3. Due to the token limitation of the AlphaFold3 server, we can only include two α and six β subunits of human PCC or MCC in the prediction. The overall assembly patterns of the Alphafold3-predicted structures are similar to that of the cryo-EM structures. The RMSDs between PCCα, PCCβ, MCCα, and MCCβ in the apo cryo-EM structures and those in the AlphaFold3-predicted structures are 7.490 Å, 0.857 Å, 7.869 Å, and 1.845 Å, respectively. The PCCα and MCCα subunits adopt an open conformation in the cryo-EM structures but adopt a closed conformation in the AlphaFold-3 predicted structures, resulting in large RMSDs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      DMS-MaP is a sequencing-based method for assessing RNA folding by detecting methyl adducts on unpaired A and C residues created by treatment with dimethylsulfate (DMS). DMS also creates methyl adducts on the N7 position of G, which could be sensitive to tertiary interactions with that atom, but N7-methyl adducts cannot be detected directly by sequencing. In this work, the authors adopt a previously developed method for converting N7-methyl-G to an abasic site to make it detectable by sequencing and then show that the ability of DMS to form an N7-methyl-G adduct is sensitive to RNA structural context. In particular, they look at the G-quadruplex structure motif, which is dense with N7-G interactions, is biologically important, and lacks conclusive methods for in-cell structural analysis. 

      Strengths: 

      - The authors clearly show that established methods for detecting N7-methyl-G adducts can be used to detect those adducts from DMS and that the formation of those adducts is sensitive to structural context, particularly G-quadruplexes. 

      - The authors assess the N7-methyl-G signal through a wide range of useful probing analyses, including standard folding, adduct correlations, mutate-and-map, and single-read clustering. 

      - The authors show encouraging preliminary results toward the detection of G-quadruplexes in cells using their method. Reliable detection of RNA G-quadruplexes in cells is a major limitation for the field and this result could lead to a significant advance. 

      - Overall, the work shows convincingly that N7-methyl-G adducts from DMS provide valuable structural information and that established data analyses can be adapted to incorporate the information. 

      We thank the reviewer for their time and appreciate the reviewer for their positive assessment as well as for their suggestions which we have addressed below.

      Weaknesses: 

      - Most of the validation work is done on the spinach aptamer and it is the only RNA tested that has a known 3D structure. Although it is a useful model for validating this method, it does not provide a comprehensive view of what results to expect across varied RNA structures. 

      Thank you for your insightful comments. We agree that a more comprehensive view of BASH MaP involves probing a larger variety of RNAs with known 3D-structures beyond Spinach and the poly-UG RNA. Although outside the scope of this publication, more work is needed to reveal the determinants of N7G reactivity to DMS.

      - It's not clear from this work what the predictive power of BASH-MaP would be when trying to identify G-quadruplexes in RNA sequences of unknown structure. Although clusters of G's with low reactivity and correlated mutations seem to be a strong signal for G-quadruplexes, no effort was made to test a range of G-rich sequences that are known to form G-quadruplexes or not. Having this information would be critical for assessing the ability of BASH-MaP to identify G-quadruplexes in cells. 

      - Although the authors present interesting results from various types of analysis, they do not appear to have developed a mature analysis pipeline for the community to use. I would be inclined to develop my own pipeline if I were to use this method. 

      Thank you for your suggestion. We have more clearly annotated the python scripts and GitHub repository which contain all custom scripts used for analyzing BASH MaP data. These changes will enable researchers to more easily utilize our developed pipelines.

      - There are various aspects of the DAGGER analysis that don't make sense to me: <br /> (1) Folding of the RNA based on individual reads does not represent single-molecule folding since each read contains only a small fraction of the possible adducts that could have formed on that molecule. As a result, each fold will largely be driven by the naive folding algorithm. I recommend a method like DREEM that clusters reads into profiles representing different conformations. 

      (2) How reliable is it to force open clusters of low-reactivity G's across RNA's that don't already have known G-quadruplexes? 

      (3) By forcing a G-quadruplex open it will be treated as a loop by the folding algorithm, so the energetics won't be accurate. 

      (4) It's not clear how signals on "normal" G's are treated. In Figure 5C some are wiped to 0 but others are kept as 1. 

      Thank you for your keen observations regarding the conceptual frameworks utilized in DAGGER. We have included a complimentary analysis to DAGGER utilizing Spinach BASH MaP data with DANCE, an algorithm which shares an underlying architecture with DREEM, and found that DANCE analysis gave similar results to those found with DAGGER. However, we have not benchmarked DAGGER’s performance on a range of RNAs and compared the results with expectation-maximization algorithms like DREEM and DANCE.

      To minimize the effects of artificially creating loops with tertiary folding constraints, we utilized the RNA folding algorithm CONTRAfold which relies less on direct energetic calculations than other commonly used RNA folding algorithms such as RNAstructure.

      We have updated the main text to more clearly indicate how DAGGER handles signals at G’s in a range of conditions. The main text now better clarifies the specific logic used for determining which G’s contain either a 0 or a 1 in the bitvector encoding used in DAGGER analysis.

      Reviewer #2 (Public Review): 

      Summary: 

      The manuscript introduces BASH MaP and DAGGER, innovative tools for analyzing RNA tertiary structures, specifically focusing on the G-quadruplexes. Traditional methods have struggled to detect and analyze these structures due to their reliance on interactions on the Hoogsteen face of guanine, which are not readily observable through conventional probing that targets Watson-Crick interactions. BASH MaP employs dimethyl sulfate and potassium borohydride to enhance the detection of N7-methylguanosine by converting it into an abasic site, thereby enabling its identification through misincorporation during reverse transcription. This method provides higher precision in identifying G-quadruplexes and offers deeper insights into RNA's structural dynamics and alternative conformations in both vitro and cellular contexts. Overall, the study is well-executed, demonstrating robust signal detection of N7-Gs with some compelling positive controls, thorough analysis, and beautifully presented figures. 

      Strengths: 

      The manuscript introduces a new method to detect G-quadruplexes (G-qs) that simplifies and potentially enhances the robustness and quantification compared to previous methods relying on reverse transcription truncations. The authors provide a strong positive control, demonstrating a 70% misincorporation at endogenous N7-G within the 18S rRNA, which illustrates BASH MaP's high signal-to-noise ratio. The data concerning the detection of positive control G-qs is particularly compelling. 

      Weaknesses: 

      Figure 3E shows considerable variability in the correlations among guanosines, suggesting that the methods may struggle with specificity in determining guanosine participation within and between different quadruplexes. There is no estimation of the methods false positive discovery rate.

      Thank you for your positive assessment and for your time to come up with suggestions to improve this publication. We have addressed your specific comments in the “Recommendations For The Authors” section below.

      Reviewer #3 (Public Review): 

      Summary: 

      In this study, the authors aim to develop an experimental/computational pipeline to assess the modification status of an RNA following treatment with dimethylsulfate (DMS). Building upon the more common DMS Map method, which predominantly assesses the modification status of the Watson-Crick-Franklin face of A's and C's, the authors insert a chemical processing step in the workflow prior to deep sequencing that enables detection of methylation at the N7 position of guanosine residues. This approach, termed BASH MaP, provides a more complete assessment of the true modification status of an RNA following DMS treatment and this new information provides a powerful set of constraints for assessing the secondary structure and conformational state of an RNA. In developing this work, the authors use Spinach as a model RNA. Spinach is a fluorogenic RNA that binds and activates the fluorescence of a small molecule ligand. Crystal structures of this RNA with ligand bound show that it contains a G-quadruplex motif. In applying BASH MaP to Spinach, the authors also perform the more standard DMS MaP for comparison. They show that the BASH MaP workflow appears to retain the information yielded by DMS MaP while providing new information about guanosine modifications. In Spinach, the G-quadruplex G's have the least reactive N7 positions, consistent with the engagement of N7 in hydrogen bonding interactions at G's involved in quadruplex formation. Moreover, because the inclusion of data corresponding to G increases the number of misincorporations per transcript, BASH MaP is more amenable to analysis of co-occurring misincorporations through statistical analysis, especially in combination with site-specific mutations. These co-occurring misincorporations provide information regarding what nucleotides are structurally coupled within an RNA conformation. By deploying a likelihood-ratio statistical test on BASH MaP data, the authors can identify Gs in G-quadruplexes, deconvolute G-G correlation networks, base-triple interactions and even stacking interactions. Further, the authors develop a pipeline to use the BASH MaP-derived G-modification data to assist in the prediction of RNA secondary structure and identify alternative conformations adopted by a particular RNA. This seems to help with the prediction of secondary structure for Spinach RNA. 

      Strengths: 

      The BASH Map procedure and downstream data analysis pipeline more fully identify the complement of methylations to be identified from the DMS treatment of RNA, thereby enriching the information content. This in turn allows for more robust computational/statistical analysis, which likely will lead to more accurate structure predictions. This seems to be the case for the Spinach RNA. 

      Weaknesses: 

      The authors demonstrate that their method can detect G-quadruplexes in Spinach and some other RNAs both in vitro and in cells. However, the performance of BASH MaP and associated computational analysis in the context of other RNAs remains to be determined. 

      We thank the reviewer for their time spent analyzing this manuscript, for their positive assessment and for their suggestions on improving this publication. We have addressed your specific comments in the “Recommendations For The Authors” section below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Although the text is clear and coherent, the overall flow of the manuscript comes across as "here's a bunch of stuff I tried." Maybe you're looking to get this out quickly, but it would have been much more impactful (and enjoyable to read) a description of a more polished final product. 

      Thank you for your highlighting the strengths and weaknesses of this manuscript. We have changed parts of the main text to enhance the overall flow of the manuscript and increase reader enjoyability.

      Reviewer #2 (Recommendations For The Authors): 

      I have only a few comments: 

      Major: 

      (1) Analysis of Guanosine Correlations in Figure 3E: In Figure 3E, there is a lot of variability in the correlations among guanosines. For example, G46 shows a strong correlation with G93 (within the same quadruplex) but also correlates with G91, G95 (in different quadruplexes), and G97 (not part of any quadruplex as per the model in Figure 3C). Contrarily, G86 exhibits weak correlations, and G50 along with G89 shows no significant correlations. These findings imply that BASH MaP followed by RING MaP analysis struggles to accurately distinguish between guanosines within the same or different quadruplexes in Spinach. Perhaps there are some opportunities to enhance the specificity in determining guanosine participation within quadruples, a great point for the authors to discuss. 

      Thank you for your comments and careful analysis of the pattern of correlations produced by BASH MaP. We agree that BASH MaP followed by RING MaP analysis is unable to unambiguously distinguish between guanosines within the same or different quadruplex layers. This finding was a surprise as we initially assumed that quadruplex layers would behave in a manner like Watson-Crick base pairs and produce specific signals in the corresponding RING MaP heatmaps.  We suspect that this may be due to mutations in specific G’s being associated with altered conformations which allow other G’s to form different interactions that affect DMS reactivity.  This may be unique to the highly complex structure in Spinach.  However, we think BASH-MaP clearly provides signals that point to key residues within the G-quadruplex, even if it does not clearly identify all of them.

      This idea is supported by experiments described in Figure 4, which show that mutation of a single guanosine residue causes a complete breakdown of the hydrogen-bonding network throughout all quadruplex layers. Additionally, DMS methylation of an N7G in a quadruplex is likely to disrupt base stacking interactions in and around the quadruplex. The compounding effects of a dynamic G-quadruplex and DMS-induced changes to local base stacking properties explains both the strong correlations with G97, which is base-stacked with the quadruplex, and the inability to specifically identify the guanosines which comprise specific quadruplex quartets. We have further emphasized this point in an updated discussion section.

      (2) Potential Consolidation of Figures 3 and 4: Figure 4 appears quite similar to Figure 3 but employs M2-seq instead of relying on spontaneous mutations. It might be beneficial to merge these figures to demonstrate that M2-seq can more effectively identify correlations between guanosines in quadruplexes. 

      We agree that Figures 3 and 4 appear quite similar but there is an important distinction to be made between RING MaP and M2-seq analysis. We suspect that the mechanism causing correlations between guanosines in quadruplexes for RING MaP as “RNA breathing” in contrast to the spontaneous T7 RNA polymerase-induced mutation model proposed in Cheng et al. PNAS 2017, https://doi.org/10.1073/pnas.1619897114. To determine whether correlations between guanosines in Spinach BASH MaP experiments rely on spontaneous mutations, we compared the fraction of reads containing misincorporations at pairs of quadruplex guanosines over a range of DMS concentrations.  The spontaneous mutation model predicts a linear dependence between quadruplex guanosine signals and DMS dose while an “RNA breathing” or double-DMS hit model predicts a quadratic dependence on DMS dose (Cheng et al. PNAS 2017, https://doi.org/10.1073/pnas.1619897114). Our data may support a quadratic dependence on DMS dose for multiple pairs of G-quadruplex guanosines, while they demonstrate a linear dependence between helical G’s (Supplementary Data Fig. 9). Together, these data suggest that BASH MaP followed by RING MaP analysis detects double-DMS modification events for pairs of quadruplex guanosines. Therefore, BASH MaP and RING MaP analysis provide a complimentary approach to M2 BASH MaP and reveal guanosine correlations in contexts where pre-installed mutations are incompatible such as the study of endogenously expressed RNAs.

      (3) Estimation of False Positive Rates: An estimation of the false positive rate for G-quadruplex identification would be invaluable. Since identification currently depends on the absence of DMS modification, it's important to consider how other factors like solvent inaccessibility or library generation might affect the detection and be misinterpreted as G-quadruplexes. Although this could be a subject of future work, some discussion by the authors would enhance the manuscript. 

      We have added a table summarizing sensitivity, positive predictive value, and false positive rate for different G-quadruplex identification schemes.  See Supplementary Table 1.

      Minor: 

      (4) Line 273 Reference Correction: Please adjust the reference in line 273 to accurately reflect that the G-quadruplex experiments compare potassium with lithium, not sodium. 

      In cellulo G-quadruplex reverse transcriptase (RT) stop assays as described by Guo and Bartel (https://www.science.org/doi/10.1126/science.aaf537) compared RT stops between DMS treated mRNA refolded in potassium and sodium buffers. We have clarified in the text that traditionally, G-quadruplex RT stop assays compare potassium with lithium.

      (5) Consistency in Figure 1 (Panels F and G): Aligning BASH MaP (170 mM DMS) as the y-axis in both panels F and G would visually align the data points and enhance the graphical coherence across these panels. 

      Thank you for noticing the subtleties in our data presentation and for the suggestion on how to improve our graphical coherence across panels. We specifically choose not to align BASH MaP (170 mM DMS) as the y-axis for panels F and G because we did not want the reader to mistakenly assume that the data for BASH MaP (170 mM DMS) presented in panels F and G is the same data. In panel F, BASH MaP was performed under standard DMS probing buffer conditions which utilized a pH 7.5 bicine buffer. The purpose of panel F is to show the reproducibility of BASH MaP under various DMS concentrations. In panel G, BASH MaP was performed under DMS probing buffer conditions which promote the formation of m3U using a pH 8.3 bicine buffer. The purpose of panel G is to show that the borohydride treatment and depurination steps in BASH MaP do not react with DMS-derived m1A, m3C, and m3U in a manner which prevents their measurement through cDNA misincorporation. Together, these experimental differences cause the data points for BASH MaP (170 mM DMS) to vary between panels F and G which would lead to more confusion for the reader and detract from the intended message we are trying to convey through panels F and G. 

      (6) Statistical Detail in Figure 1E: Incorporating a confidence interval or a P-value in Figure 1E would enrich the statistical depth and provide readers with a clearer understanding of the data's significance. 

      Thank you for the suggestion of including a p-value in Figure 1E to provide the readers with a clearer understanding of the data’s significance. The effect of combining DMS treatment and borohydride reduction on the misincorporation rate of G’s in Spinach is so dramatic that the raw data sufficiently provides the readers a clear understanding of its significance.

      (7) Reevaluation of Figure 2B: Considering the small number of Gs in single-stranded regions and base triples, it might be more informative to move Figure 2B to supplementary information. Focusing on Figure 2C, which consolidates non-quadruplex categories, could provide more impactful insights. 

      Thank you for your suggestion. It is important to initially provide an overall characterization of N7G DMS reactivity for G’s in a variety of structural contexts before more specifically looking at G-quadruplexes. Panel B is an important part of figure 2 for the following two reasons:

      First, a reader’s first question upon seeing the N7G chemical reactivity for Spinach as showed in Figure 2A is likely to ask whether base-paired G’s and single-stranded G’s have similar or different DMS reactivities. Figure 2, panel B shows that generally, single-stranded G’s appear to have higher DMS reactivity than base-paired G’s except for 2 G’s which display hyper-reactivity. The basis for this hyper-reactivity is addressed in Figure 4.

      Second, panel B highlights the wide range in N7G DMS reactivities. Since the G-quadruplex G’s display a dramatically lower DMS reactivity as compared to single-stranded G’s and hyper-reactive base-paired G’s, the dynamic range of DMS reactivities was difficult to capture in a single panel. Panel C does not convey these dynamics appropriately as a stand-alone figure.

      (8) Enhancements to Figure 2G: Improving the visibility of mutation rates in this figure would help. Suggestions include coloring bars by nucleotide type for intuitive visual comparison and adjusting the y-axis to a logarithmic scale to better represent near-zero mutation rates. Additionally, employing histograms or box plots could directly compare DMS reactivities and provide a clearer analysis. 

      Thank you for your suggestions on enhancing the presentation of BASH MaP applied to an mRNA. The main purpose of figure 2G was to validate whether BASH MaP could detect G’s engaged in a G-quadruplex in a cell. In-cell G-quadruplex folding measurements as performed by Guo and Bartel (https://www.science.org/doi/10.1126/science.aaf537) only identified a few G-quadruplexes which were folded and only the 3’ end of the G-quadruplex was detected. We therefore reasoned that the 3’ most G’s of these select set of G-quadruplexes were the only validated G’s engaged in a G-quadruplex in cells. In the instance of the AKT2 mRNA, Guo and Bartel found that 4 G’s appeared to be folded in a G-quadruplex in cells (Supplementary figure 2E). These G’s are indicated at the bottom of the plot with black bars and the label “In-cell G-quadruplex guanosines”. Therefore, we hypothesized that these G’s would display low DMS reactivity with BASH MaP while other G’s in the AKT2 mRNA would display higher chemical reactivities. We followed a standard convention in displaying chemical reactivities used extensively in the field where black bars indicate low reactivity, yellow bars indicate moderate reactivity, and red bars indicate high reactivity. The data in Fig 2G directly supports Guo and Bartel’s prediction of an in-cell folded G-quadruplex in the AKT2 mRNA because the 4 G’s predicted to be engaged in a G-quadruplex all displayed near zero DMS reactivities.

      We agree that adjusting the y-axis to a logarithmic scale would better represent near-zero mutations rates. However, the purpose of figure 2G is not to compare all positions with near-zero mutation rates. Instead, our use of standard conventions in displaying chemical reactivities is sufficient for the purpose of displaying BASH MaP’s ability to validate in-cell G-quadruplex G’s.

      Later in the paper, we go a step further and create a better criterion than simple N7G DMS reactivity for identifying G’s engaged in a G-quadruplex. For further analysis of G’s with near zero DMS reactivities, see Figure 3 and Supplementary figure 4 which utilizes RING Mapper to identify lowly-reactive G’s which produce co-occurring misincorporations.

      (9) Scale Consistency in Figure 3: Ensuring that the correlation scales are uniform across Panels A, B, D, and E would facilitate easier comparison of the data, enhancing the overall coherence of the findings. Using raw correlation values could also improve clarity and interpretation. 

      Thank you for the suggestions to facilitate easier comparisons of data in Figure 3. We have ensured the correlation scales are uniform across panels A, B, D, and E to enhance the coherence of these findings. We initially visualized the data in Figure 3 by plotting raw correlation values, but we found these values differed between DMS MaP and BASH MaP datasets, likely because of the low-level background mutations introduced by the borohydride reduction step of BASH (see Supplementary figure 3A). However, performing a global normalization of correlation strength values computed by RING mapper enabled clear comparisons between DMS MaP and BASH MaP RING heatmaps and revealed structural domains consistent with the crystal structure of Spinach.

      (10) Correction on Line 506: Please update the reference to M2 BASH MaP for accuracy. 

      Thank you. We have updated the main text to incorporate this comment.

      Reviewer #3 (Recommendations For The Authors): 

      The paper describes multiple applications and multiple methods of analysis of the BASH Map data, which collectively make the manuscript more difficult to follow. The manuscript would become more readable and user-friendly if there were some overview figures to describe the sequencing pipeline and the various computational workflows that the BASH MaP data are fed into (e.g. RING Mapper, DAGGER, M2 BASH MaP, Co-occurring Misincorporations, Secondary Structure Prediction). One or more summary schemes that provide an overview would strongly assist with the clarity and overall content of the paper. 

      Thank you for your suggestions. We have incorporated a summary scheme of the various computational workflows and their use cases in Fig 7.

      Line 165. Here, misincorporation rates for all four nucleotides are discussed, but m3U is not mentioned until from the following paragraph. It would be appropriate and clearer to mention this sooner. 

      Thank you for your suggestion. We have restructured this section to introduce the DMS modification m3U in an earlier paragraph to increase clarity for readers.

      Line 506: spelling of DAGGER. 

      Thank you. We have updated the main text to incorporate this comment.

      Line 645: I found this paragraph difficult to follow, especially the line starting 649. I thought the logic was to exclude G's involved in tertiary interactions from base-paring in the secondary structure prediction. Some clarification would be helpful. 

      Thank you for your comments. We have restructured the paragraph to emphasize that DAGGER only applies tertiary folding constraints to sequencing reads without misincorporations at G’s engaged in tertiary interactions. We reasoned that sequencing reads with a misincorporation at a G engaged in a tertiary interaction likely come from an RNA molecule which is in an alternative tertiary conformational state. In this specific circumstance, a tertiary folding constraint may impose incorrect restrictions on the folding of RNA molecules due to distinct tertiary conformations.

      Line 817. "Ability to". 

      Thank you. We have updated the main text to incorporate this comment.

      Figure 6F. Mistake in the axis description. 

      Thank you. We have updated the main text to incorporate this comment.

      Consider combining the paragraphs at lines 850 and 903. 

      Thank you for the suggestion. We rearranged paragraphs in the discussion to improve clarity.

      Line 1546. The final conc of DMS would be nice to see here.

      Thank you. We have updated the main text to incorporate this comment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Using a knock-out mutant strain, the authors tried to decipher the role of the last gene in the mycofactocin operon, mftG. They found that MftG was essential for growth in the presence of ethanol as the sole carbon source, but not for the metabolism of ethanol, evidenced by the equal production of acetaldehyde in the mutant and wild type strains when grown with ethanol (Fig 3). The phenotypic characterization of ΔmftG cells revealed a growth-arrest phenotype in ethanol, reminiscent of starvation conditions (Fig 4). Investigation of cofactor metabolism revealed that MftG was not required to maintain redox balance via NADH/NAD+, but was important for energy production (ATP) in ethanol. Since mycobacteria cannot grow via substrate-level phosphorylation alone, this pointed to a role of MftG in respiration during ethanol metabolism. The accumulation of reduced mycofactocin points to impaired cofactor cycling in the absence of MftG, which would impact the availability of reducing equivalents to feed into the electron transport chain for respiration (Fig 5). This was confirmed when looking at oxygen consumption in membrane preparations from the mutant and would type strains with reduced mycofactocin electron donors (Fig 7). The transcriptional analysis supported the starvation phenotype, as well as perturbations in energy metabolism, and may be beneficial if described prior to respiratory activity data.

      We thank the reviewer for their thorough evaluation of our work. We carefully considered whether transcriptional data should be presented before the respirometry data. However, this would disrupt other transitions and the flow of thoughts between sections, so that we prefer to keep the order of sections as is.

      While the data and conclusions do support the role of MftG in ethanol metabolism, the title of the publication may be misleading as the mutant was able to grow in the presence of other alcohols (Supp Fig S2).

      We agree that ethanol metabolism was the focus of this work and that phenotypes connected to other alcohols were less striking. We, therefore, changed “alcohol” to “ethanol” in the title of the manuscript.

      Furthermore, the authors propose that MftG could not be involved in acetate assimilation based on the detection of acetate in the supernatant and the ability to grow in the presence of acetate. The minimal amount of acetate detected in the supernatant but a comparative amount of acetaldehyde could point to disruption of an aldehyde dehydrogenase.

      We do not agree that MftG might be involved in acetaldehyde oxidation. According to our hypothesis, the disruption of an acetaldehyde dehydrogenase would lead to the accumulation of acetaldehyde. However, we observed an equal amount of acetaldehyde in cultures of M. smegmatis WT and ∆mftG grown on ethanol as well as on ethanol + glucose. Furthermore, the amount of acetate detected in the supernatants is not “minimal” as the reviewer points out but higher as or comparable to the acetaldehyde concentration (Figure 3 E and F, note that acetate concentration are indicated in g/L, acetaldehyde concentrations in µM). Furthermore, the accumulation of mycofactocinols in ∆mftG mutants grown on ethanol is not in agreement with the idea of MftG being an aldehyde dehydrogenase but very well supports our hypothesis that MftG is involved in cofactor reoxidation.

      The link between mycofactocin oxidation and respiration is shown, however the mutant has an intact respiratory chain in the presence of ethanol (oxygen consumption with NADH and succinate in Fig 7C) and the NADH/NAD+ ratios are comparable to growth in glucose. Could the lack of growth of the mutant in ethanol be linked to factors other than respiration?

      Indeed, by using NADH and succinate as electron donors we show that the respiratory chain is largely intact in WT and ∆mftG grown on ethanol. Also, when mycofactocinols were used as an electron donor, we observed that respiration was comparable to succinate respiration in the WT. However, respiration was severely hampered in membranes of ∆mftG when mycofactocinols were offered as reducing agent. These findings support our hypothesis very well that MftG is necessary to shuttle electrons from mycofactocin to the respiratory chain, while the rest of the respiratory chain stayed intact. The fact that NADH/NAD+ ratios are comparable between ethanol and glucose conditions are interesting but indirectly support our hypothesis that mycofactocin and not NAD is the major cofactor in ethanol metabolism. Therefore, we do not see any evidence that the lack of growth of the mutant in ethanol is linked to factors other than respiration.

      To this end, bioinformatic investigation or other evidence to identify the membrane-bound respiratory partner would strengthen the conclusions.

      We generally agree that it is an important next step to identify the direct interaction partners of MftG. However, we are convinced that experimental evidence using several orthogonal approaches is required to unequivocally identify interaction partners of MftG. Nevertheless, we agree that a preliminary bioinformatics study, could guide follow-up studies. We therefore attempted to predict interaction partners of MftG using D-SCRIPT and Alphafold 2. However, our approach did not reveal any meaningful results. Thus, we prefer not to integrate this approach into the manuscript but briefly summarize our methodology here: To predict potential interaction partners of M. smegmatis mc2 155 MftG (MSMEG_1428), D-SCRIPT (Sledzieski et al. 2021, https://doi.org/10.1016/j.cels.2021.08.010) with the Topsy-Turvy model version 1 (Singh et al. 2022, https://doi.org/10.1093/bioinformatics/btac258) was employed to screen every combination of the MSMEG_1428 amino acid sequence with the amino acid sequence of every potential interaction partner from the M. smegmatis mc2 155 predicted total proteome (total 6602 combinations, UniProt UP000000757,  Genome Accession CP000480). Predictions failed for eight potential interaction partners due to size constraints (MSMEG_0019, MSMEG_0400, MSMEG_0402, MSMEG_0408, MSMEG_1252, MSMEG_3715, MSMEG_4727, MSMEG_4757; all amino acids sequences ≥ 2000 AA). Afterward, the top 100 predicted interaction partners, ranked by D-SCRIPT protein-protein-interaction score, were subjected to an Alphafold 2 multimer prediction using ColabFold batch version 1.5.5 (AlphaFold 2 with MMseqs2, Mirdita et al. 2022, https://doi.org/10.1038/s41592-022-01488-1) on a Google Colab T4 GPU with a Python 3 environment and the following parameters (msa_mode: MMseqs2 (UniRef+Environmental), num_models = 1, num_recycles = 3, stop_at_score = 100, num_relax = 0, relax_max_iterations = 200, use_templates = False). As input, the MSMEG_1428 amino acid sequence was used as protein 1 and the amino acid sequence of the potential interaction partner was used as protein 2. In addition, proteins of the electron transport chain and the dormancy regulon (dos regulon) were included as potential interaction partners. In total, 222 unique potential MftG interactions were predicted. The AlphaFold 2 model interface predicted template modelling (ipTM) score peaked at 0.45 for MftG-MftA. This score, however, lies below the threshold of 0.75, which indicates a likely false prediction of interaction (Yin et al. 2022, https://doi.org/10.1002/pro.4379). Nonetheless, the models with the highest ipTM scores (MftG with MftA, MSMEG_3233, MSMEG_4260, MSMEG_0419, MSMEG_5139, MSMEG_5140) were inspected manually using ChimeraX version 1.8 (Meng et al. 2023, https://doi.org/10.1002/pro.4792). However, no reasonable interaction was found.

      Reviewer #2 (Public Review):

      Summary

      Patrícia Graça et al., examined the role of the putative oxidoreductase MftG in regeneration of redox cofactors from the mycofactocin family in Mycolicibacerium smegmatis. The authors show that the mftG is often co-encoded with genes from the mycofactocin synthesis pathway in M. smegmatis genomes. Using a mftG deletion mutant, the authors show that mftG is critical for growth when ethanol is the only available carbon source, and this phenotype can be complemented in trans. The authors demonstrate the ethanol associated growth defect is not due to ethanol induced cell death, but is likely a result of carbon starvation, which was supported by multiple lines of evidence (imaging, transcriptomics, ATP/ADP measurement and respirometry using whole cells and cell membranes). The authors next used LC-MS to show that the mftG deletion mutant has much lower oxidised mycofactocin (MFFT-8 vs MMFT-8H2) compared to WT, suggesting an impaired ability to regenerate myofactocin redox cofactors during ethanol metabolism. These striking results were further supported by mycofactocin oxidation assays after over-expression of MftG in the native host, but also with recombinantly produced partially purified MftG from E. coli. The results showed that MftG is able to partially oxidise mycofactocin species, finally respirometry measurements with M. smegmatis membrane preparations from WT and mftG mutant cells show that the activity of MftG is indispensable for coupling of mycofactocin electron transfer to the respiratory chain. Overall, I find this study to be comprehensive and the conclusions of the paper are well supported by multiple complementary lines of evidence that are clearly presented.

      Strengths

      The major strengths of the paper are that it is clearly written and presented and contains multiple, complementary lines of experimental evidence that support the hypothesis that MftG is involved in the regeneration of mycofactocin cofactors, and assists with coupling of electrons derived from ethanol metabolism to the aerobic respiratory chain. The data appear to support the authors hypotheses.

      We thank the reviewer for their thorough evaluation of our work.

      Weaknesses

      No major weaknesses were identified, only minor weaknesses mostly surrounding presentation of data in some figures.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Fig 6 C and D, would it not be expected that MMFT-2H2 would be decreasing over time as MMFT-2 is increasing?

      This is true. MMFT-2H2 is indeed decreasing while MMFT-2 in increasing, however, since the y-axis is drawn in logarithmic scale the visible difference is not proportional to the actual changes. The increase of MMFT-2 against a very low starting point is more clearly visible than the decrease of MMFT-2H2, which was added in high quantities.

      (2) It would be beneficial to include rationale regarding the electron acceptors tested and why FAD was not included.

      FAD is a prosthetic group of the enzyme and was always a component of the assay. The other electron acceptors were chosen as potential external electron acceptors.

      (3) Bioinformatic analysis to capture possible interacting partners of MftG

      See our response to the previous review.

      Reviewer #2 (Recommendations For The Authors):

      Questions:

      (1) The co-occurrence analysis showed that one genome encoded mftG, but not mftC - do the authors think that this is a mftG mis-annotation?

      This is a good question. We have investigated this case more closely and conclude that this particular mftG is not a misannotation. Instead, it appears that the mftC gene underwent gene loss in this organism. We added on page 8, line 15: “Only one genome (Herbiconiux sp. L3-i23) encoding a bona fide MftG did not harbor any MftC homolog. However, close inspection revealed the presence of mftD, mftF, and a potential mftA gene but a loss of mftB,C and E in this organism.”

      (2) Figure 3A - the complemented mutant strain shows enhanced growth on ethanol when compared to the WT strain with the same mftG complementation vector, suggesting that dysregulation from the expression plasmid may not be responsible for this phenotype. Have the authors conducted whole genome sequencing on the mutant/complement isolate to rule out secondary mutations?

      This is an interesting point. We have not conducted further investigations into the complement mutant. However, we can confidently state that the complementation was successful in that it restored growth of the ∆mftG mutant on ethanol, thus confirming that the growth arrest of the mutant was due to the lack of mftG activity and not due to any secondary mutation. We also observed that both the complement strain and the overexpression strain, both of which are based on the same overexpression plasmid, exhibited shorter lag phases, faster growth and higher final cell densities compared to the wild type. We interpret these data in a way that overexpression of mftG might lift a growth limited step. Notably, this is only an interpretation, we do not make this claim. What we cannot explain at the moment, is the observation that the complement mutant grew to a higher OD than the overexpression strain. This is indeed interesting, and it might be due to an artefact or due to complex regulatory effects, which are hard to study without an in-depth characterization of the different strains involved. While this goes beyond the scope of this study, we are convinced that our main conclusions are not challenged by this phenomenon.

      (3) Figure 4C - could the yellow fluorescence that suggests growth arrest be quantified in these images similar to the size and septa/replication sites?

      In principle, this is a good suggestion. However, the amount of yellow fluorescence only differed in the starvation condition between genotypes. Since this condition was not a focus of this study, we preferred not to discuss these differences further.

      (4) Figure 4E - the complemented mutant strain has very high error, why is that? Could this phenotype not be complemented?

      It is true that the standard deviation (SD) is relatively high in this experiment. This is due to the fact that single-cell analyses based on microscopic images were conducted here - not bulk measurements of the average fluorescence. This means that the high variance partially reflects phenotypic heterogeneity of the population, rather than inefficient complementation. While it is interesting that not all cells behaved equally, a finding that deserves further investigations in the future, we conclude that the mean value is a good representative for the efficiency of the complementation.

      (5) While the whole cell extract experiment presented in Figure 6A is very clear, could the authors include SDS page or MS results of their partially purified MftG preparations used for figure B-F in the supplementary data to rule out any confounding factors that may be oxidising mycofactocin species in these preparations?

      We did not include SDS-Page or MS results since the enzyme preparations obtained were not pure. This is why we refer to the preparation as “partially purified fraction”. Since we were aware of the risk of confounding factors being potentially present in the preparation, we used two different expression hosts (M. smegmatismftG and E. coli) and included negative controls, i.e., a reaction using protein preparations from the same host that underwent the exact same purification steps but lacked the mftG gene. For instance, Figure 6A shows the negative control (M. smegmatismftG) and the verum (M. smegmatismftG-mftG_His6). Although this control is not shown in panels BCD for more clarity, we can assure that the proposed activity of MftG as never been detected in any extract of _M. smegmatismftG. Concerning MftG preparations obtained from heterologous expression in E. coli, we also performed empty vector controls and inactivated protein controls. We added a new Supplementary Figure S4 to show one example control. Taken together, the usage of two different expression hosts along with corresponding background controls clearly demonstrates that mycofactocinol oxidation only occurred in protein extracts of bacterial strains that contained the mftG gene. Taken together, these data indicate that the observed mycofactocinol dehydrogenase activity is connected to MftG and not to any background activity.

      Recommendations:

      • A suggestion - revise sub-titles in the results section to be more 'results-oriented' e.g. rather than 'the role of MftG in growth and metabolism of mycobacteria' consider instead 'MftG is critical for M. smegmatis capacity to utilise ethanol as a sole carbon source for growth' or something similar.

      In principle this is a good idea for many manuscripts. However, we have the impression that this approach does not reflect the complexity and additive aspect of the sections of our manuscript.

      • For clarity, revise all figures to include p-values in the figure legend rather than above the figures (use asterisks to indicate significance).

      We are not sure whether the deletion of p-values in the figures would enhance clarity. We would prefer to leave them within figures.

      • Figure 5B -revise colour legend, it is unclear which bar on the graph corresponds to which strain.

      The figure legend was enlarged to enhance readability.

      • Page 8 - MftG and MftC should be lowercase and italicised as the authors are writing about the co-occurrence of genes encoded in genomes, not proteins.

      Good point, we changed some instances of MftG / MftC to mftG / mftC, to more specifically refer to the gene level. However, in some cases, the protein level is more appropriate, for instance, the phylogenies are based on protein sequences. That is why we used the spelling MftG / MftC in these cases.

      • Page 9 - for clarity move Figure 3 after first in text citation.

      We moved Figure 3.

      • Page 17 - for clarity move Figure 5 after first in text citation.

      We moved Figure 5. We furthermore reformatted figure legend to fit onto the same page as the figures.

      • Page 20, line 17 - 'was attempted' change to 'was performed'. The authors did more than attempt purification, they succeeded!

      Since purification of MftG was not successful, we prefer the term “attempted” here. However, activity assays indeed indicate successful production of MftG.

      • Page 20, line 19-21 - data showing that the MftG-HIS6 complements ∆mftG could be included in supplementary information.

      Complementation was obvious by growth on media containing ethanol as a sole carbon source.

      • Page 26 line 25 - 'we also we' delete duplicated we.

      Thank you for the hint, we deleted the second instance of “we” in the manuscript.

      • Page 26 Line 26 - 'mycofactocinols were oxidised to mycofactocinols', should this read mycofactocinols were oxidised to mycofactocinones?

      Correct. We changed “mycofactocinols” to “mycofactocinones”

      • Page 28 line 17, huc hydrogenase operon

      We added (“huc operon”).

      • Page 38 line 24, 'Two' not 'to'.

      This is a misunderstanding. “To” is correct

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public reviews):

      (1) Given that this is one of the first studies to report the mapping of longitudinal intactness of proviral genomes in the globally dominant subtype C, the manuscript would benefit from placing these findings in the context of what has been reported in other populations, for example, how decay rates of intact and defective genomes compare with that of other subtypes where known.  

      Most published studies are from men living with HIV-1 subtype B and the studies are not from the hyperacute infection phase and therefore a direct head-to-head comparison with the FRESH study is difficult.  However, we can cite/highlight and contrast our study with a few a few examples from acute infection studies as follows.

      a. Peluso et. al., JCI, 2020, showed that in Caucasian men (SCOPE study), with subtype B infection, initiating ART during chronic infection virus intact genomes decayed at a rate of 15.7% per year, while defective genomes decayed at a rate of 4% per year.  In our study we showed that in chronic treated participants genomes decreased at a rate of 25% (intact) and 3% (defective) per month for the first 6 months of treatment.

      b. White et. al., PNAS, 2021, demonstrated that in a cohort of African, white and mixed-race American men treated during acute infection, the rate of decay of intact viral genomes in the first phase of decay was <0.3 logs copies in the first 2-3 weeks following ART initiation. In the FRESH cohort our data from acute treated participants shows a comparable decay rate of 0.31 log copies per month for virus intact genomes.

      c. A study in Thailand (Leyre et. al., 2020, Science Translational Medicine), of predominantly HIV-1 CRF01-AE subtype compared HIV-reservoir levels in participants starting ART at the earliest stages of acute HIV infection (in the RV254/SEARCH 010 cohort) and participants initiating ART during chronic infection (in SEARCH 011 and RV304/SEARCH 013 cohorts). In keeping with our study, they showed that the frequency of infected cells with integrated HIV DNA remained stable in participants who initiated ART during chronic infection, while there was a sharp decay in these infected cells in all acutely treated individuals during the first 12 weeks of therapy.  Rates of decay were not provided and therefore a direct comparison with our data from the FRESH cohort is not possible.

      d. A study by Bruner et. al., Nat. Med. 2016, described the composition of proviral populations in acute treated (within 100 days) and chronic treated (>180 days), predominantly male subtype B cohort. In comparison to the FRESH chronic treated group, they showed that in chronic treated infection 98% (87% in FRESH) of viral genomes were defective, 80% (60% in FRESH) had large internal deletions and 14% (31% in FRESH) were hypermutated.  In acute treated 93% (48% in FRESH) were defective and 35% (7% in FRESH) were hypermutated.  The differences frequency of hypermutations could be explained by the differences in timing of infection specifically in the acute treated groups where FRESH participants initiate ART at a median of 1 day after infection.  It is also possible that sex- or race-based differences in immunological factors that impact the reservoir may play a role.  

      This study also showed that large deletions are non-random and occur at hotspots in the HIV-1 genome. The design of the subtype B IPDA assay (Bruner et. al., Nature, 2019) is based on optimal discrimination between intact and deleted sequences - obtained with a 5′ amplicon in the Ψ region and a 3′ amplicon in Envelope. This suggest that Envelope is a hotspot for large while deletions in Ψ is the site of frequent small deletions and is included in larger 5′ deletions. In the FRESH cohort of HIV-1

      subtype C, genome deletions were most frequently observed between Integrase and Envelope relative to Gag (p<0.0001–0.001).

      e. In 2017, Heiner et. al., in Cell Rep, also described genetic characteristics of the latent HIV-1 reservoir in 3 acute treated and 3 chronic treated male study participants with subtype B HIV.  Their data was similar to Bruner et. al. above showing proportions of intact proviruses in participants who initiated therapy during acute/early infection at 6% (94% defective) and chronic infection at 3% (97% defective). In contrast the frequencies in FRESH in acute treated were 52% intact and 48% defective and in chronic infection were 13% intact and 87% defective.  These differences could be attributed to the timing of treatment initiation where in the aforementioned study early treatment ranged from 0.6-3.4 months after infection.

      (2) Indeed, in the abstract, the authors indicate that treatment was initiated before the peak. The use of the term 'peak' viremia in the hyperacute-treated group could perhaps be replaced with 'highest recorded viral load'. The statistical comparison of this measure in the two groups is perhaps more relevant with regards to viral burden over time or area under the curve viral load as these are previously reported as correlates of reservoir size. 

      We have edited the manuscript text to describe the term peak viraemia in hyperacute treated participants more clearly (lines 443-444). We have now performed an analysis of area under the curve to compare viral burden in the two study groups and found associations with proviral DNA levels after one year. This has been added to the results section (lines 162-163).

      Reviewer #2 (Public reviews):

      (1) Other factors also deserve consideration and include age, and environment (e.g. other comorbidities and coinfections.)

      We agree that these factors could play a role however participants in this study were of similar age (18-23), and information on co-morbidities and coinfections are not known.

      Reviewer #3 (Public reviews):

      (1) The word reservoir should not be used to describe proviral DNA soon after ART initiation. It is generally agreed upon that there is still HIV DNA from actively infected cells (phase 1 & 2 decay of RNA) during the first 6-12 months of ART. Only after a full year of uninterrupted ART is it really safe to label intact proviral HIV DNA as an approximation of the reservoir. This should be amended throughout.

      We agree and where appropriate have amended the use of the word reservoir to only refer to the proviral load after full viral suppression, i.e., undetectable viral load.

      (2) All raw, individualized data should be made available for modelers and statisticians. It would be very nice to see the RNA and DNA data presented in a supplementary figure by an individual to get a better grasp of intra-host kinetics.

      We will make all relevant data available and accessible to interested parties on request. We have now added a section on data availability (lines 489-491).

      (3) The legend of Supplementary Figure 2 should list when samples were taken.

      The data in this figure represents an overall analysis of all sequences available for each participant at all time points.  This has now been explained more clearly in the figure legend.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      (1) It is recommended that the introduction includes information to set the scene regarding what is currently reported on the composition of the reservoir for those not in the immediate field of study i.e., the reported percentage of defective genomes and in which settings/populations genome intactness has been mapped, as this remains an area of limited information.

      We have now included summary of other reported findings in the field in the introduction (lines 89-92, 9498) and discussion (lines 345-350).  A more detailed overview has been provided in the response to public reviews.

      (2) It may be beneficial to state in the main text of the paper what the purpose of the Raltegravir was and that it was only administered post-suppression. Looking at Table 1, only the hyperacute treatment group received Raltegravir and this could be seen as a confounder as it is an integrase inhibitor. Therefore, this should be explained.

      Once Raltegravir became available in South Africa, all new acute infections in the study cohort had an intensified 4-drug regimen that included Raltegravir.  A more detailed explanation has now been included in the methods section (lines 435-437).

      (3) Can the authors explain why the viral measures at 6 months post-ART are not shown for chronictreated individuals in Figure 1 or reported on in the text?

      The 6 months post-ART time point has been added to Figure 1.

      (4) Can the authors indicate in the discussion, how the breakdown of proviral composition compares to subtype B as reported in the literature, for example, are the common sites of deletion similar, or is the frequency of hypermutation similar?

      Added to discussion (lines 345-350).

      (5) Do the numbers above the bars in Figure 3 represent the number of sampled genomes? If so, this should be stated.

      Yes, the numbers above the bars represent the number of sampled genomes. This has been added to the Figure 3 legend.

      (6) In the section starting on line 141, the introduction implies a comparison with immunological features, yet what is being compared are markers of clinical disease progression rather than immune responses. This should be clarified/corrected.

      This has been corrected (line 153).

      (7) Line 170 uses the term 'immediately' following infection, however, was this not 1 -3 days after?

      We have changed the word “immediately” to “1-3 days post-detection” (line 181).

      (8) Can the sampling time-points for the two groups be given for the longitudinal sequencing analysis?

      The sequencing time points for each group is depicted in Figure 2.

      (9) Line 183 indicates that intact genomes contributed 65% of the total sequence pool, yet it's given as 35% in the paragraph above. Should this be defective genomes?

      Yes, this was a typographical error.  Now corrected to read “defective genomes” (line 193).

      (10) The section on decay kinetics of intact and defective genomes seems to overlap with the section above and would flow better if merged.

      Well noted, however we choose to keep these sections separate.

      (11) Some references in the text are given in writing instead of numbering.

      This has been corrected.

      (12) In the clonal expansion results section, can it be indicated between which two time-points expansion was measured?

      This analysis was performed with all sequences available for each participant at all time points.  We have added this explanation to the respective Figure legend.

      Reviewer #2 (Recommendations for The Authors):

      (1) The statement on line 384 "Our data showed that early ART...preserves innate immune factors" - what innate immune factors are being referred to?

      We have removed this statement.

      (2) HLA genotyping methods are not included in the Methods section

      Now included and referenced (lines 481-483).

      (3) Are CD4:CD8 ratios available for the cohorts? This could be another informative clinical parameter to analyse in relation to HIV-1 proviral load after 1 year of ART – as done for the other variables (peak VL, and the CD4 measures).

      Yes, CD4:CD8 ratios are available. We performed the recommended analysis but found no associations with HIV-1 proviral load after 1 year of ART. We have added this to the results section (lines 163-164).

      (4) Reference formatting: Paragraph starting at line 247 (Contribution of clonal expansion...) - the two references in this paragraph are not cited according to the numbering system as for the rest of the manuscript. The Lui et al, 2020 reference is missing from the reference list - so will change all the numbering throughout.

      This has been corrected.

      Reviewer #3 (Recommendations for The Authors):

      (1) To allow comparison to past work. I suggest changing decay using % to half-life. I would also mention the multiple studies looking at total and intact HIV DNA decay rates in the intro.

      We do not have enough data points to get a good estimate of the half-life and therefor report decay as percentage per month for the first 6 months. 

      (2) Line 73: variability is the wrong word as inter-individual variability is remarkably low. I think the authors mean "difference" between intact and total.

      We have changed the word variability to difference as suggested.

      (3) Line 297: I am personally not convinced that there is data that definitively shows total HIV DNA impacting the pathophysiology of infection. All of this work is deeply confounded by the impact of past viremia. The authors should talk about this in more detail or eliminate this sentence.

      We have reworded the statement to read “Total HIV-1 DNA is an important biomarker of clinical outcomes.” (Lines 308-309).

      (4) Line 317; There is no target cell limitation for reservoir cells. The vast majority of CD4+ T cells during suppressive ART are uninfected. The mechanism listing the number of reservoir cells is necessarily not target cell limitation.

      We agree. The statement this refers to has been reworded as follows: “Considering, that the majority of CD4 T cells remain uninfected it is likely that this does not represent a higher number of target cells, and this warrants further investigation.” (lines 325-326).

      (5) Line 322: Some people in the field bristle at the concept of total HIV DNA being part of the reservoir as defective viruses do not contribute to viremia. Please consider rephrasing. 

      We acknowledge that there are deferring opinions regarding total HIV DNA being part of the reservoir as defective viruses do not contribute to viremia, however defective HIV proviruses may contribute to persistent immune dysfunction and T cell exhaustion that are associated comorbidities and adverse clinical outcomes in people living with HIV.  We have explained in the text that total HIV-DNA does not distinguish between replication-competent and -defective viruses that contribute to the viral reservoir.

      (6) Line 339: The under-sampling statement is an understatement. The degree of under-sampling is massive and biases estimates of clonality and sensitivity for intact HIV. Please see and consider citing work by Dan Reeves on this subject.

      We agree and have cited work by Dan Reeves (line 358).

      (7) Line 351: This is not a head-to-head comparison of biphasic decay as the Siliciano group's work (and others) does not start to consider HIV decay until one year after ART. I think it is important to not consider what happens during the first year of ART to be reservoir decay necessarily.

      Well noted.

      (8) Line 366-371: This section is underwritten. In nearly all PWH studies to date, observed reservoirs are highly clonal.

      We agree that observed reservoirs are highly clonal but have not added anything further to this section.

      (9) It would be nice to have some background in the intro & discussion about whether there is any a priori reason that clade C reservoirs, or reservoirs in South African women, might differ (or not) from clade B reservoirs observed in different study participants.

      We have now added this to the introduction (lines 94-103).

      (10) Line 248: This sentence is likely not accurate. It is probable that most of the reservoir is sustained by the proliferation of infected CD4+ T cells. 50% is a low estimate due to under-sampling leading to false singleton samples. Moreover, singletons can also be part of former clones that have contracted, which is a natural outcome for CD4+ T cells responding to antigens &/or exhibiting homeostasis. The data as reported is fine but more complex ecologic methods are needed to truly probe the clonal structure of the reservoir given severe under sampling.

      Well noted.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The present study's main aim is to investigate the mechanism of how VirR controls the magnitude of MEV release in Mtb. The authors used various techniques, including genetics, transcriptomics, proteomics, and ultrastructural and biochemical methods. Several observations were made to link VirR-mediated vesiculogenesis with PG metabolism, lipid metabolism, and cell wall permeability. Finally, the authors presented evidence of a direct physical interaction of VirR with the LCP proteins involved in linking PG with AG, providing clues that VirR might act as a scaffold for LCP proteins and remodel the cell wall of Mtb. Since the Mtb cell wall provides a formidable anatomical barrier for the entry of antibiotics, targeting VirR might weaken the permeability of the pathogen along with the stimulation of the immune system due to enhanced vesiculogenesis. Therefore, VirR could be an excellent drug target. Overall, the study is an essential area of TB biology.

      We thank the reviewer for the kind assessment of our paper.  

      Strengths: 

      The authors have done a commendable job of comprehensively examining the phenotypes associated with the VirR mutant using various techniques. Application of Cryo-EM technology confirmed increased thickness and altered arrangement of CM-L1 layer. The authors also confirmed that increased vesicle release in the mutant was not due to cell lysis, which contrasts with studies in other bacterial species. 

      Another strength of the manuscript is that biochemical experiments show altered permeability and PG turnover in the mutant, which fits with later experiments where authors provide evidence of a direct physical interaction of VirR with LCP proteins. 

      Transcriptomics and proteomics data were helpful in making connections with lipid metabolism, which the authors confirmed by analyzing the lipids and metabolites of the mutant. 

      Lastly, using three approaches, the authors confirm that VirR interacts with LCP proteins in Mtb via the LytR_C terminal domain. 

      Altogether, the work is comprehensive, experiments are designed well, and conclusions are made based on the data generated after verification using multiple complementary approaches.

      We are glad that this reviewer finds our study of interest and well designed.   

      Weaknesses: 

      (1) The major weakness is that the mechanism of VirR-mediated EV release remains enigmatic. Most of the findings are observational and only associate enhanced vesiculogenesis observed in the VirR mutant with cell wall permeability and PG metabolism. The authors suggest that EV release occurs during cell division when PG is most fragile. However, this has yet to be tested in the manuscript - the AFM of the VirR mutant, which produces thicker PG with more pore density, displays enhanced vesiculogenesis. No evidence was presented to show that the PG of the mutant is fragile, and there are differences in cell division to explain increased vesiculogenesis. These observations, counterintuitive to the authors' hypothesis, need detailed experimental verification.

      We concur with the reviewer that we do not have direct evidence showing a more fragile PG in the virR mutant and our statement is supported by a compendium of different results. However, this statement is framed in the discussion section as a possible scenario, acknowledging that more experiments are needed to make such connection. Nevertheless, we provide additional data on the molecular characterization of virRmut PG using MS to show a significant increase in the abundance of deacetylated muropeptides, a feature that has been linked to altered lysozyme sensitivity in other unrelated Gram-positive bacteria

      (Fig 8 G,H).  

      (2.1) Transcriptomic data only adds a little substantial. Transcriptomic data do not correlate with the proteomics data. It remains unclear how VirR deregulates transcription. 

      We concur with the reviewer that information provided by transcriptomics and proteomics is a bit fragmented and, taking into consideration the low correlation between both datasets, it does not help to explain the phenotype observed in the mutant. This issue has also been raised by another reviewer so, we have paid special attention to that. 

      To refine the biological interpretation of the transcriptomic data we have integrated the complemented strain (virRmut-Comp) in our analyses. This led us to narrow down the virR-dependent transcriptomics signature to the sets of genes that appear simultaneously deregulated in virRmut with respect to both WT and complemented strain in either direction. Furthermore, to identify the transcription factors whose regulatory activity appear disrupted in the mutant strain, we have resorted to an external dataset (Minch et al. 2015) and found a set of 10 transcriptional regulators whose regulons appear significantly impacted in the virRmut strain. While admittedly these improvements do not fully address the question tackled by the reviewer, we found that they contribute to a more precise characterization of the VirR-dependent transcriptional signatures, as well as the regulons, in the genome-wide transcriptional regulatory network of the pathogen that appear altered because of virR disruption. We acknowledge that the lack of correlation between whole-cell lysates proteomics and transcriptomic data is something intriguing, albeit not uncommon in Mycobacterium tuberculosis. However, differences in the protein cargo of the vesicles from different strains share key pathways in common with the transcriptomic analyses, such as the enrichments in cell wall biogenesis and peptidoglycan biosynthesis that are observed both among genes that are downregulated in both cases in virRmut.

      (2.2) TLCs of lipids are not quantitative. For example, the TLC image of PDIM is poor; quantitative estimation needs metabolic labeling of lipids with radioactive precursors. Further, change in PDIMs is likely to affect other lipids (SL-1, PAT/DAT) that share a common precursor (propionyl- CoA).

      We also agree with the reviewer that TLC, as it is, it is not quantitative. However, we do not have access to radioactive procedures. In the new version of the manuscript, we have run TLCs on all the strains tested to resolve SLs and PAT/DATs (Fig S8). Our results show a reduction in the pool of SL and DATs in the mutant, indicating that part of the methylmalonil pool is diverted to the synthesis of PDIMs. 

      (3) The connection of cholesterol with cell wall permeability is tenuous. Cholesterol will serve as a carbon source and contribute to the biosynthesis of methyl-branched lipids such as PDIM, SL-1, and PAD/DAT. Carbon sources also affect other aspects of physiology (redox, respiration, ATP), which can directly affect permeability and import/export of drugs. Authors should investigate whether restoration of the normal level of permeability and EV release is not due to the maintenance of cell wall lipid balance upon cholesterol exposure of the VirR mutant.

      We concur with the reviewer that cholesterol as a sole carbon source is introducing many changes in Mtb cells beside permeability. Consequently, we investigated the virRmut lipid profile upon exposure to either cholesterol or TRZ (Fig S8). Both WT and virRmut-Comp strains were included in the analysis. Polar lipid analysis revealed that either cholesterol or TRZ exposure induced a marked reduction in PIMs and cardiolipin (DPG) levels in virRmut relative to WT or complemented strains (Fig S8A). Analysis of apolar lipids indicated that, relative to glycerol MM, virRmut cultured in the presence of cholesterol or TRZ showed reduced levels of TDM and DATs compared to WT and virRmut-Comp strains (Fig S8B). These results suggest a lack of correlation between modulation of cell permeability by cholesterol and TRZ and lipid levels in the absence of VirR.

      Furthermore, about this section, we would like to mention that we have modified the reference used for the annotation of the DosR regulon: moving from the definition of the regulon used in the previous submission (coming from Rustad, el at. PLoS One 3(1), e1502 (2008). The enduring hypoxic response of Mycobacterium tuberculosis) to the more recent characterization of the regulon based on CHiPseq data, reported in Minch et al. 2015. This was done to ensure coherence with the transcriptomics analyses in the new figure 4.

      (4) Finally, protein interaction data is based on experiments done once without statistical analysis. If the interaction between VirR and LCP protein is expected on the mycobacterial membrane, how the SPLIT_GFP system expressed in the cytoplasm is physiologically relevant. No explanation was provided as to why VirR interacts with the truncated version of LCP proteins and not with the full-length proteins.

      We have repeated the experiments and applied statistics (Figure 9). As stated in the manuscript this assay has successfully been applied to interrogate interactions of domains of proteins embedded in the membrane of mycobacteria. Therefore, we believe that this assay is valid to interrogate interactions between Lcp proteins.

      Reviewer #2 (Public Review): 

      Summary: 

      In this work, Vivian Salgueiro et al. have comprehensively investigated the role of VirR in the vesicle production process in Mtb using state-of-the-art omics, imaging, and several biochemical assays. From the present study, authors have drawn a positive correlation between cell membrane permeability and vesiculogenesis and implicated VirR in affecting membrane permeability, thereby impacting vesiculogenesis. 

      Strengths: 

      The authors have discovered a critical factor (i.e. membrane permeability) that affects vesicle production and release in Mycobacteria, which can broadly be applied to other bacteria and may be of significant interest to other scientists in the field. Through omics and multiple targeted assays such as targeted metabolomics, PG isolation, analysis of Diaminopimelic acid and glycosyl composition of the cell wall, and, importantly, molecular interactions with PG-AG ligating canonical LCP proteins, the authors have established that VirR is a central scaffold at the cell envelope remodelling process which is critical for MEV production. 

      We thank the reviewer for the kind assessment of the paper.

      Weaknesses: 

      Throughout the study, the authors have utilized a CRISPR knockout of VirR. VirR is a non-essential gene for the growth of Mtb; a null mutant of VirR would have been a better choice for the study. 

      According to Tn mutant databases and CRISPR databases, virR is a non-essential gene. However, we have tried to interrupt this gene using the allelic exchange substitution approach via phages many times with no success. So far there is no precedent of a clean KO mutant in this gene. White et al., generated a virR mutant consisting of deletion of a large fragment of the c-terminal part of the protein, pretty much replicating the effect of the Tn insertion site in the virR Tn mutant. These precedents made us to switch to CRISPR technology.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) The authors monitored cell lysis by measuring the release of a cytoplasmic iron-responsive protein (IdeR). Since EV release is regulated by iron starvation, which is directly sensed by IdeR, another control (unrelated to iron) is needed. A much better approach would be to use hydrophobic/hydrophilic probes to measure changes in the cell wall envelope.

      Does the VirR complemented strain have a faint IdeR band in the supernatant? The authors need to clarify. Also, it's unclear whether the complementation restored normal VirR levels or not. 

      We thank the reviewer for this recommendation. Consequently, we have complemented these studies by an alternative approach based on serially diluted cultures spotted on solid medium. These results align very well with that of western blot using IdeR levels in the supernatant as a surrogate of cell lysis.

      We also noticed the presence of a faint IdeR band in the supernatant of the complemented strain and suggestive of a possible cell lysis. However, as shown in other section this was not translated into increased levels of vesiculation. As previously shown in a previous paper describing VirR as a genetic determinant of vesiculogenesis, VirR levels in the complemented strains are not just restored but increased considerably. This overexpression could explain the potential artifact of a leaky phenotype in the complemented strain. In addition to that previous study, the proteomic data included in this paper clearly shows a restoration of VirR levels relative to the WT strains.

      (2) Figure 2C: The data are weak; I don't see any difference in incorporating FDAAs in MM media. Even in the 7H9 medium, differences appear only at the last time point (20 h). What happens at the time point after 20 h (e.g., 48 h)? How do we differentiate between defective permeability or anabolism leading to altered PG? No statistical analysis was performed.

      We apologize for the incomplete assessment of the results in this figure. First, this figure just shows differential incorporation of FDAAs in the different strains in different media. As per previous studies (Kuru et al (2017) Nat. Protocols), these probes can freely enter into cells and may be incorporated into PG by at least three different mechanisms, depending on the species: through the cytoplasmic steps of PG biosynthesis and via two distinct transpeptidation reactions taking place in the periplasm. Consequently, the differential labeling observed in virRmut relative to WT strain may be a consequence of the enlarge PG observe din the mutant. We have repeated the experiment and created new data. First, we have cultured strains with a blue FDAA (HADA) for 48 to ensure full labeling. Then, we washed cells and cultured in the presence of a second FDAA, this time green (FDL) for 5 h. The differential incorporation of FDL relative to HADA was then measured under the fluorescence microscope. This experiment showed a virRmut incorporate more FDL that the other strains, suggesting an altered PG remodeling.  modified the figure to make clearer the early and late time points of the time-course and applied statistics.

      (3) Many genes (~ 1700) were deregulated in the mutant. Since these transcriptional changes do not correlate at the protein level in WCL, it's important to determine VirR-specificity. RNA-Seq of VirR complemented strain is important.

      We think this was an extremely important point, and we thank the reviewer for pointing this out. Following their suggestion, we have analyzed and integrated data from the complemented strain, which we have added to the GEO submission, to conclude that, in fact, differences in expression between the complemented strain and either the WT, or virRmut are also common and highly significant. Albeit this is not completely unexpected, given the nature of our mutants and the fact that the complemented strains show significantly higher levels of expression of VirR -both at the RNA and protein levels- than the WT, it motivated us to narrow down our definition of VirR-dependent genes to adopt a combined criterium that integrated the complemented strain. Following this approach, we considered the set of genes upregulated (downregulated) in virRmut as those whose expression in that strain is, at the same time, significantly higher (lower) than in WT as well as in virRmut-Comp. Working with this integrated definition, the genes considered -399 upregulated and 502 downregulated genes- are those whose observed expression changes are more likely to be genuinely VirR-dependent rather than any non-specific consequence of the mutagenesis protocols. Despite the lower number of genes in these sets, the repetition of all our functional enrichment analyses based on this combined criterium leads us to conclusions that are largely compatible with those presented in the first version of the paper.

      (4) Transcriptome data provide no clues about how VirR could mediate expression deregulation. Is there an overlap with the regulations/regulons of any Mtb transcription factors? One clue is DosR; however, DosR only regulates 50-60 genes in Mtb. 

      Again, we would like to thank the reviewer for this recommendation, which we have followed accordingly to generate a new section in the results named “VirR-dependent genes intersect the regulons of key transcriptional regulators of the responses to stress, dormancy, and cell wall remodeling”. As we explain in this new section, we resorted to the regulon annotations reported in (Minch et al. 2015), where ChIP-seq data is collected on binding events between a panel of 143 transcription factors (TFs) and DNA genome-wide. The dataset includes 7248 binding events between regulators and DNA motifs in the vicinity of targets’ promoters. After completing enrichment analyses with the resulting regulons, we identified 10 transcription factors whose intersections with the sets of up and downregulated genes in virRmut were larger than expected by chance (One tailed Fisher exact test, OR>2, FDR<0.1). Those regulators -which, as guessed by the referee, included DevR-, control key pathways related with cell wall remodeling, stress responses, and transition to dormancy.

      (5) How many proteins that are enriched or depleted in the EVs of the VirR mutant also affected transcriptionally in the mutant? How does VirR regulate the abundance and transport of protein in EVs? 

      While the intersection between genes and proteins that appear upregulated in the virRmut strain both at transcriptional and vesicular protein levels (N=21) was found larger than expected by chance (OR=2.0 p=7.0E-3), downregulated genes and proteins in virRmut (N=14) were not enriched in each other. These results, indicated, at most, a scarce correlation between RNA and protein levels (a phenomenon nonetheless previously observed in Mycobacterium tuberculosis, among other organisms, see Cortés et al. 2013). Admittedly, the compilation of these omics data is insufficient, by itself to pinpoint the specific regulatory mechanisms through which the absence of VirR impacts protein abundance in EVs. For the sake of transparency, this has been acknowledged in the discussion section of the resubmitted version of the manuscript.

      (6) The assumption that a depleted pool of methylmalonyl CoA is due to increased utilization for PDIM biosynthesis is problematic. Without flux-based measurement, we don't know if MMCoA is consumed more or produced less, more so because Acc is repressed in the VirR mutant EVs. Further, MMCoA feeds into the TCA cycle and other methyl-branched lipids. Without data on other lipids and metabolism, the depletion of MMCoA is difficult to explain.

      The differential expression statistics compiled suggest that both effects may be at place, since we observed, at the same time, a downregulation of enzymes controlling methylmalonyl synthesis from propionyl-CoA (i.e. Acc, at the protein level), as well as an upregulation of enzymes related with its incorporation into DIM/PDIMs (i.e. pps genes). Both effects, combined, would favor an increased rate of methylmalonyl production, and a slower depletion rate, thus contributing to the higher levels observed. We however concur with the reviewer that fluxomics analyses will contribute to shed light on this question in a more decisive manner, and we have acknowledged this in the discussion section too.   

      (7) Figure 5: Deregulation of rubredoxins and copper indicates impaired redox balance and respiration in the mutant. The data is complex to connect with permeability as TRZ is mycobactericidal and also known to affect the respiratory chain. The authors need to investigate if, in addition to permeability, the presence of VirR is essential for maintaining bioenergetics.

      The data related to rubredoxins and copper has been modified after reanalyzing transcriptomic data including the complemented strain. Nevertheless, we found that some features of the response to stresses may be impaired in the mutant, including the one to oxidative stress. In this regard, we found the enhanced sensitivity of the mutant to H2O2 relative to WT and complemented strains. This piece of data is now included as Fig S3 in the new version of the manuscript.

      (8) Differential regulation of DoS regulon and cholesterol growth could also be linked to differences in metabolism, redox, and respiration. What is the phenotype of VirR mutants in terms of growth and respiration in the presence of cholesterol/TRZ? 

      We thank the reviewer for this suggestion. Consequently, we have added a new section to Results that suggest that other aspects of mycobacterial physiology may be affected in the virR mutant when cultured in the presence of cholesterol or TRZ: 

      “Modulation of EV levels and permeability in virRmut by cholesterol and TRZ. We next wondered about the effect of culturing virRmut on both cholesterol or TRZ could have on cell growth, permeability and EV production. In the case of cholesterol, it has also been shown to affect other aspects of physiology (redox, respiration, ATP), which can directly affect permeability (Lu et al., 2017). We monitored virRmut growth cultured in MM supplemented with either glycerol, cholesterol as a sole carbon source, and TRZ at 3 ug ml-1 for 20 days. While cholesterol significantly enhanced the growth virRmut after 5 days relative to glycerol medium, supplementation of glycerol medium with TRZ restricted growth during the whole time-course (Fig S5A). The study of cell permeability in the same conditions indicated that the enhanced cell permeability observed in glycerol MM was reduced when virRmut when cultured with cholesterol as sole carbon source. Conversely, the presence of TRZ increased cell permeability relative to the medium containing solely glycerol (Fig S5C). As we have previously observed for the WT strain, either condition (Chol or TRZ) also modified vesiculation levels in the mutant accordingly (Fig S5B). These results strongly indicates that other aspects of mycobacterial physiology besides permeability are also affected in the virR mutant and may contribute to the observed enhanced vesiculation.

      (9) PDIM TLC is not evident; both DimA and DImB should be clearly shown. It will also be necessary to show other methyl-branched lipids, such as SL-1 and PAT/DAT, because the increase in PDIM can take away methyl malonyl CoA from the biosynthesis of SL-1 and PAT/DAT. Studies have shown that SLI-, PAT/DAT, and PDIM are tightly regulated, where an increase in one lipid pool can affect the abundance of other lipids. Quantitative assays using 14C acetate/propionate are most appropriate for these experiments. 

      We apologize for the fact that TLC analysis is not performed in a radioactive fashion. However, we do not have access to this approach. To answer reviewer question about the fact that other methyl-branched lipids may explain the altered flux of methyl malonyl CoA, we have run TLCs on all the strains tested to resolve SLs and PAT/DATs (Fig S8). Notably, we observed a reduction in the level of these lipids (SL1 or PAT/DAT) in virRmut cultured in glycerol relative to WT and complemented strains, suggesting that the excess of PDIM synthesis can take away methyl malonyl CoA from the biosynthesis of SL-1 and PAT/DAT in the absence of VirR (Fig S8B).

      (10) Figure 8: Interaction between VirR and Lcp proteins. Since these interactions are happening in the membrane, using a split GFP system where proteins are expressed in the cytoplasm is unlikely to be relevant.

      Also, experiments on Figure 8C are performed once, and representation needs to be clarified; split GFP needs a positive control, and negative control (CtpC) is not indicated in the figure.

      We have repeated the experiments and applied statistics (Figure 9). As stated in the manuscript this assay has successfully been applied to interrogate interactions of domains of proteins embedded in the membrane of mycobacteria. Therefore, we believe that this assay is valid to interrogate interactions between Lcp proteins.

      Reviewer #2 (Recommendations For The Authors):  

      (1) Authors should consider making more effort to mine the omics data and integrate them. Given the amount of data that is generated with the omics, they need to be looked at together to find out threads that connect all of them. 

      In the resubmitted version of the paper, we have followed reviewer´s recommendation by incorporating new analyses that integrated the virRmut-C strain, and tried to provide context to the differences found in the context of broader transcriptional regulatory networks (new figure 4), as well as in the context of metabolic pathways related with PDIM biosynthesis from methylmalonyl (figure 6I, already present in the first submission). We consider that these additions contribute to a deeper interpretation of the omics data in the line of what was suggested by the reviewer.

      (2) The interpretation given by authors in lines 387-390 is an interpretation that does not have sufficient support and, hence should be moved into discussion. 

      We thank the reviewer for this recommendation. We believe that these new analyses and integration studies now support the above statement.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I recommend being explicit regarding how the animals were habituated to blood sampling.

      On lines 109-111 we have added a more detailed explanation of how mice were habituated to blood sampling. This includes details that mice were held and had their tails palpated for approximately 5 minutes per day.

      Were any mice excluded due to loss or movement of the implant over time? Any details to allow replication of long-term measurements like this should be included.

      No mice lost their cannulas during experimentation so we have added a sentence on this on lines 303 to 304 to this effect.  We have also noted that there was a slight decrease in signal over the months of experimentation. A statement on line 318 has also been added that clarifies two mice lost between the pregnancy and lactation stages of experiment were euthanised due to dystocia.

      The text states that synchronized episodic activity reappeared as early as 3 days after birth, citing Figure 6c as evidence. There is no 6c. Figure 6b shows day 5 after birth.

      This has been corrected.

      The methods state mRNA levels had to be "above background" to be counted as colocalization. At how many fold/what percent above background was a cell considered positive for expression?

      Positive hybridisation was scored according to the manufacturer’s protocol and a statement to this effect has been added on line 144.

      Please ensure figure titles or the data graphs explicitly give the genotype of the mice in all figures (or state the mice are wildtype).

      Genotype has been added to figure titles where possible. Genotypes are always given in figure legends and tables and/or explicitly stated on the figure itself.

      Figure 4's title states events are "perfectly" correlated, which is a subjective term. I recommend saying "consistently" or "temporally" correlated, depending on your meaning.

      This has been amended to read “consistently correlated”

      Reviewer #2 (Recommendations For The Authors):

      The comments below aim to clarify the paper's methodology and results but do not detract from my overall enthusiasm for this work.

      - Given past studies demonstrating prolactin action in the brain, particularly the MPOA/MPN, is essential for maternal behavior, can the authors please clarify why this behavior is retained in the cam2a prlr knockout mice? The authors mention that prlr in the MPOA is only knocked down 50% compared to WT controls. Is this sufficient to retain maternal behavior?

      In our experience 50% Prlr in the MPOA is sufficient to retain normal maternal behaviour in most animals including the ones in this experiment (our original paper describing this showed relatively normal behavior, for example, with a vGAT and vGlut-mediated knockouts, and even a double knockout – it was only when we achieved complete KO with an AAV-Cre that we saw failure of maternal behavior – Brown et al, PNAS 3;114(40):10779-10784 2017). We have added a statement on lines 157-159 regarding this.  We have an additional paper in preparation specifically characterising the maternal behaviour and lactation outcomes in this line of mice, and we find most animals display normal maternal behaviour, with slightly impaired milk production in later lactation.

      - Supplementary Figure 1. Can the authors please clarify the criteria for a cell to be positive for prlr? The methods state that the signal must be "above background level." How was the background measurement obtained? In the negative control?

      As per above, scoring of positive hydribisation was done according to the manufacturer’s protocol and a statement to this effect has been added on line 144.

      - Lines 310-314: This sentence describes RNAscope analysis of prlr knockdown in kisspeptin cells and refers to Extended Figure 3 - but I believe this is in Supplementary Figure 1.

      This has been corrected.

      - Figure 3-4: When mice return to estrous cycles, the amplitude of episodic kisspeptin neuron activity is the same as 24 hours after weaning, which appears much lower than in virgin females. Does this reach significance? If so, do the authors know why kisspeptin activity is still suppressed, and can they comment on why this may not affect estrous cyclicity?

      This does not reach significance – see Supplementary Table 1 (4C) for statistics. Therefore, no further analysis was done. This question would need to be examined with a follow up experiment. Given the 5s on, 15 s off scheduled mode of recording used here, amplitude was not an extremely accurate measure and amplitude has been reported as relative within each mouse. There is also an additional issue of a gradual reduction in amplitude of signal over time in these long-term experiments – although it is true that much larger signals were detected after ovariectomy at the end of the experiment.  At present, we have not tried to interpret whether the changes in amplitude are informative.

      - Fiber photometry studies: Please indicate whether a post-mortem examination of GCaMP transfection and fiber photometry placement was conducted, and what region of the ARC was imaged.

      Brains from these mice were collected, however postmortem analysis of cannula placement of GCaMP6 transfection was not carried out in all mice. This was based on our experience with this method, in that the quality and characteristic pattern of activity seen, as well as corresponding LH secretion following an SE, was indicative of successful cannula placement and transfection.  Incorrectly placed cannular failed to show SEs. A trial was done with 3 mice and cannula placement was found to be in the caudal ARC (cARC) with GFP (attached to GCaMP) restricted to the cARC. A statement has been added on lines 306-313 regarding this.

      - Were male mice removed before birth? Please add to the methods section if not included.

      Yes, male mice were removed after a sperm plug was seen and were never present at parturition. We have inserted additional details on line 95 to this effect.

      Reviewer #3 (Recommendations For The Authors):

      (1) Line 172: n=7-8 per group, yet in Supplementary Figure 2, n=6 per group.

      These are referring to different groups of mice. N=7-8 is referring to the group size of mice in Figure 2 that were given mifepristone or vehicle control. In contrast the Supplementary figure 2 n number refers to the mice in the pilot study. Additional n number for the pilot study has been added on line 194.

      (2) Line 314: Extended = suppl; Figure 3 = 1.

      This has been corrected.

      (3) Line 451: Figure 6C, does not exist.

      This has been corrected.

      Line 590: Reference 23 could be replaced by Ordog T et al 1998 Am J Physiol 274,E665 because it is later and more relevant to the topic.

      This reference has been replaced with the suggested reference.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      Wang and colleagues presented an investigation of pig-origin bacteria Bacillus velezensis HBXN2020, for its released genome sequence, in vivo safety issue, probiotic effects in vitro, and protection against Salmonella infection in a murine model. Various techniques and assays are performed; the main results are all descriptive, without new insight advancing the field or a mechanistic understanding of the observed protection.

      Thank you very much for your reading and comments our manuscript.

      Strengths:

      An extensive study on probiotic property of the Bacillus velezensis strain HBXN2020

      Thank you very much for your reading and comments our manuscript.

      Weaknesses:

      The main results are descriptive without mechanistic insight. Additionally, most of the results and analysis parts are separated without a link or a story-telling way to deliver a concise message.

      Thank you for your comments and suggestions on our manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. The manuscript results and analysis sections have been extensively revised. We appreciate your review and feedback.

      Reviewer #2 (Public Review):

      Summary:

      In this study, Wang and colleagues study the potential probiotic effects of Bacillus velezensis. Bacillus species have potential benefit to serve as probiotics due to their ability to form endospores and synthesize secondary metabolites. B. velezensis has been shown to have probiotic effects in plants and animals but data for human use are scarce, particularly with respect to salmonella-induced colitis. In this work, the authors identify a strain of B. velezensis and test it for its ability to control colitis in mice.

      Thanks for the constructive comments and the positive reception of the manuscript.

      Key findings:

      (1) The authors sequence an isolate for B. velezensis - HBXN2020 and describe its genome (roughly 4 mb, 46% GC-content etc).

      Thanks for the constructive comments and the positive reception of the manuscript.

      (2) The authors next describe the growth of this strain in broth culture and survival under acid and temperature stress. The susceptibility of HBXN2020 was tested against various antibiotics and against various pathogenic bacteria. In the case of the latter, the authors set out to determine if HBXN2020 could directly inhibit the growth of pathogenic bacteria. Convincing data, indicating that this is indeed the case, are presented.

      Thanks for the constructive comments and the positive reception of the manuscript.

      (3) To determine the safety profile of BHXN2020 (for possible use as a probiotic), the authors infected the strain in mice and monitored weight, together with cytokine profiles. Infected mice displayed no significant weight loss and expression of inflammatory cytokines remained unchanged. Blood cell profiles of infected mice were consistent with that of uninfected mice. No significant differences in tissues, including the colon were observed.

      Thanks for the constructive comments and the positive reception of the manuscript.

      (4) Next, the authors tested the ability to HBXN2020 to inhibit growth of Salmonella typhimurium (STm) and demonstrate that HBXN2020 inhibits STm in a dose dependent manner. Following this, the authors infect mice with STm to induce colitis and measure the ability of HBXN2020 to control colitis. The first outcome measure was a reduction in STm in faeces. Consistent with this, HBXN2020 reduced STm loads in the ileum, cecum, and colon. Colon length was also affected by HBXN2020 treatment. In addition, treatment with HBXN2020 reduced the appearance colon pathological features associated with colitis, together with a reduction in inflammatory cytokines.

      Thanks for the constructive comments and the positive reception of the manuscript.

      (5) After noting the beneficial (and anti-inflammatory effects) of HBXN2020, the authors set out to investigate effects on microbiota during treatment. Using a variety of algorithms, the authors demonstrate that upon HXBN2020 treatment, microbiota composition is restored to levels akin to that seen in healthy mice.

      Thanks for the constructive comments and the positive reception of the manuscript.

      (6) Finally, the authors assessed the effect of using HBXN2020 as prophylactic treatment for colitis by first treating mice with the spores and then infecting with STm. Their data indicate that treatment with HBXN2020 reduced colitis. A similar beneficial impact was seen with the gut microbiota.

      Thanks for the constructive comments and the positive reception of the manuscript.

      Strengths:

      (1) Good use of in vitro and animal models to demonstrate a beneficial probiotic effect.

      Thank you very much for your reading and comments our manuscript.

      (2) Most observations are supported using multiple approaches.

      Thanks for the comments and the positive reception of the manuscript.

      (3) Mouse experiments are very convincing.

      Thanks for the comments and the positive reception of the manuscript.

      Weaknesses:

      (1) Whilst a beneficial effect is observed, there no investigation of the mechanism that underpins this.

      Thank you for pointing this out. We apologize for any inconvenience caused by the lack of mechanism research of the manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. Thank you for your suggestions, and we hope our response has addressed your concerns.

      (2) Mouse experiments would have benefited from the use of standard anti-inflammatory therapies to control colitis. That way the authors could compare their approach of using bacillus spores that current gold standard for treatment.

      We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. Based on your suggestion, we have supplemented this in the revised manuscript. We appreciate your review and feedback, and have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 198-378 in results section of the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Wang et al. investigates the effects of B. velezensis HBXN2020 in alleviating S. Typhimurium-induced mouse colitis. The results showed that B. velezensis HBXN2020 could alleviate bacterial colitis by enhancing intestinal homeostasis (decreasing harmful bacteria and enhancing the abundance of Lactobacillus and Akkermansia) and gut barrier integrity and reducing inflammation.

      Thanks for the comments and the positive reception of the manuscript.

      Strengths:

      B. velezensis HBXN2020 is a novel species of Bacillus that can produce a great variety of secondary metabolites and exhibit high antibacterial activity against several pathogens. B. velezensis HBXN2020 is able to form endospores and has strong anti-stress capabilities. B. velezensis HBXN2020 has a synergistic effect with other beneficial microorganisms, which can improve intestinal homeostasis.

      Thanks for the comments and the positive reception of the manuscript.

      Weaknesses:

      Few studies about the clinical application of Bacillus velezensis. Thus, more studies are still needed to explore the effectiveness of Bacillus velezensis before clinical application.

      Thanks for your suggestion. This study serves as an exploratory investigation before the application of Bacillus velezensis. The main purpose of this study is to explore the potential of Bacillus velezensis in application. We appreciate your review and feedback and hope that our response adequately addresses your concerns.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Most of my previous comments are well addressed, here are a few examples.<br /> While in my last comment, I requested a Colitis Mouse Model, which will well resemble the diarrhea disease caused by Salmonella in mammals. The available statement is not convincing, please check https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2225501/, https://pubs.rsc.org/en/content/articlelanding/2020/fo/d0fo01017k please replace "colitis" to a normal infection model. The current statement is incorrect.

      Thank you for your valuable suggestion. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 2, 29, 38, 46, 48, 199, 204, 246, 248, 282, 307, 310, 316, 431, 433, 464, 466, 473, 494, 497, 499, 504, 513, 518, 525, 706, 710 and 735 in the revised manuscript.

      Certain parts remain to be overestimated, to my knowledge, the language and logical flow should be addressed thoroughly.

      Here are suggestions to improve the logical flow of the manuscript.

      (1) Probiotic sampling and isolation

      (2) in vitro assessment

      (3) genomic sequencing and in silico safety assessment (Crit Rev Food Sci Nutr. 2023;63(32):11244-11262), which should be included as a right ref.

      (4) in vivo assay for safety evaluation, but not biosafety (it has a different meaning!!)

      (5) infection model and protection assay.

      We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. According to your suggestion, we do our best to correct those problems in the revised manuscript. We would like to express our apologies once again and hope that the revised manuscript meets your expectations. We have marked the updated contents in the revised manuscript.

      Also, please pay attention to the logical link or transition sentences between each part to connect the dots in each part.

      We gratefully appreciate for your valuable comments. The comments improve the quality of manuscript. According to your suggestion, we have corrected this in the revised manuscript. We have marked the updated contents in the revised manuscript. 

      Finally, there are also lots of typos and errors, please improve through the text.<br /> For example, Line 521. "Stain", and more...

      Thanks for pointing this out. Based on your suggestion, we have corrected in the revised manuscript. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 753, 1055, 1087 in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The revised manuscript by Wang and colleagues attempts to address concerns raised during the first round of review.

      All minor comments have been addressed and in general, the major concerns have been partially addressed in the revised manuscript.

      The outstanding concerns relate to the mechanistic basis of the observations. The authors made no attempt to address this in a meaningful manner. Secondly, the issue of comparing the responses to what would be standard therapy (such as anti-inflammatories) was also handled in a somewhat dismissive manner, referring to other ongoing/future work. The clinical utility of the findings are hard to ascertain if there is no comparison to the current gold standard therapeutic approach.

      I have no further suggestions for the authors, save for those previously made.

      Thank you for pointing this out. We apologize for any inconvenience caused by the lack of mechanism research of the manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. Thank you for your suggestions, and we hope our response has addressed your concerns.

      Secondly, About the comparative trial of oral bacillus spore treatment with the current gold standard for treatment, we have supplemented this in the revised manuscript. We appreciate your review and feedback, and have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 198-378 in results section of the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      This is a revision, they have addressed all my concerns, and now it is acceptable.

      Thank you very much for your comments and recognition of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      In this manuscript, Zhou et al describe a deaminase and reader protein-assisted RNA m5C sequencing method. The general strategy is similar to DART-seq for m6A sequencing, but the difference is that in DART-seq, m6A sites are always followed by C which can be deaminated by fused APOBEC1 to provide a high resolution of m6A sites, while in the case of m5C, no such obvious conserved motifs for m5C sites exist, therefore, the detection resolution is much lower. In addition, the authors used two known m5C binding proteins ALYREF and YBX1 to guide the fused deaminases, but it is not clear whether these two binding proteins can bind most m5C sites and compete with other m5C binding proteins.

      Thank you for your kind suggestion. RNA affinity chromatography and mass spectrometry analyses using biotin-labelled oligonucleotides with or without m5C were performed in previous reports (doi:10.1038/cr.2017.55 and doi: 10.1038/s41556-019-0361-y), and the results showed that ALYREF and YBX1 had a more prominent binding ability to m5C -modified oligonucleotides. Moreover, these two m5C -binding proteins are also responsible for mRNA m5C binding, so we chose to use their ability to bind targeted m5C to construct a DRAM detection system in anticipation of transcriptome-wide m5C detection. We hope to propose a suitable detection strategy for RNA m5C, and there will certainly be room for optimization of the DRAM system in the future with more in-depth studies of m5C binding proteins. We have discussed the above issue in lines 75-82 and 315-318.

      It is well known that two highly modified m5C sites exist in 28S RNA and many m5C sites exist in tRNA, the authors should validate their methods first by detecting these known m5C sites and evaluate the possible false positives in rRNA and tRNA.

      Thank you for your kind suggestion. We attempted PCR amplification of sequences flanking m5C sites 3782 and 4447 on 28S rRNA, as well as multiple m5C sites on tRNA, including m5C48 and m5C49 on tRNAVal, m5C48 and m5C49 on tRNAAsp, and m5C48 on tRNALys.

      However, Sanger sequencing revealed no valid mutations, which was implemented in Figure S3. We believe this outcome indicates that the DRAM system is more suited for transcriptome-wide m5C detection of mRNAs. This is supported by current reports that ALYREF and YBX1 are responsible for the m5C-binding proteins of mRNAs (doi:10.1038/cr.2017.55 and doi: 10.1038/s41556-019-0361-y). The above results and descriptions were added to lines 136-143.

      In mRNA, it is not clear what is the overlap between the technical replicates. In Figures 4A and 4C, they detected more than 10K m5C sites, and most of them did not overlap with sites uncovered by other methods. These numbers are much larger than expected and possibly most of them are false positives.

      Thank you for your kind suggestion. We observed significant overlap between the technical repeats by comparing the data across biological repeats, as shown in Figure S4C and described in lines 174-175. We considered m5C modification in a region only when editing events were detected in at least two biological replicates, ensuring a high-stringency screening process (details seen in the revised method in lines 448-455 and Figure 3F). With more in-depth research into m5C readers, we aim to achieve more accurate detection in the future.

      Besides, it is not clear what is the detection sensitivity and accuracy since the method is neither single base resolution nor quantitative.

      Thank you for your suggestion. As shown in Figure 3G, we found that the editing window of the DRAM system exhibited enrichment of approximately 20 bp upstream and downstream of the m5C site. Previous reports identified Type I m5C sites, which tend to have a downstream "NGGG" motif, and Type II m5C sites, which often contain a downstream "UCCA" motif. However, these m5C motifs do not fully characterize all m5C sites, and their presence downstream of an m5C site is not guaranteed (doi: 10.1038/s41594-019-0218-x). This limitation complicates single-base resolution analysis by the DRAM system. Nevertheless, we believe that with further exploration of m5C sequence features, precise single-base resolution detection can be achieved in the future. This point is also discussed in lines 314-322.

      Regarding the quantitative level of the assay, we conducted additional experiments by progressively reducing the expression levels of the fusion proteins. Sanger sequencing revealed that the editing efficiency of A-to-G and C-to-U within the m5C region significantly decreased as fusion protein expression diminished (Figure S9). These findings suggest that the DRAM system's transfection efficiency is concentration-dependent and that the ratio of editing efficiency to transfection efficiency could aid in the quantitative analysis of m5C using the DRAM system. The relative results were supplemented in Figure S9 and discussed in lines 263-271.

      There are no experiments to show that the detected m5C sites are responsive to the writer proteins such as NSUN2 and NSUN6, and the determination of the motifs of these writer proteins.

      Thank you for your kind suggestion. We have performed a motif enrichment analysis based on the sequences spanning 10 nt upstream and downstream of DRAM-editing sites. The relative results of this analysis were supplemented in Figure S4D and lines 168-171. Unfortunately, we did not identify any clear sequence preferences for the m5C sites catalyzed by the methyltransferases NSUN2 and NSUN6, which have previously been associated with “G”-rich sequences and the “CUCCA” motif. This limitation is mainly due to the DRAM detection system’s inability to achieve single-base resolution for m5C detection, which is also explained in the above response.

      Reviewer #2:

      (1) The use of two m5C reader proteins is likely a reason for the high number of edits introduced by the DRAM-Seq method. Both ALYREF and YBX1 are ubiquitous proteins with multiple roles in RNA metabolism including splicing and mRNA export. It is reasonable to assume that both ALYREF and YBX1 bind to many mRNAs that do not contain m5C.

      To substantiate the author's claim that ALYREF or YBX1 binds m5C-modified RNAs to an extent that would allow distinguishing its binding to non-modified RNAs from binding to m5C-modified RNAs, it would be recommended to provide data on the affinity of these, supposedly proven, m5C readers to non-modified versus m5C-modified RNAs. To do so, this reviewer suggests performing experiments as described in Slama et al., 2020 (doi: 10.1016/j.ymeth.2018.10.020). However, using dot blots like in so many published studies to show modification of a specific antibody or protein binding, is insufficient as an argument because no antibody, nor protein, encounters nanograms to micrograms of a specific RNA identity in a cell. This issue remains a major caveat in all studies using so-called RNA modification reader proteins as bait for detecting RNA modifications in epitranscriptomics research. It becomes a pertinent problem if used as a platform for base editing similar to the work presented in this manuscript.

      We thank the reviewer for the valuable suggestion. Previous studies have shown that while ALYREF and YBX1 can bind mRNAs without the m5C modification, their binding affinity for m5C-modified oligonucleotides is significantly higher than for unmethylated controls. This has been demonstrated through experiments such as in vitro tractography, electrophoretic mobility shift assay (EMSA) (doi:10.1038/cr.2017.55), and UHPLC-MRM-MS/MS. Additionally, isothermal titration calorimetry measurements and PAR-CLIP experiments have shown that mutations in the key amino acids responsible for m5C binding in ALYREF and YBX1 result in a significant reduction in their ability to m5C (doi: 10.1038/s41556-019-0361-y).

      Although Me-RIP analysis was unsuccessful in our laboratory, likely due to the poor specificity of the m5C antibody, we alternatively performed RNA pulldown experiments. These experiments verified that the ability of DRAMmut-expressing proteins to bind RNA with m5C modification was virtually absent compared to DRAM-expressing proteins, while their binding ability with non-modified RNA was not significantly affected. The relative RNA pulldown results were supplemented in Figure S1E, S1F and lines 110-111. Therefore, we believe that by integrating DRAMmut group, our DRAM system could effectively exclude the false-positive mutations caused by unspecific binding of DRAM’s reader protein to non-m5C-modified mRNAs.

      (2) Since the authors use a system that results in transient overexpression of base editor fusion proteins, they might introduce advantageous binding of these proteins to RNAs. It is unclear, which promotor is driving construct expression but it stands to reason that part of the data is based on artifacts caused by overexpression. Could the authors attempt testing whether manipulating expression levels of these fusion proteins results in different editing levels at the same RNA substrate?

      Thank you for pointing this out. To investigate how different expression levels of these proteins influence A-to-G and C-to-U editing within the same m5C region, we conducted a gradient transfection using plasmid concentrations of 1500 ng, 750 ng and 300 ng. This approach allowed us to progressively reduce the expression levels of the fusion proteins. Sanger sequencing revealed that the editing efficiency of A-to-G and C-to-U within the m5C region significantly decreased as fusion protein expression diminished. These findings suggest that the transfection efficiency of the DRAM system is concentration-dependent and that the ratio of editing efficiency to transfection efficiency may assist in the quantitative analysis of m5C using the DRAM system. The relative results and hypotheses were added and discussed in Figure S9 and lines 263-271 of the revised manuscript.

      (3) Using sodium arsenite treatment of cells as a means to change the m5C status of transcripts through the downregulation of the two major m5C writer proteins NSUN2 and NSUN6 is problematic and the conclusions from these experiments are not warranted. Sodium arsenite is a chemical that poisons every protein containing thiol groups. Not only do NSUN proteins contain cysteines but also the base editor fusion proteins. Arsenite will inactivate these proteins, hence the editing frequency will drop, as observed in the experiments shown in Figure 5, which the authors explain with fewer m5C sites to be detected by the fusion proteins.

      Thank you for pointing this out. We used bisulfite sequencing PCR to determine that the m5C levels in RPSA and AP5Z1 were significantly reduced after sodium arsenite treatment. This was followed by a significant decrease in editing frequency detected by the DRAM system in sodium arsenite-treated samples compared to untreated samples. This reduction aligns with the decreased editing efficiency observed in methyltransferase-deficient cells (as shown in Figures 2G and 2H), which initially convinced us that these results reflected the DRAM system's ability to monitor dynamic changes in m5C levels.

      However, as the reviewer pointed out, sodium arsenite treatment could potentially inactivate the fusion proteins, leading to the observed reduction in editing efficiency. This possibility has not been conclusively ruled out in our current experiments. Optimizing this validation may require the future development of more specific m5C inhibitors. In light of this, we have revised our previous results and conclusions in lines 235-244, and discussed these points in lines 308-315.

      (4) The authors should move high-confidence editing site data contained in Supplementary Tables 2 and 3 into one of the main Figures to substantiate what is discussed in Figure 4A. However, the data needs to be visualized in another way than an Excel format. Furthermore, Supplementary Table 2 does not contain a description of the columns, while Supplementary Table 3 contains a single row with letters and numbers.

      Thank you for your kind suggestion. We have visualized the data from Supplementary Tables 2 and 3 into Figure 3F, presenting it as a screening flowchart for high-confidence editing sites. In Supplementary Table 3, we have displayed only the DRAM-mutated genes, which is why it contains a single row with letters and numbers. As requested, we have included descriptions of each column and reorganized the Supplementary table 2 and 3 accordingly.

      (5) The authors state that "plotting the distribution of DRAM-seq editing sites in mRNA segments (5'UTR, CDS, and 3'UTR) highlighted a significant enrichment near the initiation codon (Figure 3F).", which is not true when this reviewer looks at Figure 3F.

      Thank you for your kind suggestion, and we replaced the expression of " near the initiation codon" with "in the CDS" in lines 192-193.

      (6) The authors state that "In contrast, cells expressing the deaminase exhibited a distinct distribution pattern of editing sites, characterized by a prevalence throughout the 5'UTR.", which is not true when this reviewer looks at Figure 3F.

      Thank you for your kind suggestion. This distribution was actually characterized by a prevalence throughout the "3'UTR", but not "5'UTR". We have also made the necessary changes in lines 193-195.

      (7) The authors claim in the final conclusion: "In summary, we developed a novel deaminase and reader protein assisted RNA m5C methylation approach...", which is not what the method entails. The authors deaminate As or Us close to 5mC sites based on the binding of a deaminase-containing protein.

      Thank you for your kind suggestion, and we have made the necessary changes in lines 331-334.

      (8) The authors claim that "The data supporting the findings of this study are available within the article and its Supplementary Information." However, no single accession number for the deposited sequencing data can be found in the text or the supplementary data. Without the primary data, none of the claims can be verified.

      Thank you for pointing this out. The sequencing data from this study has already been deposited to the GEO database (GEO assession number: GSE254194, GEO token:ororioukbdqtpcn), and we will ensure it is made publicly available in a timely manner.

      (a) To underscore point (1), a recent publication (https://doi.org/10.1038/s41419-023-05661-y) reported: "To further identify the potential mRNAs regulated by ALYREF, we performed RNA-seq analysis in control or ALYREF knockdown T24 cells. After knockdown of ALYREF, 143 mRNAs differentially expressed, including 94 downregulated mRNAs (NC reads >100, |Fold change | >1.5, P-value <0.05). Functional enrichment analysis using Kyoto Encyclopedia of Genes and Genomes (KEGG) indicated that regulated mRNAs by ALYREF are chiefly enriched in canonical cancer-related pathways (Fig. S4A), including TGF-β signaling, MAPK signaling, and NF-κB signaling, strongly supporting the oncogenic function of ALYREF in tumor progression. Among these 94 downregulated genes, 11 mRNA showed a significant reduction in m5C methylation after NUSN2 silencing in T24 cells, combined with previously transcriptome-wide RNA-BisSeq data of T24 cells [21] (Fig. 4A)."

      These results translate into 94 mRNAs are regulated by ALYREF in bladder cancer-derived cells. From those, very few (11) mRNA identities respond to NSUN2-dependent RNA methylation mediated by ALYREF binding.The question then arises, is that number sufficient to claim that ALYREF is a m5C-binding protein?

      And if so, how does the identification of 10.000+ edits by DRAM-Seq compare with the 94 mRNAs that are regulated by ALYREF? Were these 94 mRNAs identified by DRAM-Seq.

      Thank you for your kind suggestion. Previous reports by Yang et al. ( doi: 10.1038/cr.2017.55), including the literature you refer to, have detailed the close relationship between ALYREF and m5C modification, and the ALY/REF export factor (ALYREF) was identified as the first nuclear m5C reader, and it was demonstrated that many mRNAs are regulated by ALYREF, and is therefore considered to be an m5C-binding protein.

      As required, by comparing the DRAM-edited mRNAs with the reported 94 mRNAs, we found that only 55.32% of the 94 mRNAs regulated by ALYREF could be detected by the DRAM system. This indicates that the DRAM system specifically targets certain mRNAs, as illustrated in Figure S4E. The relevant results were described and discussed in lines 175-179.

      (b) Line 123:

      "The deep sequencing results showed that the deamination rates of RPSA and SZRD1 were 75.5% and 27.25%, respectively. (Fig. 2A, B)."

      The Figure shows exactly the opposite of bisulfite-mediated deamination. These are the cytosines that were not deaminated by the chemical treatment and therefore can be sequenced as cytosines and not thymidines. Hence, the term deamination rate is wrong.

      Thank you for your kind suggestion. We have made the necessary change in lines 129-130 to change the deamination rates to m⁵C fraction.

      (c) Line 157:

      "DRAM-seq analysis further confirmed that DRAM was detected in an m5C-dependent manner, with minimal mutations in AP5Z1 and RPSA mRNAs in methyltransferase knockout cells compared to wild-type cells (Fig. 3C, D)."

      There is no indication of what the authors mean by minimal mutation in these Figures. The term "minimal mutation" should be reconsidered as well.

      Thank you for your kind suggestion. We intended to express that "Mutations in AP5Z1 and RPSA mRNA are reduced in methyltransferase-deficient cells." There was an issue with the initial formulation, and we have made the necessary changes in lines 165-167.

      (d) Line 167:

      "To further delineate the characteristics of the DRAM-seq data, we compared the distribution of DRAM-seq editing sites within the gene structure, specifically examining their occurrences in the 5'untranslated region (5'UTR), 3' untranslated region (3'UTR), CDS and ncRNA."

      Which part of a coding RNA is meant by "ncRNA"?

      Thank you for pointing this out. This was actually the Intergenic or Intron region, but not ncRNA. We have also corrected this labelling in Figure 3G and lines 186-189 of the revised manuscript.

      (e) Line 189:

      "Subsequently, we assessed the capacity of DRAM-seq to detect m5C on a transcriptome-wide scale, comparing its performance to BS-seq that have been previously reported with great authority."

      The term "great authority" is not a scientific term. Please, remove adulation to senior authors.

      Thank you for your kind suggestion. We removed this unsuitable expression and made the necessary changes in lines 207-208.

      (f) Line 233:

      "Several experiments have highlighted the requirement of 100-500 ng of RNA for m5C-RIP-seq, while BS-seq necessitates an even more demanding 500-750 μg of RNA21,25,61."

      This reviewer doubts that RNA bisulfite sequencing required half to one mg of RNA input. Please, check these references.

      Thank you for your kind suggestion. According to the references, we corrected μg to ng and made the necessary changes in lines 251-252.

      (g) Line 247:

      "Several experiments have highlighted the requirement of 100-500 ng of RNA for m5C-RIP-seq, while BS-seq necessitates an even more demanding 500-750 μg of RNA21,25,61."

      This reviewer doubts that RNA bisulfite sequencing requires half to one mg of RNA input. Please, check these references.

      Thank you for your kind suggestion. According to the references, we corrected μg to ng and made the necessary changes in lines 251-252.

      (h) Line 292:

      "Since m5C lacks a fixed motif, DRAM has an apparent limitation in achieving single-base resolution for detecting m5C."

      m5C deposition by NSUN2 and NSUN6 occurs in particular motifs that were coined Type I and II motifs. Hence, this statement is not correct.

      Thank you for your kind suggestion. Previous reports identified Type I m5C sites, which tend to have a downstream "NGGG" motif, and Type II m5C sites, which often contain a downstream "UCCA" motif. However, these m5C motifs do not fully characterize all m5C sites, and their presence downstream of an m5C site is not guaranteed (doi: 10.1038/s41594-019-0218-x ). Therefore, we have corrected the expression “fixed motif” to “fixed base composition for characterizing all m5C modification sites” in lines 317.

      (i) Line 390:

      "1 μl of total cellular RNA was used for sequencing library gene..."

      1 uL does not allow us to deduce which RNA mass was used for cDNA synthesis.

      Thank you for your kind suggestion. According to our cDNA synthesis protocol, we corrected “1μl” to “1μg” in lines 422-423.

      (j) Line 405:

      "...was assessed on the Agilent 5400 system (Agilent, USA) and quantified by QPCR (1.5 nM)"

      What does the 1.5 nM refer to in this sentence?

      Thank you for your kind suggestion. Here, "1.5nM" means that the concentration of the constructed library should be no less than 1.5nM. We have also revised this expression in the methods in lines 436-438.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use analysis of existing data, mathematical modelling, and new experiments, to explore the relationship between protein expression noise, translation efficiency, and transcriptional bursting.

      Strengths:

      The analysis of the old data and the new data presented is interesting and mostly convincing.

      Thank you for the constructive suggestions and comments. We address the individual comments below.

      Weaknesses:

      (1) My main concern is the analysis presented in Figure 4. This is the core of mechanistic analysis that suggests ribosomal demand can explain the observed phenomenon. I am both confused by the assumptions used here and the details of the mathematical modelling used in this section. Firstly, the authors' assumption that the fluctuations of a single gene mRNA levels will significantly affect ribosome demand is puzzling. On average the total level of mRNA across all genes would stay very constant and therefore there are no big fluctuations in the ribosome demand due to the burstiness of transcription of individual genes. Secondly, the analysis uses 19 mathematical functions that are in Table S1, but there are not really enough details for me to understand how this is used, are these included in a TASEP simulation? In what way are mRNA-prev and mRNA-curr used? What is the mechanistic meaning of different terms and exponents? As the authors use this analysis to argue ribosomal demand is at play, I would like this section to be very much clarified.

      Thank you for raising two important points. Regarding the first point, we agree that the overall ribosome demand in a cell will remain more or less the same even with fluctuations in mRNA levels of a few genes. However, what we refer to in the manuscript is the demand for ribosomes for translating mRNA molecules of a single gene. This demand will vary with the changes in the number of the mRNA molecules of that gene. When the mRNA copy number of the gene is low, the number of ribosomes required for translation is low. At a subsequent timepoint when the mRNA number of the same gene goes up rapidly due to transcriptional bursting, the number of ribosomes required would also increase rapidly. The process of allocation of ribosomes for translation of these mRNA molecules will vary between cells, and this process can lead to increased expression variation of that gene among cells.

      Regarding the second point, each of the 19 mathematical functions was individually tested in the TASEP model and stochastic simulation. The parameters ‘mRNA-curr’ and ‘mRNA-prev’ are the mRNA copy numbers at the current time point and the previous time point in the stochastic simulation, respectively. These numbers were calculated from the rate of production of mRNA, which is influenced by the burst frequency and the burst size, as well as the rate of mRNA removal. We would expand this section with explanation for all parameters and terms in the revised manuscript.

      (2) Overall, the paper is very long and as there are analytical expressions for protein noise (e.g. see Paulsson Nature 2004), some of these results do not need to rely on Gillespie simulations. Protein CV (noise) can be written as three terms representing protein noise contribution, mRNA expression contribution, and bursty transcription contribution. For example, the results in panel 1 are fully consistent with the parameter regime, protein noise is negligible compared to transcriptional noise.

      Thank you for referring to the paper on analytical expressions for protein noise. We introduced translational bursting and ribosome demand in our model, and these are linked to stochastic fluctuations in mRNA and ribosome numbers. In addition, our model couples transcriptional bursting with translational bursting and ribosome demand. Since these processes are all stochastic in nature, we felt that the stochastic simulation would be able to better capture the fluctuations in mRNA and protein expression levels originating from these processes. For consistency, we used stochastic simulations throughout even when the coupling between transcription and translation were not considered.

      Reviewer #2 (Public review):

      This work by Pal et al. studied the relationship between protein expression noise and translational efficiency. They proposed a model based on ribosome demand to explain the positive correlation between them, which is new as far as I realize. Nevertheless, I found the evidence of the main idea that it is the ribosome demand generating this correlation is weak. Below are my major and minor comments.

      Thank you for your helpful suggestions and comments. We note that the direct experimental support required for the ribosome demand model would need experimental setups that are beyond the currently available methodologies. We address the individual comments below.

      Major comments:

      (1) Besides a hypothetical numerical model, I did not find any direct experimental evidence supporting the ribosome demand model. Therefore, I think the main conclusions of this work are a bit overstated.

      Direct experimental evidence of the hypothesis would require generation of ribosome occupancy maps of mRNA molecules at the level of single cells and at time intervals that closely match the burst frequency of the genes. This is beyond the currently available methodologies. However, there are other evidences that support our model. For example, earlier work in cell-free systems have showed that constraining cellular resources required for transcription or translation can increase expression heterogeneity (Caveney et al., 2017). In addition, genome-wide analysis of expression noise in yeast also revealed that the association between protein noise and translational efficiency was highest in the group of genes with the most bursty transcription (Supplementary fig. S20).

      (2) I found that the enhancement of protein noise due to high translational efficiency is quite mild, as shown in Figure 6A-B, which makes the biological significance of this effect unclear.

      Although we agree with the reviewer’s comment that the effect of translational efficiency on protein noise may not be as substantial as the effect of transcriptional bursting, it has been observed in studies across bacteria, yeast and Arabidopsis (Ozbudak et al., 2003; Blake et al., 2003; Wu et al., 2022). In addition, the relationship between translational efficiency and protein noise is in contrast with the inverse relationship observed between mean expression and noise (Newman et al., 2006; Silander et al., 2012). We also note that the goal of the manuscript was not to evaluate the strength of the association, but to understand the basis of the influence of translational efficiency on protein noise.

      (3) The captions for most of the figures are short and do not provide much explanation, making the figures difficult to read.

      We will revise the figure captions to include more details as per the reviewer’s suggestion.

      (4) It would be helpful if the authors could define the meanings of noise (e.g., coefficient of variation?) and translational efficiency in the very beginning to avoid any confusion. It is also unclear to me whether the noise from the experimental data is defined according to protein numbers or concentrations, which is presumably important since budding yeasts are growing cells.

      For all published datasets where we had measurements from a large number of genes/promoters, we used the measures of adjusted noise (for mRNA noise) and Distance-to-median (DM, for protein noise). For experiments that we performed on a limited number of promoters, we used the measure of coefficient of variation (CV) to quantify noise, as calculation of adjusted noise or DM was not possible. Translational efficiency refers to translation rate which is determined by both the translation initiation rate and the translation elongation rate. The noise at the protein level was quantified from the signal intensity of GFP tagged proteins, which was proportional to protein numbers without considering cell volume. For quantification of noise at the mRNA level, single-cell RNA-seq data was used, which provided mRNA numbers in individual cells.

      (5) The conclusions from Figures 1D and 1E are not new. For example, the constant protein noise as a function of mean protein expression is a known result of the two-state model of gene expression, e.g., see Equation (4) in Paulsson, Physics of Life Reviews 2005.

      Yes, they are not new, but we included these results for setting the baseline for comparison with simulation results that appear in the later part of the manuscript where we included translational bursting and ribosome demand in our models.

      (6) In Figure 4C-D, it is unclear to me how the authors changed the mean protein expression if the translation initiation rate is a function of variation in mRNA number and other random variables.

      The translation initiation rate varied from a baseline initiation rate depending on the mRNA numbers and other variables. We changed the baseline initiation rate to alter the mean protein expression levels. We will elaborate this section in the revised manuscript.

      (7) If I understand correctly, the authors somehow changed the translation initiation rate to change the mean protein expression in Figures 4C-D. However, the authors changed the protein sequences in the experimental data of Figure 6. I am not sure if the comparison between simulations and experimental data is appropriate.

      It is an important observation. Even though we changed the translation initiation rate to change the mean expression (Fig. 4C-D), we noted in the description in the model (Fig. 3D) that the changes in the translation initiation rate was also linked with changes in the translation elongation rate. The translation initiation rate can only increase if the ribosomes already bound to the mRNA traverse quicker through the mRNA. This means that an increase in the translation initiation rate will occur only if the translation elongation rate is also increased, which will lead to lower traversal time of the ribosomes through the mRNA (Fig. 3D). Similarly, an increase in the translation elongation rate will allow more ribosomes to initiate translation. Thus, the parameters translation initiation rate and translation elongation rate are interconnected. This has also been observed in an experimental study by Barrington et al. (2023). Having said that, however, the models can also be expressed in terms of the translation elongation rate, instead of the translation initiation rate, and this modification will not change the results of the simulations due to interconnectedness of the initiation rate and the elongation rate.  

      References

      C. L. Barrington, G. Galindo, A. L. Koch, E. R. Horton, E. J. Morrison, S. Tisa, T. J. Stasevich, O. S. Rissland. Synonymous codon usage regulates translation initiation. Cell Rep. 42, 113413 (2023).

      W. J. Blake, M. Kaern, C. R. Cantor, J. J. Collins, Noise in eukaryotic gene expression. Nature 422, 633-637 (2003).

      P. M. Caveney, S. E. Norred, C. W. Chin, J. B. Boreyko, B. S. Razooky, S. T. Retterer, C. P. Collier, M. L. Simpson, Resource Sharing Controls Gene Expression Bursting. ACS Synth Biol. 6, 334-343 (2017)

      J. R. Newman, S. Ghaemmaghami, J. Ihmels, D. K. Breslow, M. Noble, J. L. DeRisi, J. S. Weissman, Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature, 441, 840-846 (2006).

      E. M. Ozbudak, M. Thattai, I. Kurtser, A. D. Grossman, A. van Oudenaarden, Regulation of noise in the expression of a single gene. Nat Genet. 31, 69-73 (2002).

      O. K. Silander, N. Nikolic, A. Zaslaver, A. Bren, I. Kikoin, U. Alon, M. Ackermann, A genome-wide analysis of promoter-mediated phenotypic noise in Escherichia coli. PLoS Genet. 8, e1002443 (2012).

      H. W. Wu, E. Fajiculay, J. F. Wu, C. S. Yan, C. P. Hsu, S. H. Wu, Noise reduction by upstream open reading frames. Nat Plants. 8, 474-480 (2022).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review)

      Summary:

      Numerous mechanism and structural studies reported the cooperative role of Oct4 and Sox2 during the establishment of pluripotency during reprogramming. Due to the difficulty in sample collection and RNA-seq with low-number cells, the precise mechanisms remain in early embryos. This manuscript reported the role of OCT4 and SOX2 in mouse early embryos using knockout models with low-input ATAC-seq and RNA-seq. Compared to the control, chromatin accessibility and transcriptome were affected when Oct4 and Sox2 were deleted in early ICM. Specifically, decreased ATAC-seq peaks showed enrichment of Motifs of TF such as OCT, SOX, and OCT-SOX, indicating their importance during early development. Moreover, by deep analysis of ATAC-seq and RNA-seq data, they found Oct4 and Sox2 target enhancer to activate their downstream genes. In addition, they also uncovered the role of OS during development from the morula to ICM, which provided the scientific community with a more comprehensive understanding.

      Strengths:

      On the whole, the manuscript is innovative, and the conclusions of this paper are mostly well supported by data, however, there are some issues that need to be addressed.

      Weaknesses:

      Major Points:

      (1) In Figure 1, a more detailed description of the knockout strategy should be provided to clarify itself. The knockout strategy in Fig1 is somewhat obscure, such as how is OCT4 inactivated in Oct4mKO2 heterozygotes. As shown in Figure 1, the exon of OCT4 is not deleted, and its promoter is not destroyed. Therefore, how does OCT4 inactivate to form heterozygotes?

      Thank you for your kind suggestions. We will add a detailed description of the knockout strategy in the legends for Figure 1A and 1B, as shown below:

      Figure 1A. Schemes of mKO2-labeled Oct4 KO (Oct4mKO2) and Oct4 flox alleles. In the Oct4mKO2 allele, a PGK-pac∆tk-P2A-mKO2-pA cassette was inserted 3.6 kb upstream of the Oct4 transcription start site (TSS) and a promoter-less FRT-SA-IRES-hph-P2A-Venus-pA cassette was inserted into Oct4 intron 1. The inclusion of a stop codon followed by three sets of polyadenylation signal sequences (pA) after the Venus cassette ensures both transcriptional and translational termination, effectively blocking the expression of Oct4 exons 2–5.

      Figure 1B. Schemes of EGFP-labeled Sox2 KO (Sox2EGFP) and Sox2 flox alleles. In the Sox2EGFP allele, the 5’ untranslated region (UTR), coding sequence and a portion of the 3’ UTR of Sox2 were deleted and replaced with a PGK-EGFP-pA cassette. Notably, 1,023 bp of the Sox2 3’UTR remaine intact.

      (2) Is ZP 3-Cre expressed in the zygotes? Is there any residual protein?

      Thank you for the question. While we have not directly tested for ZP3-Cre expression in zygotes, the published transcriptome and proteomics data shows that ZP3 is present at both the transcriptional and protein levels in wild-type zygotes (Deng et al., Science, 2014; Gao et al., Cell Reports, 2017). This suggests that ZP3-Cre could potentially be expressed in zygotes as well.

      (3) What motifs are enriched in the rising ATAC-seq peaks after knocking out of OCT4 and SOX2?

      Thank you for the question. The enriched motifs in the rising ATAC-seq peaking in Oct4 KO and Sox2 KO ICMs are the GATA, TEAD, EOMES and KLF motifs, as shown in Figure 4A and Figure supplement 7.

      (4) The ordinate of Fig4c is lost.

      Thank you for the question. The y-axis is average normalized signals (reads per million-normalized pileup signals). We will add it in the revised version.

      (5) Signals of H3K4me1, H3K27ac, and so on are usually used to define enhancers, and the loci of enhancers vary greatly in different cells. In the manuscript, the authors defined ATAC-seq peaks far from the TSS as enhancers. The definition in this manuscript is not strictly an enhancer.

      Thank you for this insightful comment. We will search for and analyze published omics data on H3K4me1 and H3K27ac in early embryos or mouse embryonic stem cells to conduct this analysis.

      (6) If Oct4 and Sox2 truly activate sap 30 and Uhrf 1, what effect does interfering with both genes have on gene expression and chromatin accessibility?

      Thank you for the interesting question. Unfortunately, we have not conducted this specific experiment, so we do not have direct results. However, Sap30 is a key component of the mSin3A corepressor complex, while Uhrf1 regulates the establishment and maintenance of DNA methylation. Both proteins are known to function as repressors. Therefore, we hypothesize that interfering with these two genes could alleviate repression of some genes, such as trophectoderm markers, similar to what we have observed in Oct4 KO and Sox2 KO ICMs.

      Reviewer #2 (Public review):

      In this manuscript, Hou et al. investigate the interplay between OCT4 and SOX2 in driving the pluripotent state during early embryonic lineage development. Using knockout (KO) embryos, the authors specifically analyze the transcriptome and chromatin state within the ICM-to-EPI developmental trajectory. They emphasize the critical role of OCT4 and the supportive function of SOX2, along with other factors, in promoting embryonic fate. Although the paper presents high-quality data, several key claims are not well-supported, and direct evidence is generally lacking.

      Major Points:

      (1) Although the authors claim that both maternal KO and maternal KO/zygotic hetero KO mice develop normally, the molecular changes in these groups appear overestimated. A wildtype control is recommended for a more robust comparison.

      Thank you for your valuable feedback. However, I’m unclear on what is meant by “the molecular changes in these groups appear overestimated.” Could the reviewer kindly provide more details or clarify which specific aspects of the molecular changes they are referring to? This would help us better address the concern.

      (2) The authors assert that OCT4 and SOX2 activate the pluripotent network via the OCT-SOX enhancer. However, the definition of this enhancer is based solely on proximity to TSSs, which is a rough approximation. Canonical enhancers are typically located in intronic and intergenic regions and marked by H3K4me1 or H3K27ac. Re-analyzing enhancer regions with these standards could be beneficial. Additionally, the definitions of "close to" or "near" in lines 183-184 are unclear and not defined in the legends or methods.

      Thank you for this insightful comment. We will search for and analyze published omics data on H3K4me1 and H3K27ac in early embryos or mouse embryonic stem cells to address the concern of “enhancer”.

      The definition of "close to" or "near" in lines 183-184 is in the legend of Figure 2E and methods. In the GSEA analysis, Ensembl protein-coding genes with TSSs located within 10 kb of ATAC-seq peak centers were included.

      (3) There is no evidence that the decreased peaks/enhancers could be the direct targets of Oct4 and Sox2 throughout this manuscript. Figures 2 and 4 show only minimal peak annotations related to OCT and SOX motifs, and there is a lack of chromatin IP data. Therefore, claims about direct targets are not substantiated and should be appropriately revised.

      Thank you for the comment. In Figure Supplement 3C, we analyzed published Sox2 CUT&RUN data from E4.5 ICMs (Li et al., Science, 2023), which demonstrates that the reduced ATAC-seq peaks in our Sox2 KO ICMs are enriched with Sox2 CUT&RUN signals. This data suggests that decreased peaks/enhancers could be the direct targets of Sox2. Unfortunately, we did not to find similar published data for Oct4 in embryos.

      (4) Lines 143-146 lack direct data to support the claim. Actually, the main difference in cluster 1, 11 and 3, 8, 14 is whether the peak contains OCT-SOX motif. However, the reviewer cannot get any information of peaks activated by OCT4 rather than SOX2 in cluster 1, 11.

      Thank you for the comment. As the reviewer pointed out, we agree that clusters 3, 8, 14 is more enriched with OCT-SOX motifs than clusters 1/11. However, this is consistent with our observation that the accessibility of peaks in clusters 1 and 11 mainly relies on Oct4, while the accessibility of clusters 3, 8, 14 relies on both Oct4 and Sox2. Probably the word “activate” is not accurate. We will rearrange the texts as below:

      “Notably, compared to the peaks dependent on Oct4 but not Sox2 (Figure 2B, clusters 1 and 11), those reliant on both Oct4 and Sox2 show greater enrichment of the OCT-SOX motif (Figure 2B, clusters 3, 8 and 14). The former group tended to be already open in the morula, while the latter group became open in the ICM. “

      Minor Points:

      (1) Lines 153-159: The figure panel does not show obvious enrichment of SOX2 signals or significant differences in H3K27ac signals across clusters, thus not supporting the claim.

      Thank you for the comments.

      Line 153-159 reference two datasets:  Figure supplement 3C and 3D.

      In Figure supplement 3C, the average plots above the heatmaps show that the decreased ATAC-seq peaks exhibited higher enrichment with Sox2 CUT&RUN signals compared to the increased or unchanged peaks.

      Regarding Figure supplement 3D, we agree that the H3K27ac signal is only slightly more enriched on the decreased peaks than the unchanged peaks, However, it's important to note that only the top 57,512 strongest of the 142,096 unchanged peaks were included in the analysis. We excluded the weaker unchanged peaks because they are less informative. but if included, they could reduce the average H3K27ac signal for the unchanged peaks.

      (2) Lines 189-190: The term "identify" is overstated for the integrative analysis of RNA-seq and ATAC-seq, which typically helps infer TF targets rather than definitively identifying them.

      Thank you for the suggestion. We will replace “identify” with “infer”. The revised version is as below:

      “In addition, integration of the ATAC-seq and RNA-seq data allowed us to infer previously unknown targets of Oct4 and Sox2, such as Sap30 and Uhrf1, which are essential for somatic cell reprogramming and embryonic development.”

      (3) The Discussion is lengthy and should be condensed.

      Thank you for the suggestion. We will shorten it.

    1. Author response:

      We thank the editors and reviewers for their valuable feedback and are committed to addressing their suggestions in a revised manuscript. We appreciate the reviewers’ recognition of the value of our findings, including the insights into the consequences of synaptic topography and the investigation of spike initiation zones in DNs, which further advance our understanding of signal processing. Our studies offer broader insights into synaptic organization and its significance for dendritic integration in an ethologically relevant context.

      We particularly appreciate the reviewer's suggestion to elaborate on the electrophysiological properties of DNs and to consider the electrotonic distance in our analysis. We also thank the reviewers for highlighting points that need clarification. In short, our models suggest that DNs effectively distribute synapses to maintain linear encoding of synapse numbers when multiple synapses are coactivated. This supports the results of an earlier study suggesting that synapse number gradients encode the location of an approaching stimulus in these neurons (Dombrovski et al., 2023).

      We also agree with the reviewers that the temporal activation of synapses is highly relevant for this system. However, we have focused on synaptic topography because the characterization of temporal patterns of VPN activity is currently lacking in the field. A more detailed investigation of temporal dynamics is therefore beyond the scope of this study.

      With the publication of the reviewed preprint, we have now made the computational pipeline and models available on GitHub (https://github.com/AusbornLab/VPN-DN-synapse-normalization).

      Reference

      Dombrovski M, Peek MY, Park J-Y, Vaccari A, Sumathipala M, Morrow C, Breads P, Zhao A, Kurmangaliyev YZ, Sanfilippo P, Rehan A, Polsky J, Alghailani S, Tenshaw E, Namiki S, Zipursky SL, Card GM. 2023. Synaptic gradients transform object location to action. Nature 613:534–542. doi:10.1038/s41586-022-05562-8

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      Dr. Santamaria's group previously utilized antigen-specific nanomedicines to induce immune tolerance in treating autoimmune diseases. The success of this therapeutic strategy has been linked to expanded regulatory mechanisms, particularly the role of T-regulatory type-1 (TR1) cells. However, the differentiation program of TR1 cells remained largely unclear. Previous work from the authors suggested that TR1 cells originate from T follicular helper (TFH) cells. In the current study, the authors aimed to investigate the epigenetic mechanisms underlying the transdifferentiation of TFH cells into IL-10-producing TR1 cells. Specifically, they sought to determine whether this process involves extensive chromatin remodeling or is driven by preexisting epigenetic modifications. Their goal was to understand the transcriptional and epigenetic changes facilitating this transition and to explore the potential therapeutic implications of manipulating this pathway. 

      The authors successfully demonstrated that the TFH-to-TR1 transdifferentiation process is driven by pre-existing epigenetic modifications rather than extensive new chromatin remodeling. The comprehensive transcriptional and epigenetic analyses provide robust evidence supporting their conclusions. 

      Strengths: 

      (1) The study employs a broad range of bulk and single-cell transcriptional and epigenetic tools, including RNA-seq, ATAC-seq, ChIP-seq, and DNA methylation analysis. This comprehensive approach provides a detailed examination of the epigenetic landscape during the TFH-to-TR1 transition. 

      (2) The use of high-throughput sequencing technologies and sophisticated bioinformatics analyses strengthens the foundation for the conclusions drawn. 

      (3) The data generated can serve as a valuable resource for the scientific community, offering insights into the epigenetic regulation of T-cell plasticity. 

      (4) The findings have significant implications for developing new therapeutic strategies for autoimmune diseases, making the research highly relevant and impactful. 

      We thank the reviewer for providing constructive feedback on the manuscript.

      Weaknesses: 

      (1) While the scope of this study lies in transcriptional and epigenetic analyses, the conclusions need to be validated by future functional analyses. 

      We fully agree with the reviewer’s suggestion. We have added the following text to the Discussion to address this concern: “The current study provides a foundational understanding of how the epigenetic landscape of TFH cells evolves as they transdifferentiate into TR1 progeny in response to chronic ligation of cognate TCRs using pMHCII-NPs. Our current studies focus on functional validation of these observations, by carrying out extensive perturbation studies of the TFH-TR1 transdifferentiation pathway in conditional transcription factor gene knock-out mice. In these ongoing studies, genes coding for a series of transcription factors expressed along the TFH-TR1 pathway are selectively knocked out in T cells, to ascertain (i) the specific roles of key transcription factors in the various cell conversion events and transcriptional changes that take place along the TFH-TR1 cell axis; (ii) the roles that such transcription factors play in the chromatin re-modeling events that underpin the TFH-TR1 transdifferentiation process; and (iii) the effects of transcription factor gene deletion on phenotypic and functional readouts of TFH and regulatory T cell function.”

      (2) This study successfully identified key transcription factors and epigenetic marks. How these factors mechanistically drive chromatin closure and gene expression changes during the TFH-to-TR1 transition requires further investigation. 

      Agreed. Please see our response to point #1 above.  

      (3) The study provides a snapshot of the epigenetic landscape. Future dynamic analysis may offer more insights into the progression and stability of the observed changes. 

      We have previously shown that the first event in the pMHCII-NP-induced TFH-TR1 transdifferentiation process involves proliferation of cognate TFH cells in the splenic germinal centers. This event is followed by immediate transdifferentiation of the proliferated TFH cells into transitional and terminally differentiated TR1 subsets. Although the snapshot provided by our single cell studies reported herein documents the simultaneous presence of the different subsets composing the transdifferentiation pathway at any given time point, the transdifferentiation process itself is extremely fast, such that proliferated TFH cells already transdifferentiate into TR1 cells after a single pMHCII-NP dose (Sole et al., 2023a). This makes it extremely challenging to pursue dynamic experiments. Notwithstanding this caveat, ongoing studies of cognate T cells post treatment withdrawal, coupled to single cell studies of the TFHTR1 pathway in transcription factor gene knockout mice exhibiting perturbed transdifferentiation processes are likely to shed light into the progression and stability of the epigenetic changes reported herein. 

      To address this limitation in the manuscript, we have added the following paragraph to the Discussion: “Although the snapshot provided by our single cell studies reported herein documents the simultaneous presence of the different subsets composing the TFH-TR1 cell pathway upon the termination of treatment, the transdifferentiation process itself is extremely fast, such that proliferated TFH cells already transdifferentiate into TR1 cells after a single pMHCII-NP dose (6). This makes it extremely challenging to pursue dynamic experiments. Notwithstanding this caveat, ongoing studies of cognate T cells post treatment withdrawal, coupled to single cell studies of the TFH-TR1 pathway in transcription factor gene knockout mice exhibiting perturbed transdifferentiation processes are likely to shed light into the progression and stability of the epigenetic changes reported herein”. 

      Reviewer #1 (Recommendations for the authors): 

      The authors may consider the following suggestions to improve this study: 

      (1) The authors may include a brief background on type 1 diabetes and the model involving BDC2.5 T cells to provide context for readers who may not be familiar with these aspects. 

      We have added this information to the first paragraph in the Results section: “BDC2.5mi/I-Ag7-specific CD4+ T cells comprise a population of autoreactive T cells that contribute to the progression of spontaneous autoimmune diabetes in NOD mice. The size of this type 1 diabetes-relevant T cell specificity is small and barely detectable in untreated NOD mice, but treatment with cognate pMHCII-NPs leads to the expansion and formation of antidiabetogenic TR1 cells that retain the antigenic specificity of their precursors (3). As a result, treatment of hyperglycemic NOD mice with these compounds results in the reversal of type 1 diabetes (3).”

      (2) It is understandable that further biological and functional experiments are beyond the scope of this paper, but it would be of interest to know how the authors envision future studies based on the transcriptional and epigenetic information obtained thus far. 

      We have added the following text to the Discussion section: “The current study provides a foundational understanding of how the epigenetic landscape of TFH cells evolves as they transdifferentiate into TR1 progeny in response to chronic ligation of cognate TCRs using pMHCII-NPs. Our current studies focus on functional validation of these observations, by carrying out extensive perturbation studies of the TFH-TR1 transdifferentiation pathway in conditional transcription factor gene knock-out mice. In these ongoing studies, genes coding for a series of transcription factors expressed along the TFH-TR1 pathway are selectively knocked out in T cells, to ascertain (i) the specific roles of key transcription factors in the various cell conversion events and transcriptional changes that take place along the TFH-TR1 cell axis; (ii) the roles that such transcription factors play in the chromatin re-modeling events that underpin the TFH-TR1 transdifferentiation process; and (iii) the effects of transcription factor gene deletion on phenotypic and functional readouts of TFH and regulatory T cell function.”

      (3) The authors may consider adjusting figures where genes are crowded or difficult to read due to small font size. 

      Figures with crowded text have been modified to facilitate reading.

      Reviewer #2 (Public Review): 

      Summary: 

      This study, based on their previous findings that TFH cells can be converted into TR1 cells, conducted a highly detailed and comprehensive epigenetic investigation to answer whether TR1 differentiation from TFH is driven by epigenetic changes. Their evidence indicated that the downregulation of TFH-related genes during the TFH to TR1 transition depends on chromatin closure, while the upregulation of TR1-related genes does not depend on epigenetic changes. 

      Strengths: 

      (1) A significant advantage of their approach lies in its detailed and comprehensive assessment of epigenetics. Their analysis of epigenetics covers chromatin open regions, histone modifications, DNA methylation, and using both single-cell and bulk techniques to validate their findings. As for their results, observations from different epigenetic perspectives mutually supported each other, lending greater credibility to their conclusions. This study effectively demonstrates that (1) the TFH-to-TR1 differentiation process is associated with massive closure of OCRs, and (2) the TR1-poised epigenome of TFH cells is a key enabler of this transdifferentiation process. Considering the extensive changes in epigenetic patterns involved in other CD4+ T lineage commitment processes, the similarity between TFH and TR1 in their epigenetics is intriguing. 

      (2) They performed correlation analysis to answer the association between "pMHC-NPinduced epigenetic change" and "gene expression change in TR1". Also, they have made their raw data publicly available, providing a comprehensive epigenomic database of pMHC-NPinduced TR1 cells. This will serve as a valuable reference for future research. 

      We thank the reviewer for his/her constructive feedback and suggestions for improvement of the manuscript.

      Weaknesses: 

      (1) A major limitation is that this study heavily relies on a premise from the previous studies performed by the same group on pMHC-NP-induced T-cell responses. This significantly limits the relevance of their conclusion to a broader perspective. Specifically, differential OCRs between Tet+ and naïve T cells were limited to only 821, as compared to 10,919 differential OCRs between KLH-TFH and naïve T cells (Figure 2A), indicating that the precursors and T cell clonotypes that responded to pMHC-NP were extremely limited. This limitation should be clearly discussed in the Discussion section. 

      We agree that this study focuses on a very specific, previously unrecognized pathway discovered in mice treated with pMHCII-NPs. Despite this apparent narrow perspective, we now have evidence that this is a naturally occurring pathway that also develops in other contexts (i.e., in mice that have not been treated with pMHCII-NPs). Furthermore, this pathway affords a unique opportunity to further understand the transcriptional and epigenetic mechanisms underpinning T cell plasticity; the findings reported can help guide/inform not only upcoming translational studies of pMHCII-NP therapy in humans, but also other research in this area. 

      We have added the following text to the Discussion to address this limitation: “Although the TFH-TR1 transdifferentiation was discovered in mice treated with pMHCII-NPs, we now have evidence that this is a naturally occurring pathway that also develops in other contexts (i.e., in mice that have not been treated with pMHCII-NPs). Importantly, the discovery of this transdifferentiation process affords a unique opportunity to further understand the transcriptional and epigenetic mechanisms underpinning T cell plasticity; the findings reported here can help guide/inform not only upcoming translational studies of pMHCII-NP therapy in humans, but also other research in this area”.

      We acknowledge that, in the bulk ATAC-seq studies, the differences in the number of OCRs found in tetramer+ cells or KLH-induced TFH cells vs. naïve T cells may be influenced by the intrinsic oligoclonality of the tetramer+ T cell pool arising in response to repeated pMHCII-NP challenge (Sole et al., 2023a). However, we note that our scATAC-seq studies of the tetramer+ T cell pool found similar differences between the oligoclonal tetramer+ TFH subpool and its (also oligoclonal) tetramer+ TR1 counterparts (i.e., substantially higher number of OCRs in the former vs. the latter relative to naïve T cells). 

      This has been clarified in the revised version of the manuscript, by adding the following text to the last paragraph of the Results subsection entitled “Contraction of the chromatin in pMHCII-NP-induced Tet+ vs. TFH cells at the bulk level”: “We acknowledge that, in the bulk ATAC-seq studies, the differences in the number of OCRs found in tetramer+ cells or KLHinduced TFH cells vs. naïve T cells may be influenced by the intrinsic oligoclonality of the tetramer+ T cell pool arising in response to repeated pMHCII-NP challenge (6). However, we note that scATAC-seq studies of the tetramer+ T cell pool found similar differences between the oligoclonal tetramer+ TFH subpool and its (also oligoclonal) tetramer+ TR1 counterparts (i.e., substantially higher number of OCRs in the former vs. the latter relative to naïve T cells)”.

      (2) This article uses peak calling to determine whether a region has histone modifications, claiming that the regions with histone modifications in TFH and TR1 are highly similar. However, they did not discuss the differences in histone modification intensities measured by ChIP-seq. For example, as shown in Figure 6C, IL10 H3K27ac modification in Tet+ cells showed significantly higher intensity than KLH-TFH, while in this article, it may be categorized as "possessing same histone modification region". This will strengthen their conclusions.

      We appreciate your suggestion to discuss differences in histone modification intensities as measured by ChIP-seq. However, we respectfully disagree with the reviewer’s interpretation of these data.

      Our study primarily focuses on the identification of epigenetic similarities and differences between pMHCII-NP-induced tetramer+ cells and KLH-induced TFH cells relative to naive T cells. The outcome of direct comparisons of histone deposition (ChIP-seq) between these cell types is summarized in the lower part of Figure 4B and detailed in Datasheet 5. Throughout this section, we mention the number of differentially enriched regions, their overlap with OCRs shared between tetramer+ TFH and tetramer+ TR1 cells based on scATAC-seq data, and the associated genes. Clearly, the epigenetic modifications that TR1 cells inherit from TFH cells were acquired by TFH cells upon differentiation from naïve T cell precursors. 

      Regarding the specific point raised by the reviewer on differences in the intensity of the H3K27Ac peaks linked to Il10 in Figure 6C, we note that the genomic tracks shown are illustrative. Thorough statistical analyses involving signal background for each condition and p-value adjustment did not support differential enrichment for H3K27Ac deposition around the Il10 gene between pMHCII-NP-induced tetramer+ T cells and KLH-induced TFH cells. 

      This has now been clarified by adding the following text to the end of the Results subsection entitled ”H3K4me3, H3K27me3 and H3K27ac marks in genes upregulated during the TFH-to-TR1 cell conversion are already in place at the TFH cell stage”: “We note that, although in the representative chromosome track views shown in Fig. 6C there appear to be differences in the intensity of the peaks, thorough statistical analyses involving signal background for each condition and p-value adjustment did not support differential enrichment for histone deposition around the Il10 gene between pMHCII-NP-induced tetramer+ T cells and KLH-induced TFH cells.” 

      We have also clarified this in the corresponding section of the Methods section (“ATACseq and ChIP-seq” under “Bioinformatic and Statistical Analyses”): “Given that peak calling alone does not account for variations in the intensity of histone mark deposition, analysis of differential histone deposition includes both qualitative and quantitative assessments. Whereas qualitative assessment involves evaluating the overall pattern and distribution of the various histone marks, quantitative assessment measures the intensity and magnitude of histone mark deposition.”

      (3) Last, the key findings of this study are clear and convincing, but some results and figures are unnecessary and redundant. Some results are largely a mere confirmation of the relationship between histone marks and chromatin status. I propose to reduce the number of figures and text that are largely confirmatory. Overall, I feel this paper is too long for its current contents. 

      We understand your concern about the potential redundancy of some results and figures. Our aim in including these analyses was to provide a comprehensive understanding of the intricate relationships between epigenetic features and transcriptomic differences. We believe that a detailed examination of these relationships is crucial for several reasons: (i) the breadth of the data allows for a thorough exploration of the relationships between histone marks, open chromatin status and transcriptional differences. This comprehensive approach helps to ensure that our conclusions are robust and well-supported; (ii) some of the results that may appear confirmatory are, in fact, important for validating and reinforcing the consistency of our findings across different contexts. These details are intended to provide a nuanced understanding of the interactions between epigenetic features and gene expression; and (iii) By presenting a detailed analysis, we aim to offer a solid foundation for future research in this area. The extensive data presented will serve as a valuable resource for others in the field who may seek to build on our findings.

      That said, we have carefully reviewed the manuscript to identify and streamline elements that might be perceived as overly redundant, while retaining the depth of analysis that we believe is essential.

      Reviewer #2 (Recommendations for the authors): 

      (1) In Figure 1E, the text states "94% (n=217/231) of the genes associated with chromatin regions that had closed during the TFH-TR1 conversion,", but n=231 do not match with n=1820 provided in Figure 1D as downregulated genes. This is one of the examples that do not match numbers among figures or lack sufficient explanations. Please check those numbers carefully and add some sentences if necessary. 

      We note that the text referring to Figure 1D describes the total number of differentially expressed genes between Tet+ TR1 and Tet+ TFH cells using the scMultiome dataset (n = 2,086 genes downregulated in the former vs. the latter; and n = 266 genes upregulated in the former vs. the latter). The text in the paragraph that follows (referring to Figure 1E) focuses exclusively on the genes that had closed chromatin regions during the TFH-to-TR1 conversion, to ascertain whether or not chromatin closure was indeed associated with such gene downregulation. 

      We have modified the first sentence in the paragraph referring to Figure 1E to clarify this point for the reader: “Further analyses focusing on the genes that had closed chromatin regions during the TFH-to-TR1 conversion, confirmed…”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors have developed a valuable method based on a fully cell-free system to express a channel protein and integrate it into a membrane vesicle in order to characterize it biophysically. The study presents a useful alternative to study channels that are not amenable to being studied by more traditional methods.

      Strengths:

      The evidence supporting the claims of the authors is solid and convincing. The method will be of interest to researchers working on ionic channels, allowing them to study a wide range of ion channel functions such as those involved in transport, interaction with lipids, or pharmacology.

      Weaknesses:

      The inclusion of a mechanistic interpretation of how the channel protein folds into a protomer or a tetramer to become functional in the membrane would strengthen the study.

      Work from other labs has described key factors which can improve expression and artificial lipid integration of cellfree derived transmembrane proteins (PMIDs: 35520093, 29625253, 26270393) . However, a significant number of additional experiments would be needed to elucidate the exact biophysical properties governing channel assembly of synthetically derived polycystins. We carried out additional biochemical experiments to address these concerns (see new Figure 1— figure supplement 1 D, E). We used fluorescence-detection size-exclusion chromatography (FSEC) with the goal of understanding how much of the CFE-derived protomers are biochemically folding and assembly into functional tetramers upon incorporation into SUVs. When compared to protein recombinant sources from HEK cells, the production of assembled channels is less than 4% when using the CFE+SUV approach, an estimate based on the oligomer peak fluorescence. In the absence of chaperones found in cells, the assembly of synthetically derived protomers into tetramers is likely intrinsic to the chemical properties of the proteins, and the biophysical principles governing helical membrane protein when inserted into the lipid membrane  (PMID:35133709). We have added our interpretation in lines 111-121.

      Reviewer #2 (Public Review):

      It is challenging to study the biophysical properties of organelle channels using conventional electrophysiology. The conventional reconstitution methods require multiple steps and can be contaminated by endogenous ionophores from the host cell lines after purification. To overcome this challenge, in this manuscript, Larmore et al. described a fully synthetic method to assay the functional properties of the TRPP channel family. The TRPP channels are an important organelle ion channel family that natively traffic to primary cilia and ER organelles. The authors utilized cell-free protein expression and reconstitution of the synthetic channel protein into giant unilamellar vesicles (GUV), the single channel properties can be measured using voltage-clamp electrophysiology. Using this innovative method, the authors characterized their membrane integration, orientation, and conductance, comparing the results to those of endogenous channels. The manuscript is well-written and may present broad interest to the ion channel community studying organelle ion channels. Particularly because of the challenges of patching native cilia cells, the functional characterization is highly concentrated in very few labs. This method may provide an alternative approach to investigate other channels resistant to biophysical analysis and pharmacological characterization.

      Thank you for evaluating our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It would be useful to explain how the Polycystin protein is folded under the experimental conditions used. The expression data shown in Figure 1 Supplement 1B show different protein concentrations of protomer or tetramer. However, it is not described how each form is identified and distinguished. It is also important to mention in the manuscript that this method is only applicable to membrane channels that do not require chaperons for its folding and expression into the membrane. How is the tetramer mechanistically conformed? In line 184, it is stated that this method can be leveraged for studying the effects of channel subunit composition. Would this method allow the expression of two different subunit proteins in order to produce a heteromeric channel?

      In Figure 1—figure supplement 1B, total fluorescence from the synthesized channel-GFP was measured. Protein concentration was calculated based on the linear regression of the GFP standards. Monomeric protein concentration was reported directly from total fluorescence. Tetrameric protein concentration was calculated by dividing the fluorescence by four, and subsequently calculating the concentration based off the GFP standards. 

      This is a good point. Based on your suggestion, we carried out additional biochemical experiments (see new Figure 1— figure supplement 1 D, E). We used fluorescence-detection size-exclusion chromatography (FSEC) with the goal of understanding how much of the CFE-derived protomers are biochemically folding and assembly into functional tetramers upon incorporation into SUVs. As controls we produced recombinant PKD2-GFP and PKD2L1GFP channels as elution time standards and to compare the relative production of tetrameric channels generated when using the two expression systems. The synthetically derived polycystin channels indeed produced tetramers and protomers, which supports feasibility of using this method to assay their functional properties.  When compared to protein recombinant sources from HEK cells, the production of assembled channels is less than 4% when using the CFE+SUV approach, an estimate based on the oligomer peak fluorescence. We speculate that assembly of synthetically derived protomers into tetramers is likely intrinsic to the chemical properties of the proteins, and the biophysical principles governing helical membrane protein when inserted into the lipid membrane (PMID: 35133709). Although an interesting question, a systematic analysis of these channel-lipid interactions is beyond the scope of this eLife Report but can be addressed in future studies. The limitation of using this method to characterize channels which fold and membrane integrate without the aid of molecular chaperones is now stated in lines 201205. In principle, the CFE-GUV method can be deployed to co-express different subunits to produce heteromeric channels. We have modified the text lines 192-197 to be clearer on this point.

      (2) The type of plasmid (and promoter) required for this methodology should be mentioned.

      Added to the methods (lines 210-211). “PKD2 and PKD2L1 are in pET19b plasmid under T7 promoter.”

      (3) Since this paper is methodological, it would be useful to have some information about the stability of the GUVs containing the synthetic channel. In Methods, it is stated that GUV vesicles are used on the same day (line 207). And in line 193 it says that the reactions (?) are placed at 4{degree sign}C for storage.

      Restated in lines 226-228: GUVs are electroformed and used for electrophysiology the same day. SUVs with channel incorporated are stored at 4°C for 3 days.

      (4) A comment reasoning why the PKD2 protein is more frequently incorporated into the membrane in comparison to PKD2L1 should be included. A brief description of the differences between these two proteins would also be helpful for the reader.

      In terms of overall protein production and oligomeric assembly— more PKD2L1 channels are produced compared to PKD2 (see new Figure 1C, and Figure 1— figure supplement 1 D, E). In lines 149-155 we note single channel openings were frequently observed for the high expressing PKD2L1 channels, but this often resulted in patch instability. As a result, GUV patches with lower expressing PKD2-GFP channel were more stable and thus more successfully recorded from. We have revised the text to be clearer on this point.

      (5) There are no methods for preparing hippocampal neurons or IMCD cells shown in Figure 4 Supplement 1. Instead, the method of mammalian cultures provided corresponds to HEK 293T cells.

      This information has been added to lines 273-284.

      (6) Minor:

      In Figure 2C, please include the actual % of the Cell488+Surface647+Clear lumen vesicles.

      Added

      Line 99, 108: Figures 1B and 1C are swapped. Please correct.

      Corrected in figure and figure legends.

      Line 108: misspelling: effect.

      Done

      Line 109: check sentence: verb is missing.

      Sentence now reads “Minimal changes in fluorescence were detected when a control plasmid (Ctrl) encoding a non- fluorescent protein (dihyrofolate reductase) was used in the reaction.”

      Line 145: recoding. Correct.

      Recoding changed to recordings

      Line 169: "from" is missing (recorded from MCD cilia).

      Added

      Line 169: In Table 1, the PKD2 K+ conductance magnitudes recorded from IMCD cilia were significantly smaller, not larger as stated, than those assayed using CFE-GUV system. Please correct.

      Corrected

      Line 180: "of" is missing (adaptation of CFE derived...).

      Corrected

      Line 182: "to" is missing (generalized to other channels).

      Corrected

      Line 193: "in" 4ºC, correct at.

      Corrected

      Line 197: replace "mole" for "mol".

      Corrected

      Line 207: are used "within the" same day.

      Corrected

      Line 210: c-terminally. C-should be capital letter.

      Corrected

      Line 231: n-terminally. N- should be capital letter.

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      The authors validated their method using PKD2 and PKD2L1 channels, demonstrating the potential of this approach. However, a few points merit further clarification or validation:

      (1) Stability of the protein vesicles for recording. The authors observed membrane instability during voltage transitions. It would be beneficial to discuss potential solutions to enhance stability.

      In lines 197-202, we have added a discussion of potential solutions to enhance stability. CsF in the intracellular saline could be added to stabilize the GUV membranes. CsF is frequently added to stabilize whole cell membranes in HTS planer patch clamp recording. We did not explore this formulation because Cs+ would limit outward polycystin conductance. We also suggest but did not test altering the membrane formulation of GUVs with additional cholesterol to stabilize these recordings.

      (2) Validation. Further discussion on how broadly this method can be applied to other channels would strengthen the manuscript.

      We have included further discussion on this point in lines 190-206. 

      (3) Protein production estimated by a standard GFP absorbance assay. The estimation of protein production using GFP absorption may be affected by improperly folded protein. Additional validation methods could be considered.

      C-terminal GFP fluorescence has been widely used in expression systems to designate proper folding of the target protein upstream of the GFP-tag (PMID: 22848743, PMID: 21805523, PMID: 35520093). Nonetheless we have conducted additional experiments designed to estimate the amount of assembled PKD2 and PKD2L1 channels generated using the CFE method. In the new Figure 1— figure supplement 1 D, E, we carried out fluorescencedetection size-exclusion chromatography and compared channel assembly of recombinant and CFE+SUV derived PKD2-GFP and PKD2L1-GFP. Here, we clearly observed tetrameric and protomeric forms of the channels using the synthetic approach, which supports feasibility of using this method to assay their functional properties (see new Figure 1— figure supplement 1 D, E).  When compared to protein recombinant sources from HEK cells, the production of assembled channels is less than 4% when using the CFE+SUV approach, an estimate based on the oligomer peak fluorescence. 

      (4) Single channels were observed more frequently from PKD2 incorporated GUVs compared to PKD2L1. Does this just randomly happen or is there a reason behind this difference?

      In terms of overall protein production and oligomeric assembly— more PKD2L1 channels are produced compared to PKD2 (Figure 1C, and Figure 1— figure supplement 1 D, E). This is apparent whether the channels are produced recombinantly in cells or when using the cell-free method (Figure 1— figure supplement 1 D, E). In lines 149-155, we note single channel openings were frequently observed but that the high expression of the PKD2L1 often resulted in patch instability. As a result, GUV patches the lower expressing PKD2-GFP channel were more stable and thus more successfully recorded from. As requested, we have included a brief description of the two proteins in lines 76-78. 

      (5) Additional validation or clarification for examining the channel orientation may strengthen the manuscript.

      We have modified the text to make this point clearer. 

      (6) Advantage and limitations. The authors compared the recordings from hippocampal primary cilia membranes, noting differences in conductance magnitudes compared to the GUV method. Further discussing the limitations and advantages of this approach for the biophysical properties of organelle channels would be beneficial.

      We have revised the final paragraph to discuss the limitations of this method.

      (7) Including experiments that demonstrate ligand-induced activation or inhibition to further validate the current using this method would strengthen the manuscript (optional, not required).

      Despite our best attempts, exchange of the external bath to apply inhibitors (Gd3+, La3+) resulted in GUV patch instability. Our plans are to investigate ways to stabilize the high resistance seals to develop pharmacological screening using the CFE+GUV method.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their interest in our studies. In response to their comments, we have conducted additional experiments and made the necessary revisions to the manuscript. The new studies included to address the reviewers’ comments are shown in Figure 1B, 1F, Figure 2—figure supplement 1, Figure 3, Figure 3—figure supplement 1, Figure 3—figure supplement 2, Figure 3—figure supplement 3, Figure 4E, Figure 4—figure supplement 1, Figure 5, Figure 5—figure supplement 1, Figure 5—figure supplement 2D, and Figure 6. We are grateful for the critiques, which have helped us substantially improve the quality of the manuscript.

      Below, we have provided a point-by-point response to the reviewers’ comments.  

      Public Reviews:

      Reviewer #1 (Public Review):

      In this paper, the authors show that disruption of calcineurin, which is encoded by tax-6 in C. elegans, results in increased susceptibility to P. aeruginosa, but extends lifespan. In exploring the mechanisms involved, the authors show that disruption of tax-6 decreases the rate of defecation leading to intestinal accumulation of bacteria and distension of the intestinal lumen. The authors further show that the lifespan extension is dependent on hlh-30, which may be involved in breaking down lipids following deficits in defecation, and nhr-8, whose levels are increased by deficits in defecation. The authors propose a model in which disruption of the defecation motor program is responsible for the effect of calcineurin on pathogen susceptibility and lifespan, but do not exclude the possibility that calcineurin affects these phenotypes independently of defecation.

      We thank the reviewer for providing an excellent summary of our work. We have performed additional experiments as suggested by both the reviewers and believe we have thoroughly addressed all the reviewers' concerns.

      Reviewer #2 (Public Review):

      The manuscript titled "Calcineurin Inhibition Enhances Caenorhabditis elegans Lifespan by Defecation Defects-Mediated Calorie Restriction and Nuclear Hormone Signaling" by Priyanka Das, Alejandro Aballay, and Jogender Singh reveals that inhibiting calcineurin, a conserved protein phosphatase, in C. elegans affects the defecation motor program (DMP), leading to intestinal bloating and increased susceptibility to bacterial infection. This intestinal bloating mimics calorie restriction, ultimately resulting in an enhanced lifespan. The research identifies the involvement of HLH-30 and NHR-8 proteins in this lifespan enhancement, providing new insights into the role of calcineurin in C. elegans DMP and mechanisms for longevity.

      The authors present novel findings on the role of calcineurin in regulating the defecation motor program in C. elegans and how its inhibition can lead to lifespan enhancement. The evidence provided is solid with multiple experiments supporting the main claims.

      Strengths:

      The manuscript's strength lies in the authors' use of genetic and biochemical techniques to investigate the role of calcineurin in regulating the DMP, innate immunity, and lifespan in C. elegans. Moreover, the authors' findings provide a new mechanism for calcineurin inhibitionmediated longevity extension, which could have significant implications for understanding the molecular basis of aging and developing interventions to promote healthy aging.

      (1) The study uncovers a new role for calcineurin in the regulation of C. elegans DMP and a potential novel pathway for enhancing lifespan via calorie restriction involving calcineurin, HLH-30, and NHR-8 in C. elegans.

      (2) Multiple signaling pathways involved in lifespan enhancement were investigated with fairly strong experimental evidence supporting their claims.

      We thank the reviewer for an excellent summary of our work and for highlighting the strengths of the findings.

      Weaknesses:

      The manuscript's weaknesses include the lack of mechanistic details regarding how calcineurin inhibition leads to defects in the DMP and induces calorie restriction-like effects on lifespan.

      The exact site of calcineurin action, i.e., whether in the intestine or enteric muscles (Lee et al., 2005), and the possible molecular mechanisms linking calcineurin inhibition, DMP defects, and lifespan were not adequately explored. Although characterization of the full mechanism is probably beyond the scope of this paper, given the relative simplicity and advantages of using C. elegans as a model organism for this study, some degree of rigor is expected with additional straightforward control experiments as listed below:

      The authors state that tax-6 knockdown animals had drastically reduced expulsion events (Figure 2G), leading to irregular DMP (Lines 144-145), but did not describe the nature of DMP irregularity. For example, did the reduced expulsion events still occur with regular intervals but longer cycle lengths? Or was the rhythmicity completely abolished? The former would suggest the intestine clock is still intact, and the latter would indicate that calcineurin is required for the clock to function. Therefore, ethograms of DMP in both wild-type and tax6 mutant animals are warranted to be included in the manuscript. Along the same line, besides the cycle length, the three separable motor steps (aBoc, pBoc, EMC) are easily measurable, with each step indicating where the program goes wrong, hence the site of action, which is precisely the beauty of studying C. elegans DMP. Unfortunately, the authors did not use this opportunity to characterize the exact behavior phenotypes of the tax-6 mutant to guide future investigations. Furthermore, it is interesting that about 64% of tax-6 (p675) animals had normal DMP. The authors attributed this to p675 being a weak allele. It would be informative to further examine tax-6 RNAi as in other experiments or to make a tax-6 null mutant with CRISPR. In addition, in one of the cited papers (Lee et al., 2005), the exact calcineurin loss-of-function strain tax-6(p675) was shown to have normal defecation, including normal EMC, while the gain-of-function mutant of calcineurin tax-6(jh107) had abnormal EMC steps. It wasn't clear from Lee et al., 2005, if the reported "normal defecation" was only referring to the expulsion step or also included the cycle length. Nevertheless, this potential contradiction and calcineurin gain-of-function mutant is highly relevant to the current study and should be further explored as a follow-up to previously reported results. For some of the key experiments, such as tax-6's effects on susceptibility to PA14, DMP, intestinal bloating, and lifespan, additional controls, as the norm of C. elegans studies, including second allele and rescue experiments, would strengthen the authors' claims and conclusions.

      We have now included lifespan, survival on P. aeruginosa, and DMP data using an additional knockout allele, tax-6(ok2065). Additionally, we have added ethograms of DMP for both tax-6 RNAi and the tax-6(ok2065) mutant. Our observations indicate that tax-6 inhibition leads to a complete loss of DMP rhythmicity, suggesting that calcineurin is essential for maintaining the DMP clock. While characterizing the DMP, we noticed that expulsion events appeared superficial in the tax-6(ok2065) mutant, with little to no gut content released. Consequently, we examined the movement of gut content and found that both tax-6(ok2065) mutants and tax-6 knockdown animals showed significantly reduced gut content movement. The new findings on the characterization of DMP are presented in Figure 2—figure supplement 1, Figure 3, Figure 3—figure supplement 1, and Figure 3—figure supplement 2. The text in the results section reads (lines 160-176): “Next, we investigated whether the reduced number of expulsion events was due to regular intervals with longer cycle lengths or if rhythmicity was entirely disrupted upon tax-6 knockdown. To assess this, we obtained ethograms of the DMP for N2 animals grown on control and tax-6 RNAi. While animals on control RNAi displayed regular cycles of pBoc, aBoc, and EMC, the tax-6 RNAi animals exhibited disrupted rhythmicity (Figure 3A and Figure 3—figure supplement 1). Most tax-6 knockdown animals lacked the pBoc and aBoc steps and had sporadic expulsion events. Isolated pBoc events were occasionally observed, indicating a complete loss of rhythmicity in tax-6 knockdown animals. Ethograms for tax-6(ok2065) animals also showed disrupted rhythmicity (Figure 3B and Figure 3—figure supplement 2). Although the number of expulsion events appeared higher in tax-6(ok2065) animals compared to tax-6 RNAi animals (Figure 3—figure supplement 1 and 2), these expulsion events seemed superficial, releasing little to no gut content. This suggested slow movement of gut content in tax6(ok2065) animals, leading to constipation and intestinal bloating. We examined gut content movement by measuring the clearance of blue dye (erioglaucine disodium salt) from the gut. The clearance was significantly slower in tax-6(ok2065) animals compared to N2 animals (Figure 3C), indicating impaired gut content movement due to the loss of tax-6. Similarly, tax-6 knockdown animals also showed significantly slowed gut content movement (Figure 3D).”

      Moreover, we have added a potential reason for the tax-6(p675) contradictory results from Lee et al., 2005 (lines 154-159): “At the 1-day-old adult stage, about 36% of tax-6(p675) animals showed irregular and slowed DMP, while the remainder had regular DMP (Figure 2H), suggesting that tax-6(p675) is a weak allele. The fraction of the animals with irregular DMP appeared to increase with age, indicating that this phenotype might be agedependent. This may also explain why tax-6(p675) animals were reported to have a normal defecation cycle in an earlier study (Lee et al., 2005).”

      The second weakness of this manuscript is the data presentation for all survival rate curves. The authors stated that three independent experiments or biological replicates were performed for each group but only showed one "representative" curve for each plot. Without seeing all individual datasets or the averaged data with error bars, there is no way to evaluate the variability and consistency of the survival rate reported in this study.

      We now provide all replicates data in the source data files.

      Overall, the authors' claims and conclusions are justified by their data, but further experiments are needed to confirm their findings and establish the detailed mechanisms underlying the observed effects of calcineurin inhibition on the DMP, calorie restriction, and lifespan in C. elegans.

      We have conducted additional experiments to elucidate the role of calcineurin in the DMP and to investigate the impact of the DMP on calorie restriction and lifespan in C. elegans, as described in the various responses to the reviewers’ comments. 

      Recommendations for the authors:

      Our specific comments to guide the authors, should they choose to revise the manuscript:

      The RNAi experiments in the eat-2 mutant background are difficult to interpret. If these animals are eating fewer bacteria, it is possible that there is also less tax-6 dsRNA being ingested and therefore less tax-6 inactivation. These experiments should be conducted with a tax-6 null allele.

      We have included lifespan experiments with the eat-2(ad465);tax-6(ok2065) double mutant, along with the individual single mutant controls, as shown in Figure 4E. These results demonstrate that the eat-2 mutation does not further extend the lifespan of the tax-6(ok2065) mutant. Additionally, we confirmed that the eat-2(ad465) mutants do not exhibit defects in feeding-based RNAi (Figure 4—figure supplement 1).

      While aak-2, hlh-30, and nhr-8 mutants may not have an eat phenotype, the negative tax-6 RNAi results should be confirmed with a tax-6 null mutant to obviate the consideration that these background mutations reduce RNAi efficacy.

      The genes hlh-30 and nhr-8 are located very close to tax-6 on chromosome IV (https://wormbase.org//#012-34-5), which made it challenging to generate double mutants. However, we tested the RNAi sensitivity of the hlh-30(tm1978) and nhr-8(ok186) mutants and confirmed that they are not defective in RNAi (Figure 5—figure supplement 1). We also found that tax-6 RNAi disrupted the DMP in both hlh-30(tm1978) and nhr-8(ok186) mutants (Figure 5—figure supplement 2). Furthermore, our results show that hlh-30(tm1978) and nhr-8(ok186) animals have increased susceptibility to P. aeruginosa upon tax-6 knockdown (Figure 6A, B), indicating that tax-6 RNAi was effective in these mutants. Since the phenotype in the aak-2 mutant was only partially observed, we did not conduct further experiments with aak-2 mutants.

      Reviewer #1 (Recommendations For The Authors):

      The low penetrance of defecation cycle defects in tax-6(p675) worms brings into question the role of the defecation deficits in the phenotypes caused by the disruption of tax6. At the same time, the low penetrance provides a golden opportunity to test this. Do tax6(p675) worms with a normal defecation cycle length have extended longevity? Increased susceptibility to bacterial pathogens? Smaller body size? Distended lumen? Decreased fat accumulation? Increased pha-4 and nhr-8 expression? It would be relatively straightforward to measure defecation cycle length in individual tax-6(p675) worms, bin them into normal defecation and slow defecation groups, and then compare the above-mentioned phenotypes.

      We appreciate the reviewer's interesting suggestion. However, the DMP defect phenotype in tax-6(p675) worms appears to be age-dependent, with the number of DMPdefective worms increasing as they age. Additionally, we observed that exposure to P. aeruginosa accelerates the onset of DMP defects in tax-6(p675) worms. As a result, tax6(p675) worms are not suitable for the type of experiments the reviewer suggested. Nevertheless, we believe that the additional data using the tax-6(ok2065) mutant, along with the characterization of ethograms of DMP, firmly establishes the role of calcineurin in maintaining a regular DMP in C. elegans.

      Another way to dissect specific effects of calcineurin disruption from phenotypes resulting from defecation motor program deficits would be to further characterize other worms with deficits in defecation (flr-1, nhx-2, pbo-1 RNAi). It is mentioned that they have decreased lifespan. Do they also show increased susceptibility to bacterial pathogens? Do they show decreased fat? Is their lifespan dependent on HLH-30 and NHR-8?

      We thank the reviewer for this important suggestion. We have now included data with flr-1, nhx-2, and pbo-1 RNAi, which shows that the knockdown of these genes also enhances susceptibility to P. aeruginosa (Figure 3—figure supplement 3G). Knockdown of these genes is already known to reduce fat levels in N2 worms, and we demonstrate that they similarly reduce fat levels in hlh-30(tm1978) and nhr-8(ok186) animals (Figure 5B, C, F, G). Additionally, we found that the increased lifespan observed upon knockdown of these genes (as well as with tax-6 knockdown) is dependent on HLH-30 and NHR-8 (Figure 5A, D).

      To place "enhanced susceptibility to pathogen" within the proposed model, it would be important to examine the effect of HLH-30 and NHR-8 disruption on this phenotype. The proposed model suggests that this phenotype is independent of HLH-30 and NHR-8, but this should be tested experimentally. Similarly, it would be important to test the effect of HLH-30 and NHR-8 disruption on defecation cycle length to determine if defecation deficits are upstream or downstream of deficits in the defecation motor program

      We show that the knockdown of tax-6 leads to defects in the DMP in hlh30(tm1978) and nhr-8(ok186) animals (Figure 5—figure supplement 2). Moreover, we show that hlh-30(tm1978) and nhr-8(ok186) animals have increased susceptibility to P. aeruginosa upon tax-6 knockdown (Figure 6A, B). These results are described as (lines 279-285): “Given that HLH-30 and NHR-8 are essential for lifespan extension upon calcineurin inhibition, we investigated whether these pathways also influence survival in response to P. aeruginosa infection following calcineurin knockdown. Both hlh-30(tm1978) and nhr-8(ok186) animals showed significantly reduced survival upon tax-6 RNAi (Figure 6A, B). These findings suggested that the reduced survival on P. aeruginosa following calcineurin inhibition is independent of HLH-30 and NHR-8 and is more likely due to increased gut colonization by P. aeruginosa resulting from DMP defects (Figure 6C).”

      Is the lifespan of tax-6(p675) increased? This would be important to measure and include in Figure 1.

      Indeed, the lifespan of tax-6(p675) mutants is increased. We have included the lifespan of tax-6(p675) and tax-6(ok2065) in Figure 1F.

      In Figure 2, disruption of tax-6 appears to result in a clear decrease in body size. To what extent is the decrease in fat/worm in Figure 3 simply a result of the worms being smaller? Perhaps, a measurement of Oil-Red-O intensity PER AREA would be a more appropriate measure.

      The ORO intensity values we had shown per animal were already area normalized. We have now indicated this in the Figure Legends.

      There are multiple long-lived mutant strains such as clk-1 and isp-1 that have an increased defecation cycle length. To what extent do these worms exhibit phenotypes similar to tax-6 disruption? isp-1 have increased resistance to bacterial pathogens suggesting that defecation motor program deficits are not sufficient to increase susceptibility to bacterial pathogens.

      We have now examined the clk-1 and isp-1 mutants and found that these mutants exhibit reduced gut colonization by P. aeruginosa compared to N2 animals. This reduction in colonization may be attributed to the slowed pharyngeal pumping rates observed in these mutants. These findings suggest that the phenotypes associated with a slow DMP versus a disrupted DMP could be significantly different. The manuscript with the new data on these mutants reads (lines 177-192): “We then explored whether the disruption of DMP rhythmicity due to tax-6 knockdown affected P. aeruginosa responses similarly to longer but regular DMP cycles. To do this, we studied P. aeruginosa colonization in clk-1(qm30) and isp1(qm150) mutants, which have regular but extended DMP cycles (Feng et al., 2001; Wong et al., 1995). Interestingly, both clk-1(qm30) and isp-1(qm150) mutants showed significantly reduced intestinal colonization by P. aeruginosa compared to N2 animals (Figure 3—figure supplement 3A-D). This reduced colonization could be attributed to their significantly decreased pharyngeal pumping rates (Wong et al., 1995; Yee et al., 2014), suggesting a lower intake of bacterial food in these mutants. While the survival of clk-1(qm30) animals on P. aeruginosa was comparable to N2 animals (Figure 3—figure supplement 3E), isp1(qm150) animals exhibited significantly improved survival (Figure 3—figure supplement 3F). Conversely, knockdown of flr-1, nhx-2, and pbo-1 in N2 animals resulted in significantly reduced survival on P. aeruginosa compared to control RNAi (Figure 3—figure supplement 3G). Knockdown of these genes causes complete disruption of DMP rhythmicity, increasing gut colonization by P. aeruginosa (Singh and Aballay, 2019a). Overall, these findings demonstrated that calcineurin is crucial for maintaining the DMP ultradian clock, and its inhibition increases susceptibility to P. aeruginosa by disrupting the DMP.”

      Line 192. This statement is speculative. There is no evidence that HLH-30 is mediating lipid depletion in these worms.

      We have removed this statement. We observed that the knockdown of flr-1, nhx2, and pbo-1 resulted in significant fat depletion in hlh-30(tm1978) animals (Figure 5B, C). Additionally, tax-6 knockdown also caused a small but significant reduction in fat levels in hlh-30(tm1978) animals. This contrasts with our initial submission, possibly due to the increased number of animals included in the analysis. These findings suggest that the increase in lifespan due to DMP defects requires HLH-30, likely through a mechanism independent of HLH-30’s role in fat depletion. We have updated the manuscript text and model (Figure 6C) accordingly.

      In Figure S2, tax-6 RNAi appears to have a more detrimental effect in pmk-1 mutants than the other mutants. The authors should comment on this.

      We have added the following sentence in the manuscript (lines 123-125): “The knockdown of tax-6 appeared to have a more pronounced effect in pmk-1(km25) mutants than in other mutants, suggesting that inhibition of tax-6 might exacerbate the adverse effects observed in pmk-1(km25) mutants.”

      Reviewer #2 (Recommendations For The Authors):

      Line 192-193: The statement is confusing and not accurate because HLH-30 did not enhance lifespan with or without calcineurin (Figure 4A and S4A, also in Lapierre 2023). The takeaway should be along the lines of calcineurin inhibition enhancing lifespan through HLH-30 or HLH-30 being required for lifespan enhancement via calcineurin inhibition.

      We have removed this statement. We now state (lines 237-239): “Knockdown of tax-6 did not extend the lifespan of hlh-30(tm1978) animals (Figure 5A), indicating that HLH-30 is required for the increased lifespan observed with calcineurin inhibition.”

      Line 261: Similar to the point above. Where is the data showing NHR-8 increases lifespan with or without calcineurin?

      We have removed this sentence.

      Figure 1 legend line 699: animals per condition per replicate >90, but in the Method section Line 317, it says more than 80 animals per condition per replicate. Could be more accurate.

      We have now specified in the Methods section that the exact number of animals per condition is provided in the source data files. Since different lifespan curves within a given figure panel had varying numbers of animals, we have indicated the lower boundary for all curves (including the replicates). The precise number of animals for each lifespan experiment is available in the source data files.

      Figures 2F and G, "tax-6" should be labeled as "tax-6 RNAi" to be consistent with other figures.

      We thank the reviewer for this suggestion and have updated the label to “tax-6 RNAi”.

      In summary, we would like to thank the reviewers again for providing constructive critiques. We believe we have fully addressed all the concerns of the reviewers by carrying out several new experiments and modifying the text. The manuscript has undergone substantial revision and has thereby improved significantly. We do hope that the evidence in support of the conclusions is found to be complete in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      - The authors should think about revising the terminology used to describe electrophysiological data in zebrafish (Fig.5): "posterior" hair cells in a neuromast are sensitive to posterior-to-anterior flow, which is currently termed "anterior". This is confusing because when "posterior" or "anterior" is used, for instance in the labels of the figure, one may get confused about whether this applies to hair-cell position or directionality of the stimulus. It would help to always use clearer terminology for the stimulus (e.g. posterior-to-anterior (P-to-A) as in Kindig 2023, or "from the tail"). Also, the authors may want to clarify what we should see in Fig.5 demonstrating that posterior hair cells, with reversed hair-bundle polarity, actually evince transduction of similar magnitude as anterior hair cells, with normal polarity of their hair bundles. 

      This nomenclature can indeed be confusing. Per the reviewers request we have changed the terminology to always refer to the direction of flow sensed by the hair cells. For example, HCs that respond to posterior-directed flow or anterior-directed flow. We now denote these HCs as (A to P) and (P to A), respectively in the Figure for clarity. We have modified Figure 5, the Figure 5 legend and Results (starting line 339) to reflect these changes.

      In addition, in our results we now provide more context when comparing the response magnitude of the anterior-sensing hair cells in gpr156 mutants to the response magnitude of the two diVerent orientations of hair cells in controls.

      - Also, does it make sense that there is no defect in MET for mouse otolith organs with deleted GPR156, whereas there is a diVerence in the zebrafish lateral line? It would help motivate the study on mechanoelectrical transduction (see comment of Reviewer 1 below). 

      We previously discussed this point and recognized that subtle eVects remain possible in mouse (previously Discussion line 614). We have now  modified the text in the Discussion to better emphasize this point (new line 627). The Eatock lab is currently working on developing calcium imaging in the mouse utricle to revisit this question in a future study. "Subtle e)ects remain possible, however, given the variance in single-cell electrophysiological data from both control and mutant mice.  Nevertheless, current results are consistent with normal HC function in the Gpr156 mouse mutant, a prerequisite to interrogate how non-reversed HCs a)ects vestibular behavior."

      To help motivate transduction studies starting in the second Result paragraph, we added a transition at Line 205 that was indeed lacking:

      "Gpr156 inactivation could be a powerful model to specifically ask how HC reversal contributes to vestibular function. However, GPR156 may have other confounding roles in HCs besides regulating their orientation, similar to EMX2, which impacts mechanotransduction in zebrafish HCs (Kindig et al., 2023) and a)erent innervation  in mouse and zebrafish HCs (Ji et al., 2022; Ji et al., 2018)."

      (1) One overarching objective of this study was to use the Gpr156 KO model to discover how polarity reversal informs vestibular function (Introduction, overall summary in the last paragraph) . Pairing behavioral defects with hair cell orientation is only possible if hair cell transduction is normal, which had to be tested.

      (2) The notion that experiments that produced negative results are unecessary and are not properly motivated can only apply in retrospect. At early stages we performed electrophysiology because we did not know whether transduction would be normal in absence of GPR156. We also did not know whether innervation would be normal. The fact that both appear normal makes Gpr156 KO a better model to address the importance of orientation reversal (conclusion of the Discussion line 705).

      See also reply to Reviewer #1 below.

      Reviewer #1 (Recommendations For The Authors): 

      Fig1, panel B appears to show diVerent focal planes for Gpr156del/+ and Gpr156del/del. 

      Figure 1B had control and mutant panels at slightly diVerent focal planes indeed. We swapped the right (mutant) panel image and adjusted intensities in the control image to match adjustments of the new mutant image.  

      Given that this work is largely about polarity and connectivity to neurons, I do not understand the need to assess mechanosensitivity in Gpr156 mutants. Please explain in the text, as follows: "After establishing normal numbers and types of mouse vestibular HCs, we assessed whether HCs respond normally to hair bundle deflections in the absence of GPR156." We did this because... 

      Please see reply above in 'Recommendations for the authors' for comment about the need to assess mechanosensitivity. We agree that this transition was lacking, and we added an explanation as recommended:

      "Gpr156 inactivation could be a powerful model to specifically ask how HC reversal contributes to vestibular function. However, GPR156 may have other confounding roles in HCs besides regulating their orientation, similar to EMX2, which impacts mechanotransduction in zebrafish HCs (Kindig et al., 2023) and a)erent innervation  in mouse and zebrafish HCs (Ji et al., 2022; Ji et al., 2018)."

      Anyway, the data in Figures 2, 3 and 4 seems somewhat superfluous to the main message of the paper. 

      Please see reply above in 'Recommendations for the authors'. This data may appear superfluous in retrospect but we could not claim that behavioral changes in Gpr156 mutants reflect the role of the line of polarity reversal if, for example, hair cell transduction was abnormal. We had to perform experiments to figure this out. We were further motivated as data began to emerge from the zebrafish lateral line that showed eVects on HC transduction. Although we did not get positive results on this question in the mouse, we think the diVerence between models should be included as a significant part of the narrative.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for the constructive criticism and detailed assessment of our work which helped us to significantly improve our manuscript. We made significant changes to the text to better clarify our goals and approaches. To make our main goal of extracting the network dynamics clearer and to highlight the main advantage of our method in comparison with prior work we incorporated Videos 1-4 into the main text. We hope that these changes, together with the rest of our responses, convincingly demonstrate the utility of our method in producing results that are typically omitted from analysis by other methods and can provide important novel insights on the dynamics of the brain circuits. 

      Reviewer #1 (Public Review):

      (1) “First, this paper attempts to show the superiority of DyNetCP by comparing the performance of synaptic connectivity inference with GLMCC (Figure 2).”

      We believe that the goals of our work were not adequately formulated in the original manuscript that generated this apparent misunderstanding. As opposed to most of the prior work focused on reconstruction of static connectivity from spiking data (including GLMCC), our ultimate goal is to learn the dynamic connectivity structure, i.e. to extract time-dependent strength of the directed connectivity in the network. Since this formulation is fundamentally different from most of the prior work, therefore the goal here is not to show the “improvement” or “superiority” over prior methods that mostly focused on inference of static connectivity, but rather to thoroughly validate our approach and to show its usefulness for the dynamic analysis of experimental data. 

      (2) “This paper also compares the proposed method with standard statistical methods, such as jitter-corrected CCG (Figure 3) and JPSTH (Figure 4). It only shows that the results obtained by the proposed method are consistent with those obtained by the existing methods (CCG or JPSTH), which does not show the superiority of the proposed method.”

      The major problem for designing such a dynamic model is the virtual absence of ground-truth data either as verified experimental datasets or synthetic data with known time-varying connectivity. In this situation optimization of the model hyper-parameters and model verification is largely becoming a “shot in the dark”. Therefore, to resolve this problem and make the model generalizable, here we adopted a two-stage approach, where in the first step we learn static connections followed in the next step by inference of temporally varying dynamic connectivity. Dividing the problem into two stages enables us to separately compare the results of both stages to traditional descriptive statistical approaches. Static connectivity results of the model obtained in stage 1 are compared to classical pairwise CCG (Fig.2A,B) and GLMCC (Fig.2 C,D,E), while dynamic connectivity obtained in step 2 are compared to pairwise JPSTH (Fig.4D,E).

      Importantly, the goal here therefore is not to “outperform” the classical descriptive statistical or any other approaches, but rather to have a solid guidance for designing the model architecture and optimization of hyper-parameters. For example, to produce static weight results in Fig.2A,B that are statistically indistinguishable from the results of classical CCG, the procedure for the selection of weights which contribute to averaging is designed  as shown in Fig.9 and discussed in details in the Methods. Optimization of the L2 regularization parameter is illustrated in Fig.4 – figure supplement 1 that enables to produce dynamic weights very close to cJPSTH as evidenced by Pearson coefficient and TOST statistical tests. These comparisons demonstrate that indeed the results of CCG and JPSTH are faithfully reproduced by our model that, we conclude, is sufficient justification to apply the model to analyze experimental results. 

      (3) “However, the improvement in the synaptic connectivity inference does not seem to be convincing.”

      We are grateful for the reviewer to point out to this issue that we believe, as mentioned above, results from the deficiency of the original manuscript to clarify the major motivation for this comparison. Comparison of static connectivity inferred by stage 1 of our model to the results of GLMCC in Fig.2C,D,E is aimed at optimization of yet another two important parameters - the pair spike threshold and the peak height threshold. Here, in Fig. 2D we show that when the peak height threshold is reduced from rigorous 7 standard deviations (SD) to just 5 SD, our model recovers 74% of the ground truth connections that in fact is better than 69% produced by GLMCC for a comparable pair spike threshold of 80. As explained above, we do not intend to emphasize here that our model is “superior” since it was not our goal, but rather use this comparison to illustrate the approach for optimization of thresholds for units and pairs filtering as described in detail in Fig. 11 and corresponding section in Methods.

      To address these misunderstandings and better clarify the goal of our work we changed the text in the Introductory section accordingly. We also incorporated Videos 1-4 from the Supplementary Materials into the main text as Video 1, Video 2, Video 3, and Video 4. In fact, these videos represent the main advantage (or “superiority”) of our model with respect to prior art that enables to infer the time-dependent dynamics of network connectivity as opposed to static connections.

      (4) “While this paper compares the performance of DyNetCP with a state-of-the-art method (GLMCC), there are several problems with the comparison. For example: 

      (a) This paper focused only on excitatory connections (i.e., ignoring inhibitory neurons). 

      (b) This paper does not compare with existing neural network-based methods (e.g., CoNNECT: Endo et al. Sci. Rep. 2021; Deep learning: Donner et al. bioRxiv, 2024).

      (c) Only a population of neurons generated from the Hodgkin-Huxley model was evaluated.”

      (a) In general, the model of Eq.1 is agnostic to excitatory or inhibitory connections it can recover. In fact, Fig. 5 and Fig.6 illustrate inferred dynamic weights for both excitatory (red arrows) and inhibitory (blue arrows) connections between excitatory (red triangles) and inhibitory (blue circles) neurons. Similarly, inhibitory and excitatory dynamic interactions between connections are represented in Fig. 7 for the larger network across all visual cortices.

      (b) As stated above, the goal for the comparison of the static connectivity results of stage 1 of our model to other approaches is to guide the choice of thresholds and optimization of hyperparameters rather than claiming “superiority” of our model. Therefore, comparison with “static” CNN-based model of Endo et al. or ANN-based static model of Donner et al. (submitted to bioRxiv several months after our submission to eLife) is beyond the scope of this work. 

      (c) We have chosen exactly the same sub-population of neurons from the synthetic HH dataset of Ref. 26 that is used in Fig.6 of Ref. 26 that provides direct comparison of connections reconstructed by GLMCC in the original Ref.26 and the results of our model. 

      (5) “In summary, although DyNetCP has the potential to infer synaptic connections more accurately than existing methods, the paper does not provide sufficient analysis to make this claim. It is also unclear whether the proposed method is superior to the existing methods for estimating functional connectivity, such as jitter-corrected CCG and JPSTH. Thus, the strength of DyNetCP is unclear.”

      As we explained above, we have no intention to claim that our model is more accurate than existing static approaches. In fact, it is not feasible to have better estimation of connectivity than direct descriptive statistical methods as CCG or JPSTH. Instead, comparison with static (CCG and GLMCC) and temporal (JPSTH) approaches are used here to guide the choice of the model thresholds and to inform the optimization of hyper-parameters to make the prediction of the dynamic network connectivity reliable. The main strength of DyNetCP is inference of dynamic connectivity as illustrated in Videos 1-4. We demonstrated the utility of the method on the largest in-vivo experimental dataset available today and extracted the dynamics of cortical connectivity in local and global visual networks. This information is unattainable with any other contemporary methods we are aware of. 

      Reviewer #1 (Recommendations for the Authors):

      (6) “First, the authors should clarify the goal of the analysis, i.e., to extract either the functional connectivity or the synaptic connectivity. While this paper assumes that they are the same, it should be noted that functional connectivity can be different from synaptic connectivity (see Steavenson IH, Neurons Behav. Data Anal. Theory 2023).”

      The goal of our analysis is to extract dynamics of the spiking correlations. In this paper we intentionally avoided assigning a biological interpretation to the inferred dynamic weights. Our goal was to demonstrate that a trough of additional information on neural coding is hidden in the dynamics of neural correlations. The information that is typically omitted from the analysis of neuroscience data. 

      Biological interpretation of the extracted dynamic weights can follow the terminology of the shortterm plasticity between synaptically connected neurons (Refs 25, 33-37) or spike transmission strength (Refs 30-32,46). Alternatively, temporal changes in connection weights can be interpreted in terms of dynamically reconfigurable functional interactions of cortical networks (Refs 8-11,13,47) through which the information is flowing. We could not also exclude interpretation that combines both ideas. In any event our goal here is to extract these signals for a pair (video1, Fig.4), a cortical local circuit (Video 2, Fig.5), and for the whole visual cortical network (Videos 3, 4 and Fig.7). 

      To clarify this statement, we included a paragraph in the discussion section of the revised paper. 

      (7) “Finally, it would be valuable if the authors could also demonstrate the superiority of DyNetCP qualitatively. Can DyNetCP discover something interesting for neuroscientists from the large-scale in vivo dataset that the existing method cannot?”

      The model discovers dynamic time-varying changes in neuron synchronous spiking (Videos 1-4) that more traditional methods like CCG or GLMCC are not able to detect. The revealed dynamics is happening at the very short time scales of the order of just a few ms during the stimulus presentation. Calculations of the intrinsic dimensionality of the spiking manifold (Fig. 8) reveal that up to 25 additional dimensions of the neural code can be recovered using our approach. These dimensions are typically omitted from the analysis of the neural circuits using traditional methods.  

      Reviewer #2 (Public Review):

      (1) “Simulation for dynamic connectivity. It certainly seems doable to simulate a recurrent spiking network whose weights change over time, and I think this would be a worthwhile validation for this DyNetCP model. In particular, I think it would be valuable to understand how much the model overfits, and how accurately it can track known changes in coupling strength.”

      We are very grateful to the reviewer for this insight. Verification of the model on synthetic data with known time-varying connectivity would indeed be very useful. We did generate a synthetic dataset to test some of the model performance metrics - i.e. testing its ability to distinguish True Positive (TP) from False Positive (FP) “serial” or “common input” connections (Fig.10A,B). Comparison of dynamic and static weights might indeed help to distinguish TP connections from an artifactual FP connections. 

      Generating a large synthetic dataset with known dynamic connections that mimics interactions in cortical networks is, however, a separate and not very trivial task that is beyond the scope of this work. Instead, we designed a model with an architecture where overfitting can be tested in two consecutive stages by comparison with descriptive statistical approaches – CCG and JPSTH. Static stage 1 of the model predicts correlations that are statistically indistinguishable from the CCG results (Fig.2A,B). The dynamic stage 2 of the model produce dynamic weight matrices that faithfully reproduce the cJPSTH (Fig.4D,E). Calculated Pearson correlation coefficients and TOST testing enable optimizing the L2 regularization parameter as shown in Fig.4 – supplement 1 and described in detail in the Methods section. The ability to test results of both stages separately to descriptive statistical results is the main advantage of the chosen model architecture that allow to verify that the model does not overfit and can predict changes in coupling strength at least as good as descriptive statistical approaches (see also our answer above to the Reviewer #1 questions).

      (2) “If the only goal is "smoothing" time-varying CCGs, there are much easier statistical methods to do this (c.f. McKenzie et al. Neuron, 2021. Ren, Wei, Ghanbari, Stevenson. J Neurosci, 2022), and simulations could be useful to illustrate what the model adds beyond smoothing.”

      We are grateful to the reviewer for bringing up these very interesting and relevant references that we added to the discussion section in the paper. Especially of interest is the second one, that is calculating the time-varying CCG weight (“efficacy” in the paper terms) on the same Allen Institute Visual dataset as our work is using. It is indeed an elegant way to extract time-variable coupling strength that is similar to what our model is generating. The major difference of our model from that of Ren et al., as well as from GLMCC and any statistical approaches is that the DyNetCP learns connections of an entire network jointly in one pass, rather than calculating coupling separately for each pair in the dataset without considering the relative influence of other pairs in the network. Hence, our model can infer connections beyond pairwise (see Fig. 11 and corresponding discussion in Methods) while performing the inferences with computational efficiency. 

      (3) “Stimulus vs noise correlations. For studying correlations between neurons in sensory systems that are strongly driven by stimuli, it's common to use shuffling over trials to distinguish between stimulus correlations and "noise" correlations or putative synaptic connections. This would be a valuable comparison for Figure 5 to show if these are dynamic stimulus correlations or noise correlations. I would also suggest just plotting the CCGs calculated with a moving window to better illustrate how (and if) the dynamic weights differ from the data.”

      Thank you for this suggestion. Note that for all weight calculations in our model a standard jitter correction procedure of Ref. 33 Harrison et al., Neural Com 2009 is first implemented to mitigate the influences of correlated slow fluctuations (slow “noise”). Please also note that to obtain the results in Fig. 5 we split the 440 total experimental trials for this session (when animal is running, see Table 1) randomly into 352 training and 88 validation trials by selecting 44 training trials from each configuration of contrast or grating angle and 11 for validation. We checked that this random selection, if changed, produced the very same results as shown in Fig.5. 

      Comparison of descriptive statistical results of pairwise cJPSTH and the model are shown in Fig. 4D,E. The difference between the two is characterized in Fig.4 – supplement 1 in detail as evidenced by Pearson coefficient and TOST statistical tests.

      Reviewer #2 (Recommendations for the Authors):

      (4) “The method is described as "unsupervised" in the abstract, but most researchers would probably call this "supervised" (the static model, for instance, is logistic regression).”

      The model architecture is composed of two stages to make parameter optimization grounded. While the first stage is regression, the second and the most important stage is not. Therefore, we believe the term “unsupervised” is justified. 

      (5) “Introduction - it may be useful to mention that there have been some previous attempts to describe time-varying connectivity from spikes both with probabilistic models: Stevenson and Kording, Neurips (2011), Linderman, Stock, and Adams, Neurips (2014), Robinson, Berger, and Song, Neural Computation (2016), Wei and Stevenson, Neural Comp (2021) ... and with descriptive statistics: Fujisawa et al. Nat Neuroscience (2008), English et al. Neuron (2017), McKenzie et al. Neuron (2021).”

      We are very grateful to both reviewers for bringing up these very interesting and relevant references that we gladly included in the discussions within the Introduction and Discussion sections. 

      (6) “In the section "Static connectivity inferred by the DyNetCP from in-vivo recordings is biologically interpretable"... I may have missed it, but how is the "functional delay" calculated? And am I understanding right that for the DyNetCP you are just using [w_i\toj, w_j\toi] in place of the CCG?”

      The functional delay is calculated as a time lag of the maximum (or minimum) in the CCG (or static weight matrix). The static weight that the model is extracting is indeed the wiwj product. We changed the text in this section to better clarify these definitions. 

      (7) “P14 typo "sparce spiking" sparse”

      Fixed. Thank you. 

      (8) “Suggest rewarding "Extra-laminar interactions reveal formation of neuronal ensembles with both feedforward (e.g., layer 4 to layer 5), and feedback (e.g., layer 5 to layer 4) drives." I'm not sure this method can truly distinguish common input from directed, recurrent cortical effects. Just as an example in Figure 5, it looks like 2->4, 0->4, and 3>2 are 0 lag effects. If you wanted to add the "functional delay" analysis to this laminar result that could support some stronger claims about directionality, though.”

      The time lags for the results of Fig. 5 are indeed small, but, however, quantifiable. Left panel Fig. 5A shows static results with the correlation peaks shifted by 1ms from zero lag.

      (9) “Methods - I think it would be useful to mention how many parameters the full DyNetCP model has.”

      Overall, after the architecture of Fig.1C is established, dynamic weight averaging procedure is selected (Fig.9), and Fourier features are introduced (Fig.10), there is just a few parameters to optimize including L2 regularization (Fig.4 – supplement 1) and loss coefficient  (Fig.1 – figure supplement 1A). Other variables, common for all statistical approaches, include bin sizes in the lag time and in the trial time. Decreasing the bin size will improve time resolution while decreasing the number of spikes in each bin for reliable inference. Therefore, number of spikes threshold and other related thresholds α𝑠 , α𝑤 , α𝑝 as well as λ𝑖λ𝑗, need to be adjusted accordingly (Fig.11) as discussed in detail in the Methods, Section 4. We included this sentence in the text. 

      (10) “It may be useful to also mention recent results in mice (Senzai et al. Neuron, 2019) and monkeys (Trepka...Moore. eLife, 2022) that are assessing similar laminar structures with CCGs.”

      Thank you for pointing out these very interesting references. We added a paragraph in “Dynamic connectivity in VISp primary visual area” section comparing our results with these findings. In short, we observed that connections are distributed across the cortical depth with nearly the same maximum weights (Fig.7A) that is inconsistent with observed in Trepka et al, 2022 greatly diminished static connection efficacy within <200µm from the source. It is consistent, however, with the work of Senzai et al, 2019 that reveals much stronger long-distance correlations between layer 2/3 and layer 5 during waking in comparison to sleep states. In both cases these observations represent static connections averaged over a trial time, while the results presented in Video 3 and Fig.7A show strong temporal modulation of the connection strength between all the layers during the stimulus presentation. Therefore, our results demonstrate that tracking dynamic connectivity patterns in local cortical networks can be invaluable in assessing circuitlevel dynamic network organization.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors utilize recurrent neural networks (RNNs) to explore the question of when and how neural dynamics and the network's output are related from a geometrical point of view. The authors found that RNNs operate between two extremes: an 'aligned' regime in which the weights and the largest PCs are strongly correlated and an 'oblique' regime where the output weights and the largest PCs are poorly correlated. Large output weights led to oblique dynamics, and small output weights to aligned dynamics. This feature impacts whether networks are robust to perturbation along output directions. Results were linked to experimental data by showing that these different regimes can be identified in neural recordings from several experiments.

      Strengths:

      A diverse set of relevant tasks.

      A well-chosen similarity measure.

      Exploration of various hyperparameter settings.

      Weaknesses:

      One of the major connections found BCI data with neural variance aligned to the outputs.

      Maybe I was confused about something, but doesn't this have to be the case based on the design of the experiment? The outputs of the BCI are chosen to align with the largest principal components of the data.

      The reviewer is correct. We indeed expected the BCI experiments to yield aligned dynamics. Our goal was to use this as a comparison for other, non-BCI recordings in which the correlation is smaller, i.e. dynamics closer to the oblique regime. We adjusted our wording accordingly and added a small discussion at the end of the experimental results, Section 2.6.

      Proposed experiments may have already been done (new neural activity patterns emerge with long-term learning, Oby et al. 2019). My understanding of these results is that activity moved to be aligned as the manifold changed, but more analyses could be done to more fully understand the relationship between those experiments and this work.

      The on- vs. off-manifold experiments are indeed very close to our work. On-manifold initializations, as stated above, are expected to yield aligned solutions. Off-manifold initializations allow, in principle, for both aligned and oblique solutions and are thus closer to our RNN simulations. If, during learning, the top PCs (dominant activity) rotate such that they align with the pre-defined output weights, then the system has reached an aligned solution. If the top PCs hardly change, and yet the behavior is still good, this is an oblique solution. There is some indication of an intermediate result (Figure 4C in Oby et al.), but the existing analysis there did not fully characterize these properties. Furthermore, our work suggests that systematically manipulating the norm of readout weights in off-manifold experiments can yield new insights. We thus view these as relevant results but suggest both further analysis and experiments. We rewrote the corresponding section in the discussion to include these points.

      Analysis of networks was thorough, but connections to neural data were weak. I am thoroughly convinced of the reported effect of large or small output weights in networks. I also think this framing could aid in future studies of interactions between brain regions.

      This is an interesting framing to consider the relationship between upstream activity and downstream outputs. As more labs record from several brain regions simultaneously, this work will provide an important theoretical framework for thinking about the relative geometries of neural representations between brain regions.

      It will be interesting to compare the relationship between geometries of representations and neural dynamics across connected different brain areas that are closer to the periphery vs. more central.

      It is exciting to think about the versatility of the oblique regime for shared representations and network dynamics across different computations.

      The versatility of the oblique regime could lead to differences between subjects in neural data.

      Thank you for the suggestions. Indeed, this is precisely why relative measures of the regime are valuable, even in the absence of absolute thresholds for regimes. We included your suggestions in the discussion.

      Reviewer #2 (Public Review):

      Summary:

      This paper tackles the problem of understanding when the dynamics of neural population activity do and do not align with some target output, such as an arm movement. The authors develop a theoretical framework based on RNNs showing that an alignment of neural dynamics to output can be simply controlled by the magnitude of the read-out weight vector while the RNN is being trained. Small magnitude vectors result in aligned dynamics, where low-dimensional neural activity recapitulates the target; large magnitude vectors result in "oblique" dynamics, where encoding is spread across many dimensions. The paper further explores how the aligned and oblique regimes differ, in particular, that the oblique regime allows degenerate solutions for the same target output.

      Strengths:

      - A really interesting new idea that different dynamics of neural circuits can arise simply from the initial magnitude of the output weight vector: once written out (Eq 3) it becomes obvious, which I take as the mark of a genuinely insightful idea.

      - The offered framework potentially unifies a collection of separate experimental results and ideas, largely from studies of the motor cortex in primates: the idea that much of the ongoing dynamics do not encode movement parameters; the existence of the "null space" of preparatory activity; and that ongoing dynamics of the motor cortex can rotate in the same direction even when the arm movement is rotating in opposite directions.

      - The main text is well written, with a wide-ranging set of key results synthesised and illustrated well and concisely.

      - The study shows that the occurrence of the aligned and oblique regimes generalises across a range of simulated behavioural tasks.

      - A deep analytical investigation of when the regimes occur and how they evolve over training.

      - The study shows where the oblique regime may be advantageous: allows multiple solutions to the same problem; and differs in sensitivity to perturbation and noise.

      - An insightful corollary result that noise in training is needed to obtain the oblique regime.

      - Tests whether the aligned and oblique regimes can be seen in neural recordings from primate cortex in a range of motor control tasks.

      Weaknesses:

      - The magnitude of the output weights is initially discussed as being fixed, and as far as I can tell all analytical results (sections 4.6-4.9) also assume this. But in all trained models that make up the bulk of the results (Figures 3-6) all three weight vectors/matrices (input, recurrent, and output) are trained by gradient descent. It would be good to see an explanation or results offered in the main text as to why the training always ends up in the same mapping (small->aligned; large->oblique) when it could, for example, optimise the output weights instead, which is the usual target (e.g. Sussillo & Abbott 2009 Neuron).

      We understand the reviewer’s surprise. We chose a typical setting (training all weights of an RNN with Adam) to show that we don’t have to fine-tune the setting (e.g. by fixing the output weights) to see the two regimes. However, other scenarios in which the output weights do change are possible, depending on the algorithm and details in the way the network is parameterized. Understanding why some settings lead to our scenario (no change in scale) and others don’t is not a simple question. A short explanation here, nonetheless:

      - Small changes to the internal weights are sufficient to solve the tasks.

      - Different versions of gradient descent and different ways of parametrizing the network lead to different results in which parts of the weights get trained. This goes in particular for how weight scales are introduced, e.g. [Jacot et al. 2018 Neurips], [Geiger et al. 2020 Journal of Statistical Mechanics], or [Yang, Hu 2020, arXiv, Feature learning in infinite-width networks]. One insight from these works is that plain gradient descent (GD) with small output weights leads to learning only at the output (and often divergence or unsuccessful learning). For this reason, plain GD (or stochastic GD) is not suitable for small output weights (the aligned regime). Other variants of GD, such as Adam or RMSprop, don’t have this problem because they shift the emphasis of learning to the hidden layers (here the recurrent weights). This is due to the normalization of the gradients.

      - FORCE learning [Sussillo & Abbott 2009] is somewhat special in that the output weights are simultaneously also used as feedback weights. That is, not only the output weights but also an additional low-rank feedback loop through these output weights is trained. As a side note: By construction, such a learning algorithm thus links the output directly to the internal dynamics, so that one would only expect aligned solutions – and the output weights remain correspondingly small in these algorithms [Mastrogiuseppe, Ostojic, 2019, Neural Comp].

      - In our setting, the output is not fed back to the network, so training the output alone would usually not suffice. Indeed, optimizing just the output weights is similar to what happens in the lazy training regime. These solutions, however, are not robust to noise, and we show that adding noise during the training does away with these solutions.

      To address this issue in the manuscript, we added the following sentence to section 2.2: “While explaining this observation is beyond the scope of this work, we note that (1) changing the internal weights suffices to solve the task, and that (2) the extent to which the output weights change during learning depends on the algorithm and specific parametrization [21, 27, 85].”

      - It is unclear what it means for neural activity to be "aligned" for target outputs that are not continuous time-series, such as the 1D or 2D oscillations used to illustrate most points here.

      Two of the modeled tasks have binary outputs; one has a 3-element binary vector.

      For any dynamics and output, we compare the alignment between the vector of output weights and the main PCs (the leading component of the dynamics). In the extreme of binary internal dynamics, i.e., two points {x_1, x_2}, there would only be one leading PC (the line connecting the two points, i.e. the choice decoder).

      - It is unclear what criteria are used to assign the analysed neural data to the oblique or aligned regimes of dynamics.

      Such an assignment is indeed difficult to achieve. The RNN models we showed were at the extremes of the two regimes, and these regimes are well characterized in the case of large networks (as described in the methods section). For the neural data, we find different levels of alignment for different experiments. These differences may not be strong enough to assign different regimes. Instead, our measures (correlation and relative fitting dimension) allow us to order the datasets. Here, the BCI data is more aligned than non-BCI data – perhaps unsurprisingly, given the experimental design of the prior and the previous findings for the rotation task [Russo et al, 2018]. We changed the manuscript accordingly, now focusing on the relative measure of alignment, even in the absence of absolute thresholds. We are curious whether future studies with more data, different tasks, or other brain regions might reveal stronger differentiation towards either extreme.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There's so much interesting content in the supplement - it seemed like a whole other paper! It is interesting to read about the dynamics over the course of learning. Maybe you want to put this somewhere else so that more people read it?

      We are glad the reviewer appreciated this content. We think developing these analysis methods is essential for a more complete understanding of the oblique regime and how it arises, and that it should therefore be part of the current paper.

      Nice schematic in Figure 1.

      There were some statements in the text highlighting co-rotation in the top 2 PCs for oblique networks. Figure 4a looks like aligned networks might also co-rotate in a particular subspace that is not highlighted. I could be wrong, but the authors should look into this and correct it if so. If both aligned and oblique networks have co-rotation within the top 5 or so PCs, some text should be updated to reflect this.

      This is indeed the case, thanks for pointing this out! For one example, there is co-rotation for the aligned network already in the subspace spanned by PCs 1 and 3, see the figure below. We added a sentence indicating that co-rotation can take place at low-variance PCs for the aligned regime and pointed to this figure, which we added to the appendix (Fig. 17).

      While these observations are an important addition, we don’t think they qualitatively alter our results, particularly the stronger dissociation between output and internal dynamics for oblique than aligned dynamics.

      Figure 4 color labels were 'dark' and 'light'. I wasn't sure if this was a typo or if it was designed for colorblind readers? Either way, it wasn't too confusing, but adding more description might be useful.

      Fixed to red and yellow.

      Typo "Aligned networks have a ratio much large than one"

      Typo "just started to be explored" Typo "hence allowing to test"

      Fixed all typos.

      Reviewer #2 (Recommendations For The Authors):

      - Explain/discuss in the main text why the initial output weights reliably result in the required internal RNN dynamics (small->aligned; large->oblique) after training. The magnitude of the output weights is initially discussed as being fixed, and as far as I can tell all analytical results (sections 4.6-4.9) also assume this. But in all trained models that make up the bulk of the results (Figures 3-6) all three weight vectors/matrices (input, recurrent, and output) are trained by gradient descent. It would be good to see an explanation or results offered in the main text as to why the training always ends up in the same mapping (small->aligned; large->oblique) when it could, for example, just optimise the output weights instead.

      See the answer to a similar comment by Reviewer #1 above.

      - Page 6: explain the 5 tasks.

      We added a link to methods where the tasks are described.

      - Page 6/Fig 3 & Methods: explain assumptions used to compute a reconstruction R^2 between RNN PCs and a binary or vector target output.

      We added a new methods section, 4.4, where we explain the fitting process in Fig. 3. For all tasks, the target output was a time series with P specified target values in N_out dimensions. We thus always applied regression and did not differentiate between binary and non-binary tasks.

      - Page 8: methods and predictions are muddled up: paragraph ending "along different directions" should be followed by paragraph starting "Our intuition...". The intervening paragraph ("We apply perturbations...") should start after the first sentence of the paragraph "To test this,...".

      Right, these sentences were muddled up indeed. We put them in the correct order.

      - Page 10: what are the implications of the differences in noise alignment between the aligned and oblique regimes?

      The noise suppression in the oblique regime is a slow learning process that gradually renders the solution more stable. With a large readout, learning separates into two phases. An early phase, in which a “lazy” solution is learned quickly. This solution is not robust to noise. In a second, slower phase, learning gradually leads to a more robust solution: the oblique solution. The main text emphasizes the result of this process (noise suppression). In the methods, we closely follow this process. This process is possibly related to other slow learning process fine-tuning solutions, e.g., [Blanc et al. 2020, Li et al. 2021, Yang et al. 2023]. Furthermore, it would be interesting to see whether such fine-tuning happens in animals [Ratzon et al. 2024]. We added corresponding sentences to the discussion.

      - Neural data analysis:

      (i) Page 11 & Fig 7: the assignment of "aligned" or "oblique" to each neural dataset is based on the ratio of D_fit/D_x. But in all cases this ratio is less than 1, indicating fewer dimensions are needed for reconstruction than for explaining variance. Given the example in Figure 2 suggests this is an aligned regime, why assign any of them as "oblique"?

      We weakened the wording in the corresponding section, and now only state that BCI data leans more towards aligned, non-BCI data more towards oblique. This is consistent with the intuition that BCI is by construction aligned (decoder along largest PCs) and non-BCI data already showed signs of oblique dynamics (co-rotating leading PCs in the cycling task, Russo et al. 2018).

      We agree that Fig 2 (and Fig 3) could suggest distinguishing the regimes at a threshold D_fit/D_x = 1, although we hadn’t considered such a formal criterion.

      (ii) Figure 23 and main text page 11: discuss which outputs for NLB and BCI datasets were used in Figure 7 & and main text; the NLB results vary widely by output type - discuss in the main text; D_fit for NLB-maze-accuracy is missing from panel D; as the criterion is D_fit/D_x, plot this too.

      We now discuss which outputs were used in Fig. 7 in its caption: the velocity of the task-relevant entity (hand/finger/cursor). This was done to have one quantity across studies. We added a sentence to the main text, p. 11, which points to Fig 22 (which used to be Fig 23) and states that results are qualitatively similar for other decoded outputs, despite some fluctuations in numerical values and decodability.

      Regarding Fig 22: D_fit for NLB-maze-accuracy was beyond the manually set y-limit (for visibility of the other data points). We also extended the figure to include D_fit/D_x. We also discovered a small bug in the analysis code which required us to rerun the analysis and reproduce the plots. This also changed some of the numbers in the main text.

      - Discussion:

      "They do not explain why it [the "irrelevant activity"] is necessary", implies that the following sentence(s) will explain this, but do not. Instead, they go on to say:

      "Here, we showed that merely ensuring stability of neural dynamics can lead to the oblique regime": this does not explain why it is necessary, merely that it exists; and it is unclear what results "stability of neural dynamics" is referring to.

      We agree this was not a very clear formulation. We replaced these last three sentences with the following:

      “Our study systematically explains this phenomenon: generating task-related output in the presence of large, task-unrelated dynamics requires large readout weights. Conversely, in the presence of large output weights, resistance to noise or perturbations requires large, potentially task-unrelated neural dynamics (the oblique regime).”

      - The need for all 27 figures was unclear, especially as some seemed not to be referenced or were referenced out of order. Please check and clarify.

      Fig 16 (Details for network dynamics in cycling tasks) and Fig 21 (loss over learning time for the different tasks) were not referenced, and are now removed.

      We also reordered the figures in the appendix so that they would appear in the order they are referenced. Note that we added another figure (now Fig. 17) following a question from Reviewer #1.

    1. Author Response:

      Reviewer #1 (Public review):

      In this study, Deshmukh et al. provide an elegant illustration of Haldane's sieve, the population genetics concept stating that novel advantageous alleles are more likely to fix if dominant because dominant alleles are more readily exposed to selection. To achieve this, the authors rely on a uniquely suited study system, the female-polymorphic butterfly Papilio polytes.

      Deshmukh et al. first reconstruct the chronology of allele evolution in the P. polytes species group, clearly establishing the non-mimetic cyrus allele as ancestral, followed by the origin of the mimetic allele polytes/theseus, via a previously characterized inversion of the dsx locus, and most recently, the origin of the romulus allele in the P. polytes lineage, after its split from P. javanus. The authors then examine the two crucial predictions of Haldane's sieve, using the three alleles of P. polytes (cyrus, polytes, and romulus). First, they report with compelling evidence that these alleles are sequentially dominant, or put in other words, novel adaptive alleles either are or quickly become dominant upon their origin. Second, the authors find a robust signature of positive selection at the dsx locus, across all five species that share the polytes allele.

      In addition to exquisitely exemplifying Haldane's sieve, this study characterizes the genetic differences (or lack thereof) between mimetic alleles at the dsx locus. Remarkably, the polytes and romulus alleles are profoundly differentiated, despite their short divergence time (< 0.5 my), whereas the polytes and theseus alleles are indistinguishable across both coding and intronic sequences of dsx. Finally, the study reports incidental evidence of exon swaps between the polytes and romulus alleles. These exon swaps caused intermediate colour patterns and suggest that (rare) recombination might be a mechanism by which novel morphs evolve.

      This study advances our understanding of the evolution of the mimicry polymorphism in Papilio butterflies. This is an important contribution to a system already at the forefront of research on the genetic and developmental basis of sex-specific phenotypic morphs, which are common in insects. More generally, the findings of this study have important implications for how we think about the molecular dynamics of adaptation. In particular, I found that finding extensive genetic divergence between the polytes and romulus alleles is striking, and it challenges the way I used to think about the evolution of this and other otherwise conserved developmental genes. I think that this study is also a great resource for teaching evolution. By linking classic population genetic theory to modern genomic methods, while using visually appealing traits (colour patterns), this study provides a simple yet compelling example to bring to a classroom.

      In general, I think that the conclusions of the study, in terms of the evolutionary history of the locus, the dominance relationships between P. polytes alleles, and the inference of a selective sweep in spite of contemporary balancing selection, are strongly supported; the data set is impressive and the analyses are all rigorous. I nonetheless think that there are a few ways in which the current presentation of these data could lead to confusion, and should be clarified and potentially also expanded.

      We thank the reviewer for the kind and encouraging assessment of our work.

      (1) The study is presented as addressing a paradox related to the evolution of phenotypic novelty in "highly constrained genetic architectures". If I understand correctly, these constraints are assumed to arise because the dsx inversion acts as a barrier to recombination. I agree that recombination in the mimicry locus is reduced and that recombination can be a source of phenotypic novelty. However, I'm not convinced that the presence of a structural variant necessarily constrains the potential evolution of novel discrete phenotypes. Instead, I'm having a hard time coming up with examples of discrete phenotypic polymorphisms that do not involve structural variants. If there is a paradox here, I think it should be more clearly justified, including an explanation of what a constrained genetic architecture means. I also think that the Discussion would be the place to return to this supposed paradox, and tell us exactly how the observations of exon swaps and the genetic characterization of the different mimicry alleles help resolve it.

      The paradox that we refer to here is essentially the contrast of evolving new adaptive traits which are genetically regulated, while maintaining the existing adaptive trait(s) at its fitness peak. While one of the mechanisms to achieve this could be differential structural rearrangement at the chromosomal level, it could arise due to alternative alleles or splice variants of a key gene (caste determination in Cardiocondyla ants), and differential regulation of expression (the spatial regulation of melanization in Nymphalid butterflies by ivory lncRNA). In each of these cases, a new mutation would have to give rise to a new phenotype without diluting the existing adaptive traits when it arises. We focused on structural variants, because that was the case in our study system, however, the point we were making referred to evolution of novel traits in general. We will add a section in the revised discussion to address this.

      (2) While Haldane's sieve is clearly demonstrated in the P. polytes lineage (with cyrus, polytes, and romulus alleles), there is another allele trio (cyrus, polytes, and theseus) for which Haldane's sieve could also be expected. However, the chronological order in which polytes and theseus evolved remains unresolved, precluding a similar investigation of sequential dominance. Likewise, the locus that differentiates polytes from theseus is unknown, so it's not currently feasible to identify a signature of positive selection shared by P. javanus and P. alphenor at this locus. I, therefore, think that it is premature to conclude that the evolution of these mimicry polymorphisms generally follows Haldane's sieve; of two allele trios, only one currently shows the expected pattern.

      We agree with the reviewer that the genetic basis of f. theseus requires further investigation. f. theseus occupies the same level on the dominance hierarchy of dsx alleles as f. polytes (Clarke and Sheppard, 1972) and the allelic variant of dsx present in both these female forms is identical, so there exists just one trio of alleles of dsx. Based on this evidence, we cannot comment on the origin of forms theseus and polytes. They could have arisen at the same time or sequentially. Since our paper is largely focused on the sequential evolution of dsx alleles through Haldane’s sieve, we have included f. theseus in our conclusions. We think that it fits into the framework of Haldane’s sieve due to its genetic dominance over the non-mimetic female form. However, this aspect needs to be explored further in a more specific study focusing on the characterization, origin, and developmental genetics of f. theseus in the future.

      Reviewer #2 (Public review):

      Summary:

      Deshmukh and colleagues studied the evolution of mimetic morphs in the Papilio polytes species group. They investigate the timing of origin of haplotypes associated with different morphs, their dominance relationships, associations with different isoform expressions, and evidence for selection and recombination in the sequence data. P. polytes is a textbook example of a Batesian mimic, and this study provides important nuanced insights into its evolution, and will therefore be relevant to many evolutionary biologists. I find the results regarding dominance and the sequence of events generally convincing, but I have some concerns about the motivation and interpretation of some other analyses, particularly the tests for selection.

      We thank the reviewer for these insightful remarks.

      Strengths:

      This study uses widespread sampling, large sample sizes from crossing experiments, and a wide range of data sources.

      We appreciate this point. This strength has indeed helped us illuminate the evolutionary dynamics of this classic example of balanced polymorphism.

      Weaknesses:

      (1) Purpose and premise of selective sweep analysis

      A major narrative of the paper is that new mimetic alleles have arisen and spread to high frequency, and their dominance over the pre-existing alleles is consistent with Haldane's sieve. It would therefore make sense to test for selective sweep signatures within each morph (and its corresponding dsx haplotype), rather than at the species level. This would allow a test of the prediction that those morphs that arose most recently would have the strongest sweep signatures.

      Sweep signatures erode over time - see Figure 2 of Moest et al. 2020 (https://doi.org/10.1371/journal.pbio.3000597), and it is unclear whether we expect the signatures of the original sweeps of these haplotypes to still be detectable at all. Moest et al show that sweep signatures are completely eroded by 1N generations after the event, and probably not detectable much sooner than that, so assuming effective population sizes of these species of a few million, at what time scale can we expect to detect sweeps? If these putative sweeps are in fact more recent than the origin of the different morphs, perhaps they would more likely be associated with the refinement of mimicry, but not necessarily providing evidence for or against a Haldane's sieve process in the origin of the morphs.

      Our original plan was to perform signatures of sweeps on individual morphs, but we have very small sample sizes for individual morphs in some species, which made it difficult to perform the analysis. We agree that signatures of selective sweeps cannot give us an estimate of possible timescales of the sweep. They simply indicate that there may have been a sweep in a certain genomic region. Therefore, with just the data from selective sweeps, we cannot determine whether these occurred with refining of mimicry or the mimetic phenotype itself. We have thus made no interpretations regarding time scales or causal events of the sweep. Additionally, we discuss the results we obtained for individual alleles represent what could have occurred at the point of origin of mimetic resemblance or in the course of perfecting the resemblance, although we cannot differentiate between the two at this point (lines 320 to 333).

      (2) Selective sweep methods

      A tool called RAiSD was used to detect signatures of selective sweeps, but this manuscript does not describe what signatures this tool considers (reduced diversity, skewed frequency spectrum, increased LD, all of the above?). Given the comment above, would this tool be sensitive to incomplete sweeps that affect only one morph in a species-level dataset? It is also not clear how RAiSD could identify signatures of selective sweeps at individual SNPs (line 206). Sweeps occur over tracts of the genome and it is often difficult to associate a sweep with a single gene.

      RAiSD (https://www.nature.com/articles/s42003-018-0085-8) detects selective sweeps using the μ statistic, which is a combined score of SFS, LD, and genetic diversity along a chromosome. The tool is quite sensitive and is able to detect soft sweeps. RAiSD can use a VCF variant file comprising of SNP data as input and uses an SNP-driven sliding window approach to scan the genome for signatures of sweep. Using an SNP file instead of runs of sequences prevents repeated calculations in regions that are sparse in variants, thereby optimizing execution time. Due to the nature of the input we used, the μ statistic was also calculated per site. We then tried to annotate the SNPs based on which genes they occur in and found that all species showing mimicry had atleast one site that showed a signature of sweep contained within the dsx locus.

      (3) Episodic diversification

      Very little information is provided about the Branch-site Unrestricted Statistical Test for Episodic Diversification (BUSTED) and Mixed Effects Model of Evolution (MEME), and what hypothesis the authors were testing by applying these methods. Although it is not mentioned in the manuscript, a quick search reveals that these are methods to study codon evolution along branches of a phylogeny. Without this information, it is difficult to understand the motivation for this analysis.

      We thank you for bringing this to our notice, we will add a few lines in the Methods about the hypothesis we were testing and the motivation behind this analysis. We will additionally cite a previous study from our group which used these and other methods to study the molecular evolution of dsx across insect lineages.

      (4) GWAS for form romulus

      The authors argue that the lack of SNP associations within dsx for form romulus is caused by poor read mapping in the inverted region itself (line 125). If this is true, we would expect strong association in the regions immediately outside the inversion. From Figure S3, there are four discrete peaks of association, and the location of dsx and the inversion are not indicated, so it is difficult to understand the authors' interpretation in light of this figure.

      We indeed observe the regions flanking dsx showing the highest association in our GWAS. This is a bit tricky to demonstrate in the figure as the genome is not assembled at the chromosome level. However, the association peaks occur on scf 908437033 at positions 2192979, 1181012 and 1352228 (Fig. S3c, Table S3) while dsx is located between 1938098 and 2045969. We will add the position of dsx in the figure legend of the revised manuscript.

      (5) Form theseus

      Since there appears to be only one sequence available for form theseus (actually it is said to be "P. javanus f. polytes/theseus"), is it reasonable to conclude that "the dsx coding sequence of f. theseus was identical to that of f. polytes in both P. javanus and P. alphenor" (Line 151)? Looking at the Clarke and Sheppard (1972) paper cited in the statement that "f. polytes and f. theseus show equal dominance" (line 153), it seems to me that their definition of theseus is quite different from that here. Without addressing this discrepancy, the results are difficult to interpret.

      Among P. javanus individuals sampled by us, we obtained just one individual with f. theseus and the H P allele, however, in the data we added from a previously published study (Zhang et. al. 2017), we were able to add nine more individuals of this form (Fig. S4b and S7), while we did not show these individuals in Fig 3 (which was based on PCR amplification and sequencing of individual exons od dsx), all the analysis with sequence data was performed on 10 theseus individuals in total. In Zhang et. al. the authors observed what we now know are species specific differences when comparing theseus and polytes dsx alleles and not allele-specific differences. Our observations were consistent with these findings.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Question 1: The experiment that utilizes lactose or glucose supplementation to infer the importance of carbohydrate recognition by galectin-9 cannot be interpreted unequivocally owing to the growth-enhancing effect of lactose supplementation on Mtb during liquid culture in vitro.

      Thank you for this very constructive comment. We repeated the experiments by lowering the concentration of lactose or AG from 10 μg/mL to 1 μg/mL. We found that low concentration of lactose or AG showed neglectable effect on Mtb growth, however, they still reversed the inhibitory effect of galectin-9 on mycobacterial growth (revised Fig. 2A, C). Therefore, we consider that the supplementation of lactose or AG reverse galectin-9 mediated inhibition of Mtb growth largely through its carbohydrate recognition rather than their growth-enhancing effect.

      Question 2: Similar to the comment above, the apparent dose-independent effect of galectin-9 on Mtb growth in vitro is difficult to reconcile with the interpretation that galectin is functioning as claimed.

      We thank the reviewer for the correction. Indeed, as the reviewer pointed out, galectin-9 inhibits Mtb growth in dose-independent manner. We had corrected the claim in the revised manuscript (Line 114).

      Question 3: The claimed differences in galectin-9 concentration in sera from tuberculin skin test (TST)-negative or TST-positive non-TB cases versus active TB patients are not immediately apparent from the data presented.

      We appreciate your concern. Previous samples are from a cohort set up in Max Plank Institute for Infection Biology. We have performed the detection of galectin-9 in sera in another independent cohort of active TB patients and healthy donors in China. And we found higher abundance of galectin-9 in serum from TB patients than that from heathy donors (revised Fig. 1E).

      Question 4: Neither fluorescence microscopy nor electron microscopy analyses are supported by high-quality, interpretable images which, in the absence of supporting quantitative data, renders any claims of anti-AG mAb specificity (fluorescence microscopy) or putative mAb-mediated cell wall swelling (electron microscopy) highly speculative.

      We appreciate your concern. We have improved the procedure of the immunofluorescence assay and obtained high-quality and interpretable images with quantitative data (revised Fig. 4F). As for electron microscopy analyses, we added clearer label indicating cell wall in revised manuscript (revised Fig. 7C).

      Question 5: Finally, the absence of any discussion of how anti-AG antibodies (similarly, galectin-9) gain access to the AG layer in the outer membrane of intact Mtb bacilli (which may additionally possess an extracellular capsule/coat) is a critical omission - situating these results in the context of current knowledge about Mtb cellular structure (especially the mycobacterial outer membrane) is essential for plausibility of the inferred galectin-9 and anti-AG mAb activities.

      Exactly, AG is hidden by mycolic acids in the outer layer of Mtb cell wall. As we have discussed in the Discussion part of previous manuscript (line 285), we speculate that during Mtb replication, cell wall synthesis is active and AG becomes exposed, thereby facilitating its binding to galectin-9 or AG antibody and leading to Mtb growth arrest. It’s highly possible that galectin-9 or AG antibody targets replicating Mtb.

      To Reviewer #2 (Public Review):

      Question 1: In light of other observations that cleaved galectin-9 levels in the plasma is a biomarker for severe infection (Padilla A et al Biomolecules 2021 and Iwasaki-Hozumi H et al. Biomoleucles 2021) it is difficult to reconcile the author's interpretation that the elevated gal-9 in Active TB patients (Figure 1E) contributes to the maintenance of latent infection in humans. The authors should consider incorporating these observations in the interpretation of their own results.

      Thank you for these very insightful comments. We observed elevated levels of galectin-9 in the serum of active TB patients, consistent with reports indicating that cleaved galectin-9 levels in the serum serve as a biomarker for severe infection (Iwasaki-Hozumi et al., 2021; Padilla et al., 2020). We consider that the elevated levels of galectin-9 in the serum of active TB may be an indicator of the host immune response to Mtb infection, however, the magnitude of elevated galectin-9 is not sufficient to control Mtb infection and maintain latent infection. This is highly similar to other protective immune factors such as interferon gamma, which is elevated in active TB as well (El-Masry et al., 2007; Hasan et al., 2009). We have included the discussion in the revised manuscript (line 298).

      Question 2: The anti-AG titers were measured only in individuals with active TB (Figure 3C), generally thought to be a less protective immunological state. The speculation that individuals with anti-AG titers have some protection is not founded. Further only 2 mAbs were tested to demonstrate restriction of Mtb in culture. It is possible that clones of different affinities for AG present within a patient's polyclonal AG-antibody responses may or may not display a direct growth restriction pressure on Mtb in culture. The authors should soften the claims about the presence of AG-titers in TB patients being indicative of protection.

      We appreciate your concern. As per your suggestion, we have softened the claim to that “We speculate that during Mtb infection, anti-AG IgG antibodies are induced, which potentially contribute to protection against TB by directly inhibiting Mtb replication albeit seemingly in vain.”

      References

      El-Masry, S., Lotfy, M., Nasif, W.A., El-Kady, I.M., and Al-Badrawy, M. (2007). Elevated serum level of interleukin (IL)-18, interferon (IFN)-gamma and soluble Fas in patients with pulmonary complications in tuberculosis. Acta microbiologica et immunologica Hungarica 54, 65-77.

      Hasan, Z., Jamil, B., Khan, J., Ali, R., Khan, M.A., Nasir, N., Yusuf, M.S., Jamil, S., Irfan, M., and Hussain, R. (2009). Relationship between circulating levels of IFN-gamma, IL-10, CXCL9 and CCL2 in pulmonary and extrapulmonary tuberculosis is dependent on disease severity. Scandinavian journal of immunology 69, 259-267.

      Iwasaki-Hozumi, H., Chagan-Yasutan, H., Ashino, Y., and Hattori, T. (2021). Blood Levels of Galectin-9, an Immuno-Regulating Molecule, Reflect the Severity for the Acute and Chronic Infectious Diseases. Biomolecules 11.

      Padilla, S.T., Niki, T., Furushima, D., Bai, G., Chagan-Yasutan, H., Telan, E.F., Tactacan-Abrenica, R.J., Maeda, Y., Solante, R., and Hattori, T. (2020). Plasma Levels of a Cleaved Form of Galectin-9 Are the Most Sensitive Biomarkers of Acquired Immune Deficiency Syndrome and Tuberculosis Coinfection. Biomolecules 10.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weaknesses:

      The match between fractal and classical cycles is not one-to-one. For example, the fractal method identifies a correlation between age and cycle duration in adults that is not apparent with the classical method. This raises the question as to whether differences are due to one method being more reliable than another or whether they are also identifying different underlying biological differences. It is not clear for example whether the agreement between the two methods is better or worse than between two human scorers, which generally serve as a gold standard to validate novel methods. The authors provide some insight into differences between the methods that could account for differences in results. However, given that the fractal method is automatic it would be important to clearly identify criteria for recordings in which it will produce similar results to the classical method.

      Thank you for these insightful suggestions. In the revised Manuscript, we have added a number of additional analyses that provide a quantitative comparison between the classical and fractal cycle approaches aiming to identify the source of the discrepancies between classical and fractal cycle durations. Likewise, we assessed the intra-fractal and intra-classical method reliability as outlined below.

      Reviewer #1 (Recommendations For The Authors):

      One of the challenges in interpreting the results of the manuscript is understanding whether the differences between the two methods are due to a genuine difference in what these two methods are quantifying or simply noise/variability in each method. If the authors could provide some more insight into this, it would be a great help in assessing their findings and I think bolster the applicability of their method.

      (1) Method reliability: The manuscript clearly shows that cycle length is robustly correlated between fractal and classical in multiple datasets, however, it is hard to assign a meaningful interpretation to the correlation value (ie R = 0.5) without some reference point. This could be provided by looking at the intra-method correlation of cycle lengths. In the case of classical scoring, inter-scorer results could be compared, if the R-value here is significantly higher than 0.5 it would suggest genuine differences between the methods. In the case of fractal scoring, inter-electrode results could be compared / results with slight changes to the peak prominence threshold or smoothing window.

      In the revised Manuscript, we performed the following analyses to show the intra-method reliability:

      a) Classical cycle reliability: For the revised Manuscript, an additional scorer has independently defined classical sleep cycles for all datasets and marked sleep cycles with skipped REM sleep. Likewise, we have performed automatic sleep cycle detection using the R “SleepCycles” package by Blume & Cajochen (2021). We have added a new Table S8 to Supplementary Material 2 that shows the averaged cycle durations and cycle numbers obtained by the two human scorers and automatic algorithm as well as the inter-scorer rate agreement. We have added a new sheet named “Classical method reliability” that reports classical cycle durations for each participant and each dataset as defined by two human scorers and the algorithm To the Supplementary Excel file.

      We found that the correlation coefficients between two human scorers ranged from 0.69 to 0.91 (in literature, r’s > 0.7 are defined as strong scores) in different datasets, thus being higher than correlation coefficients between fractal and classical cycle durations, which in turn ranged from 0.41 to 0.55 (r’s in the range of 0.3 – 0.7 are considered moderate scores). The correlation coefficients between human raters and the automatic algorithm showed remarkably lower coefficients ranging from 0.30 to 0.69 (moderate scores) in different datasets, thus lying within the range of the correlation coefficients between fractal and classical cycle durations. This analysis is reported in Supplementary Material 2, section ”Intra-classical method reliability” and Table S8.

      b) Fractal cycle reliability: In the revised Supplementary Material 2 of our Manuscript, we assessed the intra-fractal method reliability, we correlated between the durations of fractal cycles calculated as defined in the main text, i.e., using a minimum peak prominence of 0.94 z and smoothing window of 101 thirty-second epochs, with those calculated using a minimum peak prominence ranging from 0.86 to 1.20 z with a step size of 0.04 z and smoothing windows ranging from 81 to 121 thirty-second epochs with a step size of 10 epochs (Table S7). We found that fractal cycle durations calculated using adjacent minimum peak prominence (i.e., those that differed by 0.04 z) showed r’s > 0.92, while those calculated using adjacent smoothing windows (i.e., those that differed by 10 epochs) showed r’s > 0.84. In addition, we correlated fractal cycle durations defined using different channels and found that the correlation coefficients ranged between 0.66 – 0.67 (Table S1). Thus, most of the correlations performed to assess intra-fractal method reliability showed correlation coefficients (r > 0.6) higher than those obtained to assess inter-method reliability (r = 0.41 – 0.55), i.e., correlations between fractal and classical cycle. This analysis is reported in Supplementary Material 2, section ”Intra-fractal method reliability” and Table S7. Likewise, we have added a new sheet named “Fractal method reliability” that reports the actual values for the abovementioned parameters to the Supplementary Excel file. For a discussion on potential sources of differences, see below.

      (2) Origin of method differences: The authors outline a few possible sources of discrepancies between the two methods (peak vs REM end, skipped REM cycle detection...) but do not quantify these contributions. It would be interesting to identify some factors that could predict for either a given night of sleep or dataset whether it is likely to show a strong or weak agreement between methods. This could be achieved by correlating measures of the proposed differences ("peak flatness", fractal cycle depth, or proportion of skipped REM cycles) with the mismatch between the two methods.

      In the revised Manuscript, we have quantified a few possible sources of discrepancies between the durations of fractal vs classical cycles and added a new section named “Sources of fractal and classical cycle mismatches” to the Results as well as new Tables 5 and S10 (Supplementary Material 2). Namely, we correlated the difference in classical vs fractal sleep cycle durations on the one side, and either the amplitude of fractal descent/ascent (to reflect fractal cycle depth), duration of cycles with skipped REM sleep/TST, duration of wake after sleep onset/TST or the REM episode length of a given cycle (to reflect peak flatness) on the other side. We found that a higher difference in classical vs fractal cycle duration was associated with a higher proportion of wake after sleep onset (r = 0.226, p = 0.001), shallower fractal descents (r = 0.15, p = 0.002) and longer REM episodes (r = 0.358, p < 0.001, n = 417 cycles, Table S10 in Supplementary Material 2). The rest of the assessed parameters showed no significant correlations (Table S10). We have added a new sheet named “Fractal-classical mismatch” that reports the actual values for the abovementioned parameters to the Supplementary Excel file.  

      (3) Skipped REM cycles: the authors underline that the fractal method identified skipped REM cycles. It seems likely that manual identification of skipped REM cycles is particularly challenging (ie we would expect this to be a particular source of error between two human scorers). If this is indeed the case, it would be interesting to discuss, since it would highlight an advantage of their methodology that they already point out (l644).

      In the revised Manuscript, we have added the inter-scorer rate agreement regarding cycles with skipped REM sleep, which was equal to 61%, which is 32% lower than the performance of our fractal cycle algorithm (93%). These findings are now reported in the “Skipped cycles” section of the Results and in Table S9 of Supplementary Material 2. We also discuss them in Discussion:

      “Our algorithm detected skipped cycles in 93% of cases while the hypnogram-based agreement on the presence/absence of skipped cycles between two independent human raters was 61% only; thus, 32% lower. We deduce that the fractal cycle algorithm detected skipped cycles since a lightening of sleep that replaces a REM episode in skipped cycles is often expressed as a local peak in fractal time series.”<br /> Discussion, section “Fractal and classical cycles comparison”, paragraph 5.

      Minor comments:

      - In the subjects where the number of fractal and classical cycles did not match, how large was the difference (ie just one extra cycle or more)? Correlating cycle numbers could be one way to quantify this.

      In the revised Manuscript, we have reported the required information for the participants with no one-to-one match (46% of all participants) as follows: 

      “In the remaining 46% of the participants, the difference between the fractal and classical cycle numbers ranged from -2 to 2 with the average of -0.23 ± 1.23 cycle. This subgroup had 4.6 ± 1.2 fractal cycles per participant, while the number of classical cycles was 4.9 ± 0.7 cycles per participant. The correlation coefficient between the fractal and classical cycle numbers was 0.280 (p = 0.006) and between the cycle durations – 0.278 (p=0.006).” Results, section “Correspondence between fractal and classical cycles”, last paragraph.

      - When discussing the skipped REM cycles (l467), the authors explain: "For simplicity and between-subject consistency, we included in the analysis only the first cycles". I'm not sure I understood this, could they clarify to which analysis they are referring to?

      In the revised Manuscript, we performed this analysis twice: using first cycles and using all cycles and therefore have rephrased this as follows:

      _“We tested whether the fractal cycle algorithm can detect skipped cycles, i.e., the cycles where an anticipated REM episode is skipped (possibly due to too high homeostatic pressure). We performed this analysis twice. First, we counted all skipped cycles (except the last cycles of a night, which might lack REM episode for other reasons, e.g., a participant had/was woken up). Second, we counted only the first classical cycles (i.e., the first cycle out of the 4 – 6 cycles that each participant had per night, Fig. 3 A – B) as these cy_cles coincide with the highest NREM pressure. An additional reason to disregard skipped cycles observed later during the night was our aim to achieve higher between-subject consistency as later skipped cycles were observed in only a small number of participants.” Results, section “Skipped cycles”, first paragraph.

      - The inclusion of all the hypnograms as a supplementary is a great idea to give the reader concrete intuition of the data. If the limits of the sleep cycles for both methods could be added it would be very useful.

      Supplementary Material 1 has been updated such that each graph has a mark showing the onsets of fractal and classical sleep cycles, including classical cycles with skipped REM sleep.

      - The difference in cycle duration between adults and children seems stronger / more reliable for the fractal cycle method, particularly in the histogram (Figure 3C). Is this difference statistically significant?

      In the revised Manuscript, we have added the Multivariate Analysis of Variance to compare F-values, partial R-squared and eta squared. The findings are as follows:

      “To compare the fractal approach with the classical one, we performed a Multivariate Analysis of Variance with fractal and classical cycle durations as dependent variables, the group as an independent variable and the age as a covariate. We found that fractal cycle durations showed higher F-values (F(1, 43)  \= 4.5 vs F(1, 43) = 3.1), adjusted R squared (0.138 vs 0.089) and effect sizes (partial eta squared 0.18 vs 0.13) than classical cycle durations.” Results, Fractal cycles in children and adolescents, paragraph 3.

      There have been some recent efforts to define sleep cycles in an automatic way using machine learning approaches. It could be interesting to mention these in the discussion and highlight their relevance to the general endeavour of automatizing the sleep cycle identification process.

      In the Discussion of the revised Manuscript, we have added the section on the existing automatic sleep cycle definition algorithms:

      “Even though recently, there has been a significant surge in sleep analysis incorporating various machine learning techniques and deep neural network architectures, we should stress that this research line mainly focused on the automatic classification of sleep stages and disorders almost ignoring the area of sleep cycles. Here, as a reference method, we used one of the very few available algorithms for sleep cycle detection (Blume & Cajochen, 2021). We found that automatically identified classical sleep cycles only moderately correlated with those detected by human raters (r’s = 0.3 – 0.7 in different datasets). These coefficients lay within the range of the coefficients between fractal and classical cycle durations (r = 0.41 – 0.55, moderate) and outside the range of the coefficients between classical cycle durations detected by two human scorers (r’s = 0.7 – 0.9, strong, Supplementary Material 2, Table S8).” Discussion, section “Fractal and classical cycles comparison”, paragraph 4.

      Reviewer #2 (Public Review):

      One weakness of the study, from my perspective, was that the IRASA fits to the data (e.g. the PSD, such as in Figure 1B), were not illustrated. One cannot get a sense of whether or not the algorithm is based entirely on the fractal component or whether the oscillatory component of the PSD also influences the slope calculations. This should be better illustrated, but I assume the fits are quite good.

      Thank you for this suggestion. In the revised Manuscript, we have added a new figure (Fig.S1 E, Supplementary Material 2), illustrating the goodness of fit of the data as assessed by the IRASA method.

      The cycles detected using IRASA are called fractal cycles. I appreciate the use of a simple term for this, but I am also concerned whether it could be potentially misleading? The term suggests there is something fractal about the cycle, whereas it's really just that the fractal component of the PSD is used to detect the cycle. A more appropriate term could be "fractal-detected cycles" or "fractal-based cycle" perhaps?

      We agree that these cycles are not fractal per se. In the Introduction, when we mention them for the first time, we name them “fractal activity-based cycles of sleep” and immediately after that add “or fractal cycles for short”. In the revised version, we renewed this abbreviation with each new major section and in Abstract. Nevertheless, given that the term “fractal cycles” is used 88 times, after those “reminders”, we used the short name again to facilitate readability. We hope that this will highlight that the cycles are not fractal per se and thus reduce the possible confusion while keeping the manuscript short.

      The study performs various comparisons of the durations of sleep cycles evaluated by the IRASA-based algorithm vs. conventional sleep scoring. One concern I had was that it appears cycles were simply identified by their order (first, second, etc.) but were not otherwise matched. This is problematic because, as evident from examples such as Figure 3B, sometimes one cycle conventionally scored is matched onto two fractal-based cycles. In the case of the Figure 3B example, it would be more appropriate to compare the duration of conventional cycle 5 vs. fractal cycle 7, rather than 5 vs. 5, as it appears is currently being performed.

      In cases where the number of fractal cycles differed from the number of classical cycles (from 34 to 55% in different datasets as in the case of Fig.3B), we did not perform one-to-one matching of cycles. Instead, we averaged the duration of the fractal and classical cycles over each participant and only then correlated between them (Fig.2C). For a subset of the participants (45 – 66% of the participants in different datasets) with a one-to-one match between the fractal and classical cycles, we performed an additional correlation without averaging, i.e., we correlated the durations of individual fractal and classical cycles (Fig.4S of Supplementary Material 2). This is stated in the Methods, section Statistical analysis, paragraph 2.

      There are a few statements in the discussion that I felt were either not well-supported. L629: about the "little biological foundation" of categorical definitions, e.g. for REM sleep or wake? I cannot agree with this statement as written. Also about "the gradual nature of typical biological processes". Surely the action potential is not gradual and there are many other examples of all-or-none biological events.

      In the revised Manuscript, we have removed these statements from both Introduction and Discussion.

      The authors appear to acknowledge a key point, which is that their methods do not discriminate between awake and REM periods. Thus their algorithm essentially detected cycles of slow-wave sleep alternating with wake/REM. Judging by the examples provided this appears to account for both the correspondence between fractal-based and conventional cycles, as well as their disagreements during the early part of the sleep cycle. While this point is acknowledged in the discussion section around L686. I am surprised that the authors then argue against this correspondence on L695. I did not find the "not-a-number" controls to be convincing. No examples were provided of such cycles, and it's hard to understand how positive z-values of the slopes are possible without the presence of some wake unless N1 stages are sufficient to provide a detected cycle (in which case, then the argument still holds except that its alterations between slow-wave sleep and N1 that could be what drives the detection).

      In the revised Manuscript, we have removed the “NaN analysis” from both Results and Discussion. We have replaced it with the correlation between the difference between the durations of the classical and fractal cycles and proportion of wake after sleep onset. The finding is as follows:

      “A larger difference between the durations of the classical and fractal cycles was associated with a higher proportion of wake after sleep onset in 3/5 datasets as well as in the merged dataset (Supplementary Material 2, Table S10).” Results, section “Fractal cycles and wake after sleep onset”, last two sentences. This is also discussed in Discussion, section “Fractal cycles and age”, paragraph 1, last sentence. 

      To me, it seems important to make clear whether the paper is proposing a different definition of cycles that could be easily detected without considering fractals or spectral slopes, but simply adjusting what one calls the onset/offset of a cycle, or whether there is something fundamentally important about measuring the PSD slope. The paper seems to be suggesting the latter but my sense from the results is that it's rather the former.

      Thank you for this important comment. Overall, our paper suggests that the fractal approach might reflect the cycling nature of sleep in a more precise and sensitive way than classical hypnograms. Importantly, neither fractal nor classical methods can shed light on the mechanism underlying sleep cycle generation due to their correlational approach. Despite this, the advantages of fractal over classical methods mentioned in our Manuscript are as follows:

      (1) Fractal cycles are based on a real-valued metric with known neurophysiological functional significance, which introduces a biological foundation and a more gradual impression of nocturnal changes compared to the abrupt changes that are inherent to hypnograms that use a rather arbitrary assigned categorical value (e.g., wake=0, REM=-1, N1=-2, N2=-3 and SWS=-4, Fig.2 A).

      (2) Fractal cycle computation is automatic and thus objective, whereas classical sleep cycle detection is usually based on the visual inspection of hypnograms, which is time-consuming, subjective and error-prone. Few automatic algorithms are available for sleep cycle detection, which only moderately correlated with classical cycles detected by human raters (r’s = 0.3 – 0.7 in different datasets here).

      (3) Defining the precise end of a classical sleep cycle with skipped REM sleep that is common in children, adolescents and young adults using a hypnogram is often difficult and arbitrary.   The fractal cycle algorithm could detect such cycles in 93% of cases while the hypnogram-based agreement on the presence/absence of skipped cycles between two independent human raters was 61% only; thus, 32% lower.

      (4) The fractal analysis showed a stronger effect size, higher F-value and R-squared than the classical analysis for the cycle duration comparison in children and adolescents vs young adults. The first and second fractal cycles were significantly shorter in the pediatric compared to the adult group, whereas the classical approach could not detect this difference.

      (5) Fractal – but not classical – cycle durations correlated with the age of adult participants.

      These bullets are now summarized in Table 5 that has been added to the Discussion of the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Liu and colleagues applied the hidden Markov model on fMRI to show three brain states underlying speech comprehension. Many interesting findings were presented: brain state dynamics were related to various speech and semantic properties, timely expression of brain states (rather than their occurrence probabilities) was correlated with better comprehension, and the estimated brain states were specific to speech comprehension but not at rest or when listening to non-comprehensible speech.

      Strengths:

      Recently, the HMM has been applied to many fMRI studies, including movie watching and rest. The authors cleverly used the HMM to test the external/linguistic/internal processing theory that was suggested in comprehension literature. I appreciated the way the authors theoretically grounded their hypotheses and reviewed relevant papers that used the HMM on other naturalistic datasets. The manuscript was well written, the analyses were sound, and the results had clear implications.

      Weaknesses:

      Further details are needed for the experimental procedure, adjustments needed for statistics/analyses, and the interpretation/rationale is needed for the results.

      We greatly appreciate the reviewers for the insightful comments and constructive suggestions. Below are the revisions we plan to make:

      (1) Experimental Procedure: We will provide a more detailed description of the stimuli and comprehension tests in the revised manuscript. Additionally, we will upload the corresponding audio files and transcriptions as supplementary data to ensure full transparency. 

      (2) Statistics/Analyses: In response to the reviewer's suggestions, we have reproduced the states' spatial maps using unnormalized activity patterns. For the resting state, we observed a state similar to the baseline state described by Song, Shim, & Rosenberg (2023). However, for the speech comprehension task, all three states showed network activity levels that deviated significantly from zero. Furthermore, we regenerated the null distribution for behavior-brain state correlations using a circular shift approach, and the results remain largely consistent with our previous findings. We have also made other adjustments to the analyses and introduced some additional analyses, as per the reviewer's recommendations. These changes will be incorporated into the revised manuscript.

      (3) Interpretation/Rationale: We will expand on the interpretation of the relationship between state occurrence and semantic coherence. Specifically, we will highlight that higher semantic coherence may enable the brain to more effectively accumulate information over time. State #2 appears to be involved in the integration of information over shorter timescales (hundreds of milliseconds), while State #3 is engaged in longer timescales (several seconds). 

      Reviewer #2 (Public review):

      Liu et al. applied hidden Markov models (HMM) to fMRI data from 64 participants listening to audio stories. The authors identified three brain states, characterized by specific patterns of activity and connectivity, that the brain transitions between during story listening. Drawing on a theoretical framework proposed by Berwick et al. (TICS 2023), the authors interpret these states as corresponding to external sensory-motor processing (State 1), lexical processing (State 2), and internal mental representations (State 3). States 1 and 3 were more likely to transition to State 2 than between one another, suggesting that State 2 acts as a transition hub between states. Participants whose brain state trajectories closely matched those of an individual with high comprehension scores tended to have higher comprehension scores themselves, suggesting that optimal transitions between brain states facilitated narrative comprehension.

      Overall, the conclusions of the paper are well-supported by the data. Several recent studies (e.g., Song, Shim, and Rosenberg, eLife, 2023) have found that the brain transitions between a small number of states; however, the functional role of these states remains under-explored. An important contribution of this paper is that it relates the expression of brain states to specific features of the stimulus in a manner that is consistent with theoretical predictions.

      (1) It is worth noting, however, that the correlation between narrative features and brain state expression (as shown in Figure 3) is relatively low (~0.03). Additionally, it was unclear if the temporal correlation of the brain state expression was considered when generating the null distribution. It would be helpful to clarify whether the brain state expression time courses were circularly shifted when generating the null. 

      We have regenerated the null distribution by circularly shifting the state time courses. The results remain consistent with our previous findings: p = 0.002 for the speech envelope, p = 0.007 for word-level coherence, and p = 0.001 for clause-level coherence. 

      We notice that in other studies which examined the relationship between brain activity and word embedding features, the group-mean correlation values are similarly low but statistically significant and theoretically meaningful (e.g., Fernandino et al., 2022; Oota et al., 2022). We think these relatively low correlations is primarily due to the high level of noise inherent in neural data. Brain activity fluctuations are shaped by a variety of factors, including task-related cognitive processing, internal thoughts, physiological states, as well as arousal and vigilance. Additionally, the narrative features we measured may account for only a small portion of the cognitive processes occurring during the task. As a result, the variance in narrative features can only explain a limited portion of the overall variance in brain activity fluctuations.

      We will update Figure 3 and relevant supplementary figures to reflect the new null distribution generated via circular shift. Furthermore, we will expand the discussion to address why the observed brain-stimuli correlations are relatively small, despite their statistical significance.

      (2) A strength of the paper is that the authors repeated the HMM analyses across different tasks (Figure 5) and an independent dataset (Figure S3) and found that the data was consistently best fit by 3 brain states. However, it was not entirely clear to me how well the 3 states identified in these other analyses matched the brain states reported in the main analyses. In particular, the confusion matrices shown in Figure 5 and Figure S3 suggests that that states were confusable across studies (State 2 vs. State 3 in Fig. 5A and S3A, State 1 vs. State 2 in Figure 5B). I don't think this takes away from the main results, but it does call into question the generalizability of the brain states across tasks and populations. 

      We identified matching states across analyses based on similarity in the activity patterns of the nine networks. For each candidate state identified in other analyses, we calculate the correlation between its network activity pattern and the three predefined states from the main analysis, and set the one it most closely resembled to be its matching state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly. 

      Each column in the confusion matrix depicts the similarity of each candidate state with the three predefined states. In Figure S3 (analysis for the replication dataset), the highest similarity occurred along the diagonal of the confusion matrix. This means that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from two analyses.

      For the comparison of speech comprehension task with the resting and the incomprehensible speech condition, there was some degree of overlap or "confusion." In Figure 5A, there were two candidate states showing the highest similarity to State #2. In this case, we labelled the candidate state with the the strongest similarity as State #2, while the other candidate state is assigned as State #3 based on this ranking of similarity. This strategy was also applied to naming of states for the incomprehensible condition. The observed confusion supports the idea that the tripartite-state space is not an intrinsic, task-free property. To make the labeling clearer in the presentation of results, we will use a prime symbol (e.g., State #3') to indicate cases where such confusion occurred, helping to distinguish these ambiguous matches.

      In the revised manuscript, we will give a detailed illustration for how the correspondence of states across analyses were made. 

      (3) The three states identified in the manuscript correspond rather well to areas with short, medium, and long temporal timescales (see Hasson, Chen & Honey, TiCs, 2015). Given the relationship with behavior, where State 1 responds to acoustic properties, State 2 responds to word-level properties, and State 3 responds to clause-level properties, the authors may want to consider a "single-process" account where the states differ in terms of the temporal window for which one needs to integrate information over, rather than a multi-process account where the states correspond to distinct processes.

      The temporal window hypothesis indeed provides a better explanation for our results. Based on the spatial maps and their modulation by speech features, States #1, #2, and #3 seem to correspond to the short, medium, and long processing timescales, respectively. We will update the discussion to reflect this interpretation. 

      We sincerely appreciate the constructive suggestions from the two anonymous reviewers, which have been highly valuable in improving the quality of the manuscript.

    1. Author response:

      Joint Public Review:

      Strengths:

      The insulin-dependent signaling in the central nervous system is relatively understudied. This explorative study delves into several interesting and clinically relevant possibilities, examining how insulin-dependent signaling and its crosstalk with WNK kinases might affect brain circuits involved in memory formation and/or anxiety. Therefore, these findings might inspire follow-up studies performed in disease models for disorders that exhibit impaired glucose metabolism, deficient memory, or anxiety, such as Diabetes mellitus, Alzheimer's disease, or most psychiatric disorders.

      The graphical presentation of the figures is of high quality, which helps the reader to obtain a good overview and easily understand the experimental design, results, and conclusions.

      The behavioral studies are well conducted and provide valuable insights into the role of WNK kinases in glucose metabolism and their effect on learning and memory. Additionally, the authors evaluate the levels of basal and induced anxiety in Figures 1 and 2, enhancing our understanding of how WNK signaling might engage in cognitive function and anxiety-like behavior, particularly in the context of altered glucose metabolism.

      We thank the reviewers for recognizing the strengths of our study.

      Weaknesses:

      The study used a WNK643 inhibitor as the only tool to manipulate WNK1-4 activity. This inhibitor seems selective; however, it has been reported that it exhibits different efficiency in inhibiting the individual WNK kinases among each other (e.g. PMID: 31017050, PMID: 36712947). Additionally, the authors do not analyze nor report the expression profiles or activity levels of WNK1, WNK2, WNK3, and WNK4 within the relevant brain regions (i.e. hippocampus, cortex, amygdala). Combined, these weaknesses raise concerns about the direct involvement of WNK kinases within the selected brain regions and behavior circuits. It would be beneficial if the authors provided gene profiling for WNK1, 2, 3, and -4 (e.g. using Allen brain atlas). To confirm the observations, the authors should either add results from using other WNK inhibitors or, preferentially, analyze knock-down or knock-out animals/tissue targeting the single kinases.

      We thank the reviewers for the suggestions. To address the criticism and as recommended, we have planned to include gene profiling for WNK1-4 in the brain from Allen brain atlas. Additionally, we have planned to include the effect of WNK1 knockdown on pAKT levels in immortalized SHSY5Y cells.

      The authors do not report any data on whether the global inhibition of WNKs affects insulin levels. Since the authors wish to demonstrate the synergistic effect of simultaneous insulin treatment and WNK1-4 inhibition, such data are missing.

      To address this critique, we have planned to include plasma insulin levels upon global inhibition of WNKs using WNK463 in C57BL/6J mice.

      The study discovered that the Sortilin receptor binds to OSR1, leading the authors to speculate that Sortilin may be involved in the insulin-dependent GLUT4 surface trafficking. However, the authors do not provide any evidence supporting Sortilin's involvement in insulin- or WNK-dependent GLUT4 trafficking. Thus, this conclusion should be qualified, rephrased, or additional data included.

      We thank the reviewers for suggesting experiments that will significantly enhance the clarity of our conclusions. We have planned to include immunofluorescence staining data for sortilin localization in SHSY5Y cells under conditions of DMSO, insulin and/or WNK463 treatment. These data would suggest whether WNK463 treatment affects localization of sortilin in the golgi network which has been shown by previous studies to affect sortilin-dependent GLUT4 trafficking.

    1. Author response:

      We would like to thank the reviewers for their positive evaluation of our work, and their comments inspiring useful discussion. We will provide an in-depth response once one of the key authors has returned from parental leave (in some months), but below we share initial thoughts:  

      Both reviewers asked to see more gaze data to understand how eye movements in patients with achromatopsia might drive our results. We will expand our analyses of eye tracking data and discuss the implications in more depth, but would like to note that our key findings (no change in signal coverage in the foveal rod-scotoma projection zone in achromats, and changes in connective fields) are both robust to eye movement, and unlikely to be driven by gaze differences. Where this is less clear (i.e., population Receptive Field eccentricities are shifted outwards and increased in size), we have highlighted this and avoided drawing strong conclusions. 

      Reviewer 1 questioned why smaller connective fields (CFs) were observed in achromats, suggesting that their flatter V1 eccentricity tuning should predict larger CFs. It’s not straightforward to predict how V1's population receptive field (pRF) tuning profile shapes V3's sampling extent, as CFs are driven, but not dictated by V1 - they combine and integrate V1 signals. As we’re dealing with an atypically developed visual system, assumptions about expected relationships are complicated further. We believe that the most relevant aspect of pRF data to the interpretability of V3 CF extent, is the ratio between V1 and V3 pRF sizes. Our outcomes show that pRF sizes in achromats, while larger in V1, are more normalized in V3, predicting more local V3 sampling from V1. This is what our quantifications of CF size show across two independent measures with different stimuli. We will provide further data to address reviewer 1's various queries about the potential causes of the pRF eccentricity shifts in achromats, the relationship between pRFs and CFs, and methodological details of CF fits.

      We thank the reviewers again for their insightful  comments and look forward to providing more comprehensive responses to their queries substantiated with data as soon as possible.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The findings of Ziolkowska and colleagues show that a specific projection from the nucleus reuniens of the thalamus (RE) to dorsal hippocampal CA1 neurons plays an important role in fear extinction learning in male and female mice. In and of itself, this is not a particularly new finding, although the authors' identification of structural alterations from within dorsal CA1 stratum lacunosum moleculare (SLM) as a candidate mechanism for the learning-related plasticity is potentially novel and exciting. The authors use a range of anatomical and functional approaches to demonstrate structural synaptic changes in dorsal CA1 that parallel the necessary role of RE inputs in modulating extinction learning. Yet, the significance of these findings is substantially limited by several technical shortcomings in the experimental design, and the authors' central interpretation. Otherwise, there remain several strengths in the design and interpretation that offset some of these concerns.

      Given that much is already known about the role of RE and hippocampus in modulating fear learning and extinction, it remains unclear whether addressing these concerns would substantially increase the impact of this study beyond the specific area of speciality. Below, several major weaknesses will be highlighted, followed by several miscellaneous comments.

      Methodological:

      (1) One major methodological weakness in the experimental design involves the widespread misapplication of Ns used for the statistical analyses. Much of the anatomical analyses of structural synaptic changes in the RE-CA1 pathway use N = number of axons (Figs. 1, 2), N = number of dendrites (Figs. 3, 4), and N = number of sections (Fig. 7; note that there are 7 figures in total). In every instance, N = animal number should be used. It is unclear which of these results would remain significant if N = animal number were used in each or how many more animals would be required. This is problematic since these data comprise the main evidence for the authors' central conclusion that specific structural synaptic changes are associated with fear extinction learning.

      We do agree with the reviewer that N = animal number is the preferred way to present data in most of our experiments. However, in some experimental groups we observed a very low number of entries. For example, in the 5US group we found RE+/+ spines only in 3 out of 6 analyzed animals. We believe that this observation is not due to technical problems as mCherry virus transduction required to find RE+/+ spines is similar in all experimental groups and we analyzed similar volumes of tissue. While this result still allows the calculation of density of RE+/+ spines per animal it generates no entries for spine area and PSD95 mean gray value if N = animal number. Hence, we decided to use N=animals to calculate spines and boutons densities, and N=dendritic spines/boutons to calculate other spine/bouton parameters.

      (2) There is a lack of specific information regarding what constitutes learning with respect to behavioral freezing. It is never clearly stated what specific intervals are used over which freezing is measured during acquisition, extinction, and in extinction retrieval tests. Additionally, assessment of freezing during retrieval at 5- and 30-min time points doesn't lay to rest the possibility that there were differences in the decay rate over the 30-min period (also see below).

      We added a detailed description of how learning was assessed.

      ln 125-134: For assessment of learning we used percent of time spent by animals freezing (% freezing). Freezing behavior was defined as complete lack of movement, except respiration. To assess within-session learning (working memory) we compared pre- and post-US freezing frequency (the first 148 sec vs last 30 sec) during the CFC session (day 1). To assess formation of long-term contextual fear memory, we compared pre-US freezing (day 1) and the first 5 minutes of the Extinction session (day 2). To assess within session contextual fear extinction we ran 2-way ANOVA to assess the effect of time and manipulation on freezing frequency. Freezing data were analyzed in 5-minute bins. To assess formation of long-term contextual fear extinction memory we compared the first 5 minutes of the Extinction session (day 2) and Test session (day 3).

      As suggested by the reviewer, we also added data for all six 5-minut bins of Extinction sessions. 

      (3) A minor-to-moderate methodological weakness concerns the authors' decision to utilize saline injected groups as controls for the chemogenetics experiments (Figs. 5, 6). The correct design is to have a CNO-only group with the same viral procedure sans hM4Di. This concern is partly mitigated by the inclusion of a CNO vs. saline injection control experiment (Fig. 6).

      Figure 5 does not describe a chemogenetic experiment.

      We added new groups with control virus (CNO vs saline) to Figure 6 (now Fig. 6D and H). 

      The chemogenetic experiment shown on Figure 7 has all 4 experimental groups (Control vs hM4Di and saline vs CNO).

      (4) In the electron microscopic analyses of dendritic spines (Fig. 5), comparison of only the fear acquisition versus extinction training, and the lack of inclusion of a naïve control group, makes it difficult to understand how these structural synaptic changes are occurring relative to baseline. It is noteworthy that the authors utilize the tripartite design in other anatomical analyses (Fig. 2-4).

      We added data for the Naive mice to Figure 5.

      (5) Interpretation:

      The main interpretive weakness in the study is the authors' claim that their data shows a role for the RE-CA1 pathway in memory consolidation (i.e., see Abstract). This claim is based on the premise that, although RE-CA1 pathway inactivation with CNO treatment 30 min prior to contextual fear extinction did not affect freezing at 5- and 30-min time points relative to saline controls, these rats showed greater freezing when tested on extinction retrieval 24 h thereafter. First, the data do not rule out possible differences in the decay rate of freezing during extinction training due to CNO administration. Next, the fact that CNO is given prior to training still leaves open the possibility that acquisition was affected, even if there were not any frank differences in freezing. Support for this latter possibility derives from the fact that mice tested for extinction retrieval as early as 5 min after extinction training (Fig. 6C) showed the same impairments as mice tested 24 h later (Figs. 6A). Further, all the structural synaptic changes argued to underlie consolidation were based on analysis at a time point immediately following extinction training, which is too early to allow for any long-term changes that would underlie memory consolidation, but instead would confer changes associated with the extinction training event.

      We do agree with the reviewer that our data do not allow us to conclude whether RE-CA1 pathway is involved in acquisition or consolidation of CFE memory. Therefore, we avoid those terms in the manuscript. We just conclude that RE→CA1 participates in the CFE.

      Reviewer #2 (Public review):

      Summary:

      Ziółkowska et al. characterize the synaptic mechanisms at the basis of the REdCA1 contribution to the consolidation of fear memory extinction. In particular, they describe a layer specific modulation of RE-dCA1 excitatory synapses modulation associated to contextual fear extinction which is impaired by transient chemogenetic inhibition of this pathway. These results indicate that RE activity-mediated modulation of synaptic morphology contributes to the consolidation of contextual fear extinction

      Strengths:

      The manuscript is well conceived, the statistical analysis is solid and methodology appropriate. The strength of this work is that it nicely builds up on existing literature and provides new molecular insight on a thalamo-hippocampal circuit previously known for its role in fear extinction. In addition, the quantification of pre- and post-synapses is particularly thorough.

      Weaknesses:

      The findings in this paper are well supported by the data more detailed description of the methods is needed.

      (1) In the paragraph Analysis of dCA1 synapses after contextual fear extinction (CFE), more experimental and methodological data should be given in the text: 

      - how was PSD95 used for the analysis, what was the difference between RE. Even if Thy1-GFP mice were used in Fig.2, it appears they were not used for bouton size analysis. To improve clarity, I suggest moving panel 2C to Figure 3. It is not clear whether all RE axons were indiscriminately analysed in Fig. 2 or if only the ones displaying colocalization with both PSD95 and GFP were analysed. If GFP was not taken into account here, analysed boutons could reflect synapses onto inhibitory neurons and this potential scenario should be discussed.

      PSD-95 immunostaining in close apposition to boutons was used to identify RE buttons innervating CA1 (Fig 1 and 2). In these cases PSD-95 signal was not quantified. PSD-95 in close apposition to dendritic spines was used as a proxy of PSDs in CA1 (Figure 3, 4 and 7). In these cases we assessed the integrated mean gray value of PSD-95 signal per dendritic spine (Figure 3, 4) or per ROI (Figure 7). This is explained in detail in the section Confocal microscopy and image quantification (ln 149-172).

      GFP signal was not taken into account during boutons analysis. This is explained in the materials and methods section Confocal microscopy and image quantification (ln 149-172).

      We indicate that PSD-95 is a marker of excitatory synapses located both on excitatory and inhibitory neurons.

      Ln 258: RE boutons were identified in SO and SLM as axonal thickenings in close apposition to PSD-95-positive puncta (a synaptic scaffold used as a marker of excitatory synapses located both on excitatory and inhibitory neurons (Kornau et al., 1995; El-Husseini et al., 2000; Chen et al., 2011; Dharmasri et al., 2024). 

      We also cite literature demonstrating that RE projects to the hippocampal formation and forms asymmetric synapses with dendritic spines and dendrites, suggesting innervation of excitatory synapses on both excitatory and aspiny inhibitory neurons (ln 673).

      As advised by the reviewer the Figure 2C panel was moved to Figure 3 (now it is Fig 3A).

      (2) in the methods: The volume of intra-hippocampal CNO injections should be indicated. The concentration of 3 uM seems pretty low in comparison with previous studies. CNO source is missing.

      This section has been rewritten to be more clear. The concentration of CNO was chosen based on the previous studies (Stachniak et al., 2014).

      ln 103: Cannula placement. Mice were anesthetized by inhalation of 3–5% isoflurane (IsoFlo; Abbott Animal Health) in oxygen and positioned in a stereotaxic frame (51503, Stoelting, Wood Dale, IL, USA). Two holes were drilled in the skull, and a double guide cannulae (2 mm apart and 2 mm long; 26GA, Plastics One) was lowered into the holes such that the cannula tip was located over dorsal CA1 area (2 mm posterior to bregma, ±1 mm lateral, and −1.3 mm vertical). Cannulae were kept patent by using 33-gauge internal dummy cannulae (Plastics One). The animals were used in contextual fear conditioning 21 days after the cannulation. Animals received bilateral CNO (3 μM, 0.2 μl per side for 1 min; Tocris Bioscience, Cat. No. 4936) (Stachniak et al., 2014) or saline injections (0.2 μl per side) 30 minutes before Extinction session via intrahippocampal injection cannulae (33-gauge). After the infusion, the cannula was left in place for 30 seconds. The cannula placement was verified by histology, and only data from animals with correct cannula implants were included in statistical analyses.”

      (3) More details of what software/algorithm was used to score freezing should be included. 

      Freezing was automatically scored with VideoFreeze™ Software (Med Associates Inc.).

      (4) Antibody dilutions for IHC should be indicated. Secondary antibody incubation time should be indicated.

      The missing information is added.

      ln 144: Next, sections were incubated in 4°C overnight with primary antibodies directed against PSD-95 (1:500, Millipore, MAB 1598), washed three times in 0.3% Triton X-100 in PBS and incubated in room temperature for 90 minutes with a secondary antibody bound with Alexa Fluor 647 (1:500, Invitrogen, A31571). 

      (5) No statement about code and data availability is present.

      The statements are added.

      ln 785: Row data and the code used for analysis of confocal data is available at OSF (https://osf.io/bnkpx/).

      Reviewer #3 (Public review):

      Summary:

      This paper examined the role of nucleus reuniens (RE) projections to dorsal CA1 neurons in context fear extinction learning. First, they show that RE neurons send excitatory projections to the stratum oriens (SO) and the stratum lacunosum moleculare (SLM), but not the stratum radiatum (SR). After context fear conditioning, the synaptic connections between RE and dCA1 neurons in the SLM (but not the SO) are weakened (reduced bouton and spine density) after mice undergo context fear conditioning. This weakening is reversed by extinction learning, which leads to enhanced synaptic connectivity between RE inputs and dendrites in the SLM. Control experiments demonstrate that the observed changes are due to extinction and not caused by simple exposure to the context. Extinction learning also induced increases in the size (volume and surface area) of the post-synaptic density (PSD) in SLM. To establish the functional role of RE inputs to dCA1, the researchers used an inhibitory DREADD to silence this pathway during extinction learning. They observe that extinction memory (measured 2-hours or 24-hours later) is impaired by this inhibition. Control experiments show that the extinction memory deficit is not simply due to increased freezing caused by inactivation of the pathway or injections of CNO. Inhibiting the RO projection during extinction learning also reduced the levels of PSD-95 protein levels in the spines of dCA1 neurons.

      Strengths:

      Based on their results, the authors conclude that, "the RE→SLM pathway participates in the updating of fearful context value by actively regulating CFE-induced molecular and structural synaptic plasticity in the SLM.". I believe the data are generally consistent with this hypothesis, although there is an important control condition missing from the behavioral experiments.

      Weaknesses:

      (1) A defining feature of extinction learning is that it is context specific (Bouton, 2004). It is expressed where it was learned, but not in other environments. Similarly, it has been shown that internal contexts (or states) also modulate the expression of extinction (Bouton, 1990). For example, if a drug is administered during extinction learning, it can induce a specific internal state. If this state is not present during subsequent testing, the expression of extinction is impaired just as it is when the physical context is altered (Bouton, 2004). It is possible that something similar is happening in Figure 6. In these experiments, CNO is administered to inactivate the RE-dCA1 projection during extinction learning. The authors observe that this manipulation impairs the expression of extinction the next day (or 2-hours later). However, the drug is not given again during the test. Therefore, it is possible that CNO (and/or inactivation of the RE-dCA1 pathway) induces a state change during extinction that is not present during subsequent testing. Based on the literature cited above, this would be expected to disrupt fear extinction as the authors observed. To determine if this alternative explanation is correct, the researchers need to add groups that receive CNO during extinction training and subsequent extinction testing. If the deficits in extinction expression reported in Figure 6 result from a state change, then these groups should not exhibit an impairment. In contrast, if the authors' account is correct, then the expression of extinction should still be disrupted in mice that receive CNO during training and testing.

      We do agree with the reviewer that such an experiment would be interesting. However, it could be also confusing as we could not distinguish whether the possible behavioral effects are related to the state-dependent aspects of CFE or impaired recall of CFE. Importantly, previous studies showed that RE is crucial for extinction recall (Totty et al., 2023). We also show that CFE memory is impaired not only when the animals recall CFE without CNO (day 3) but also with CNO (day 4) (Figure 6C). Moreover, we do not see the effects of CNO on CFE in the control groups (Figure 6D and H). So we believe that it is unlikely that CNO results in state-dependent CFE.

      (2) In their analysis of dCA1 synapses after contextual fear extinction (CFE) (Figure 4), the authors should have compared Ctx and Ctx-Ctx animals against naïve animals (as they did in Figure 3) when comparing 5US and Ext with naïve animals. Otherwise, the authors cannot make the following conclusion; "since changes of SLM synapses were not observed in the animals exposed to the familiar context that was not associated with the USs, our data support the role of the described structural plasticity at the RE→SLM synapses in CFE, rather than in processing contextual information in general.".

      We assume that the key experimental groups to conclude about synaptic plasticity related to particular behavior are the groups that differ just by one factor/experience. For CFE that would be mice sacrificed immediately before and after CFE session (Figure 2 & 3); on the other hand to conclude about the effects of the re-exposure to the neutral context mice sacrificed before and after second exposure to the neutral context are needed (Figure 4). The naive group, as it differs by at least two manipulations from the Ext and Ctx-Ctx groups, is interesting but not crucial in both cases. This group would be necessary if we focused on the memories of FC or novel context. However, these topics are not the main focus of the current manuscript. Still, the naive group is shown on Figures 2 & 3 to check if CFE brings spine parameters to the levels observed in mice with low freezing.

      We have re-written the cited paragraph to be more precise in our conclusions. 

      "Overall, our data demonstrate that synapses in all dCA1 strata undergo structural or molecular changes relevant to CFC and/or CFE. However, only in SLM CFE-induced synaptic changes are likely to be directly regulated by RE inputs as they appear on RE+ dendrites and spines. Since such changes of SLM synapses were not observed in the animals re-exposed to the neutral context, our data support the role of the described structural plasticity at the RE→SLM synapses in CFE, rather than in processing contextual information in general."

      (3) In the materials and methods section, the description of cannula placements is confusing and needs to be rewritten.

      This section has been rewritten.

      ln 103: Cannula placement. Mice were anesthetized by inhalation of 3–5% isoflurane (IsoFlo; Abbott Animal Health) in oxygen and positioned in a stereotaxic frame (51503, Stoelting, Wood Dale, IL, USA). Two holes were drilled in the skull, and a double guide cannulae (2 mm apart and 2 mm long; 26GA, Plastics One) was lowered into the holes such that the cannula tip was located over dorsal CA1 area (2 mm posterior to bregma, ±1 mm lateral, and −1.3 mm vertical). Cannulae were kept patent by using 33-gauge internal dummy cannulae (Plastics One). The animals were used in contextual fear conditioning 21 days after the cannulation. Animals received bilateral CNO (3 μM, 0.2 μl per side for 1 min; Tocris Bioscience, Cat. No. 4936) (Stachniak et al., 2014) or saline injections (0.2 μl per side) 30 minutes before Extinction session via intrahippocampal injection cannulae (33-gauge). After the infusion, the cannula was left in place for 30 seconds. The cannula placement was verified by histology, and only data from animals with correct cannula implants were included in statistical analyses.”

    1. Author response:

      We are grateful to the reviewers for their thoughtful and constructive feedback on our manuscript. Based on the Public Reviews, we will address the concerns raised by each reviewer through a combination of new analyses, clarifications, and expanded discussion as outlined below:

      Reviewer #1:

      (1) Integration of Positive Selection Results:

      We will enhance the integration of positive selection analyses throughout the manuscript. Specifically, we will discuss how the positively selected sites in primates, including site 193, inform IFIT1 function. We will expand the discussion to explain how PAML, FUBAR, and MEME complement each other and why MEME did not detect site 193 in primates. Additionally, we will provide a rationale for focusing on the three sites identified in primates and address the overlap with bat orthologs.

      (2) Expression Levels and Antiviral Activity:

      We acknowledge the variability in IFIT1 ortholog expression levels. To address this, we will quantify and normalize protein expression to GAPDH across all orthologs, allowing for a more accurate comparison of antiviral activity. We will revise the text to clarify that species-specific diVerences in viral suppression may be influenced by expression levels.

      (3) Clarification of Terminology and Data Interpretation:

      We will refine our description of the antiviral eVects observed for SINV in Figure 4E. We will also revise statements related to protein expression in the relevant sections to improve accuracy.

      (4) Cohesion of Data:

      We will work to more tightly connect the evolutionary analysis with the functional virology data, framing the manuscript around how positive selection shapes IFIT1 function across species. 

      Reviewer #2:

      (1) Recombination Analysis of IFIT1:

      We will conduct a recombination analysis using GARD from the HyPhy package to ensure that the signatures of positive selection are not confounded by recombination between IFIT1 and IFIT1B. 

      (2) Clarification of IFIT1 Homologs Studied:

      We will provide additional details on how IFIT1 orthologs were selected, including addressing the relationship between IFIT1 and IFIT1B. We will support this by presenting additional sequence comparisons to demonstrate the orthology of the proteins studied.

      (3) Chimpanzee IFIT1 Loss of Function:

      We will revise the discussion of chimpanzee IFIT1 to better reflect the data. 

      (4) Presentation of Antiviral Specificity Data:

      We will include a supplementary table listing the percentage of infection normalized to control by VSV and VEEV for each ortholog to allow for clearer comparisons.

      Additionally, we will provide an alternative visualization to better compare the data sets. 

      Reviewer #3:

      (1) Alternative Hypotheses for IFIT1 Antiviral Activity such as IFIT1-IFIT interactions:

      We will expand the discussion to consider alternative hypotheses, including the potential for IFIT1 activity to be regulated through interactions with other IFIT family members. Therefore, we will address how IFIT1-IFIT interactions may be broadly applicable to our findings with IFIT1 orthologs. In addition, we will clarify that we do not conclude that residues 362/4/6 are the sole drivers of antiviral specificity across the orthologs tested in this study.

      (2) Generalization of Findings Across Orthologs:

      We acknowledge that the functional importance of residues 362/4/6 may not be generalizable across all orthologs. We will discuss this limitation more explicitly in the manuscript, while also expanding on how these findings apply specifically to primate IFIT1 orthologs.

      We believe that these revisions will address the key concerns raised by the reviewers and strengthen the manuscript. We look forward to submitting the revised version for further consideration.

    1. Author Response:

      We are grateful to the reviewers for their encouraging comments and constructive suggestions. These suggestions will be valuable to improve the revised manuscript.

      Reviewer 1:

      PD-1 signaling is suppressive to the establishment of cytokine-producing effector cells in general. However, as the reviewer pointed out, one of the results in Fig. 2H showing a decrease of IFN-gamma-producing cells is against this trend. The data indicate percentages of cytokine-producing cells, which are not always consistent with the absolute number of activated T cells. Nonetheless, we plan additional experiments in order to address the question.

      For PD-1YFYF experiments in Figs. 3-5, there were moderate changes in cytokine production between wild-type and mutant PD-1. We conducted gene transduction to newly prepared T cells in each experiment. In addition, to monitor the immunosuppressive effect of PD-1 agonist antibodies, these T cells were stimulated using PD-L1-deficient APC. Therefore, we think these cytokine levels were most likely a technical variation, but not specific function of PD-1YFYF.

      Anti-PD-L1 mAb was used for the optimal blockade of PD-1/PD-L1 blockade, and the concentration of antibody (5 microg/ml) is within a normal range for this purpose. We used variable concentrations of OVA peptide to set up experiments with different intensities of TCR stimulation. TCR signal intensity has been shown to affect CD4+ T cell differentiation into Th1 and Th2 cells. We lowered the peptide concentration to test the effect of PD-1 signals under the suboptimal TCR stimulation.

      Reviewer 2:

      Antigen-specific T cells from immunized mice are not ideal for Th differentiation studies because activated T cells in response to the antigen might have already undergone functional differentiation in vivo. Incorporating the reviewer’s suggestion, we will test alternative approach including human CD4+ T cells.

      For the allergy model, we will expand the analysis for inflammatory effectors.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The main research question could be defined more clearly. In the abstract and at some points throughout the manuscript, the authors indicate that the main purpose of the study was to assess whether the allocation of endogenous attention requires saccade planning [e.g., ll.3-5 or ll.247-248]. While the data show a coupling between endogenous attention and saccades, they do not point to a specific direction of this coupling (i.e., whether endogenous attention is necessary to successfully execute a saccade plan or whether a saccade plan necessarily accompanies endogenous attention).

      Thanks for the suggestion. We have modified the text in the abstract and at various points in the text to make it more clear that the study investigates the relationship between attention and saccades in one particular direction, first attentional deployment and then saccade planning.

      Some of the analyses were performed only on subgroups of the participants. The reporting of these subgroup analyses is transparent and data from all participants are reported in the supplementary figures. Still, these subgroup analyses may make the data appear more consistent, compared to when data is considered across all participants. For instance, the exogenous capture in Experiments 1 and 2 appears much weaker in Figure 2 (subgroup) than Figure S3 (all participants). Moreover, because different subgroups were used for different analyses, it is often difficult to follow and evaluate the results. For instance, the tachometric curves in Figure 2 (see also Figure 3 and 4) show no motor bias towards the cue (i.e., performance was at ~50% for rPTs <75 ms). I assume that the subsequent analyses of the motor bias were based on a very different subgroup. In fact, based on Figure S2, it seems that the motor bias was predominantly seen in the unreliable participants. Therefore, I often found the figures that were based on data across all participants (Figures 7 and S3) more informative to evaluate the overall pattern of results.

      Indeed, our intent was to dissociate the effects on saccade bias and timing as clearly as possible, even if that meant having to parse the data into subgroups of participants for different analyses. We do think conceptually this is the better strategy, because the bias and timing effects were distinct and not strongly correlated with specific participants or task variants. For instance, the unreliable participants were somewhat more consistently biased in the same direction, but the reliable participants also showed substantial biases, so the difference in magnitude was relatively modest. This can be more easily appreciated now that the reliable and unreliable participants are indicated in Figures 3 and 5. The impact of the bias is also discussed further in the last paragraphs of the Results, which note that the bias was not a reliable predictor of overall success during informed choices.

      Reviewer #3 (Public Review):

      (1) In this experimental paradigm, participants must decide where to saccade based on the color of the cue in the visual periphery (they should have made a prosaccade toward a green cue and an antisaccade away from a magenta cue). Thus, irrespective of whether the cue signaled that a prosaccade or an antisaccade was to be made, the identity of the cue was always essential for the task (as the authors explain on p. 5, lines 129-138). Also, the location where the cue appeared was blocked, and thus known to the participants in advance, so that endogenous attention could be directed to the cue at the beginning of a trial (e.g., p. 5, lines 129-132). These aspects of the experimental paradigm differ from the classic prosaccade/antisaccade paradigm (e.g. Antoniades et al., 2013, Vision Research). In the classic paradigm, the identity of the cues does not have to be distinguished to solve the task, since there is only one stimulus that should be looked at (prosaccade) or away from (antisaccade), and whether a prosaccade or antisaccade was required is constant across a block of trials. Thus, in contrast to the present paradigm, in the classic paradigm, the participants do not know where the cue is about to appear, but they know whether to perform a prosaccade or an antisaccade based on the location of the cue.

      The present paradigm keeps the location of the cue constant in a block of trials by intention, because this ensures that endogenous attention is allocated to its location and is not overpowered by the exogenous capture of attention that would happen when a single stimulus appeared abruptly in the visual field. Thus, the reason for keeping the location of the cue constant seems convincing. However, I wondered what consequences the constant location would have for the task representations that persist across the task and govern how attention is allocated. In the classic paradigm, there is always a single stimulus that captures attention exogenously (as it appears abruptly). In a prosaccade block, participants can prioritize the visual transient caused by the stimulus, and follow it with a saccade to its coordinates. In an antisaccade block, following the transient with a saccade would always be wrong, so that participants could try to suppress the attention capture by the transient, and base their saccade on the coordinates of the opposite location. Thus, in prosaccade and antisaccade blocks, the task representations controlling how visual transients are processed to perform the task differ. In the present task, prosaccades and antisaccades cannot be distinguished by the visual transients. Thus, such a situation could favor endogenous attention and increase its influence on saccade planning, even though saccade planning under more naturalistic conditions would be dominated by visual transients. I suggest discussing how this (and vice versa the emphasis on visual transients in the classic paradigm) could affect the generality of the presented findings (e.g., how does this relate to the interpretation that saccade plans are obligatorily coupled to endogenous attention? See, Results, p. 10, lines 306-308, see also Deubel & Schneider, 1996, Vision Research).

      Great discussion point. There are indeed many ways to set up an experiment where one must either look to a relevant cue or look away from it. Furthermore, it is also possible to arrange an experiment where the behavior is essentially identical to that in the classic antisaccade task without ever introducing the idea of looking away from something (Oor et al., 2023). More important than the specific task instructions or the structure of the event sequence, we think the fundamental factors that determine behavior in all of these cases are the magnitudes of the resulting exogenous and endogenous signals, and whether they are aligned or misaligned. Under urgent conditions, consideration of these elements and their relevant time scales explains behavior in a wide variety of tasks (see Salinas and Stanford, 2021). Furthermore, a recent study (Zhu et al., 2024) showed that the activation patterns of neurons in monkey prefrontal cortex during the antisaccade task can be accurately predicted from their stimulus- and saccade-related responses during a simpler task (a memory guided saccade task). This lends credence to the idea that, at the circuit level, the qualities that are critical for target selection and oculomotor performance are the relative strengths of the exogenous and endogenous signals, and their alignment in space and time. If we understand what those signals are, then it no longer matters how they were generated. The Discussion now includes a paragraph on this issue.

      (2) Discussion (p. 16, lines 472-475): The authors suppose that "It is as if the exogenous response was automatically followed by a motor bias in the opposite direction. Perhaps the oculomotor circuitry is such that an exogenous signal can rapidly trigger a saccade, but if it does not, then the corresponding motor plan is rapidly suppressed regardless of anything else.". I think this interesting point should be discussed in more detail. Could it also be that instead of suppression, other currently active motor plans were enhanced? Would this involve attention? Some attention models assume that attention works by distributing available (neuronal) processing resources (e.g., Desimone & Duncan, 1995, Annual Review of Neuroscience; Bundesen, 1990, Psychological Review; Bundesen et al., 2005, Psychological Review) so that the information receiving the largest share of resources results in perception and is used for action, but this happens without the active suppression of information.

      The rebound seen after the exogenously driven changes is certainly interesting, and we agree that it could involve not only the suppression of a specific motor plan but also enhancement of another (opposite) plan. However, we think that, given the lack of prior data with the requisite temporal precision, further elaboration of this point would just be too speculative in the context of the point that we are trying to make, which is simply that the underlying choice dynamics are more rapid and intricate than is generally appreciated.

      (3) Methods, p. 19, lines 593-596: It is reported that saccades were scored based on their direction. I think more information should be provided to understand which eye movements entered the analysis. Was there a criterion for saccade amplitude? I think it would be very helpful to provide data on the distributions of saccade amplitudes or on their accuracy (e.g. average distance from target) or reliability (e.g. standard deviation of landing points). Also, it is reported that some data was excluded from the analysis, and I suggest reporting how much of the data was excluded. Was the exclusion of the data related to whether participants were "reliable" or "unreliable" performers?

      The reported results are based on all saccades (detected according to a velocity threshold) that were produced after the go signal and in a predominantly horizontal direction (within ± 60° of the cue or non-cue), which were the vast majority (> 99%). Indeed, most saccades were directed to the choice targets, with 95% of them within ± 14.2° of the horizontal plane. The excluded (non-scored) trials were primarily fixation breaks plus a small fraction of trials with blinks, which compromised saccade determination. There was no explicit amplitude criterion; applying one (for instance, excluding any saccades with amplitude < 2°) produced minimal changes to the data. Overall, saccade amplitudes were distributed unimodally with a median of 7.7° and a 95% confidence interval of [3.7°, 9.7°], whereas the choice targets were located at ± 8° horizontally. This is now reported in the Methods.

      As far as data exclusion, analyses were based on urgent trials (gap > 0); non-urgent (gap < 0) trials were excluded from calculation of the tachometric curves simply because they might correspond to a slightly different regime (go signal after cue onset) and to long processing times in the asymptotic range (rPT in 200–300 ms) or beyond, which are not as informative. However, including them made no appreciable difference to the results. No data were excluded based on participant performance or identity; all psychometric analyses were carried out after the selection of trials based on the scoring criteria described above. This is now stated in the Methods.

      (4) Results, p. 9, lines 262-266: Some data analyses are performed on a subset of participants that met certain performance criteria. The reasons for this data selection seem convincing (e.g. to ensure empirical curves were not flat, line 264). Nevertheless, I suggest to explain and justify this step in more detail. In addition, if not all participants achieved an acceptable performance and data quality, this could also speak to the experimental task and its difficulty. Thus, I suggest discussing the potential implications of this, in particular, how this could affect the studied mechanisms, and whether it could limit the presented findings to a special group within the studied population.

      The ideal (i.e., best) analysis for determining the cost of an antisaccade for each individual participant (Fig. 4c) was based on curve fitting and required task performance to rise consistently above chance at long rPTs in both pro and anti trials. This is why the mentioned conditions on the fits were imposed. This is now explained in the text. This ideal analysis was not viable for all tachometric curves not necessarily because of task difficulty but also because of high variability or high bias in a particular experiment/condition. It is true that the task was somewhat difficult, but this manifested in various ways across the dataset, so attempting to draw a clean-cut classification of participants based on “difficulty” may not be easy or all that informative (as can be gleaned from Fig. S1). There simply was a range of success levels, as one might expect from any task that requires some nontrivial cognitive processing. Also note that no participants were excluded flat out from analysis. Thus, at the mentioned point in the text, we simply note that a complementary analysis is presented later that includes all participants and all conditions and provides a highly consistent result (namely, Fig. 7e). Then, in the last section of the Results, where Fig. 7 is presented, we point out that there is considerable variance in performance at long rPTs, and that it relates to both the bias and the difficulty of the task across participants.   

      Reviewer #1 (Recommendations For The Authors):

      (1) I have some questions related to the initial motor bias:

      a) Based on Figure S3, which shows the tachometric curves using data from all participants, there only seems to be a systematic motor bias in Experiments 1 and 3 but no bias in Experiments 2 and 4. It is unclear to me why this is different from the data shown in Figure 7.

      For the bars in Fig. 7, accuracy (% correct) was computed for each participant and then averaged across participants, whereas for the data in Fig. S3, trials were first pooled across participants and then accuracy was computed for each rPT bin. The different averaging methods produce slightly different results because some participants had more trials in the guessing range than others, and different biases.  

      b) Based on Figure 7 (and Figure S3), there was no motor bias in Experiment 4. Based on the correlations between motor bias and time difference between pro and antisaccades, I would expect that the rise points between pro and antisaccades would be more similar in this Experiment. Was this the case?

      No. Figs. 3c and S3d show that the rise times of pro and anti trials for Experiment 4 still differ by about 30 ms (around the 75% correct mark), and the rest of the panels in those figures show that the difference is similar for all experiments. What happens is that Figs. 7 and S3 show that on average the bias is zero for Experiment 4, but that does not mean that the average difference in rise times is zero because there is an offset in the data (correlation is not the same as regression). The most relevant evidence is in Fig. 6c, which shows that, for an overall bias of zero, one would still expect a positive difference in rise times of about 25–30 ms. This figure now includes a regression line, and the corresponding text now explains the relationship between bias and rise times more clearly. Thanks for asking; this is an important point that was not sufficiently elaborated before.

      c) If I understand correctly, the initial motor bias was predominantly observed in participants who were classified as 'unreliable performers' (comparing Figure S2 and Figure 2). Was there a correlation between the motor bias and overall success in the task? In other words: Was a strong motor bias generally disadvantageous?

      Good question. Participants classified as ‘unreliable’ were somewhat more consistently biased in the same direction than those classified as ‘reliable’, but the distinction in magnitude was not large. This can be better appreciated now in Fig. 5 by noting the mix of black (reliable) and gray labels (unreliable) along the x axes. The unreliable participants were also, by definition, less accurate in their asymptotic performance in at least one experiment (Fig. S1). In general, however, this classification was used simply to distinguish more clearly the two main effects in the data (timing cost and bias). In fact, the motor bias was not a reliable predictor of performance during informed choices: across all participants, the mean accuracy in the asymptotic range (rPT > 200 ms) had a weak, non-significant correlation with the bias (ρ = ‒0.07, p = 0.7). So, no, the motor bias did not incur an obvious disadvantage in terms of overall success in the task. Its more relevant effect was the asymmetry in performance that it promoted between pro- and antisaccade trials (Fig. 6c). This is now explained at the end of the Results.

      (2) One of the key analyses of the current study is the comparison of the rPT required to make informed pro and antisaccades (ll.246 ff). I think it would be informative for readers to see the results of this analysis separately for all four experiments. For instance, based on Figure 4a and b, it looks like the rise points were actually very similar between pro and antisaccades in Experiment 1.

      We agree that the ideal analysis would be to compute the performance rise point for pro- and antisaccade curves for each experiment and each participant, but as is now noted in the text, this requires a steady and substantial rise in the tachometric curve, which is not always obtained at such a fine-grained level; the underlying variability can be glimpsed from the individual points in Fig. 7a, b. Indeed, in Fig. 4a, b the mean difference between pro and anti rise points appears small for Experiment 1 — but note that the two panels include data from only partially overlapping sets of participants; the figure legend now makes this more clear. Again, this is because the required fitting procedure was not always reliable in both conditions (pro and anti) for a given subject in a given experiment. Thus, panels a and b cannot be directly compared. The key results are those in Fig. 4c, which compare the rise points in the two conditions for the same participants (11 of them, for which both rise points could be reliably determined). In that case the mean difference is evident, and the individual effect consistent for 9 of the 11 participants (as now noted).

      A similar comparison for Experiments 1 or 2 individually would include fewer data points and lose statistical power. However, on average, the results for Experiments 1 and 2 (separately) were indeed very similar; in both cases, the comparison between pro and anti curves pooled across the same qualifying participants as in Fig. 4c produced results that were nearly identical to those of Fig. 4d (as can be inferred from Fig. 2a, b). Furthermore, results for the four individual experiments pooled across all participants are presented in Figure S3, which shows delayed rises in antisaccade performance consistent with the single participant data (Fig. 4c).

      (3) Figure 3: It would be helpful to indicate the reliable performers that were used for Figure 3a in the bar plots in Figure 3b. Same for Figures 3c and d.

      Done. Thanks for the suggestion.

      (4) Introduction: The literature on the link between covert attention and directional biases in microsaccades seems relevant in the context of the current study (e.g., Hafed et al., 2002, Vision Res; Engbert & Kliegl, 2003, Vision Res; Willett & Mayo, 2023, Proc Natl Acad Sci USA).

      Yes, thanks for the suggestion. The introduction now mentions the link between attentional allocation and microsaccade production.

      (5) ll.395ff & Figure 7f: Please clarify whether data were pooled across all four experiments for this analysis.

      Yes, the data were pooled, but a positive trend was observed for each of the four experiments individually. This is now stated.

      (6) ll.432-433: There is evidence that the attentional locus and the actual saccade endpoint can also be dissociated (e.g., Wollenberg et al., 2018, PLoS Biol; Hanning et al., 2019, Proc Natl Acad Sci USA).

      True. We have rephrased accordingly. Thanks for the correction.

      (7) ll.438-440: This sentence is difficult to parse.

      Fixed.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is well-written and compelling. The biggest issue for me was keeping track of the specifics of the individual experiments. I think some small efforts to reinforce those details along the way would help the reader. For example, in the Figure 3 figure legend, I found the parenthetical phrase "high luminence cue, low luminence non-cue)" immensely helpful. It would be helpful and trivial to add the corresponding phrase after "Experiment 4" in the same legend.

      Thanks for the suggestion. Legends and/or labels have been expanded accordingly in this and other figures.

      Line 314: "..had any effect on performance,..." Should there be a callout to Figure 2 here?

      Done.

      It wasn't clear to me why the specific high and low luminance values (48 and 0.25) were chosen. I assume there was at least some quick perceptual assessment. If that's the case or if the values were taken from prior work, please include that information.

      Done.

      Reviewer #3 (Recommendations For The Authors):

      Minor points. Please note that the comments made in the public review above are not repeated here.

      (1) Introduction, p. 2, lines 41-45: It is mentioned that the effects of covert attention or a saccade can be quite distinct. I suggest specifying in what way.

      Done.

      (2) Introduction, p. 2, lines 46-47: It is said that the relation between attention and saccade planning was still uncertain and then it is stressed that this was the case for more natural viewing conditions. However, the discussed literature and the experimental approach of the current study still rely on experimental paradigms that are far from natural viewing conditions. Thus, I suggest either discussing the link between these paradigms and natural viewing in more detail or leaving out the reference to natural viewing at this point (I think the latter suggestion would fit the present paper best).

      We followed the latter suggestion.

      (3) Introduction (e.g. p. 3, lines 55-58): The authors discuss the effects that sustaining fixation might have on attention and eye movements. Recently, it has been found that maintaining fixation can ameliorate cognitive conflicts that involve spatial attention (Krause & Poth, 2023, iScience). It seems interesting to include this finding in the discussion, because it supports the authors' view that it is necessary to study fixation and eye movements rather than eye movements alone to uncover their interplay with attention and decision-making.

      Thanks for the reference. The reported finding is certainly interesting, but we find it somewhat tangential to the specific point we make about strong fixation constraints — which is that they suppress internally driven motor activity, including biases, that are highly informative of the relationship between attention and saccade planning (lines 466‒472, 541‒561). Whether fixation state has other subtle consequences for cognitive control is an intriguing, important issue, for sure. But we would rather maintain the readers’ focus on the reasons why less restrictive fixation requirements are relevant for understanding the deployment of attention.

      (4) Results, p. 9, lines 264-266: It is reported that "The rise points were statistically the same across experiments for both prosaccades (p=0.08, n=10, permutation test)...", but the p-value seems quite close to significance. I suggest mentioning this and phrasing the sentence a bit more carefully.

      We now refer to the rise points as “similar”.

      (5) Figure 7 a-d: It might help readers who first skim through the figures before reading the text to use other labels for the bins on the x-axis that spell out the name of the phase in the trial. It might also help to visualize the bins on the plot of a tachymetric function (in this case, changing the labels could be unnecessary).

      Thanks for the suggestion. We added an insert to the figure to indicate the correspondence between labels and time bins more intuitively.

      (6) Methods, p. 18, lines 566-567: On some trials, participants received an auditory beep as a feedback stimulus. As this could induce a burst of arousal, I wondered how it affected the subsequent trials.

      This is an interesting issue to ponder. We agree that, in principle, the beep could have an impact on arousal. However, what exactly would be predicted as a consequence? The absence of a beep is meant to increase the urgency of the participant, so some effect of the beep event on RT would be expected anyway as per task instructions. Thus, it is unclear whether an arousal contribution could be isolated from other confounds. That said, three observations suggest that, at most, an independent arousal effect would be very small. First, we have performed multisensory experiments (unpublished) with auditory and visual stimuli, and have found that it is difficult to obtain a measurable effect of sound on an urgent visual choice task unless the experimental conditions are particularly conducive; namely, when the visual stimuli are dim and the sound is loud and lateralized. None of these conditions applies to the standard feedback beep. Second, because most trials are on time, the meaningful feedback signal is conveyed by the absence of the beep. But this signal to alter behavior (i.e., respond sooner) has zero intensity and is therefore unlikely to trigger a strong exogenous, automatic response. Finally, in our data, we can parse the trials that followed a beep (the majority) from those that did not (a minority). In doing so, we found no differences with respect to perceptual performance; only minor differences in RT that were identical for pro- and antisaccade trials. All this suggests to us that it is very unlikely that the feedback alters arousal significantly on specific trials, somehow impacting the tachometric curve (a contribution to general arousal across blocks or sessions is possible, of course, but would be of little consequence to the aims of the study).

      (7) Methods, p. 18, lines 574-577: I suggest referring to the colors or the conditions in the text as it was done in the experiments, just to prevent readers being confused before reading the methods.

      We appreciate the thought, but think that the study is easier to understand by pretending, initially, that the color assignments were fixed. This is a harmless simplification. Mentioning the actual color assignments early on would be potentially more confusing and make the description of the task longer and more contrived.

      (8) Methods, p. 18, Table 1: Given that the authors had a spectrophotometer, I suggest providing (approximate) measurements for the stimulus colors in addition to the luminance (i.e. not just RGB values).

      Unfortunately, we have since switched the monitor in our setup, so we don’t have the exact color measurements for the stimuli used at the time. We will keep the suggestion in mind for future studies though.

      References

      Oor EE, Stanford TR, Salinas E (2023) Stimulus salience conflicts and colludes with endogenous goals during urgent choices. iScience 26:106253.

      Salinas E, Stanford TR (2021) Under time pressure, the exogenous modulation of saccade plans is ubiquitous, intricate, and lawful. Curr Opin Neurobiol 70:154-162.

      Zhu J, Zhou XM, Constantinidis C, Salinas E, Stanford TR (2024) Parallel signatures of cognitive maturation in primate antisaccade performance and prefrontal activity. iScience.  doi: https://doi.org/10.1016/j.isci.2024.110488.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for insightful feedback on how we could improve the manuscript. We have revised the manuscript and addressed the points raised.

      Regarding the technical issues raised about the quality of patch clamp recordings (Reviewer 2), we acknowledge that the upper limit of the access resistance cutoff should be lower and that the accepted change should be 10-20%. To this end, we have revised the manuscript to more accurately detail the quality metrics used. The access resistance for the neurons in paired recordings were below 40 MΩ (similar to the metric used by Kolb et al. 2019), and if the access changed above 50 MΩ, we stopped recording from that neuron. Furthermore, the inclusion of neurons in the histogram with access resistance above 50 MΩ was to highlight the total number of neurons patched but not necessarily used in paired recordings. As this was done with an automated robotic system, the neurons would still undergo an initial voltage clamp and current clamp protocol before the pipette would release the neuron and patch another cell. To the point of Reviewer 2, this patch-walk protocol could also be alternatively implemented using manual recording approaches and this point has been included in the revised manuscript.

      Regarding the spatial restrictions (Reviewer 3), we agree that the average intersomatic distance is higher than ideal. This was likely due to failed patch attempts; for instance, if one pipette successfully achieved whole cell, and the other pipette had several sequential failed patch attempts, the intersomatic distance (ISD) would increase with each failed attempt due to the user selected index of cells. Ideally, the pipettes would be walking across a slice with low ISD if the whole-cell success rate was closer to 100%. To overcome this challenge in future work, automated cell identification and tracking could enable the path planning to be continuously updated after each patch attempt. Given the whole-cell success rate efficiency for a given electrophysiologist, we believe that the automated robot could be improved in later versions to include routeplanning algorithms to minimize the distance between neurons. Alternatively, this patch-walk system could also be integrated to improve connectivity yields for manual recording approaches as well.

      For the point raised about morphological identification, we believe that while important, morphological identification is out of the scope for this project. Future work will include neuronal reconstruction. Regarding the other points, we will amend the manuscript to highlight other key metrics such as maximum time we could hold a neuron under the whole-cell configuration. Additionally, we agree with Reviewer 3 that some of the current language may cause confusion, and we will amend it accordingly.

      To all the reviewers, thank you for your time, understanding, and the opportunity to improve our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      As the scientific community identifies increasing numbers of genetic variants that cause rare human diseases, a challenge is how the field can most quickly identify pharmacological interventions to address known deficits. The authors point out that defining phenotypic outcomes required for drug screen assays is often challenging, and emphasize how invertebrate models can be used for quick ID of compounds that may address genetic deficits. A major contribution of this work is to establish a framework for potential intervention drug screening based on quantitative imaging of morphology and mobility behavior, using methods that the authors show can define subtle phenotypes in a high proportion of disease gene knockout mutants. 

      Overall, the work constitutes an elegant combination of previously developed high-volume imaging with highly detailed quantitative phenotyping (and some paring down to specific phenotypes) to establish proof of principle on how the combined applications can contribute to screens for compounds that may address specific genetic deficits, which can suggest both mechanism and therapy. 

      In brief, the authors selected 25 genes for which loss of function is implicated in human neuro-muscular disease and engineered deletions in the corresponding C. elegans homologs. The authors then imaged morphological features and behaviors prior to, during, and after blue light stimuli, quantitating features, and clustering outcomes as they elegantly developed previously (PMID 35322206; 30171234; 30201839). In doing so, phenotypes in 23/25 tested mutants could be separated enough to distinguish WT from mutant and half of those with adequate robustness to permit high-throughput screens, an outcome that supports the utility of general efforts to ID phenotypes in C. elegans disease orthologs using this approach. A detailed discussion of 4 ciliopathy gene defects, and NACLN-related channelopathy mutants reveals both expected and novel phenotypes, validating the basic approach to modeling vetted targets and underscoring that quantitative imaging approaches reiterate known biology. The authors then screened a library of nearly 750 FDA-approved drugs for the capacity to shift the unc-80 NACLN channel-disrupted phenotype closer to the wild type. Top "mover" compound move outcome in the experimental outcome space; and also reveal how "side effects" can be evaluated to prioritize compounds that confer the fewest changes of other parameters away from the center. 

      Strengths: 

      Although the imaging and data analysis approaches have been reported and the screen is limited in scope and intervention exposure, it is important that the authors strongly combine individual approach elements to demonstrate how quantitative imaging phenotypes can be integrated with C. elegans genetics to accelerate the identification of potential modulators of disease (easily extendable to other goals). Generation of deletion alleles and documentation of their associated phenotypes (available in supplemental data) provide potentially useful reagents/data to the field. The capacity to identify "over-shooting" of compound applications with suggestions for scale back and to sort efficacious interventions to minimize other changes to behavioral and physical profiles is a strong contribution. 

      Weaknesses: 

      The work does not have major weaknesses, although it may be possible to expand the discussion to increase utility in the field: 

      (1) Increased discussion of the challenges and limitations of the approach may enhance successful adaptation application in the field. 

      It is quite possible that morphological and behavioral phenotypes have nothing to do with disease mechanisms and rather reflect secondary outcomes, such that positive hits will address "off-target" consequences. 

      This is possible and can only be determined with human data. We now discuss the possibility in the discussion.

      The deletion approach is adequately justified in the text, but the authors may make the point somewhere that screening target outcomes might be enhanced by the inclusion of engineered alleles that match the human disease condition. Their work on sod-1 alleles (PMID 35322206) might be noted in this discussion. 

      We agree and now mention this work in the discussion. We are currently working on a collection of strains with patient-specific mutations.

      Drug testing here involved a strikingly brief exposure to a compound, which holds implications for how a given drug might engage in adult animals. The authors might comment more extensively on extended treatments that include earlier life or more extended targeting. The assumption is that administering different exposure periods and durations, but if the authors are aware as to whether there are challenges associated with more prolonged applications, larger scale etc. it would be useful to note them. 

      More prolonged applications are definitely possible. We chose short treatments for this screen to model the potential for changing neural phenotypes once developmental effects of the mutation have already occurred. We now briefly discuss this choice and the potential of longer treatments in the discussion.

      (2) More justification of the shift to only a few target parameters for judging compound effectiveness. 

      - In the screen in Figure 4D and text around 313, 3 selected core features of the unc-80 mutant (fraction that blue-light pause, speed, and curvature) were used to avoid the high replicate requirements to identify subtle phenotypes. Although this strategy was successful as reported in Figure 5, the pared-down approach seems a bit at odds with the emphasis on the range of features that can be compared mutant/wt with the author's powerful image analysis. Adding details about the reduced statistical power upon multiple comparisons, with a concrete example calculated, might help interested scientists better assess how to apply this tool in experimental design. 

      To empirically test the effect of including more features on the subsequent screen, we have repeated the analysis using increasing numbers of features. In a new supplementary figure we find increasing the number of features reduces our power to detect rescue. At 256 features, we would not be able to detect any compounds that rescued the disease model phenotype.

      (3) More development of the side-effect concept. The side effects analysis is interesting and potentially powerful. Prioritization of an intervention because of minimal perturbation of other phenotypes might be better documented and discussed a bit further; how reliably does the metric of low side effects correlate with drug effectiveness? 

      Ultimately this can only be determined with clinical trial data on multiple drugs, but there are currently no therapeutic options for UNC80 deficiency in humans. We have included some extra discussion of the side effect concept.

      Reviewer #2 (Public Review): 

      Summary and strengths: 

      O'Brien et al. present a compelling strategy to both understand rare disease that could have a neuronal focus and discover drugs for repurposing that can affect rare disease phenotypes. Using C. elegans, they optimize the Brown lab worm tracker and Tierpsy analysis platform to look at the movement behaviors of 25 knockout strains. These gene knockouts were chosen based on a process to identify human orthologs that could underlie rare diseases. I found the manuscript interesting and a powerful approach to making genotype-phenotype connections using C. elegans. Given the rate at which rare Mendelian diseases are found and candidate genes suggested, human geneticists need to consider orthologous approaches to understand the disease and seek treatments on a rapid time scale. This approach is one such way. Overall, I have a few minor suggestions and some specific edits. 

      Weaknesses: 

      (1) Throughout the text on figures, labels are nearly impossible to read. I had to zoom into the PDF to determine what the figure was showing. Please make text in all figures a minimum of 10-point font. Similarly, the Figure 2D point type is impossible to read. Points should be larger in all figures. Gene names should be in italics in all figures, following C. elegans convention. 

      We have updated all figures with larger labels and, where necessary, split figures to allow for better readability. We’ve also corrected italicisation.

      (2) I have a strong bias against the second point in Figure 1A. Sequencing of trios, cohorts, or individuals NEVER identifies causal genes in the disease. This technique proposes a candidate gene. Future experiments (oftentimes in model organisms) are required to make those connections to causality. Please edit this figure and parts of the text. 

      We have removed references to causation. We were thinking of cases where a known variant is found in a patient where causality has already been established rather than cases of new variant discovery.

      (3) How were the high-confidence orthologs filtered from 767 to 543 (lines 128-131)? Also, the choice of the final list of 25 genes is not well justified. Please expand more about how these choices were made. 

      We now explain the extra keyword filtering step. For the final filtering step, we simply examined the list and chose 25. There is therefore little justification to provide and we acknowledge these cannot be seen as representative of the larger set according to well-defined rules. The choice was based on which genes we thought would be interesting using their descriptions or our prior knowledge (“subjective interestingness” in the main text).

      (4) Figures 3 and 4, why show all 8289 features? It might be easier to understand and read if only the 256 Tierpsy features were plotted in the heat maps. 

      In this case, we included all features because they were all tested for differences between mutants and controls. By consistently using all features for each fingerprint we can be sure that the features that are different that we want to highlight in box plots can be referred to in the fingerprint.

      (5) The unc-80 mutant screen is clever. In the feature space, it is likely better to focus on the 256 less-redundant Tierpsy features instead of just a number of features. It is unclear to me how many of these features are correlated and not providing more information. In other words, the "worsening" of less-redundant features is far more of a concern than the "worsening" of 1000 correlated features. 

      This is a good point. We’ve redone the analysis using the Tierpsy 256 feature set and included this as a supplementary figure. We find that the same trend exists when looking at this reduced feature set.

      Reviewer #3 (Public Review): 

      In this study, O'Brien et al. address the need for scalable and cost-effective approaches to finding lead compounds for the treatment of the growing number of Mendelian diseases. They used state-of-the-art phenotypic screening based on an established high-dimensional phenotypic analysis pipeline in the nematode C. elegans. 

      First, a panel of 25 C. elegans models was created by generating CRISPR/Cas9 knock-out lines for conserved human disease genes. These mutant strains underwent behavioral analysis using the group's published methodology. Clustering analysis revealed common features for genes likely operating in similar genetic pathways or biological functions. The study also presents results from a more focused examination of ciliopathy disease models. 

      Subsequently, the study focuses on the NALCN channel gene family, comparing the phenotypes of mutants of nca-1, unc-77, and unc-80. This initial characterization identifies three behavioral parameters that exhibit significant differences from the wild type and could serve as indicators for pharmacological modulation. 

      As a proof-of-concept, O'Brien et al. present a drug repurposing screen using an FDA-approved compound library, identifying two compounds capable of rescuing the behavioral phenotype in a model with UNC80 deficiency. The relatively short time and low cost associated with creating and phenotyping these strains suggest that high-throughput worm tracking could serve as a scalable approach for drug repurposing, addressing the multitude of Mendelian diseases. Interestingly, by measuring a wide range of behavioural parameters, this strategy also simultaneously reveals deleterious side effects of tested drugs that may confound the analysis. 

      Considering the wealth of data generated in this study regarding important human disease genes, it is regrettable that the data is not actually made accessible. This diminishes the study's utility. It would have a far greater impact if an accessible and user-friendly online interface were established to facilitate data querying and feature extraction for specific mutants. This would empower researchers to compare their findings with the extensive dataset created here. Otherwise, one is left with a very limited set of exploitable data. 

      We have now made the feature data available on Zenodo (https://doi.org/10.5281/zenodo.12684118) as a matrix of feature summaries and individual skeleton timeseries data (the feature matrix makes it more straightforward to extract the data from particular mutants for reanalysis). We have also created a static html version of the heatmap in Figure 2 containing the entire behavioural feature set extracted by Tierpsy. This can be opened in a browser and zoomed for detailed inspection. Mousing over the heatmap shows the names of features at each position making it easier to arrive at intuitive conclusions like ‘strain A is slow’ or ‘strain B is more curved’.

      Another technical limitation of the study is the use of single alleles. Large deletion alleles were generated by CRISPR/Cas9 gene editing. At first glance, this seems like a good idea because it limits the risk that background mutations, present in chemically-generated alleles, will affect behavioral parameters. However, these large deletions can also remove non-coding RNAs or other regulatory genetic elements, as found, for example, in introns. Therefore, it would be prudent to validate the behavioral effects by testing additional loss-of-function alleles produced through early stop codons or targeted deletion of key functional domains. 

      We have added a note in the main text on limitations of deletion alleles. We like the idea of making multiple alleles in future studies, especially in cases where a project is focussed on just one or a few genes.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors): 

      Note that none of the above suggestions or the one immediately below are considered mandatory. 

      One additional minor point: The dual implication of mevalonate perturbations for NACLM deficiencies is striking. At the same time, the mevalonate pathway is critical for embryo viability among other things, which prompts questions about how reproductive physiology is integrated in this screen approach. It appears that sterilization protocols are not used to prepare screen target animals, but it would be useful to know if there were a signature associated with drug-induced sterility that might help identify one potential common non-interesting outcome of compound treatments in general. In this work, the screen treatment is only 4 hours, which is probably too short to compromise reproduction, but as noted above, it is likely users would intend to expose test subjects for much longer than 4-hour periods. 

      This is an interesting point. In its current form our screen doesn’t assess reproductive physiology. This is something that we will consider in ongoing projects.

      Figures 

      Figure 1D might be omitted or moved to supplement. 

      We have removed 1D and moved figure 1E as a standalone table (Table 1) to improve readability.

      Figure 2D "key" is hard to make out size differences for prestim, bluelight, and poststim -more distinctive symbols should be used. 

      We have increased the size of the symbols so that the key is easier to read.

      Line 412 unc-25 should be in italics 

      Corrected

      Reviewer #2 (Recommendations For The Authors): 

      Specific edits: 

      All of the errors below have been corrected.

      Line 47, "loss of function" should be hyphenated because it is a compound adjective that modifies mutations. 

      Line 50, "genetically-tractable" should not be hyphenated because it is not a compound adjective. It is an adverb-adjective pair. Line 102 has the same grammatical issue. 

      Line 85, "rare genetic diseases" do not "affect nervous system function". The disease might have deficits in this function, but the disease does not do anything to function. 

      Line 86, it should be mutations not mutants. Mutations are changes to DNA. Mutants are individuals with mutations. 

      Throughout, wild-type should be hyphenated when it is used as a compound adjective. 

      Figure 4, asterisks is spelled incorrectly. 

      Reviewer #3 (Recommendations For The Authors): 

      - As stated in the public review, the utility of the study is limited by the lack of access to the complete dataset. The wealth of data produced by the study is one of its major outputs. 

      We have made the data publicly available on Zenodo. We appreciate the request.

      - Describe the exact break-points of the different alleles, because it was not readily feasible to derive them from the gene fact sheets provided in the supplementary materials. 

      We have now provided the start position and total length of deletion for each gene in the gene fact sheets.

      - Figure 1C: what does "Genetic homology"/"sequence identity" refer to? How were these values calculated? 

      UNC-49 is clearly not 95% identical to vertebrate GABAR subunits at the protein level. 

      We have changed the axis label to “BLAST % Sequence Identity” to clarify that these values are calculated from BLAST sequence alignments on WormBase and the Alliance Genom Resources webpages.

      - Figure 1E : The data presented in Figure 1E appears somewhat unreliable. For example, a cursory check showed: 

      (1) Wrong human ortholog: unc-49 is a Gaba receptor, not a Glycine receptor as indicated in the second column. 

      (2) Wrong disease association: dys-1 is not associated with Bardet-Biedl syndrome; overall the data indicated in the table does not seem to fully match the HPO database. 

      (3) Inconsistent disease association: why don't the avr-14 and glc-2 (and even unc-49) profiles overlap/coincide given that they present overlapping sets of human orthologs. 

      Thank you for catching this! We have corrected gene names which were mistakenly pasted. We have also made this a standalone table (Table 1) for improved readability.

      - Error in legend to figure 4I : "with ciliopathies and N2" > ciliopathies should be "NALCN disease". 

      - Error at line 301: "Figures 2E-H" should be "Figures 4E-H". 

      Corrected.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Wang and colleagues identify biallelic variants of DNAH3 in four unrelated Han Chinese infertile men through whole-exome sequencing, which contributes to abnormal sperm flagellar morphology and ultrastructure. To investigate the importance of DNAH3 in male infertility, the authors generated crispant Dnah3 knockout (KO) male mice. They observed that KO mice are also infertile, showing a severe reduction in sperm movement with abnormal IDA (inner dynein arms) and mitochondrion structure. Moreover, nonfunctional DNAH3 expression decreased the expression of IDA-associated proteins in the spermatozoa of patients and KO mice, which are involved in the disruption of sperm motility. Interestingly, the infertility of patients and KO mice is rescued by intracytoplasmic sperm injection (ICSI). Taken together, the authors propose that DNAH3 is a novel pathogenic gene for asthenoterozoospermia and male infertility.

      Strengths:

      This work investigates the role of DNAH3 in sperm mobility and male infertility. By using gold-standard molecular biology techniques, the authors demonstrate with exquisite resolution the importance of DNAH3 in sperm morphology, showing strong evidence of its role in male infertility. Overall, this is a very interesting, well-written, and appealing article. All aspects of the study design and methods are well described and appropriate to address the main question of the manuscript. The conclusions drawn are consistent with the analyses conducted and supported by the data.

      Weaknesses:

      The paper is solid, and in its current form, I have not detected relevant weaknesses.

      We thank the comments from the reviewer very much.

      Reviewer #2 (Public Review):

      Wang et al. investigated the role of dynein axonemal heavy chain 3 (DNAH3) in male infertility. They found that variants of DNAH3 were present in four infertile men, and the deficiency of DNAH3 in sperm affects sperm mobility. Additionally, they showed that Dnah3 knockout male mice are infertile. Furthermore, they demonstrated that DNAH3 influences inner dynein arms by regulating several DNAH proteins. Importantly, they showed that intracytoplasmic sperm injection (ICSI) can rescue the infertility in Dnah3 knockout mice and two patients with DNAH3 variants.

      Strengths:

      The conclusions of this paper are well-supported by data.

      Weaknesses:

      The sample/patient size is small; however, the findings are consistent with those of a recent study on DNAH3 in male infertility involving 432 patients.

      We extend our sincere gratitude to the expert reviewers for their valuable comments and insightful suggestions.

      A cohort of 587 unrelated infertile men with asthenoteratozoospermia was recruited to investigate the potential genetic etiology using WES. In addition to mutations in DNAH3 identified in four patients, mutations in serval other genes previous reported by our group, including CFAP65 (Zhang et al., 2019. PMID: 31571197), DNAH8 (Yang et al., 2020. PMID: 32681648), DNAH12 (Li et al., 2022. PMID: 34791246), FISIP2 (Zheng et al., 2023. PMID: 35654582), CEP128 (Zhang et al., 2022. PMID: 35296684), CEP78 (Zhang et al., 2022. PMID: 36206347), CT55 (Zhang et al., 2023. PMID: 36481789), SPATA20 (Wang et al., 2023. PMID: 36415156), TENT5D (Zhang et al., 2024. PMID: 38228861), CFAP52 (Jin et al., 2023. PMID: 38126872), CEP70 (Ruan et al., 2023. PMID: 36967801), PRSS55 (Liu et al., 2022. PMID: 35821214), as well as other unreported variants were also identified.

      Reviewer #3 (Public Review):

      Summary:

      (1) To further explore the genetic basis of asthenoteratozoospermia, the authors performed whole-exome sequencing analyses among infertile males affected by asthenoteratozoospermia. Four unrelated Han Chinese patients were found to carry biallelic variations of DNAH3, a gene encoding IDA-associated protein.

      (2) To verify the function of IDA associated protein DNAH3, the authors generated a Dnah3-KO mouse model and revealed that the loss of DNAH3 leads to severe male infertility as a result of the severe reduction in sperm movement with the abnormal IDA and mitochondrion structures.

      (3) Mechanically, they confirmed decreased expression of IDA-associated proteins (including DNAH1, DNAH6 and DNALI1) in the spermatozoa from patients with DNAH3 mutations and Dnah3-KO male mice.

      (4) Then, they also found that male infertility caused by DNAH3 deficiency could be rescued by intracytoplasmic sperm injection (ICSI) treatment in humans and mice.

      Strengths:

      (1) In addition to existing research, the authors provided novel variants of DNAH3 as important factors leading to asthenoteratozoospermia. This further expands the spectrum of pathogenic variants in asthenoteratozoospermia.

      (2) By mechanistic studies, they found that DNAH3 deficiency led to decreased expression of IDA-associated proteins, which may be used to explain the disruption of sperm motility and reduced fertility caused by DNAH3 deficiency.

      (3) Then, successful ICSI outcomes were observed in patients with DNAH3 mutations and Dnah3 KO mice, which will provide an important reference for genetic counselling and clinical treatment of male infertility.

      We are very grateful for the reviewer's careful comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      I have carefully read the revised versions of this manuscript, and I would like to thank the authors for addressing all my previous concerns.

      I have no additional comments or suggestions.

      We thank the reviewer for reviewing our revised manuscript.

      Reviewer #2 (Recommendations for The Authors):

      (1) Statistical analyses should be provided alongside the quantification (Fig S1B, S7C).

      According to the suggestions of the reviewer, we have added statistical analyses of the corresponding quantification in the legends of Figure S1 and Figure S7.

      (2) The numbers of sperms counted in Fig S1A should be listed.

      In response to reviewer's valuable suggestions. We have listed the corresponding ratio of different morphological defects in sperm tail of the patients in Figure S1A.

      (3) Due to the high similarities in experimental design, data and conclusions between the current study and previously published work by Meng et al. (2024), as well as the very similar titles of the two studies, it is crucial to emphasize the differences in the Discussion section.

      Many thanks for reviewer's kind suggestions for our revised manuscript.

      Employing whole-exome sequencing (WES) on infertile men to identify candidate variants, followed by in-silico and functional analysis of these variants, and generating mouse models using CRISPR-Cas9 technology, has proven to be an efficient and widely used approach for uncovering the causative genes of male infertility associated with sperm defects. Both our study and the recent work by Meng et al. utilized this approach to verify whether DNAH3 mutations are a cause of asthenoteratozoospermia. Additionally, we have also updated the title of our study to: 'DNAH3 deficiency causes flagellar inner dynein arm loss and male infertility in humans and mice'.

      Meng et al. reported DNAH3 mutations in asthenoteratozoospermia affected patients, revealing multiple morphological defects in sperm tail. Moreover, ultrastructural abnormalities of the flagellar axoneme in the patients were evident in these patients, characterized by a disrupted '9+2' arrangement and the notable absence of IDAs. Additionally, they generated Dnah3 KO mice, which were infertile and exhibited moderate morphological abnormalities. While the '9+2' microtubule arrangement in the flagella of their Dnah3 KO mice remained intact, the IDAs on the microtubules were partially absent. In our study, we observed similar phenotypic differences between DNAH3-deficient patients and Dnah3 KO mice. Both studies suggest that DNAH3 plays a crucial role in human and mouse male reproduction.

      However, there are notable differences between the two studies. Firstly, the phenotypes of Dnah3 KO mice showed slight differences. Meng et al. generated two Dnah3 KO mouse models (KO1 and KO2), and both of which exhibited significantly higher sperm motility and progressive motility than in our study, where nearly all sperm were completely immobile. Furthermore, their Dnah3 KO2 mice even displayed motility comparable to WT mice and retained partial fertility. We speculate that these differences may be attributed to variations in mouse genetic background or the presence of a truncated DNAH3 protein resulting from specific knockout strategies. Secondly, we conducted additional research and uncovered novel findings. We revealed that male infertility caused by DNAH3 mutations follows an autosomal recessive inheritance pattern, as confirmed through Sanger sequencing of the patients' parents. We also discovered the dynamic expression and localization of DNAH3 during spermatogenesis in humans and mice through immunofluorescent staining. We further found that DNAH3 deficiency had no impact on ciliary development in the oviduct or on oogenesis in mice, resulting in normal female fertility. Moreover, in the absence of DNAH3 in both humans and mice, the expression of IDA-associated proteins, including DNAH1, DNAH6 and DNALI1, was decreased, while the expression of ODA-associated proteins remained unaffected, indicating that DNAH3 is involved in sperm axonemal development, specifically through its role in the assembly of IDAs. Collectively, our study corroborates the findings of Meng et al., and provides additional unique insights, comprehensively elucidating the critical role of DNAH3 in human and mouse spermatogenesis.

      We have added these discussions in line 275 to line 306.

      Reviewer #3 (Recommendations for The Authors):

      I have no more recommendations for the authors.

      We thank the reviewer for reviewing our revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors showed that autophagy-related genes are involved in plant immunity by regulating the protein level of the salicylic acid receptor, NPR1.<br /> The experiments are carefully designed and the data is convincing. The authors did a good job of understanding the relationship between ATG6 and NRP1.

      The authors have addressed most of my previous concerns.

      Thank you so much for acknowledging our research. It is incredibly rewarding to see our work recognized. We hope that our findings will inspire new perspectives and foster further exploration in this area.

      Reviewer #2 (Public Review):

      The manuscript by Zhang et al. explores the effect of autophagy regulator ATG6 on NPR1-mediated immunity. The authors propose that ATG6 directly interacts with NPR1 in the nucleus to increase its stability and promote NPR1-dependent immune gene expression and pathogen resistance. This novel role of ATG6 is proposed to be independent of its role in autophagy in the cytoplasm. The authors demonstrate through biochemical analysis that ATG6 interacts with NPR1 in yeast and very weakly in vitro. They further demonstrate using overexpression transgenic plants that in the presence of ATG6-mcherry the stability of NPR1-GFP and its nuclear pool is increased.

      Comments on revised version:

      The authors demonstrate the correlation between overexertion of atg6 and higher stability and activity of npr1. They claim a novel activity of atg6 in the nucleus.

      Overall, the experimental scope of the study is solid, however, the over-interpretation of the results substantially reduces the significance and value of this study for the target plant immunity readership.

      Thank you very much for you constructive and insightful comments, as well as for acknowledging the experimental scope of this study. In addition, we have made every effort to address the over-interpretation of the results, as per your comments, ensuring they are more accurate and concise. In the revised version, the modified content has been highlighted in blue to clearly indicate the changes made.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors have addressed most of my concerns. I have no further comments.

      Thank you so much for acknowledging our research. It is incredibly rewarding to see our work recognized. We hope that our findings will inspire new perspectives and foster further exploration in this area.

      Reviewer #2 (Recommendations For The Authors):

      As I previously commented, in fig. 2a and c, the discrepancy between levels of atg6-mcherry in microscope image vs WB has to be explained. The explanation provided by the authors is incomplete and may mislead. The most likely reason for the difference is that the fluorescence signal in fig. 2a is predominantly from free mCherry, rather than the atg6-mcherry fusion. This has to be included in the main text to avoid misleading the reader.

      Thank you very much for you constructive and insightful comments, in response to your comments, we have incorporated the necessary explanations into the revised manuscript (lines 160-164).

      In fig. 1B, the PD fraction has to show the size range of free GST. Also, please use "anti" to indicate that these are immunoblots,.

      Thank you for pointing this out. In the revised manuscript, we identified the range of free GST and used "anti:GST and anti:His" to indicate that these are immunoblots.

      In fig 1C, the WB has to show the free GFP band in the input and IP fractions together with NPR1, rather than in separate blots.

      Thank you for bringing this to our attention. Fig. 1c has been replaced, and the updated image now shows the free GFP band in the input and IP fractions together with NPR1-GFP.

      In fig. 1d, the bifc signal has to be quantified from multiple images across the biological repeats. Also, there's no significance in showing the chlorophyll autofluorescence. What is the purpose of this? They need to use a nuclear marker instead.

      Thank you for your suggestion. Based on your input, we utilized ImageJ software to quantify the YFP fluorescence signal. A total of n = 15 independent images were analyzed, and the corresponding results have been added to Figure 1e. Monitoring chlorophyll autofluorescence serves as a useful background signal, aiding in the distinction between the fluorescence signal of the target protein and background noise. This approach helps reduce potential signal overlap or interference during the experiment, thereby enhancing the reliability of the results.

      Please provide a sequence alignment with multiple ATGs to show the conservation of the presumed bipartite NLS. This information has to be included in the main data.

      Thank you very much for your constructive and insightful comments. We analyzed the putative nuclear localization signal (NLS) in the ATG6 protein sequence using the online INSP (Identification of Nuclear Signal Peptide) prediction software (http://www.csbio.sjtu.edu.cn/bioinf/INSP/). The prediction results indicated the presence of a potential nuclear localization sequence "FLKEKKKKK" within the ATG6 protein, spanning from the 217th to the 223rd amino acid. Additionally, we utilized INSP to investigate the nuclear localization sequences of various ATG proteins (TaATG6a [1], TaATG6b [1], TaATG6c [1], SlATG8h [2]) that have been previously reported to localize in the nucleus. This analysis revealed a relatively conserved NLS sequence motif: "E/K-K/E-K-K-L/K-K" in these ATG proteins. In line with your suggestion, the results of this sequence comparison have been incorporated into the revised manuscript as Figure 2c. The revised manuscript includes a description of the corresponding results. (lines 146-156).

      Fig. 3d and f, how many blots are used for this quantification? Please include all the individual analyzed blots in the supplementary data. In addition, if you present such quantification with error bars, then statistical analysis is required.

      Thank you for pointing this out. In Figure 3d, three independent blots were utilized for this quantification. In Figure 3f, two independent blots were used. The individual analyzed blots have been included in the supplementary Figure 7. We also conducted a statistical analysis as shown in Fig 3d and f, with a detailed description included in the legend section (lines 858 and 861).

      In fig. 4, please indicate what is the normalizing gene. Also, what are the error bars?

      Thank you for pointing this out. In Fig.4, values are means ± SD (n = 3 biological replicates). The AtActin gene was used as the internal control. We have included a detailed description in the figure notes

      In fig. 4b the labeling is missing.

      Thank you for bringing this to our attention. We have included the labeling for Fig. 4 in the revised manuscript.

      Lines 236-239: this statement contradicts the data in fig. 5b: the levels of NPR1-GFP are actually reduced in the presence of atg6 at 24h. So, this result has to be described more accurately by stating that the increase is transient, and it is evident more at 8h, but not at 20-24h.

      Thank you very much for you constructive and insightful comments. We have revised the description of this section to provide a more accurate account of the results (lines 253-258).

      Reference

      (1) Yue J, Sun H, Zhang W, et al. Wheat homologs of yeast ATG6 function in autophagy and are implicated in powdery mildew immunity. BMC Plant Biol. 2015;15:95.

      (2) Li F, Zhang M, Zhang C, et al. Nuclear autophagy degrades a geminivirus nuclear protein to restrict viral infection in solanaceous plants. New Phytol. 2020;225:1746-1761.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Freas et al. investigated if the exceedingly dim polarization pattern produced by the moon can be used by animals to guide a genuine navigational task. The sun and moon have long been celestial beacons for directional information, but they can be obscured by clouds, canopy, or the horizon. However, even when hidden from view, these celestial bodies provide directional information through the polarized light patterns in the sky. While the sun's polarization pattern is famously used by many animals for compass orientation, until now it has never been shown that the extremely dim polarization pattern of the moon can be used for navigation. To test this, Freas et al. studied nocturnal bull ants, by placing a linear polarizer in the homing path on freely navigating ants 45 degrees shifted to the moon's natural polarization pattern. They recorded the homing direction of an ant before entering the polarizer, under the polarizer, and again after leaving the area covered by the polarizer. The results very clearly show, that ants walking under the linear polarizer change their homing direction by about 45 degrees in comparison to the homing direction under the natural polarization pattern and change it back after leaving the area covered by the polarizer again. These results can be repeated throughout the lunar month, showing that bull ants can use the moon's polarization pattern even under crescent moon conditions. Finally, the authors show, that the degree in which the ants change their homing direction is dependent on the length of their home vector, just as it is for the solar polarization pattern. 

      The behavioral experiments are very well designed, and the statistical analyses are appropriate for the data presented. The authors' conclusions are nicely supported by the data and clearly show that nocturnal bull ants use the dim polarization pattern of the moon for homing, in the same way many animals use the sun's polarization pattern during the day. This is the first proof of the use of the lunar polarization pattern in any animal.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors aimed to understand whether polarised moonlight could be used as a directional cue for nocturnal animals homing at night, particularly at times of night when polarised light is not available from the sun. To do this, the authors used nocturnal ants, and previously established methods, to show that the walking paths of ants can be altered predictably when the angle of polarised moonlight illuminating them from above is turned by a known angle (here +/- 45 degrees).

      Strengths: 

      The behavioural data are very clear and unambiguous. The results clearly show that when the angle of downwelling polarised moonlight is turned, ants turn in the same direction. The data also clearly show that this result is maintained even for different phases (and intensities) of the moon, although during the waning cycle of the moon the ants' turn is considerably less than may be expected.

      Weaknesses: 

      The final section of the results - concerning the weighting of polarised light cues into the path integrator - lacks clarity and should be reworked and expanded in both the Methods and the Results (also possibly with an extra methods figure). I was really unsure of what these experiments were trying to show or what the meaning of the results actually are.

      Rewrote these sections and added figure panel to Figure 6.

      Impact: 

      The authors have discovered that nocturnal bull ants while homing back to their nest holes at night, are able to use the dim polarised light pattern formed around the moon for path integration. Even though similar methods have previously shown the ability of dung beetles to orient along straight trajectories for short distances using polarised moonlight, this is the first evidence of an animal that uses polarised moonlight in homing. This is quite significant, and their findings are well supported by their data.

      Reviewer #3 (Public Review): 

      Summary: 

      This manuscript presents a series of experiments aimed at investigating orientation to polarized lunar skylight in a nocturnal ant, the first report of its kind that I am aware of.

      Strengths: 

      The study was conducted carefully and is clearly explained here. 

      Weaknesses: 

      I have only a few comments and suggestions, that I hope will make the manuscript clearer and easier to understand.

      Time compensation or periodic snapshots 

      In the introduction, the authors compare their discovery with that in dung beetles, which have only been observed to use lunar skylight to hold their course, not to travel to a specific location as the ants must. It is not entirely clear from the discussion whether the authors are suggesting that the ants navigate home by using a time-compensated lunar compass, or that they update their polarization compass with reference to other cues as the pattern of lunar skylight gradually shifts over the course of the night - though in the discussion they appear to lean towards the latter without addressing the former. Any clues in this direction might help us understand how ants adapted to navigate using solar skylight polarization might adapt use to lunar skylight polarization and account for its different schedule. I would guess that the waxing and waning moon data can be interpreted to this effect.

      Added a paragraph discussing this distinction in mechanisms and the limits of the current data set in untangling them. An interesting topic for a follow up to be sure.

      Effects of moon fullness and phase on precision 

      As well as the noted effect on shift magnitudes, the distributions of exit headings and reorientations also appear to differ in their precision (i.e., mean vector length) across moon phases, with somewhat shorter vectors for smaller fractions of the moon illuminated. Although these distributions are a composite of the two distributions of angles subtracted from one another to obtain these turn angles, the precision of the resulting distribution should be proportional to the original distributions. It would be interesting to know whether these differences result from poorer overall orientation precision, or more variability in reorientation, on quarter moon and crescent moon nights, and to what extent this might be attributed to sky brightness or degree of polarization.

      See below for response to this and the next reviewer comment

      N.B. The Watson-Williams tests for difference in mean angle are also sensitive to differences in sample variance. This can be ruled out with another variety of the test, also proposed by Watson and Williams, to check for unequal variances, for which the F statistic is = (n2-1)*(n1-R1) / (n1-1)*(n2-R2) or its inverse, whichever is >1. 

      We have looked at the amount of variance from the mean heading direction in terms of both the shifts and the reorientations and found no significant difference in variance between all relevant conditions. It is possible (and probably likely) that with a higher n we might find these differences but with the current data set we cannot make statistical statements regarding degradations in navigational precision.  

      As an additional analysis to address the Watson-Williams test‘s sensitivity to changes in variance, we have added var test comparisons for each of the comparisons, which is a well-established test to compare variance changes. None of these were significantly different, suggesting the observed differences in the WW tests are due to changes in the mean vector and not the distribution. We have added this test to the text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      I have only very few minor suggestions to improve the manuscript: 

      (1) While I fully agree with the authors that their study, to the best of my knowledge, provides the first proof (in any animal) of the use of the moon's polarization pattern, the many repetitions of this fact disturb the flow of the text and could be cut at several instances. 

      Yes, it is indeed repeated to an annoying degree. 

      We have removed these beyond bookending mentions (Abstract and Discussion).

      (2) In my opinion, the authors did not change the "ambient polarization pattern" when using the linear polarization filter (e.g., l. 55, 170, 177 ...). The linear polarizer presents an artificial polarization pattern with a much higher degree of polarization in comparison to the ambient polarization pattern. I would suggest re-phrasing this, to emphasize the artificial nature of the polarization pattern under the polarizer.

      We have made these suggested changes throughout the text to clarify. We no longer say the ambient pattern was   

      (3) Line 377: I do not see the link between the sentence and Figure 7 

      Changed where in the discussion we refer to Figure 7.

      (4) Figure 7 upper part: In my opinion, the upper part of Figure 7 does not add any additional value to the illustration of the data as compared to Figure 5 and could be cut.

      We thought it might be easier for some reader to see the shifts as a dial representation with the shift magnitude converted to 0-100% rather than the shifts in Figure 5. This makes it somewhat like a graphical abstract summarising the whole study.

      I agree that Figure 5 tells the same story but a reader that has little background in directional stats might find figure 7 more intuitive. This was the intent at least. 

      If it becomes a sticking point, then we can remove the upper portion.  

      Reviewer #2 (Recommendations For The Authors): 

      Minor corrections and queries 

      Line 117: THE majority 

      Corrected

      Lines 129-130: Do you have a reference to support this statement? I am unaware of experiments that show that homing ants count their steps, but I could have missed it.

      We have added the references that unpack the ant pedometer.  

      Line 140: remove "the" in this line. 

      Removed

      Line 170: We need more details here about the spectral transmission properties of the polariser (and indeed which brand of filter, etc.). For instance, does it allow the transmission of UV light?

      Added

      Line 239: "...tested identicALLY to ...." 

      Corrected

      Lines 242-258 (Vector testing): I must admit I found the description of these experiments very difficult to follow. I read this section several times and felt no wiser as a result. I think some thought needs to be given to better introduce the reader to the rationale behind the experiment (e.g., start by expanding lines 243-246, and maybe add a methods figure that shows the different experimental procedures).

      I have rewritten this section of the methods to clearly state the experiment rational and to be clearer as to the methodology.

      Also added a methods panel to Figure 6.

      Line 247: "reoriented only halfway". What does this mean? Do you mean with half the expected angle?

      Yes, this is a bit unclear. We have altered for clarity:

      ‘only altered their headings by about half of the 45° e-vector shift (25.2°± 3.7°), despite being tested on near-full-moon nights.’

      Results section (in general): In Figure 1 (which is a very nice figure!) you go to all the trouble of defining b degrees (exit headings) and c degrees (reorientation headings), which are very intuitive for interpreting the results, and then you totally abandon these convenient angles in favour of an amorphous Greek symbol Phi (Figs. 2-6) to describe BOTH exit and reorientation headings. Why?? It becomes even more confusing when headings described by Phi can be typically greater than 300 degrees in the figures, but they are never even close to this in the text (where you seem to have gone back to using the b degrees and c degrees angles, without explicitly saying so). Personally, I think the b degrees and c degrees angles are more intuitive (and should be used in both the text and the figures), but if you do insist on using Phi then you should use it consistently in both the text and the figures. 

      Replaced Phi with b° and c° for both figures and in the text.

      Finally, for reorientation angles in Figure 4A, you say that the angle is 16.5 degrees. This angle should have been 143.5 degrees to be consistent with other figures. 

      Yes, the reorientation was erroneously copied from the shift data (it is identical in both the +45 shift and reorientation for Figure 4A). This has now been corrected

      Line 280, and many other lines: Wherever you refer to two panels of the same figure, they should be written as (say) Figure 2A, B not Figure 2AB.

      Changed as requested throughout the text.

      Line 295 (Waxing lunar phases): For these experiments, which nest are you using? 1 or 2?

      We have added that this is nest 1. 

      Figure 3B: The title of this panel should be "Waxing Crescent Moon" I think. 

      Ah yes, this is incorrect in the original submission. I have fixed this.

      Lines 312-313: Here it sounds as though the ants went right back to the full +/- 45 degrees orientations when they clearly didn't (it was -26.6 degrees and 189.9 degrees). Maybe tone the language down a bit here.

      Changed this to make clear the orientation shift is only ‘towards’ the ambient lunar e-vector.

      Line 327: Insert "see" before "Figure 5" 

      Added

      Line 329: See comment for Line 295. 

      We have added that this is nest 1. 

      Lines 357-373 (Vector testing): Again, because of the somewhat confusing methods section describing these experiments, these results were hard to follow, both here and in the Discussion. I don't really understand what you have shown here. Re-think how you present this (and maybe re-working the Methods will be half the battle won). 

      I have rewritten these sections to try to make clear these are ant tested with differences in vector length 6m vs. 2m, tested at the same location. Hopefully this is much clearer, but I think if these portions remain a bit confusing that a full rename of the conditions is in order. Something like long vector and short vector would help but comes with the problem of not truly describing what the purpose of the test is which is to control for location, thus the current condition names. As it stands, I hope the new clarifications adequately describe the reasoning while keeping the condition names. Of course, I am happy to make more changes here as making this clear to readers is important for driving home that the path integrator is in play.

      See current change to results as an example: ‘Both forgers with a long ~6m remaining vector (Halfway Release), or a short ~2m remaining vector (Halfway Collection & Release), tested at the same location_,_ exhibited significant shifts to the right of initial headings when the e-vector was rotated clockwise +45°.’

      Line 361: I think this should be 16.8 not 6.8 

      Yes, you are correct. Fixed in text (16.8).

      Line 365: I think this should be -12.7 not 12.7 

      Yes, you are correct. Fixed in text (–12.7).

      Line 408: "morning twilight". Should this be "morning solar twilight"? Plus "M midas" should be "M. midas"

      Added and fixed respectively.

      Line 440. "location" is spelt wrong. 

      Fixed spelling.

      Line 444: "...WITH longer accumulated vectors, ..." 

      Added ‘with’ to sentence. 

      Line 447: Remove "that just as"

      Removed.

      Line 448: "Moonlight polarised light" should be "Polarised moonlight" 

      Corrected.

      Lines 450-453: This sentence makes little sense scientifically or grammatically. A "limiting factor" can't be "accomplished". Please rephrase and explain in more detail.

      This sentence has been rephrased:

      ‘The limiting factors to lunar cue use for navigation would instead be the ant’s detection threshold to either absolute light intensity, polarization sensitivity and spectral sensitivity. Moonlight is less UV rich compared to direct sunlight and the spectrum changes across the lunar cycle (Palmer and Johnsen 2015).’

      Line 474: Re-write as "... due to the incorporation of the celestial compass into the path integrator..."

      Added.

      Reviewer #3 (Recommendations For The Authors): 

      Minor comments 

      Line 84 I am not sure that we can infer attentional processes in orientation to lunar skylight, at least it has not yet been investigated.

      Yes, this is a good point. We have changed ‘attend’ to ‘use’.  

      Line 90 This description of polarized light is a little vague; what is meant by the phrase "waves which occur along a single plane"? (What about the magnetic component? These waves can be redirected, are they then still polarized? Circular polarization?). I would recommend looking at how polarized light is described in textbooks on optics.

      We have rewritten the polarised light section to be clearer using optics and light physics for background. 

      Line 92 The phrase "e-vector" has not been described or introduced up to this point.

      We now introduce e-vector and define it. 

      ‘Polarised light comprises light waves which occur along a single plane and are produced as a by-product of light passing through the upper atmosphere (Horváth & Varjú 2004; Horváth et al., 2014). The scattering of this light creates an e-vector pattern in the sky, which is arranged in concentric circles around the sun or moon's position with the maximum degree of polarisation located 90° from the source. Hence when the sun/moon is near the horizon, the pattern of polarised skylight is particularly simple with uniform direction of polarisation approximately parallel to the north-south axes (Dacke et al., 1999, 2003; Reid et al. 2011; Zeil et al., 2014).’

      Happy to make further changes as well.  

      Line 107 Diurnal dung beetles can also orient to lunar skylight if roused at night (Smolka et al., 2016), provided the sky is bright enough. Perhaps diurnal ants might do the same?

      Added the diurnal dung beetles mention as well as the reference.

      Also, a very good suggestion using diurnal bull ants.

      Line 146 Instead of lunar calendar the authors appear to mean "lunar cycle". 

      Changed

      Line 165 In Figure 1B, it looks like visual access to the sky was only partly "unobstructed". Indeed foliage covers as least part of the sky right up to the zenith.

      We have added that the sky is partially obstructed. 

      Line 179 This could also presumably be checked with a camera? 

      For this testing we tried to keep equipment to a minimum for a single researcher walking to and from the field site given the lack of public transport between 1 and 4am. But yes, for future work a camera based confirmation system would be easier. 

      Line 243 The abbreviation "PI" has not been described or introduced up to this point.

      Changes to ‘path integration derived vector lengths….’

      Line 267 The method for comparing the leftwards and rightwards shifts should be described in full here (presumably one set of shifts was mirrored onto the other?).

      We have added the below description to indicate the full description of the mirroring done to counterclockwise shifts.

      ‘To assess shift magnitude between −45° and +45° foragers within conditions, we calculated the mirror of shift in each −45° condition, allowing shift magnitude comparisons within each condition. Mirroring the −45° conditions was calculated by mirroring each shift across the 0° to 180° plane and was then compared to the corresponding unaltered +45 condition.’

      Discussion Might the brightness and spectrum of lunar skylight also play a role here?

      We have added a section to the discussion to mention the aspects of moonlight which may be important to these animals, including the spectrum, brightness and polarisation intensity.  

      Line 451 The sensitivity threshold to absolute light intensity would not be the only limiting factor here. Polarization sensitivity and spectral sensitivity may also play a role (moonlight is less UV rich than sunlight and the spectrum of twilight changes across the lunar cycle: Palmer & Johnsen, 2015). 

      Added this clarification.

      Line 478 Instead of the "masculine ordinal" symbol used (U+006F) here a degree symbol (U+00B0) should be used.

      Ah thank you, we have replaced this everywhere in the text.  

      Line 485 It should be possible to calculate the misalignment between polarization pattern before and after this interruption of celestial cues. Does the magnitude of this misalignment help predict the size of the reorientation?

      Reorientations are highly correlated with the shift size under the filter, which makes sense as larger shifts mean that foragers need to turn back more to reorient to both the ambient pattern and to return to their visual route. Reorientation sizes do not show a consistent reduction compared to under-the-filter shifts when the lunar phase is low and is potentially harder to detect.

      I have reworked this line in the text as I do not think there is much evidence for misalignment and it might be more precise to say that overnight periods where the moon is not visible may adversely impact the path integrator estimate, though it is currently unknown the full impact of this celestial cue gap of if other cues might also play a role.

      Line 642 "from their" should be "relative to" 

      Changed as requested

      Figure 1B Some mention should be made of the differences in vegetation density. 

      Added a sentence to the figure caption discussing the differences in both vegetation along the horizon and canopy cover.

      Figures 2-6 A reference line at 0 degrees change might help the reader to assess the size of orientation changes visually. Confidence intervals around the mean orientation change would also help here.

      We have now added circular grid lines and confidence intervals to the circular plots. These should help make the heading changes clear to readers.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment 

      This is a valuable study in the Jurkat T cell line that calls attention to phosphorylation of formin-like 1 β role and its role in polarization of CD63 positive extracellular vesicles (referred to as exosomes). The evidence presented in the Jurkat model is solid, but concerns have been raised about the statistical analysis and more details would be required to fully assess the significance of the results. For example, ANOVA is the method described, but it requires large amounts of normally distributed data in multiple groups and cannot be used to make pairwise comparisons within groups, which would require a post-hoc method (which is not discussed). In addition, the data showing forming-like 1 β in primary human T cells without and with a CAR are provided without quantification and don't investigate any of the novel claims, so doesn't address the relevance of Formin-like 1 β beyond the Jurkat model. Nonetheless, the consistent trends in the body of the study provide solid support for the claims.

      We acknowledge this general statement on statistics. Thus, we have now discussed and provided more details on the post-hoc method (Tukey), as a new Supplementary data S13 (p-values after applying tukey's method -post hoc- to the one-way anova for all the pairwise comparisons). Additionally, we have now provided quantitative data on the percentage of primary cells with and without CAR that show FMNL1 accumulations at the immune synapse (Suppl. Fig. S7). Regarding the data in primary human T cells, we have already changed the title of the manuscript to strictly adjust it to the main body of the data and our conclusions in the well-established Jurkat synapse model. We also want to emphasize that we have not pretended to extrapolate the relevance of our data regarding FMNL1 and exosomes beyond the Jurkat model. Thus, we have included some additional sentences and/or nuances in the Discussion to somewhat soften our statements in this regard (i.e. “…..provided that the FMNL1 effect on exosome secretion in Jurkat cells can be extended to primary T lymphocytes”) and to clarify this important point.

      Reviewer 1:

      (1) The main findings have been obtained in clones of Jurkat cells. They have not been confirmed in primary T cells. The only experiment performed in primary cells is shown in Figure S7 (primary human T lymphoblasts) for which only the distribution of FMNL1 is shown without quantification. No results presenting the effect of FMNL1 KO and expression of mutants in primary T cells are shown.

      Referee is right regarding the extension of exosome secretion studies to primary human T lymphocytes. Unfortunately, it is well known that primary T lymphocytes are extremely difficult to transfect. Moreover, the expression of our large bi-cistronic large plasmids (>15 Kb) is very inefficient, coupled with the challenge of expressing large proteins, such as the 180 kDa YFP-FMNL1 chimeric variants. The convergence of all these undesirable factors synergistically hampers these studies and we have been unable to consistently achieve enough transfection efficiency to perform these experiments. However, the role of FMNL1 on MTOC/MVB polarization in Jurkat cells, confirmed in this manuscript, has been already extended to primary CD8+ T cell clones (DOI10.1016/j.immuni.2007.01.008). Given that exosome secretion requires

      MTOC/MVB polarization both in Jurkat and primary T lymphoblasts (10.1038/cdd.2010.184, 10.3389/fimmu.2019.00851), this suggests FMNL1 may also control exosome secretion in primary T cells, although the formal demonstration will require further research.

      A new sentence has been included in the Discussion to address this important point. Regarding the second request, we have quantified the images mentioned in Suppl. Fig. S7, and the percentages of fixed T cells showing FMNL1 accumulations at the immune synapse are included in the figure legend.

      (2) Analysis in- depth of the defect in actin remodeling (quantification of the images, analysis of some key actors of actin remodeling) is still lacking. Only Factin is shown, no attempt to look more precisely at actors of actin remodeling has been done.

      The referee is right. Since we have obtained new results on the role of FMNL1 on actin remodeling, we have focused on this formin, which is already a key actor in this process. In this context, we have previously shown that the formin Dia1, another major actor of actin remodeling in T lymphocytes along with FMNL1 (DOI10.1016/j.immuni.2007.01.008), does not undergo phosphorylation upon PKC activation (Suppl. Fig. 5 in https://doi.org/10.1080/20013078.2020.1759926). Since our aim was to unravel the PKC-mediated pathway controlling actin remodeling, we have ruled out more studies on Dia1. Therefore, we have included a new sentence to emphasize the specific role of FMNL1 phosphorylation, but not Dia1, in this regard. Nonetheless, future studies aimed to identifying new important players in this or related pathways could offer significant insights.

      (3) The defect in the secretion of extracellular vesicles is still very preliminary. Examples of STED images given by the authors are nice, yet no quantification is performed.

      The referee is right regarding this point and we acknowledge this comment. Accordingly, we have now quantified the STED images and provided numerical data on the percentages of cells exhibiting the observed phenotypes (see the figure legend for Fig. 10).

      (4) Results shown in Figure S12 on the colocalization of proteins phosphorylated on Ser/Thr are still not convincing. It seems indeed that "phospho-PKC" is labeling more preferentially the CMAC positive cells (Raji) than the Jurkat T cells. It is thus particularly difficult to conclude on the colocalization and even more on the recruitment of phosphorylated-FMNL1 at the IS. Thus, these experiments are not conclusive and cannot be the basis even for their cautious conclusion: "Although all these data did not allow us to infer that FMNL1b is phosphorylated at the IS due to the resolution limit of confocal and STED microscopes, the results are compatible with the idea that both endogenous FMNL1 and YFP-FMNL1bWT are specifically phosphorylated at the cIS".

      The referee may be correct regarding the detail of the "phospho-PKC" labeling. However, it cannot be overlooked that Raji cells also contain proteins that are or may be potential PKC substrates. As a matter of fact, Raji cells also express FMNL1. In addition, MHCII triggering in B cells induces PKC activation (https://doi.org/10.1002/eji.200323351). Regarding which cell type is preferentially labeled, this is a variable topic depending on the analyzed synapse. 

      It is true that there are likely several PKC substrates, both in Jurkat in Raji cells, but our point is that one of these substrates either colocalizes with FMNL1 or is FMNL1 itself. We do not claim at any point that FMNL1 is the only PKC substrate, neither in Jurkat or in Raji cells. 

      Apparently, the referee has either overlooked our results or we did not emphasize them sufficiently. Our results effectively validated the PKC substrate antibody, both on endogenous phospho-FMNL1 and phospho-YFPFMNL1β by WB (Fig. 3). Moreover, the phospho-PKC does not recognize

      YFP-FMNL1β S1086A or S1086D variants (Fig. 3). Last, but not least, when FMNL1 is interfered in the Jurkat cell, the phospho-PKC does not colocalize with FMNL1, but it strongly colocalizes at the synapse with expressed YFPFMNL1βWT in the Jurkat cell (Fig. S11). Indeed YFP-FMNL1β belonged to the Jurkat cell. Taken together these results demonstrate: 1. the specificity of phospho-PKC antibody, 2. the phospho-PKC antibody certainly recognizes phosphorylated YFP-FMNL1β but not its non-phosphorylatable mutant variants, 3. the colocalization of phospho-PKC with anti-FMNL1 is specific. We have included some sentences to clarify these points and to avoid possible misunderstandings by potential readers.  We acknowledge the referee for his/her clarifying point, and we firmly believe our mentioned cautious conclusion is strictly correct, although we have tuned it to consider the possibility that a different PKC substrate could be closely associated to FMNL1, producing the observed colocalization: “Although all these data do not yet allow us to infer that FMNL1b is phosphorylated at the IS due to the resolution limits of super resolution microscopy and the possibility that another PKC substrate may be associated to FMNL1 or very close to FMNL1, in a strictly S1086-dependent manner”.

      To clear any doubt regarding which cell is labelled with phospho-PKC, we have changed the lower panels in Suppl. Fig. S12, and now is more evident that FMNL1 and phospho-PKC belong to the Jurkat cell.

      The study would benefit from a more careful statistical analysis. The dot plots showing polarity are presented for one experiment. Yet, the distribution of the polarity is broad. Results of the 3 independent experiments should be shown and a statistical analysis performed on the independent experiments.

      The referee is right and we have now included further post-hoc analyses data (Tukey) at Suppl. Fig S13. Tukey’s test values were included for all the dot plot figures. We have not included all the plots from 3 different experiments since the manuscript already contains 10+12 multi panel figures and is too large. However, we have stated in the figure legend that these independent experiments are representative of the data obtained from 3 independent experiments. Referee’s consideration regarding the broad distribution of polarity data is correct. We included in the first version of the manuscript a sentence in this regard, that it may have been overlooked: “Remarkably, one important feature of the IS consists of both the onset of the initial cell-cell contacts and the establishment of a mature, fully productive IS, are intrinsically stochastic, rapid and asynchronous processes (87, 88) (43). Thus, the score of the PI corresponding to the distance of MTOC/MVB with respect the IS (42) may be contaminated by background MTOC/MVB polarization, in great part due to the stochastic nature of IS formation (87)”.

    1. Author response:

      • The study does not clearly establish the relationship between Type 1 IFN and cancer therapy, and more robust data are needed to support the claim that tumor growth inhibition occurs via Type 1 IFN upregulation following ORMDL3 knockdown.

      We thank the reviewer’s concern. In Figure 6 we detected the expression of IFNB1 and ISGs in MC38 and LLC tumor upon ORMDL3 knockdown. At the mean time, we also used IHC to explore the abundance of RIG-I and ORMDL3 in these tumors. In addition, in figure S5 we performed western blots to detect the expression of RIG-I with or without ORMDL3 knockdown. All these results support our hypothesis that that ORMDL3 is a negative regulator of interferon via modulating RIG-I abundance.

      • There is ambiguity regarding whether ORMDL3 has a positive or negative role in the Type 1 IFN pathway, especially given conflicting findings in the literature that link higher ORMDL3 levels to increased Type 1 IFN expression.

      We appreciate the reviewer’s concern. In our system and experiments, we validated that ORMDL3 is a negative regulator of interferon, although there is also literature that links higher ORMDL3 levels to increased type-I IFN response. ORMDL3 has been reported associated with rhinovirus-induced childhood asthma (Nature.  2007;448(7152):470-473; N Engl J Med. 2013 Apr 11;368(15):1398-407), and ORMDL3 level is positively associated with rhinovirus abundance (N Engl J Med. 2013 Apr 11;368(15):1398-407).  There are reports indicating that ORMDL3 supports the replication of rhinovirus (for example, Am J Respir Cell Mol Biol. 2020 Jun;62(6):783-792). This phenomenon is consistent with our findings that higher ORMDL3 expression leads to lower interferon production, which facilitates viral replication. We believe that the different experimental conclusions obtained in these experiments are due to different experiment condition and different stimulation. In our research, we provided comprehensive studies at the molecular, cellular, and animal levels to support the conclusion that ORMDL3 is a negative regulator of type-I interferon.

      • The use of certain experimental models, such as HEK293T cells (which are not typical Type 1 IFN producers), raises concerns about the validity and generalizability of the results. Further clarity is needed regarding the rationale for using the same tag in overexpression experiments.

      We thank the reviewer’s suggestion. Besides HEK293T, in Figure 1C and 1D we also used A549 and BMDM to overexpress ORMDL3 and stimulate them with polyI:C or polyG:C, Our results showed that ORMDL3 especially inhibits RLR signaling. Additionally, in Figure 3H we found that the endogenous RIG-I expression decreased when we overexpressed ORMDL3 in BMDM. Regarding the issue of using different protein tags, we plan to use different tags to validate our results.

      • The manuscript contains several inconsistencies and lacks detailed explanations of critical areas, such as the mechanism by which ORMDL3 facilitates USP10 transfer to RIG-I despite no direct interaction between ORMDL3 and RIG-I.

      There are some ERMC (ER-mitochondria contact) proteins that mediate the interaction between ER and mitochondria. ORMDL3 locates in ER, and it has been reported to be associated with calcium transportation. At the meantime, the calcium transfer between ER and mitochondria plays an important role in protein synthesis. It is possible that some ERMC proteins mediate the interaction between ORMDL3 and MAVS. In addition,  we also validated that ORMDL3 interacts with USP10 (Figure 5B). Although ORMDL3 and RIG-I do not interact directly, we generated a mechanistic model that ORMDL3 and MAVS recruit USP10 and RIG-I to ERMCS respectively, thus USP10 could form a complex with RIG-I (Figure 5C) and regulate the stability of RIG-I upon RNA sensing.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Because tRNA-sequencing methods have not been widely used (compared to mRNA-seq), many readers would not be familiar with the characteristics of different methods introduced in this study (QuantM-tRNA, mim-tRNA, YAMAT, DM-tRNA, and ALL-tRNA; bowtie2-based, SHRiMP, and mimseq; what are the main features of "Salmon?"). The manuscript will read better when the basic features of these methods are described in the manuscript, however brief.

      Introduction page 4 now clarifies a little more the difference between bowtie2, SHRiMP and mimseq. Results page 9 briefly summarises the differences between the tRNA-Seq methods. Results page 14 clarifies how Decision and Salmon work.

      Reviewer 2:

      (1) The explanation of the parameter D for bowtie2 sounds ambiguous. "How much effort to expend" needs to be explained in more detail.

      Results page 6 gives a more precise explanation of the D parameter.

      (2) Please provide optimal parameters (L and D) for tRNA-seq alignment.

      I think optimal here is not possible to determine. It will depend on the species, the frequency of misincorporations due to modifications (tRNA-Seq protocol specific) and how long one is willing to let bowtie continue searching for a better match. The point of Figure 1a is that D needs to be increased if L is decreased and an error is allowed in the seed. I think the sentence in the results section Figure 1a is the appropriate way to express this without committing to a single ‘optimal’ parameterisation_:_ ‘We observed that when an error in the seed is allowed, as the seed length is decreased, there needs to be a concomitant increase in effort expended to allow bowtie2 more opportunities to find the best possible alignment, especially with respect to the Transcript ID‘.

      (3) I think the authors chose L=10 and D=100 based on Figure 1A. Which dataset did you choose for this parameterization among ALL-tRNAseq, DM-tRNAseq, mim-tRNAseq, QuantM-tRNA-seq, and YAMAT-seq?

      Figure 1A is based on simulation of full length reads with only sequencing errors, e.g not from any tRNA-Seq method in particular. This is stated in the results text and I’ve clarified in the figure legend.

      (4) Salmon does not need a read alignment process such as Bowtie2. Hence, it is not clear "Only results from alignment with bowtie2" in Figure legend for Figure 4a.

      I’m using Salmon in ‘alignment-mode’, taking the alignments from bowtie2. I’ve clarified this in results page 14.

    1. Author response:

      We thank both reviewers for their thorough and insightful feedback, which will contribute to improving our manuscript. In summary, the key concerns raised include the potential induction of GLV volatiles due to plant handling, limitations in the design of the "wind tunnel" bioassay, and the need for a deeper analysis of specific volatile compounds that contribute to the success of push-pull systems. We are happy to revise the entire manuscript according to all comments of the reviewers. This includes clarification of our methodology and providing a more reflective discussion on how physical stress might have influenced volatile emissions. Additionally, we will conduct new experiments with a modified bioassay setup to address concerns about directional cues and airflow control, minimizing cross-contamination. While the identification of individual compounds was beyond the scope of this study, we acknowledge its importance and propose it as a direction for future research.

      Reviewer #1 (Public review):

      Summary:

      The manuscript of Odermatt et al. investigates the volatiles released by two species of Desmodium plants and the response of herbivores to maize plants alone or in combination with these species. The results show that Desmodium releases volatiles in both the laboratory and the field. Maize grown in the laboratory also released volatiles, in a similar range. While female moths preferred to oviposit on maize, the authors found no evidence that Desmodium volatiles played a role in lowering attraction to or oviposition on maize.

      Strengths:

      The manuscript is a response to recently published papers that presented conflicting results with respect to whether Desmodium releases volatiles constitutively or in response to biotic stress, the level at which such volatiles are released, and the behavioral effect it has on the fall armyworm. These questions are relevant as Desmodium is used in a textbook example of pest-suppressive sustainable intercropping technology called push-pull, which has supported tens of thousands of smallholder farmers in suppressing moth pests in maize. A large number of research papers over more than two decades have implied that Desmodium suppresses herbivores in push-pull intercropping through the release of large amounts of volatiles that repel herbivores. This premise has been questioned in recent papers. Odermatt et al. thus contribute to this discussion by testing the role of odors in oviposition choice. The paper confirms that ovipositing FAW preferred maize, and also confirmed that odors released from Desmodium appeared not important in their bioassays.

      The paper is a welcome addition to the literature and adds quality headspace analyses of Desmodium from the laboratory and the field. Furthermore, the authors, some of whom have since long contributed to developing push-pull, also find that Desmodium odors are not significant in their choice between maize plants. This advances our knowledge of the mechanisms through which push-pull suppresses herbivores, which is critically important to evolving the technique to fit different farming systems and translating this mechanism to fit with other crops and in other geographical areas.

      Thank you for your careful assessment of our manuscript.

      Weaknesses:

      Below I outline the major concerns:

      (1) Clear induction of the experimental plants, and lack of reflective discussion around this: from literature data and previous studies of maize and Desmodium, it is clear that the plants used in this study, particularly the Desmodium, were induced. Maize appeared to be primarily manually damaged, possibly due to sampling (release of GLV, but little to no terpenoids, which is indicative of mostly physical stress and damage, for example, one of the coauthor's own paper Tamiru et al. 2011), whereas Desmodium releases a blend of many compounds (many terpenoids indicative of herbivore induction). Erdei et al. also clearly show that under controlled conditions maize, silver leaf and green leaf Desmodium release volatiles in very low amounts. While the condition of the plants in Odermatt et al. may be reflective of situations in push-pull fields, the authors should elaborate on the above in the discussion (see comments) such that the readers understand that the plant's condition during the experiments. This is particularly important because it has been assumed that Desmodium releases typical herbivore-induced volatiles constitutively, which is not the case (see Erdei et al. 2024). This reflection is currently lacking in the manuscript.

      We acknowledge the need for a more reflective discussion on the possible causes of GLV (green leaf volatiles) emission, particularly regarding physical damage. Although the field plants were carefully handled, it is possible that some physical stress may have contributed to the release of GLVs. We will ensure the revised manuscript reflects this nuanced interpretation. However, we will also explain more clearly that our aim was to capture the volatile emission of plants used by farmers under realistic conditions and moth responses to these plants, not to be able to attribute the volatile emission to a specific cause. We think that this is also clear in the manuscript. However, we plan to revise relevant passages throughout the manuscript to ensure that we do not make any claims about the reason for volatile emissions, and that our claims regarding these plants and their headspace being representative of the system as practiced by farmers are supported. In the revised manuscript we will explain better that the volatile profiles comprise a majority of non-GLV compounds. As shown in figure 1, the majority of the substances that were found in the headspace of the sampled plants of Desmodium intortum or Desmodium incanum are non-GLV monoterpenes, sesquiterpenes, or aromatic compounds. We will also note that the experimental plants used in the study were grown in insect proof screenhouses and were checked for any insect damage before volatile collection and bioassay.

      (2) Lack of controls that would have provided context to the data: The experiments lack important controls that would have helped in the interpretation:

      (2a) The authors did not control the conditions of the plants. To understand the release of volatiles and their importance in the field, the authors should have included controlled herbivory in both maize and Desmodium. This would have placed the current volatile profiles in a herbivory context. Now the volatile measurements hang in midair, leading to discussions that are not well anchored (and should be rephrased thoroughly, see eg lines 183-188). It is well known that maize releases only very low levels of volatiles without abiotic and biotic stressors. However, this changes upon stress (GLVs by direct, physical damage and eg terpenoids upon herbivory, see above). Erdei et al. confirm this pattern in Desmodium. Not having these controls, means that the authors need to put the data in the context of what has been published (see above).

      We appreciate this concern. Our study aimed to capture the real-world conditions of push-pull fields, where Desmodium and maize grow in natural environments without the direct induction of herbivory for experimental purposes. We will update the discussion to provide better context based on existing literature regarding the volatile release under stress conditions. We agree that in further studies it would be important to carry out experiments under different environmental conditions, including herbivore damage. However, this was not within the scope of the present study.

      (2b) It would also have been better if the authors had sampled maize from the field while sampling Desmodium. Together with the above point (inclusion of herbivore-induced maize and Desmodium), the levels of volatile release by Desmodium would have been placed into context.

      We acknowledge that sampling maize and other intercrop plants, such as edible legumes, alongside Desmodium in the push-pull field would have allowed us to make direct comparisons of the volatile profiles of different plants in the push-pull system under shared field conditions. Again, this should be done in future experiments but was beyond the scope of the present study. Due to the amount of samples, we could handle given cost and workload, we chose to focus on Desmodium because there is much less literature on the volatile profiles of field-grown Desmodium than maize plants in the field: we are aware of one study attempting to measure field volatile profiles from Desmodium intortum (Erdei et al. 2024) and no study attempting this for Desmodium incanum. We will point out this justification for our focus on Desmodium in the manuscript. Additionally, we will suggest in the discussion that future studies should measure volatile profiles from maize and intercrop legumes alongside Desmodium and border grass in push-pull fields.

      (2c) To put the volatiles release in the context of push-pull, it would have been important to sample other plants which are frequently used as intercrop by smallholder farmers, but which are not considered effective as push crops, particularly edible legumes. Sampling the headspace of these plants, both 'clean' and herbivore-induced, would have provided a context to the volatiles that Desmodium (induced) releases in the field - one would expect unsuccessful push crops to not release any of these 'bioactive' volatiles (although 'bioactive' should be avoided) if these odors are responsible for the pest suppressive effect of Desmodium. Many edible intercrops have been tested to increase the adoption of push-pull technology but with little success.

      Again, we very much agree that such measurements are important for the longer-term research program in this field. But again, for the current study this would have exploded the size of the required experiment. Regarding bioactivity, we have been careful to use the phrase "potentially bioactive", or to cite other studies showing bioactivity, where we have not demonstrated bioactivity ourselves.

      Because of the lack of the above, the conclusions the authors can draw from their data are weakened. The data are still valuable in the current discussion around push-pull, provided that a proper context is given in the discussion along the points above.

      We agree that our study is limited to its specific aims. Therefore, we think the revisions will make these more explicit and help to avoid misleading claims.

      (3) 'Tendency' of the authors to accept the odor hypothesis (i.e. that Desmodium odors are responsible for repelling FAW and thereby reduce infestation in maize under push-pull management) in spite of their own data: The authors tested the effects of odor in oviposition choice, both in a cage assay and in a 'wind tunnel'. From the cage experiments, it is clear that FAW preferred maize over Desmodium, confirming other reports (including Erdei et al. 2024). However, when choosing between two maize plants, one of which was placed next to Desmodium to which FAW has no tactile (taste, structure, etc), FAW chose equally. Similarly in their wind tunnel setup (this term should not be used to describe the assay, see below), no preference was found either between maize odor in the presence or absence of Desmodium. This too confirms results obtained by Erdei et al. (but add an important element to it by using Desmodium plants that had been induced and released volatiles, contrary to Erdei et al. 2024). Even though no support was found for repellency by Desmodium odors, the authors in many instances in the manuscript (lines 30-33, 164-169, 202, 279, 284, 304-307, 311-312, 320) appear to elevate non-significant tendencies as being important. This is misleading readers into thinking that these interactions were significant and in fact confirming this in the discussion. The authors should stay true to their own data obtained when testing the hypothesis of whether odors play a role in the pest-suppressive effect of push-pull.

      We appreciate this feedback and agree that we may have overstated claims that could not be supported by strict significance tests. However, we believe that non-significant tendencies can still provide valuable insights. In the revised version of the manuscript, we will ensure a clear distinction between statistically significant findings and non-significant trends and remove any language that may imply stronger support for the odor hypothesis that what the data show.

      (4) Oviposition bioassay: with so many assays in close proximity, it is hard to certify that the experiments are independent. Please discuss this in the appropriate place in the discussion.

      We have pointed this out in the submitted manuscript in the lines 275 – 279. Furthermore, we include detailed captions to figure 4 - supporting figure 3 & figure 4 - supporting figure 4. We are aware that in all such experiments there is a danger of between-treatment interference, which we will point out for our specific case. We will also mention that this common caveat does not invalidate experimental designs when practicing replication and randomization and assume insect’s ability to select suitable oviposition site in the background of such confounding factors under realistic conditions. We will also mention explicitly that with our experimental setup we tried to minimize interference between treatments by spacing and temporal staggering.

      (5) The wind tunnel has a number of issues (besides being poorly detailed):

      (5a) The setup which the authors refer to as a 'wind tunnel' does not qualify as a wind tunnel. First, there is no directional flow: there are two flows entering the setup at opposite sides. Second, the flow is way too low for moths to orient in (in a wind tunnel wind should be presented as a directional cue. Only around 1.5 l/min enters the wind tunnel in a volume of 90 l approximately, which does not create any directional flow. Solution: change 'wind tunnel' throughout the text to a dual choice setup /assay.)

      We agree with these criticisms and will change the terminology accordingly. We also plan to conduct an additional experiment with a no-choice arena that provides conditions closer to a true wind tunnel. The setup of the added experiment features an odor entry point at only one side of the chamber to create a more directional airflow. Each treatment (maize alone, maize + D. intortum, maize + D. incanum, and a control with no plants) will be tested separately, with only one treatment conducted per evening to avoid cross-contamination.

      (5b) There is no control over the flows in the flight section of the setup. It is very well possible that moths at the release point may only sense one of the 'options'. Please discuss this.

      We will add this to the discussion. The newly planned assays also address this concern by using a setup with laminar flow.

      (5c) Too low a flow (1,5 l per minute) implies a largely stagnant air, which means cross-contamination between experiments. An experiment takes 5 minutes, but it takes minimally 1.5 hours at these flows to replace the flight chamber air (but in reality much longer as the fresh air does not replace the old air, but mixes with it). The setup does not seem to be equipped with e.g. fans to quickly vent the air out of the setup. See comments in the text. Please discuss the limitations of the experimental setup at the appropriate place in the discussion.

      We will add these limitations to the discussion and will address these concerns with new experiments (see answer 5a).

      (5d) The stimulus air enters through a tube (what type of tube, diameter, length, etc) containing pressurized air (how was the air obtained into bags (type of bag, how is it sealed?), and the efflux directly into the flight chamber (how, nozzle?). However, it seems that there is no control of the efflux. How was leakage prevented, particularly how the bags were airtight sealed around the plants? 

      We will add the missing information to the methods and provide details about types of bags, manufacturers, and pre-treatments. In short, Teflon tubes connected bagged plants to the bioassay setup and air was pumped in at an overpressure, so leakage was not eliminated but contamination from ambient air was avoided.

      (5e) The plants were bagged in very narrowly fitting bags. The maize plants look bent and damaged, which probably explains the GLVs found in the samples. The Desmodium in the picture (Figure 5 supplement), which we should assume is at least a representative picture?) appears to be rather crammed into the bag with maize and looks in rather poor condition to start with (perhaps also indicating why they release these volatiles?). It would be good to describe the sampling of the plants in detail and explain that the way they were handled may have caused the release of GLVs.

      We will include a more detailed description of the plant handling and bagging processes to the methods to clarify how the plants were treated during all assays reported in the submitted manuscript and the newly planned assays. This will address concerns about the possible influence of plant stress, such as GLV emission due to bagging, on the results. We politely disagree that the maize plants were damaged and the Desmodium plants not representative of those encountered in the field. The Desmodium plant pictured was D. incanum, which has sparser foliage and smaller leaves than D. intortum.

      (6) Figure 1 seems redundant as a main figure in the text. Much of the information is not pertinent to the paper. It can be used in a review on the topic. Or perhaps if the authors strongly wish to keep it, it could be placed in the supplemental material.

      We think that Figure 1 provides essential information about the push-pull system and the FAW. To our knowledge, this partly contradictory evidence so far has not been synthesized in the literature. We realize that such a figure would more commonly be provided in a review article, but we do not think that the small number of studies on this topic so far justify a stand-alone review. Instead, the introduction to our manuscript includes a brief review of these few studies, complemented by the visual summary provided in Figure 1 and a detailed supplementary table. We will revise the figure and associated text in the introduction to highlight its relevance for the current study and to reduce redundant information.

      Reviewer #2 (Public review):

      Based on the controversy of whether the Desmodium intercrop emits bioactive volatiles that repel the fall armyworm, the authors conducted this study to assess the effects of the volatiles from Desmodium plants in the push-pull system on behavior of FAW oviposition. This topic is interesting and the results are valuable for understanding the push-pull system for the management of FAW, the serious pest. The methodology used in this study is valid, leading to reliable results and conclusions. I just have a few concerns and suggestions for improvement of this paper:

      (1) The volatiles emitted from D. incanum were analyzed and their effects on the oviposition behavior of FAW moth were confirmed. However, it would be better and useful to identify the specific compounds that are crucial for the success of the push-pull system.

      We fully agree that identifying specific volatile compounds responsible for the push-pull effect would provide valuable insights into the underlying mechanisms of the system. However, the primary focus of this study was to address the still unresolved question whether Desmodium emits volatiles at all under field conditions, and the secondary aim was to test whether we could demonstrate a behavioral effect of Desmodium headspace on FAW moths. Before conducting our experiments, we carefully considered the option of using single volatile compounds and synthetic blends in bioassays. We decided against this because we judged that the contradictory evidence in the literature was not a sufficient basis for composing representative blends. Furthermore, we think it is an important first step to test for behavioral responses to the headspaces of real plants. We consider bioassays with pure compounds to be important for confirmation and more detailed investigation in future studies. There was also contradictory evidence in the literature regarding moth responses to plants. We thus opted to focus on experiments with whole plants to maintain ecological relevance.

      (2) That would be good to add "symbols" of significance in Figure 4 (D).

      We report the statistical significance of the parameters in Figure 4 (D) in Table 3. While testing significance between groups is a standard approach, we used a more robust model-based analysis to assess the effects of multiple factors simultaneously. We will clarify this in the figure legend and provide a cross-reference to Table 3 for readers to easily find the statistical details.

      (3) Figure A is difficult for readers to understand.

      Unfortunately, it is not entirely clear which specific figure is being referred to as "Figure A" in this comment. We kindly request further clarification on which figure needs improvement, and we will make adjustments accordingly to ensure that all figures are easily comprehensible for readers.

      (4) It will be good to deeply discuss the functions of important volatile compounds identified here with comparison with results in previous studies in the discussion better.

      Our study does not provide strong evidence that specific volatiles from Desmodium plants are important determinants of FAW oviposition or choice in the push-pull system. Therefore, we prefer to refrain from detailed discussions of the potential importance of individual compounds. However, in the revised version, we will indicate specifically which of the volatiles we identified overlap with those previously reported from Desmodium, as only the total numbers are summarized in the discussion of the submitted paper.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Removing claims of causality: To avoid confusion, we have now removed claims of causality from our manuscript and also changed the title of the manuscript accordingly

      "Electrophysiological dynamics of salience, default mode, and frontoparietal networks during episodic memory formation and recall: A multi-experiment iEEG replication".

      Control analyses directly comparing AI and IFG: As per the reviewer’s suggestion, we have carried out additional control analyses by directly comparing the net inward/outward balance between the AI and the IFG. Our analysis revealed that the net outflow for the AI is significantly higher compared to the IFG during both encoding and recall phases, a pattern that was replicated across all four experiments. 

      These findings further highlight the unique role of the AI as a key hub in coordinating network interactions during episodic memory formation and retrieval, distinguishing it from a key anatomically adjacent prefrontal region implicated in cognitive control.

      We have incorporated these results into the manuscript (see new Figure S6 and updated Results section). 

      Control analyses directly comparing task with resting state: As per the reviewer’s suggestion, we compared the AI's net outflow during task periods to resting state, finding significantly higher outflow during both encoding and recall across all experiments (ps < 0.05). These results provide further evidence for enhanced role of AI net directed information flow to the DMN and FPN during memory processing compared to the resting state. 

      We have incorporated these results into the manuscript (see new Figure S9 and updated Results section). 

      Control analysis using every region of the brain outside the considered networks: We appreciate the reviewer's suggestion to conduct additional control analyses. However, we have concerns about implementing this approach for several reasons:

      (1) Hypothesis-driven research: Our study was designed based on a strong hypothesis derived from prior fMRI studies, which have consistently shown that the salience network (SN), anchored by the anterior insula (AI), plays a critical role in regulating the engagement and disengagement of the default mode network (DMN) and frontoparietal network (FPN) across diverse cognitive tasks.

      (2) Risk of p-hacking: Running analyses on a large number of brain regions outside our networks of interest without a priori hypotheses could lead to p-hacking, a practice strongly criticized in the scientific community, including by eLife editors (Makin & Orban de Xivry, 2019). Such an approach could potentially yield spurious results and undermine the validity of our findings.

      (3) Principled control region selection: Our choice of the inferior frontal gyrus (IFG) as a control region was hypothesis-driven, based on its: a) Anatomical adjacency to the AI b) Involvement in cognitive control functions, including response inhibition c) Frequent coactivation with the AI in fMRI studies. 

      (4) Robustness of current findings: Our PTE analysis involving the IFG, along with the additional control analyses requested by the reviewer (comparing the task-related net balance of the AI with the IFG and with resting state, see response to reviewer comment 2.1), strongly support a key role for the AI in orchestrating large-scale network dynamics during memory processes.

      (5) Specificity of findings: The contrast between AI and IFG results demonstrates that our observed patterns are not general to all task-active regions but are specific to the AI's role in network coordination. 

      We believe that our current analyses, including the additional controls, provide a comprehensive and rigorous examination of the AI's role in memory-related network dynamics. Adding analyses of numerous additional regions without clear hypotheses could potentially dilute the focus and interpretability of our results. 

      However, we acknowledge the importance of considering broader network interactions. In future studies, we could explore the role of other key regions in a hypothesis-driven manner, potentially expanding our understanding of the complex interactions between multiple brain networks during memory processes.

      These revisions, combined with our rigorous methodologies and comprehensive analyses, provide compelling support for the central claims of our manuscript. We believe these changes significantly enhance the scientific contribution of our work.

      Our point-by-point responses to the reviewers' comments are provided below.

      Reviewer 1:

      (1.1) Because phase-transfer entropy is referenced as a "causal" analysis in this investigation (PTE), I believe it is important to highlight for readers recent discussions surrounding the description of "causal mechanisms" in neuroscience (see "Confusion about causation" section from Ross and Bassett, 2024, Nature Neuroscience). A large proportion of neuroscientists (myself included) use "causal" only to refer to a mechanism whose modulation or removal (with direct manipulation, such as by lesion or stimulation) is known to change or control a given outcome (such as a successful behavior). As Ross and Bassett highlight, it is debatable whether such mechanistic causality is captured by Granger "causality" (a.k.a. Granger prediction) or the parametric PTE, and imprecise use of "causation" may be confusing. The authors have defined in the revised Introduction what their definition of "causality" is within the context of this investigation. 

      We appreciate the reviewer's feedback in terms of the terminology used in our manuscript. To avoid confusion, we have now removed claims of causality from our manuscript and also changed the title of the manuscript accordingly. 

      Reviewer 2:

      (2.1) Clarifying the new control analyses. The authors have been responsive to our feedback and implemented several new analyses. The use of a pre-task baseline period and a control brain region (IFG) definitively help to contextualize their results, and the findings shown in the revision do suggest that (1) relative to a pre-task baseline, directed interactions from the AI are stronger and (2) relative to a nearby region, the IFG, the AI exhibits greater outward-directed influence. 

      However, it is difficult to draw strong quantitative conclusions from the analyses as presented, because they do not directly statistically contrast the effect in question (directed interactions with the FPN and DMN) between two conditions (e.g. during baseline vs. during memory encoding/retrieval). As I understand it, in their main figures the authors ask, "Is there statistically greater influence from the AI to the DMN/FPN in one direction versus another?" And in the AI they show greater "outward" PTE than "inward" PTE from other networks during encoding/retrieval. The balance of directed information favors an outward influence from the AI to DMN/FPN. 

      But in their new analyses, they simply show that the degree of "outward" PTE is greater during task relative to baseline in (almost) all tasks. I believe a more appropriately matched analysis would be to quantify the inward/outward balance during task states, quantify the inward/outward balance during rest states, and then directly statistically compare the two. It could be that the relative balance of directed information flow is nonsignificantly changed between task and rest states, which would be important to know. 

      We thank the reviewer for this suggestion. We have now run additional analysis by directly comparing the inward/outward balance during the task versus the rest states. To calculate the net inward/outward balance, we calculated the net outflow as the difference between the total outgoing information and total incoming information (PTE(out)–PTE(in)). This analysis revealed that net outflow during task periods is significantly higher compared to rest, during both encoding and recall, and across the four experiments (ps < 0.05). These results provide further evidence for enhanced role of AI net directed information flow to the DMN and FPN during memory processing compared to the resting state. These new results have now been included in the revised manuscript (page 12). 

      Likewise, a similar principle applies to their IFG analysis. They show that the IFG tends to have an "inward" balance of influence from the DMN/FPN (the opposite of the AIs effect), but this does not directly answer whether the AI occupies a statistically unique position in terms of the magnitude of its influence on other regions. More appropriate, as I suggest above, would be to quantify the relative balance inward/outward influence, both for the IFG and the AI, and then directly compare those two quantities. (Given the inversion of the direction of effect, this is likely to be a significant result, but I think it deserves a careful approach regardless.) 

      We appreciate the reviewer's suggestion. As per the reviewer’s suggestion, we directly compared the net inward/outward balance between the AI and the IFG. Specifically, we compared the net outflow (PTE(out)–PTE(in)) for the AI with the IFG. This analysis revealed that the net outflow for the AI is significantly higher compared to the IFG during both encoding and recall, and across the four experiments. These findings further highlight a key role for the AI in orchestrating large-scale network dynamics during memory processes. The AI's pattern of directed information flow stands in contrast to that of the IFG, despite their anatomical proximity and shared involvement in cognitive control processes. This dissociation underscores the specificity of the AI's function in coordinating network interactions during memory formation and retrieval. These new results have now been included in our revised manuscript (page 11). 

      (2.2) Consider additional control regions. The authors justify their choice of IFG as a control region very well. In my original comments, I perhaps should have been more clear that the most compelling control analyses here would be to subject every region of the brain outside these networks (with good coverage) to the same analysis, quantify the degree of inward/outward balance, and then see how the magnitude of the AI effect stacks up against all possible other options. If the assertion is that the AI plays a uniquely important role in these memory processes, showing how its influence stacks up against all possible "competitors" would be a very compelling demonstration of their argument. 

      We thank the reviewer for this suggestion. However, please note that running a large number of random analysis by including a large number of brain regions (every region of the brain outside these networks) and comparing their dynamics to the AI without a hypothesis or solid principle amounts to p-hacking, which has been previously strongly criticized by the eLife editors (Makin & Orban de Xivry, 2019). Our study was strongly driven by a solid hypothesis based on prior fMRI studies that have shown that the SN, anchored by the anterior insula (AI), plays a critical role in regulating the engagement and disengagement of the DMN and FPN across diverse cognitive tasks (Bressler & Menon, 2010; Cai et al., 2016; Cai, Ryali, Pasumarthy, Talasila, & Menon, 2021; Chen, Cai, Ryali, Supekar, & Menon, 2016; Kronemer et al., 2022; Raichle et al., 2001; Seeley et al., 2007; Sridharan, Levitin, & Menon, 2008). Moreover, our selection of the IFG as a control region for comparison was also very strongly hypothesis driven, due to its anatomical adjacency to the AI, its involvement in a wide range of cognitive control functions including response inhibition (Cai, Ryali, Chen, Li, & Menon, 2014), and its frequent co-activation with the AI in fMRI studies. Furthermore, the IFG has been associated with controlled retrieval of memory (Badre, Poldrack, Paré-Blagoev, Insler, & Wagner, 2005; Badre & Wagner, 2007; Wagner, Paré-Blagoev, Clark, & Poldrack, 2001), making it a compelling region for comparison. Our findings related to the PTE analysis involving the IFG and also the additional control analyses requested by the reviewer (directly comparing the task-related net balance of the AI with the IFG and also to resting state, please see response to reviewer comment 2.1) strongly highlight a key role of the AI in orchestrating large-scale network dynamics during memory processes. 

      We believe that our current analyses, including the additional controls, provide a comprehensive and rigorous examination of the AI's role in memory-related network dynamics. Adding analyses of numerous additional regions without clear hypotheses could potentially dilute the focus and interpretability of our results.

      However, we acknowledge the importance of considering broader network interactions. In future studies, we could explore the role of other key regions in a hypothesis-driven manner, potentially expanding our understanding of the complex interactions between multiple brain networks during memory processes.

      (2.3) Reporting of successful vs. unsuccessful memory results. I apologize if I was not clear in my original comment (2.7, pg. 13 of the response document) regarding successful vs. unsuccessful memory. The fact that no significant difference was found in PTE between successful/unsuccessful memory is a very important finding that adds valuable context to the rest of the manuscript. I believe it deserves a figure, at least in the Supplement, so that readers can visualize the extent of the effect in successful/unsuccessful trials. This is especially important now that the manuscript has been reframed to focus more directly on claims regarding episodic memory processing; if that is indeed the focus, and their central analysis does not show a significant effect conditionalized on the success of memory encoding/retrieval, it is important that readers can see these data directly.

      As per the reviewer’s suggestion, we have now included a Figure related to the results for the successful versus unsuccessful comparison in the Supplementary materials of the revised manuscript (Figures S10, S11).   

      (2.4) Claims regarding causal relationships in the brain. I understand that the authors have defined "causal" in a specific way in the context of their manuscript; I do believe that as a matter of clear and transparent scientific communication, the authors nonetheless bear a responsibility to appreciate how this word may be erroneously interpreted/overinterpreted and I would urge further review of the manuscript to tone down claims of causality. Reflective of this, I was very surprised that even as both reviewers remarked on the need to use the word "causal" with extreme caution, the authors added it to the title in their revised manuscript.

      We thank the reviewer for this suggestion. To avoid confusion, we have now removed claims of causality from our manuscript and also changed the title of the manuscript accordingly. 

      References 

      Badre, D., Poldrack, R. A., Paré-Blagoev, E. J., Insler, R. Z., & Wagner, A. D. (2005). Dissociable controlled retrieval and generalized selection mechanisms in ventrolateral prefrontal cortex. Neuron, 47(6), 907-918. doi:10.1016/j.neuron.2005.07.023

      Badre, D., & Wagner, A. D. (2007). Left ventrolateral prefrontal cortex and the cognitive control of memory. Neuropsychologia, 45(13), 2883-2901. doi:10.1016/j.neuropsychologia.2007.06.015

      Bressler, S. L., & Menon, V. (2010). Large-scale brain networks in cognition: emerging methods and principles. Trends in Cognitive Sciences, 14(6), 277-290. doi:10.1016/j.tics.2010.04.004

      Cai, W., Chen, T., Ryali, S., Kochalka, J., Li, C. S., & Menon, V. (2016). Causal Interactions Within a Frontal-Cingulate-Parietal Network During Cognitive Control: Convergent Evidence from a Multisite-Multitask Investigation. Cereb Cortex, 26(5), 2140-2153. doi:10.1093/cercor/bhv046

      Cai, W., Ryali, S., Chen, T., Li, C. S., & Menon, V. (2014). Dissociable roles of right inferior frontal cortex and anterior insula in inhibitory control: evidence from intrinsic and taskrelated functional parcellation, connectivity, and response profile analyses across multiple datasets. J Neurosci, 34(44), 14652-14667. doi:10.1523/jneurosci.3048-14.2014

      Cai, W., Ryali, S., Pasumarthy, R., Talasila, V., & Menon, V. (2021). Dynamic causal brain circuits during working memory and their functional controllability. Nat Commun, 12(1), 3314. doi:10.1038/s41467-021-23509-x

      Chen, T., Cai, W., Ryali, S., Supekar, K., & Menon, V. (2016). Distinct Global Brain Dynamics and Spatiotemporal Organization of the Salience Network. PLOS Biology, 14(6), e1002469. doi:10.1371/journal.pbio.1002469

      Kronemer, S. I., Aksen, M., Ding, J. Z., Ryu, J. H., Xin, Q., Ding, Z., . . . Blumenfeld, H. (2022). Human visual consciousness involves large scale cortical and subcortical networks independent of task report and eye movement activity. Nat Commun, 13(1), 7342. doi:10.1038/s41467-022-35117-4

      Makin, T. R., & Orban de Xivry, J. J. (2019). Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. Elife, 8. doi:10.7554/eLife.48175

      Raichle, M. E., MacLeod, A. M., Snyder, A. Z., Powers, W. J., Gusnard, D. A., & Shulman, G. L. (2001). A default mode of brain function. Proc Natl Acad Sci U S A, 98(2), 676-682. doi:10.1073/pnas.98.2.676

      Seeley, W. W., Menon, V., Schatzberg, A. F., Keller, J., Glover, G. H., Kenna, H., . . . Greicius, M. D. (2007). Dissociable Intrinsic Connectivity Networks for Salience Processing and Executive Control. Journal of Neuroscience, 27(9), 2349-2356. doi:10.1523/JNEUROSCI.5587-06.2007

      Sridharan, D., Levitin, D. J., & Menon, V. (2008). A critical role for the right fronto-insular cortex in switching between central-executive and default-mode networks. Proceedings of the National Academy of Sciences, 105(34), 12569-12574. doi:10.1073/pnas.0800005105

      Wagner, A. D., Paré-Blagoev, E. J., Clark, J., & Poldrack, R. A. (2001). Recovering meaning: left prefrontal cortex guides controlled semantic retrieval. Neuron, 31(2), 329-338. doi:10.1016/s0896-6273(01)00359-2

    1. Author response:

      Reviewer #1 (Public review): 

      The authors survey the ultrastructural organization of glutamatergic synapses by cryo-ET and image processing tools using two complementary experimental approaches. The first approach employs so-called "ultra-fresh" preparations of brain homogenates from a knock-in mouse expressing a GFP-tagged version of PSD-95, allowing Peukes and colleagues to specifically target excitatory glutamatergic synapses. In the second approach, direct in-tissue (using cortical and hippocampal regions) targeting of the glutamatergic synapses employing the same mouse model is presented. In order to ascertain whether the isolation procedure causes any significant changes in the ultrastructural organization (and possibly synaptic macromolecular organization) the authors compare their findings using both of these approaches. The quantitation of the synaptic cleft height reveals an unexpected variability, while the STA analysis of the ionotropic receptors provides insights into their distribution with respect to the synaptic cleft.

      The main novelty of this study lies in the continuous claims by the authors that the sample preservation methods developed here are superior to any others previously used. This leads them as well to systematically downplay or directly ignore a substantial body of previous cryo-ET studies of synaptic structure. Without comparisons with the cryo-ET literature, it is very hard to judge the impact of this work in the field. Furthermore, the data does not show any better preservation in the so-called "ultra-fresh" preparation than in the literature, perhaps to the contrary as synapses with strangely elongated vesicles are often seen. Such synapses have been regularly discarded for further analysis in previous synaptosome studies (e.g. Martinez-Sanchez 2021). Whilst the targeting approach using a fluorescent PSD95 marker is novel and seems sufficiently precise, the authors use a somewhat outdated approach (cryo-sectioning) to generate in-tissue tomograms of poor quality. To what extent such tomograms can be interpreted in molecular terms is highly questionable. The authors also don't discuss the physiological influence of 20% dextran used for high-pressure freezing of these "very native" specimens.

      Lastly, a large part of the paper is devoted to image analysis of the PSD which is not convincing (including a somewhat forced comparison with the fixed and heavy-metal staining room temperature approach). Despite being a technically challenging study, the results fall short of expectations. 

      Our manuscript contains a discussion of both conventional EM and cryoET of synapses. We apologise if we have omitted referencing or discussing any earlier cryoET work. This was certainly not our intention, and we include a more complete discussion of published cryoET work on synapses in our revised manuscript.

      The reviewer is concerned that the synaptic vesicles in some synapse tomograms are “stretched” and that this may reflect poor preservation.  We would like to point out that such non-spherical synaptic vesicles have also been previously reported in cryoET of primary neurons grown on EM grids (Tao et al., J. Neuro, 2018). Indeed, there is no reason per se to suppose synaptic vesicles are always spherical and there are many diverse families of proteins expressed at the synapse that shape membrane curvature (BAR domain proteins, synaptotagmin, epsins, endophilins and others). We will add further discussion of this issue in the revised manuscript.

      The reviewer regards ‘cryo-sectioning’ as outdated and cryoET data from these preparations as “poor quality”. We respectfully disagree. Preparing brain tissues for cryoET is generally considered to be challenging. The first successful demonstration of preparing such samples was before the advent of the cryoEM resolution revolution (with electron counting detectors) by Zuber et al (Proc. Natl. Acad. Sci.,2005) preparing cryo-sections/CEMOVIS of in vitro brain cultures. We followed this technique to prepare tissue cryo-sections for cryoET in our manuscript. Recently, cryoFIB-SEM liftout has been developed as an alternative method to prepare tissue samples for cryoET (Mahamid et al., J. Struct. Biol., 2015) and only more recently this method became available to more laboratories. Both techniques introduce damage as has been described (Han et al., J. Microsc., 2008; Lucas et al., Proc. Natl. Acad. Sci., 2023). Importantly no like-for-like, quantitative comparison of these two methodologies has yet been performed. We have recently demonstrated that the molecular structure of amyloid fibrils within human brain is preserved down to the protein fold level in samples prepared by cryo-sectioning (Gilbert et al., Nature, 2024). We will add further detail on the process by which we excluded poor quality tomograms from our analysis, which we described in detail in our methods section.

      The reviewer asks what the physiological effect is of adding 20% w/v ~40,000 Da dextran? This is a reasonable concern since this could in principle exert osmotic pressure on the tissue sample. While we did not investigate this ourselves, earlier studies have (Zuber et al, 2005) showing cell membranes were not damaged by and did not have any detectable effect on cell structure in the presence of this concentration of dextran.

      The reviewer is not convinced by our analysis of the apparent molecular density of macromolecules in the postsynaptic compartment that in conventional EM is called the postsynaptic density. However, the reviewer provides no reasoning for this assessment nor alternative approaches that could be attempted. We would like to add that we have tested multiple different approaches to objectively measure molecular crowding in cryoET data, that give comparable results. We believe that our conclusion – that we do not observe an increased molecular density conserved at the postsynaptic membrane, and that the PSD that we and others observed by conventional EM does not correspond to a region of increased molecular density - is well supported by our data.  We and the other reviewers consider this an important and novel observation.

      Reviewer #2 (Public review): 

      Summary: 

      The authors set out to visualize the molecular architecture of the adult forebrain glutamatergic synapses in a near-native state. To this end, they use a rapid workflow to extract and plunge-freeze mouse synapses for cryo-electron tomography. In addition, the authors use knockin mice expression PSD95-GFP in order to perform correlated light and electron microscopy to clearly identify pre- and synaptic membranes. By thorough quantification of tomograms from plunge- and high-pressure frozen samples, the authors show that the previously reported 'post-synaptic density' does not occur at high frequency and therefore not a defining feature of a glutamatergic synapse.

      Subsequently, the authors are able to reproduce the frequency of post-synaptic density when preparing conventional electron microscopy samples, thus indicating that density prevalence is an artifact of sample preparation. The authors go on to describe the arrangement of cytoskeletal components, membraneous compartments, and ionotropic receptor clusters across synapses.

      Demonstrating that the frequency of the post-synaptic density in prior work is likely an artifact and not a defining feature of glutamatergic synapses is significant. The descriptions of distributions and morphologies of proteins and membranes in this work may serve as a basis for the future of investigation for readers interested in these features.

      Strengths: 

      The authors perform a rigorous quantification of the molecular density profiles across synapses to determine the frequency of the post-synaptic density. They prepare samples using two cryogenic electron microscopy sample preparation methods, as well as one set of samples using conventional electron microscopy methods. The authors can reproduce previous reports of the frequency of the post-synaptic density by conventional sample preparation, but not by either of the cryogenic methods, thus strongly supporting their claim. 

      We thank the reviewer for their generous assessment of our manuscript.

      Reviewer #3 (Public review): 

      Summary: 

      The authors use cryo-electron tomography to thoroughly investigate the complexity of purified, excitatory synapses. They make several major interesting discoveries: polyhedral vesicles that have not been observed before in neurons; analysis of the intermembrane distance, and a link to potentiation, essentially updating distances reported from plastic-embedded specimen; and find that the postsynaptic density does not appear as a dense accumulation of proteins in all vitrified samples (less than half), a feature which served as a hallmark feature to identify excitatory plastic-embedded synapses. 

      Strengths: 

      (1) The presented work is thorough: the authors compare purified, endogenously labeled synapses to wild-type synapses to exclude artifacts that could arise through the homogenation step, and, in addition, analyse plastic embedded, stained synapses prepared using the same quick workflow, to ensure their findings have not been caused by way of purification of the synapses. Interestingly, the 'thick lines of PSD' are evident in most of their stained synapses.

      (2) I commend the authors on the exceptional technical achievement of preparing frozen specimens from a mouse within two minutes.

      (3) The approaches highlighted here can be used in other fields studying cell-cell junctions.

      (4) The tomograms will be deposited upon publication which will enable neurobiologists and researchers from other fields to carry on data evaluation in their field of expertise since tomography is still a specialized skill and they collected and reconstructed over 100 excellent tomograms of synapses, which generates a wealth of information to be also used in future studies.

      (5) The authors have identified ionotropic receptor positions and that they are linked to actin filaments, and appear to be associated with membrane and other cytosolic scaffolds, which is highly exciting.

      (6) The authors achieved their aims to study neuronal excitatory synapses in great detail, were thorough in their experiments, and made multiple fascinating discoveries. They challenge dogmas that have been in place for decades and highlight the benefit of implementing and developing new methods to carefully understand the underlying molecular machines of synapses.

      Weaknesses: 

      The authors show informative segmentations in their figures but none have been overlayed with any of the tomograms in the submitted videos. It would be helpful for data evaluation to a broad audience to be able to view these together as videos to study these tomograms and extract more information. Deposition of segmentations associated with the tomgrams would be tremendously helpful to Neurobiologists, cryo-ET method developers, and others to push the boundaries.

      Impact on community: 

      The findings presented by Peukes et al. pertaining to synapse biology change dogmas about the fundamental understanding of synaptic ultrastructure. The work presented by the authors, particularly the associated change of intermembrane distance with potentiation and the distinct appearance of the PSD as an irregular amorphous 'cloud' will provide food for thought and an incentive for more analysis and additional studies, as will the discovery of large membranous and cytosolic protein complexes linked to ionotropic receptors within and outside of the synaptic cleft, which are ripe for investigation. The findings and tomograms available will carry far in the synapse fields and the approach and methods will move other fields outside of neurobiology forward. The method and impactful results of preparing cryogenic, unlabelled, unstained, near-native synapses may enable the study of how synapses function at high resolution in the future.

      We thank the reviewer for their supportive assessment of our manuscript.  We thank the reviewer for suggesting overlaying segmentations with videos of the raw tomographic volumes. We will include this in our revised manuscript.

    1. Author response:

      Response to Reviewer #1:

      “Claiming a possible therapeutic role for this gene is a bit far-fetched at the present state of the art”.

      We agree that while the therapeutic relevance of Svep1 is not clear at this point, this potential is always something we consider in interpreting our data.

      Response to Reviewer #2:

      a. “The weakness of this paper is that it does not present a convincing explanation for how Svep1 regulates any of the phenotypes described. In this regard, a demonstration of a genetic interaction between Svep1 and FGF9 mutants or a careful characterization of a tissue-specific knockdown of Svep1, could be insightful. In addition, a comparison of the phenotype of Svep1 mutants and the phenotypes of other mutants affecting ECM components would be worthwhile”. 

      We agree that additional experiments are needed to determine how exactly Svep1 contributes to the phenotypes described. While our preliminary data point to an interaction of Svep1 and Fgf9, we agree that additional data are needed to prove that such interaction is a primary driver of the phenotypes observed.

      b. “A minor weakness is that the title of the paper is not fully supported by the data presented. While the defects in the morphogenesis of the distal lung in Svep1 mutants presage a defect in alveolar differentiation, this cannot be formally demonstrated since the animals die soon after birth”

      The reviewer is correct that we cannot formally demonstrate this in the current model. The profound defects observed in Svep1 mutants lead to early death, making it challenging to study the full process of alveolarization. However, it is important to note that lung morphogenesis is a continuous process in which earlier developmental phases lay the groundwork for subsequent stages. During the branching phase, the fate of alveolar cell types is established, while the saccular stage serves as a critical foundation for alveolar development, where alveolar cells begin to differentiate. We believe that the significant abnormalities in cellular differentiation observed prior to the bulk of alveolarization indicate likely defects in the later stages of alveolar differentiation. Therefore, while the model limits our ability to directly assess alveolarization, we anticipate that defects in cellular differentiation will continue to manifest beyond the saccular stage in Svep1 mice.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Zhang et al. analyzed 17 specimens of Cindarella eucalla with 3D technology and discussed the anatomical findings, the relationship to other artiopods, and the ecology of the animal. The results are excellent and the findings are very interesting. However, the discussion needs to be extended, as the point the authors are trying to make is not always clear. I also recommend some restructuring of the discussion. Overall this is an important manuscript, and I'm looking forward to reading the edited version.

      Strengths:

      The analyses, the 3D data is excellent and provides new information.

      Weaknesses:

      The discussion - the authors provide information for the findings, but do not discuss them in detail. More information is needed.

      We are committed to enhancing the quality of our manuscript further and, in response to your comments, will implement the following improvements:

      (1) Comparative Analysis of Eyes: We will expand our discussion to include a detailed comparative analysis of the eyes of Cindarella eucalla with those of other artiopods (e.g. Xandarellids, trilobites, living insects), focusing on morphology, size, and other relevant characteristics.

      (2) Segmental Mismatch Discussion: We will provide an in-depth exploration of the specifics and significance of the segmental mismatch to offer a clearer understanding of its implications. We will also compare the characteristics of this mismatch in our focal species with those observed in extant arthropods, such as spiders and myriapods. This comparison will be further enriched by integrating our phylogenetic analysis, thereby providing a broader evolutionary context.

      (3) Methodological Clarity: We will provide more detailed information on the parameters used for the analyses in the Methods section, especially the phylogenetic sections and the X-ray tomography section.

      (4) Phylogenetic Analysis: We will engage in a more in-depth discussion of certain characters (e.g. anterior sclerite, hypostome, endopodite, segmental mismatch, etc.) within our phylogenetic analyses to clarify their relevance and contribution to our findings.

      Reviewer #2 (Public review):

      Summary:

      Zhang et al. present very well-illustrated specimens of the artiopodan Cinderella eucalla from the Chengjiang Biota. Multiple specimens are shown with preserved appendages, which is rare for artiopodans and will greatly help our understanding of this taxon. The authors use CT scanning to reveal the ventral organization of this taxon. The description of the taxon needs some modification, specifically expansion of the gut and limb morphology. The conclusion that Cinderella was a fast-moving animal is very weak, comparisons with extant fast animals and possibly FEA analyses are necessary to support such a claim. Although the potential insights provided by such well-preserved fossils could be valuable, the claims made are tenuous and based on the available evidence presented herein.

      Strengths:

      The images produced through CT scanning specimens reveal the very fine detail of the appendages and are well illustrated. Specimens preserve guts and limbs, which are informative both for the phylogenetic position and ecology of this taxon. The limbs are very well preserved, with protopodite, exopodite, and endopodites visible. Addressing the weaknesses below will make the most of this compelling data that demonstrates the morphology of the limbs well.

      Weaknesses:

      Although this paper includes very well-illustrated fossils, including new information on the eyes, guts, and limbs of Cinderella, the data are not fully explained, and the conclusions are weakly supported.

      The authors suggest the preservation of complex ramifying diverticular, but it should be better illustrated and the discussion of the gut diverticulae should be longer, especially as gut morphology can provide insights into the feeding strategy.

      The conclusion that Cinderella eucalla was fast, sediment feeding in a muddy environment, is not well supported. These claims seem to be tenuously made without any evidence to support them. The authors should add a new section in the discussion focused on feeding ecology where they explicitly compare the morphology to suspension-feeding artiopodans to justify whether it fed that way or not. To further explore feeding, the protopodite morphology needs to be more carefully described and compared to other known taxa. The function of endites on the endopodite to stir up sediment for particle feeding in a muddy environment would also need to be more thoroughly discussed and compared with modern analogs. The impact of their findings is not highlighted in the discussion, which is currently more of a review of what has been previously said and should focus more on what insights are provided by the great fossils illustrated by the authors.

      The authors argue that their data supports fast escaping capabilities, but it is not clear how they reached that conclusion based on the data available. Is there a way this can be further evaluated? The data is impressive, so including comparisons with extant taxa that display fast escaping strategies would help the authors make their case more compelling. The authors also claim that the limbs of Cinderella are strong, again this conclusion is unclear. Comparison with the limbs of other taxa to show their robustness would be useful. To actually test how these limbs deal with the force and strain applied to them by a sudden burst of movement, the authors could conduct Finite Element Analyses.

      Here are the key points we plan to address:

      (1) Gut and Limb Morphology: We will expand our description of the gut and limb morphology of C. eucalla, providing a more detailed comparison and analysis. This will include a revised discussion on the function and ecological implications of these features.

      (2) Fast-Moving Animal Claim: We acknowledge your concern about the conclusion that C. eucalla was a fast-moving animal. We will conduct a more detailed comparison among C. eucalla and other Cambrian artiopods and living arthropods, focusing on morphological and functional aspects. We will also reconsider our claim and will be more cautious in our conclusions. If the comparison proves insufficient, we will remove this assertion from the manuscript. But we may no longer conduct Finite Element Analysis, as a comprehensive and cautious analysis would require a massive project to complete.

      (3) Sediment Feeding in a Muddy Environment: We will revise the section discussing the feeding ecology of C. eucalla. We will enhance this section by comparing the morphology of C. eucalla to that of suspension-feeding artiopods, which will help to substantiate our claims. Additionally, we will expand the discussion to include a more detailed examination of endites, gnathobases, and other relevant anatomical structures.

      (4) Impact of Findings: We will endeavor to highlight the impact of our findings in the discussion, focusing on the insights provided by the well-preserved fossils illustrated in our study.

      Reviewer #3 (Public review):

      This paper provides an interesting description of the ventral parts of the Cambrian xandarellid Cindarella eucalla, derived from exceptionally preserved specimens of the Chengjiang Biota. These morphological data are useful for our broad understanding and future research on Xandarellida, and are generally well-represented in the description and accompanying figures. The strengths of this work rest in this morphological description of exceptional fossil material, and this is generally well supported. In addition, the authors put this description in the context of the morphology of other xandarellids and Cambrian arthropod groups, with most of these parallels being useful and reasonably supported, though in several places homology is assumed and this currently lacks evidence. The manuscript goes on to use these morphological data and comparisons to other groups (particularly trilobites) to make suggestions for the ecology of Cindarella eucalla and other xandarellids. The majority of my comments on this work relate to this latter aim - the ecological conclusions drawn are generally derived through morphological comparisons, where a specific morphology has been suggested as an adaption to a particular ecological function in another extinct arthropod group. However, the original suggestions for ecological function are untested, and so remain hypotheses. Despite this, they are frequently presented as truisms to enable ecological conclusions to be drawn for Cindarella eucalla. I have listed my comments and queries on the study below for the authors to address or respond to, and I hope they are useful to the authors.

      Comments:

      There are a number of ecological and functional morphology conclusions stated that seem put too strongly to be considered sufficiently supported by the evidence given. These relate to both the description of C. eucalla, and comparisons to other extinct arthropod taxa (notably trilobites). Many of these latter statements are assumptions of functional morphology, and should not be repeated as truisms, rather than they represent suggested functions and ecologies based on the known morphological descriptions. This aspect occurs throughout the article, and, for me, is the primary concern.

      We plan to address the following points in upon revision:

      (1) Homology Assumptions: You pointed out that we have assumed homology in certain instances without sufficient evidence. We will revise the manuscript to include a more detailed analysis of the anterior sclerite and exite, considering phylogenetic relationships and morphological comparisons to provide a more robust discussion.

      (2) Ecological and Functional Morphology: We acknowledge that our conclusions regarding the ecological function were presented with too much certainty. We will adopt a more cautious approach in our discussion, ensuring that our ideas are clearly labeled as such and are supported by a comparison of relevant studies on Cambrian artiopods and extant arthropods, including fluid dynamics, functional morphology, etc. We will re-evaluate the ecological function section, and if it does not adds value and clarity to the manuscript—our speculations do not contribute to the understanding of the specimen or may lead to misunderstandings—we will remove the relevant parts. We believe future changes reflect a more cautious and rigorous approach to the ecological and functional interpretations of C. eucalla.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank the reviewers and editor for their positive assessment of our work. For the Version of Record, we have made small revisions addressing the remaining concerns of reviewer #3. We have also reformatted the supplementary material to conform to eLife’s style.

      While the manuscript was under review, we discussed our work with Bill Bialek, who suggested clarifying the effect of cell rearrangements on genetic patterns. Using the tracked cell trajectories we found that the highly coordinated intercalations in the germ band preserve the relative AP positions of cells. We have added an Appendix subsection (Appendix 1.5) explaining this finding and highlighting its relevance in a short paragraph added to the discussion.

      Reviewer #2

      Main comment from 1st review:

      Weaknesses:

      The modeling is interesting, with the integration of tension through tension triangulation around vertices and thus integrating force inference directly in the vertex model. However, the authors are not using it to test their hypothesis and support their analysis at the tissue level. Thus, although interesting, the analysis at the tissue level stays mainly descriptive.

      Comments on the revised version:

      My main concern was that the author did not use the analysis of mutant contexts such as Snail and Twist to confirm their predictions. They made a series of modifications, clarifying their conclusions. In particular, they now included an analysis of Snail mutant and show that isogonal deformations in the ventro-lateral regions are absent when the external pulling force of the VF is abolished, supporting the idea that isogonal strain could be used as an indicator of external forces (Fig7 and S6).

      They further discuss their results in the context of what was published regarding the mutant backgrounds (fog, torso-like, scab, corkscrew, ksr) where midgut invagination is disrupted, and where germ band buckles, and propose that this supports the importance of internal versus external forces driving GBE.

      Overall, these modifications, in addition to clarifications in the text, clearly strengthen the manuscript.

      We thank the reviewer for assessing our manuscript again and are happy to hear that they find the added data on the snail mutant convincing and that our revised manuscript is stronger.

      Reviewer #3

      In their article "The Geometric Basis of Epithelial Convergent Extension", Brauns and colleagues present a physical analysis of drosophila axis extension that couples in toto imaging of cell contours (previously published dataset), force inference, and theory. They seek to disentangle the respective contributions of active vs passive T1 transitions in the convergent extension of the lateral ectoderm (or germband) of the fly embryo.

      The revision made by the authors has greatly improved their work, which was already very interesting, in particular the use of force inference throughout intercalation events to identify geometric signatures of active vs passive T1s, and the tension/isogonal decomposition. The new analysis of the Snail mutant adds a lot to the paper and makes their findings on the criteria for T1s very convincing.

      About the tissue scale issues raised during the first round of review. Although I do not find the new arguments fully convincing (see below), the authors did put a lot of effort to discuss the role of the adjacent posterior midgut (PMG) on extension, which is already great. That will certainly provide the interested readers with enough material and references to dive into that question.

      We appreciate the referee’s positive assessment of our manuscript and their careful reading and constructive feedback. In particular, we are happy to hear that the referee finds our added data on the snail mutant very convincing and finds that the extended discussion on the role of the PMG is helpful. We address the remaining concerns in our detailed response below.

      I still have some issues with the authors' interpretation on the role of the PMG, and on what actually drives the extension. Although it is clear that T1 events in the germ band are driven by active local tension anisotropy (which the authors show but was already well-established), it does not show that the tissue extension itself is powered by these active T1s. Their analysis of "fence" movies from Collinet et al 2015 (Tor mutants and Eve RNAi) is not fully convincing. Indeed, as the authors point out themselves, there is no flow in Tor mutant embryos, even though tension anisotropy is preserved. They argue that in Tor embryos the absence of PMG movement leaves no room for the germband to extend properly, thus impeding the flow. That suggests that the PMG acts as a barrier in Tor mutants - What is it attached to, then?

      We thank the referee for pointing out this omission: The PMG is attached to the vitelline membrane in the scab domain (Munster et al. Nature 2019) and is also obstructed from moving by more anterior laying tissue (amnioserosa). It therefore acts as an obstacle for GBE extension if it fails to invaginate (e.g. in a Tor embryo). We have clarified this in the discussion of the Tor mutants.

      The authors also argue that the posterior flow is reduced in "fenced" Eve RNAi embryos (which have less/no tension anisotropy), to justify their claim that it is the anisotropy that drives extension. However, previous data, including some of the authors' (Irvine and Wieschaus, 1994 - Fig 8), show that the first, rapid phase of germband extension is left completely unaffected in Eve mutants (that lack active tension anisotropy). Although intercalation in Eve mutants is not quantified in that reference, this was later done by others, showing that it is strongly reduced.

      The quantification of GBE in Irvine and Wieschaus 1994 was based on the position of the PMG from bright field imaging, making it hard to distinguish the contributions of ventral furrow, PMG, and germ band, particularly during the early phase of GBE where all these processes happen simultaneously. More detailed quantifications based on PIV analysis of in toto light-sheet imaging show significantly reduced tissue flow in eve mutants after the completion of ventral furrow invagination (Lefebvre et al., eLife 2023). That the initial fast flow is driven by ventral furrow invagination, not by the PMG is apparent from twist/snail embryos where the initial phase is significantly slower (Lefebvre et al., eLife 2023, Gustafson et al., Nat Comms 2022). We have added these references to the re-analysis and discussion of the Collinet et al 2015 experiments.

      Similarly, the Cyto-D phenotype from Clement et al 2017, in which intercalation is also strongly reduced, also displays normal extension.

      We agree that a careful quantification of tissue flow in Cyto-D-treated embryos would be interesting. Whether they show normal extension is not clear from the Clement et al. 2017 paper, as no quantification of total tissue flow is performed and no statements regarding extension are made there.

      Reviewer #3 (Recommendations For The Authors):

      • A lot of typos / grammar mistakes / repetitions are still found here and there in the paper. Authors should plan a careful re-reading prior to final publication.

      We have carefully checked the manuscript and fixed the typos and grammar mistakes.

      • I failed to point to a very relevant reference in the previous round of review, which I think the authors should cite and comment: A review by Guirao & Bellaiche on the mechanics of intercalation in the fly germband, which notably discusses the passive/active and stress-relaxing/stress-generating nature of T1s. (Guirao and Bellaiche, Current opinions in cell biology 2017), in particular figures 1 and 2.

      We thank the referee for pointing us to this relevant reference which we now cite in the introduction.

      • Any new arguments/discussion the authors see fit to include in the paper to comment on the Eve/Tor phenotypes. As far as I am concerned, I am not fully convinced at the moment (see review), but I think the paper has other great qualities and findings, and now (since the first round of review) sufficiently discusses that particular matter. I leave it up to the authors how much (more) they want to delve into this in their final version!

      We have added clarifications and references to the discussion of the Eve/Tor phenotypes.


      The following is the authors’ response to the original reviews.

      Public Review:

      Joint Public Review:

      Summary:

      Brauns et al. work to decipher the respective contribution of active versus passive contributions to cell shape changes during germ band elongation. Using a novel quantification tool of local tension, their results suggest that epithelial convergent extension results from internal forces.

      Reading this summary, and the eLife assessment, we realized that we failed to clearly communicate important aspects of our findings in the first version of our manuscript. We therefore decided to largely restructure and rewrite the abstract and introduction to emphasize that:

      ● Our analysis method identifies active vs passive contributions to cell and tissue shape changes during epithelial convergent extension

      ● In the context of Drosophila germ band extension, this analysis provides evidence for a major role for internal driving forces rather than external pulling force from neighboring tissue regions (posterior midgut), thus settling a question that has been debated due to apparently conflicting evidence from different experiments.

      ● Our findings have important implications for local, bottom-up self-organization vs top-down genetic control of tissue behaviors during morphogenesis.

      Strengths:

      The approach developed here, tension isogonal decomposition, is original and the authors made the demonstration that we can extract comprehensive data on tissue mechanics from this type of analysis.

      They present an elegant diagram that quantifies how active and passive forces interact to drive cell intercalations.

      The model qualitatively recapitulates the features of passive and active intercalation for a T1 event.

      Regions of high isogonal strains are consistent with the proximity of known active regions.

      We think this statement is somewhat ambiguous and does not summarize our findings precisely. A more precise statement would be that high isogonal strain identifies regions of passive deformation, which is caused by adjacent active regions.

      They define a parameter (the LTC parameter) which encompasses the geometry of the tension triangles and allows the authors to define a criterium for T1s to occur.

      The data are clearly presented, going from cellular scale to tissue scale, and integrating modeling approaches to complement the thoughtful description of tension patterns.

      Weaknesses:

      The modeling is interesting, with the integration of tension through tension triangulation around vertices and thus integrating force inference directly in the vertex model. However, the authors are not using it to test their hypothesis and support their analysis at the tissue level. Thus, although interesting, the analysis at the tissue level stays mainly descriptive.

      We fully agree that a full tissue scale model is crucial to support the claims about tissue scale self-organization we make in the discussion. However, the full analysis of such a model is beyond the scope of the present manuscript. We have therefore split off that analysis into a companion manuscript (Claussen et al. 2023). In this paper, we show that the key results of the tissue-scale analysis of the Drosophila embryo, in particular the order-to-disorder transition associated with slowdown of tissue flow, are reproduced and rationalized by our model.

      We now refer more closely to this companion paper to point the reader to the results presented there.

      Major points:

      (1) The authors mention that from their analysis, they can predict what is the tension threshold required for intercalations in different conditions and predict that in Snail and Twist mutants the T1 tension threshold would be around √2. Since movies of these mutants are most probably available, it would be nice to confirm these predictions.

      This is an excellent suggestion. We have included an analysis of a recording of a Snail mutant, which is presented in the new Figures 4 and S6. As predicted, we find that isogonal deformations in the ventro-lateral regions are absent when the external pulling force of the VF is abolished. Further, in the absence of isogonal deformation, T1 transitions indeed occur at a critical tension of approx. √2, as predicted by our model. Both of these results provide important experimental evidence for our model and for isogonal strain as a reliable indicator of external forces.

      (2) While the formalism is very elegant and convincing, and also convincingly allows making sense of the data presented in the paper, it is not all that clear whether the claims are compatible with previous experimental observations. In particular, it has been reported in different papers (including Collinet et al NCB 2015, Clement et al Curr Biol 2017) that affecting the initial Myosin polarity or the rate of T1s does not affect tissue-scale convergent extension. Analysis/discussion of the Tor phenotype (no extension with myosin anisotropy) and the Eve/Runt phenotype (extension without Myosin anisotropy), which seem in contradiction with an extension mostly driven by myosin anisotropy.

      We are happy to read that the referees find our approach elegant and convincing. The referees correctly point out that we have failed to clearly communicate how our findings connect to the existing literature on Drosophila GBE. Indeed, the conflicting results reported in the literature on what drives GBE – internal forces (myosin anisotropy) or external forces (pulling by the posterior midgut) – were a motivation for our study. We have extensively rewritten the introduction, results section (“Isogonal strain identifies regions of passive tissue deformation”), and discussion (“Internal and external contributions to germ band extension”) in response to the referee’s request.

      In brief, distinguishing active internal vs passive external driving of tissue flow has been a fundamental open question in the literature on morphogenesis. Our tension-isogonal decomposition now provides a way to answer this question on the cell scale, by identifying regions of passive deformation due to external forces. As we now explain more clearly, our analysis shows that germ band extension is predominantly driven by internal tension dynamics, and not pulling forces from the posterior midgut.

      We put this cell-scale evidence into the context of previous experimental observations on the tissue scale: Genetic mutants (fog, torso-like, scab, corkscrew, ksr), where posterior midgut invagination is disrupted (Muenster et al. 2019, Smits et al. 2023). In these mutants, the germ band buckles forming ectopic folds or twists into a corkscrew shape as it extends, pointing towards a buckling instability characteristic of internally driven extensile flows.

      To address the apparently conflicting evidence from Collinet et al. 2015, we carried out a

      quantitative re-analysis of the data presented in that reference (see new SI section 3 and Fig.

      S11). The results support the conclusion that the majority of GBE flow is driven internally, thus resolving the apparent conflict.

      Lastly, as far as we understand, Clement et al. 2017 appears to be compatible with our picture of active T1 transitions. Clement et al. report that the actin cortex, when loaded by external forces, behaves visco-elastically with a relaxation time of the order of minutes, in line with our model for emerging interfaces post T1.

      We again thank the referees for prompting us to address these important issues and believe that including their discussion has significantly strengthened our manuscript.

      Recommendations for the authors:

      Minor points:

      - Fig 2 : authors should state in the main text at which scale the inverse problem is solved. (Intercalating quartet, if I understood correctly from the methods) ? and they should explain and justify their choice (why not computing the inverse at a larger scale).

      We have rephrased the first sentence of the section “Cell scale analysis” to clarify that we use local tension inference. This local inference is informative about the relative tension of one interface to its four neighbors. The focus on this local level is justified because we are interested in local cell behaviors, namely rearrangements. Tension inference is also most robust on the local level, since this is where force balance, the underlying physical determinant of the link between mechanics and geometry, resides. In global tension inference, spurious large scale gradients can appear when small deviations from local force balance accumulate over large distances. We have added a paragraph in SI Sec. 1.4 to explain these points.

      -Fig 2 : how should one interpret that tension after passive intercalation (amnioserosa) is higher than before. On fig 2E, tension has not converged yet on the plot, what happens after 20 minutes ?

      Recall that the inferred tension is the total tension on an interface. While on contracting interfaces, the majority of this tension will be actively generated by myosin motors, on extending interfaces there is also a contribution carried by passive crosslinkers. The passive tension can be effectively viewed as viscous dissipation on the elongating interface as crosslinkers turn over (Clement et al. 2017). Note that this passive tension is explicitly accounted for in the model presented in Fig. 5. Notably, it is crucial for the T1 process to resolve in a new extending junction. In the amnioserosa, the tension post T1 remains elevated because the amnioserosa is continually stretched by the convergence of the germ band. The tension hence does not necessarily converge back to 1. However, our estimates for the tension after 20 mins post T1 are very noisy because most of the T1s happen relatively late in the movie (past the 25 min mark) and therefore there are only a few T1s where we can track the post-T1 dynamics for more than 20 mins.

      We have added a brief explanation of the high post-T1 tension at the end of the section entitled “Relative tension dynamics distinguishes active and passive intercalations”. Further, we have moved up the section describing the minimal model right after the analysis of the relative tension during intercalations. We believe that this helps the reader better understand these findings before moving on to the tension-isogonal decomposition which generalizes them to the tissue scale.

      Page 7-8 / Figure 3: It is unclear how the decomposition into 1) physical shape 2) tension shape 2) isogonal shape works exactly. A more detailed explanation and more clear illustration of what a quartet is and its labels could help.

      We have added a more detailed explanation in the main text. See our response to the longer question regarding this point below.

      -What exactly defines the boundary curve in figure 3E? How is it computed?

      We have added a sentence in the caption for Fig. 3E explaining that the boundary curve is found by solving Eq. (1) with l set to zero for the case of a symmetric quartet. We have also added a brief explanation immediately below Eq. (1) pointing out that this equation defines the T1 threshold in the space of local tensions T_i in terms of the isogonal length l_iso.

      -The authors should consider incorporating some details described in the SI file to the main text to clarify some points, as long as the accessible style of the manuscript can be kept. The points mentioned below may also be clarified in the SI doc. The specific points that could be elaborated are: Page 7-8 / Figure 3: It is unclear how the decomposition into 1) physical shape 2) tension shape 2) isogonal shape works exactly. A more detailed explanation and more clear illustration of what a quartet is and its labels could help. The mapping to Maxwell-Cremona space is fine, but which subset is the quartet? For a set of 4 cells with two shared vertices and a junction, aren't there 5 different tension vectors? Are we talking two closed force triangles? Separately, how do you exactly decompose the deformation (of 4 full cell shapes or a subset?) into isogonal and non-isogonal parts? What is the least squares fit done over - is this system underdetermined? Is this statistically averaged or computed per quartet and then averaged?

      We thank the referees for pointing us to unclear passages in our presentation. We hope that our revisions have resolved the referee’s questions. As described above, we have clarified the tension-isogonal decomposition in the main text. We have also revised the corresponding SI section (1.5) to address the above questions. A sketch of the quartet with labels is found in SI Fig. S7A which we now refer to explicitly in the main text.

      We always consider force-balance configurations, i.e. closed force triangles. Therefore in the “kite” formed by two adjacent tension triangles, only three tension vectors are independent.

      The decomposition of deformation is performed as follows: For each of the four cells, the center of mass c_i is calculated. Next, tension inference is performed to find the two tension triangles with tension vectors T_ij. Now there are three independent centroidal vectors c_j - c_i and three corresponding independent tension vectors T_ij. We define the isogonal deformation tensor I_quratet as the tensor that maps the centroidal vectors to the tension vectors. In general this is not possible exactly, because I_quartet has only three independent components, but there are six equations.

      The plots in Fig. 3C, C’ are obtained by performing this decomposition for each intercalating quartet individually. The data is then aligned in time and ensemble averages are calculated for each timepoint.

      For tissue-scale analysis in Fig. 6, the decomposition is performed for individual vertices (i.e. the corresponding centroidal and tension triangles) and then averaged locally to find the isogonal strain fields shown in Fig. 6B, B’.

      - Line 468: "Therefore, tissue-scale anisotropy of active tension is central to drive and orient convergent-extension flow [10, 57, 59, 60]." Authors almost never mention the contribution of the PMG to tissue extension. Yet it is known to be crucial (convergent extension in Tor mutants is very much affected). Please discuss this point further.

      The referees raise an important point: as discussed in our response to major point (2), we now explicitly discuss the role of internal (active tension) and external (PMG pulling) forces during germ band extension. Please see our response to major point (2) for the changes we made to the manuscript to address this.

      In particular, we now explain that in mutants where PMG invagination is impaired (fog, torso-like, torso, scab, corkscrew), the germ band buckles out of plane or extends in a twisted, corkscrew fashion (Smits et al. 2023). This shows that the germ band generates extensile forces largely internally. In torso mutants, the now stationary PMG acts as a barrier which blocks GBE extension; the germ band buckles as a response.

      The role of PMG invagination hence lies not in creating pulling forces to extend the germ band, but rather in “making room” to allow for its orderly extension. As shown by the genetics mutants just discussed, the synchronization of PMG invagination and GBE is crucial for successful gastrulation.

      -Typos:

      Line 74: how are intercalations are

      Line 84: vertices vertices

      Line 233: very differently

      Line 236: are can

      Line 390: energy which is the isogonal mode must

      Line 1585: reveals show

      Line 603: area Line 618: in terms of on the

      We have fixed these typos.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study revisits the effects of substitution model selection on phylogenetics by comparing reversible and non-reversible DNA substitution models. The authors provide evidence that 1) non time-reversible models sometimes perform better than general time-reversible models when inferring phylogenetic trees out of simulated viral genome sequence data sets, and that 2) non time-reversible models can fit the real data better than the reversible substitution models commonly used in phylogenetics, a finding consistent with previous work. However, the methods are incomplete in supporting the main conclusion of the manuscript, that is that non time-reversible models should be incorporated in the model selection process for these data sets.

      The non-reversible models should be incorporated in the selection model process not because the significantly perform better but only because the do not perform worse than the reversible models and that true biochemical processes of nucleotide substitution does support the science of non-reversibility.

      Reviewer #1 (Public Review):

      The study by Sianga-Mete et al revisits the effects of substitution model selection on phylogenetics by comparing reversible and non-reversible DNA substitution models. This topic is not new, previous works already showed that non-reversible, and also covarion, substitution models can fit the real data better than the reversible substitution models commonly used in phylogenetics. In this regard, the results of the present study are not surprising. Specific comments are shown below.

      True.

      Major comments

      It is well known that non-reversible models can fit the real data better than the commonly used reversible substitution models, see for example,

      https://academic.oup.com/sysbio/article/71/5/1110/6525257

      https://onlinelibrary.wiley.com/doi/10.1111/jeb.14147?af=R

      The manuscript indicates that the results (better fitting of non-reversible models compared to reversible models) are surprising but I do not think so, I think the results would be surprising if the reversible models provide a better fitting.

      I think the introduction of the manuscript should be increased with more information about non-reversible models and the diverse previous studies that already evaluated them. Also I think the manuscript should indicate that the results are not surprising, or more clearly justify why they are surprising.

      The surprise in the findings is in NREV12 performing better than NREV6 for double stranded DNA viruses as it was expected that NREV6 would perform better given the biochemical processes discussed in the introduction.

      In the introduction and/or discussion I missed a discussion about the recent works on the influence of substitution model selection on phylogenetic tree reconstruction. Some works indicated that substitution model selection is not necessary for phylogenetic tree reconstruction, https://academic.oup.com/mbe/article/37/7/2110/5810088 https://www.nature.com/articles/s41467-019-08822-w https://academic.oup.com/mbe/article/35/9/2307/5040133

      While others indicated that substitution model selection is recommended for phylogenetic tree reconstruction, https://www.sciencedirect.com/science/article/pii/S0378111923001774 https://academic.oup.com/sysbio/article/53/2/278/1690801 https://academic.oup.com/mbe/article/33/1/255/2579471

      The results of the present study seem to support this second view. I think this study could be improved by providing a discussion about this aspect, including the specific contribution of this study to that.

      In our conclusion we have stated that: The lack of available data regarding the proportions of viral life cycles during which genomes exist in single and double stranded states makes it difficult to rationally predict the situations where the use of models such as GTR, NREV6 and NREV12 might be most justified: particularly in light of the poor over-all performance of NREV6 and GTR relative to NREV12 with respect to describing mutational processes in viral genome sequence datasets. We therefore recommend case-by-case assessments of NREV12 vs NREV6 vs GTR model fit when deciding whether it is appropriate to consider the application of non-reversible models for phylogenetic inference and/or phylogenetic model-based analyses such as those intended to test for evidence of natural section or the existence of molecular clocks.

      The real data was downloaded from Los Alamos HIV database. I am wondering if there were any criterion for selecting the sequences or if just all the sequences of the database for every studied virus category were analysed. Also, was any quality filter applied? How gaps and ambiguous nucleotides were considered? Notice that these aspects could affect the fitting of the models with the data.

      We selected varying number of sequences of the database for every studied virus type. Using the software aliview we did quality filter by re-aligning the sequences per virus type.

      How the non-reversible model and the data are compared considering the non-reversible substitution process? In particular, given an input MSA, how to know if the nucleotide substitution goes from state x to state y or from state y to state x in the real data if there is not a reference (i.e., wild type) sequence? All the sequences are mutants and one may not have a reference to identify the direction of the mutation, which is required for the non-reversible model. Maybe one could consider that the most abundant state is the wild type state but that may not be the case in reality. I think this is a main problem for the practical application of non-reversible substitution models in phylogenetics.

      True.

      Reviewer #2 (Public Review):

      The authors evaluate whether non time reversible models fit better data presenting strand-specific substitution biases than time reversible models. Specifically, the authors consider what they call NREV6 and NREV12 as candidate non time-reversible models. On the one hand, they show that AIC tends to select NREV12 more often than GTR on real virus data sets. On the other hand, they show using simulated data that NREV12 leads to inferred trees that are closer to the true generating tree when the data incorporates a certain degree of non time-reversibility. Based on these two experimental results, the authors conclude that "We show that non-reversible models such as NREV12 should be evaluated during the model selection phase of phylogenetic analyses involving viral genomic sequences". This is a valuable finding, and I agree that this is potentially good practice. However, I miss an experiment that links the two findings to support the conclusion: in particular, an experiment that solves the following question: does the best-fit model also lead to better tree topologies?

      By NREV12 leading to inferred trees that are closer to the true generating tree as compared to GTR, it then shows that the best-fit model in this case being NREV12 leads to better tree topologies.

      On simulated data, the significance of the difference between GTR and NREV12 inferences is evaluated using a paired t test. I miss a rationale or a reference to support that a paired t test is suitable to measure the significance of the differences of the wRF distance. Also, the results show that on average NREV12 performs better than GTR, but a pairwise comparison would be more informative: for how many sequence alignments does NREV12 perform better than GTR?

      We have used the popular paired t-test as it is the most widely used when comparing means values between two matched samples where the difference of each mean pair is normally distributed. And the wRF distances do match the guidelines above.

      The paired t-test contains the pairwise comparison and the boxplots side by side show the pairwise wRF comparisions..

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      The reversible and non-reversible models used in this study assume that all the sites evolve under the same substitution matrix, which can be unrealistic. This aspect could be mentioned.

      Done.

      The manuscript indicates that "a phylogenetic tree was inferred from an alignment of real sequences (Avian Leukosis virus) with an average sequence identity (API) of ~90%.". I was wondering under which substitution model that phylogenetic tree reconstruction was performed? could the use of that model bias posterior results in terms of favoring results based on such a model?

      We have stated on page ….. that the GTR+G model was used to reconstruct the tree. The use of the GTR+G model could yes bias the posterior results as we have stated on page ….

      I was wondering which specific R function was used to calculate the weighted Robinson-Foulds metric. I think this should be included in the manuscript.

      We stated that We used the weighted Robinson-Foulds metric (wRF; implemented in the R phangorn package (Schliep, 2011)⁠)

      Despite a minority, several datasets fitted better with a reversible model than with a non-reversible model. I think that should be clearly indicated.

      In addition, in my opinion the AIC does not enough penalizes the number of parameters of the models and favors the non-reversible models over the reversible models, but this is only my opinion based on the definition of AIC and it is not supported. Thus, I think the comparison between phylogenetic trees reconstructed under different substitution models was a good idea (but see also my second major comment).

      Noted.

      When comparing phylogenetic trees I was wondering if one should consider the effect of the estimation method and quality of the studied data? For example, should bootstrap values be estimated for all the ancestral nodes and only ancestral nodes with high support be evaluated in the comparison among trees?

      Yes the estimation method and quality of the studied data should be considered. When using RF unlike wRF this will not matter but for weighted RF it does. When building the trees, using RaxML only high support nodes are added to the tree.

      In Figure 3, I do not see (by eye) significant differences among the models. I see in the legend that the statistical evaluation was based on a t test but I am not much convinced. Maybe it is only my view. Exactly, which pairs of datasets are evaluated with the t test? Next, I would expect that the influence of the substitution model on the phylogenetic tree reconstruction is higher at large levels of nucleotide diversity because with more substitution events there is more information to see the effects of the model. However, the t test seems to show that differences are only at low levels of nucleotide diversity (and large DNR), what could be the cause of this?

      The paired T-tests compares the wRF distances of the inferred tree real tree and the trees simulated using the GTR model verses the wRF distances of the inferred true tree from the trees simulated using the NREV12 model.

      The reason why the influence of the NREV12 model on the tree reconstructed is not significantly higher at large levels of nucleotide diversity could be because at a certain level the DNR are simply unrealistic.

      Can the user perform substitution model selection (i.e., AIC) among reversible and non-reversible substitution models with IQTREE? If yes, then doing that should be the recommendation from this study, correct?

      But, can DNR be estimated from a real dataset? DNR seems to be the key factor (Figure 3) for the phylogenetic analysis under a proper model.

      Substitution model selection can be performed among reversible and non-reversible using both HyPhy and IQTREE. And we have recommended that model tests should be done as a first step before tree building. Estimating DNR from real datasets requires a substation rate matrix of a non-reversible.

      The manuscript has many text errors (including typos and incorrect citations). For example, many citations in page 20 show "Error! Reference source not found.". I think authors should double check the manuscript before submitting. Also, some text is not formally written. For example, "G represents gamma-distributed rates", rates of what? The text should be clear for readers that are not familiar with the topic (i.e., G represents gamma-distributed substitution rates among sites). In general, I recommend a detailed revision of the whole text of the manuscript.

      Done.

      Reviewer #2 (Recommendations For The Authors):

      The authors reference Baele et al., 2010 for describing NREV6 and NREV12. I suggest using the same name used in the referenced paper: GNR-SYM and GNR respectively. Although I do not think there is a standard name for these models, I would use a previously used one.

      We have built studies based on the names NREV6 and NREV12. We would like to keep the naming as standard for our studies.

      GTR and NREV12 models are already described in many other papers. I do not see the need to include such an extensive description. Also, a reference should be included to the discrete Gamma rate categories [1]

      We included the extensive description to enable other readers who are not super familiar with these models better understanding since we have given the models our own naming different from those used in other papers.

      We have added referencing for the discrete gamma rate as recommended. (Yang, 1994)

      To evaluate the exhaustiveness and correctness of the results, I would recommend publishing as supplementary material the simulated data sets or the scripts for generating the data set, the scripts or command lines for the analysis, and the versions of the software used (e.g., IQTREE). Also, to strongly support the main conclusion of the manuscript, I suggest adding to the simulations section results the RF-distances of the best-fit selected model under AIC, AICc, and BIC as well.

      We can go ahead and submit all the needed datasets. The simulated data RF-Distances results are available and will be submitted. We cannot however add them to the main document as this will create very long data tables.

      In some instances, it is mentioned that the selection criterion used is AIC, while in others, AIC-c is referenced. Even in the table captions, both terms are mixed. It should be made clearer which criterion is being employed, as AIC is not suitable for addressing the overparameterization of evolutionary models, given that it does not account for the sample size. A previous pre-print of this article [2] does not mention AIC-c, but also explicitly includes the formulas for AIC that do not take the sample size into account, and reports the same results as this manuscript, what indicates that AIC and not AIC-c was used here. This should be clarified. It is recommended to use AIC-c instead of AIC, especially if the sample size to model parameters ratio is low [3]. Two things may be appointed here: some authors consider tree branch lengths as model free parameters and others do not. In this paper it is not specified how the model parameters are counted. AIC tends to select more parameterized models than AIC-c, and overparameterization can lead to different tree inferences, as evidenced in Hoff et al., 2016. Therefore, it is expected that NREV12 is more frequently selected than NREV6 and GTR.

      In my opinion, a pairwise comparison between GTR and NREV12 performance is of great interest here, and the whiskers plots are not useful. Scatterplots would display the results better.

      Boxplots are meant to offer a simplified view of the results as the paired t-tests does all of the comparisons. We shall provide the scatter plots as supplementary information so that readers can get full detailed plots as recommended.

      Some references are missing

      Missing references added

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper seeks to understand the upstream regulation and downstream effectors of glycolysis in retinal progenitor cells, using mouse retinal explants as the main model system. The paper presents evidence that high glycolysis in retinal progenitor cells is required for their proliferation and timely differentiation into photoreceptors. Retinal glycolysis increases after the deletion of Pten. The authors suggest that high glycolysis controls cell proliferation and differentiation by promoting intracellular alkalinization, beta-catenin acetylation and stabilization, and consequent activation of the canonical Wnt pathway.

      Strengths:

      (1) The experiments showing that PFKFB3 overexpression is sufficient to increase the proliferation of retinal progenitors (which are already highly dividing cells) and photoreceptor differentiation are striking and the result is unanticipated. It suggests that glycolytic flux is normally limiting for proliferation in embryos.

      In our BrdU birthdating experiment, we showed that PFKB3 expression drives the precocious differentiation of retinal progenitor cells (RPCs) into photoreceptors. However, we did not determine if there is an associated change in the number of dividing RPCs. To examine the proliferative status of PFKB3-overexpressing RPCs, we will perform short-term BrdU labeling to measure the number of RPCs in S-phase of the cell cycle. Additionally, we will count the number of RPCs expressing pHH3, a mitotic marker, and Ki67, a marker of cycling cells in all cell cycle phases.

      (2) Likewise the result that an increase in pH from 7.4 to 8.0 is sufficient to increase proliferation implies that pH regulation may have instructive roles in setting the tempo of retinal development and embryonic cell proliferation. Similarly, the results show that acetate supplementation increases proliferation (I think this result should be moved to the main figures).

      We thank the reviewer for these positive comments on our work. We will move the acetate data to the main figure as requested.

      Weaknesses:

      (1) Epistatic experiments to test if changes in pH mediate the effects of glycolysis on photoreceptor differentiation, or if Wnt activation is the main downstream effector of glycolysis in controlling differentiation are not presented.

      Traditionally, epistasis is tested using double knock-out (DKO) studies with null mutant alleles. If two genes operate in the same pathway, the downstream phenotype prevails, whereas phenotypic worsening is observed if two genes act in parallel pathways. Our data suggests the following order of events: Pten¯®glycolysis­®intracellular pH­®Wnt signaling­®photoreceptor differentiation. In this model, Wnt signaling is the downstream-most effector. To test our epistatic model, we will assess RPC proliferation and the differentiation of Crx+ photoreceptor precursors with the following assays:

      (1) To confirm that Wnt signaling acts downstream of Pten, we will generate DKOs of Pten and Ctnnb1, a downstream effector of Wnt signaling. We know that fewer photoreceptors are generated in single Pten-cKO and Ctnnb1-cKO retinas, with a disruption of the outer nuclear layer only in Ctnnb1-cKOs. If Pten and Wnt act in the same pathway, Pten;Ctnnb1 DKOs will resemble single Ctnnb1-cKOs.

      (2) While epistasis is traditionally examined using genetic mutants, we will perform proxy experiments using pharmacological agents. To test whether Wnt activation acts downstream of a pH increase, we will activate Wnt signaling with recombinant Wnt3a at high and low pH. While low pH inhibits photoreceptor differentiation, if Wnt signaling is downstream, it should promote differentiation even at low pH. Conversely, we will alter pH in the presence of a Wnt inhibitor, FH535, which should block the positive effects of high pH on photoreceptor differentiation.

      (3) To test whether Wnt activation acts downstream of glycolysis to increase photoreceptor differentiation, we will apply recombinant Wnt3a to retinal explants while simultaneously inhibiting glycolysis with 2DG.  While 2DG inhibits photoreceptor differentiation, if Wnt signaling is downstream, it should still be able to promote differentiation. 

      (4) To test whether pharmacological inhibition of Wnt signaling reverses the effects of high glycolytic activity in Pten cKO retinas, we will treat wild-type and Pten-cKO retinas with the Wnt inhibitor FH535 and/or the glycolytic inhibitor 2DG.

      (2) It is likely that metabolism changes ex vivo vs in vivo, and therefore stable isotope tracing experiments in the explants may not reflect in vivo metabolism.

      We agree with the reviewer that metabolism likely changes ex vivo compared to in vivo. However, we did not perform stable isotope tracing experiments to directly examine glycolytic flux in this study. While outside the scope of the current study, this type of analysis is an important future direction that we will bring up in the discussion.

      (3) The retina at P0 is composed of both progenitors and differentiated cells. It is not clear if the results of the RNA-seq and metabolic analysis reflect changes in the metabolism of progenitors, or of mature cells, or changes in cell type composition rather than direct metabolic changes in a specific cell type.

      We mined a scRNA-seq dataset to show that Pgk1, a rate-limiting enzyme for glycolysis, is specifically elevated in early-stage RPCs versus later stage. We have since analysed additional glycolytic pathway genes, and observed a similar enrichment of Pfkl, Eno1 and Slc16a3 transcripts in early RPCs, while other genes were equally expressed in both early and late RPCs.

      To functionally demonstrate that there are differences in glycolysis between early and late RPCs, we will use CD133 to sort RPCs at E15 (early) and P0 (late). We will perform qPCR on sorted cells to validate the transcriptional differences in glycolytic gene expression. Additionally, we will perform two proxy measures of glycolysis: 1) We will measure lactate levels in sorted RPCs at both stages, and 2) We will use a Seahorse assay and assess ECAR in sorted RPCs at both stages.

      (4) The biochemical links between elevated glycolysis and pH and beta-catenin stability are unclear. White et al found that higher pH decreased beta-catenin stability (JCB 217: 3965) in contrast to the results here. Oginuma et al found that inhibition of glycolysis or beta-catenin acetylation does not affect beta-catenin stability (Nature 584:98), again in contrast to these results. Another paper showed that acidification inhibits Wnt signaling by promoting the expression of a transcriptional repressor and not via beta-catenin stability (Cell Discovery 4:37). There are also additional papers showing increased pH can promote cell proliferation via other mechanisms (e.g. Nat Metab 2:1212). It is possible that there is organ-specificity in these signaling pathways however some clarification of these divergent results is warranted.

      The pleiotropic actions of Wnt signaling on cell proliferation and differentiation are well known, even shifting from pro-proliferative to anti-proliferative depending on tissue or cell type. It is thus not surprising that different studies found unique effects of pH and glycolysis on b-catenin modifications and the activation of downstream signaling. Thus, as suggested by the reviewer, the difference between our data and other studies could be attributed to tissue and organism. In our revision, we will more fully assess our findings in the context of published studies, as recommended by the reviewer.

      To summarize our data, in the developing retina, we found that non-phosphorylated b-catenin protein levels increase in Pten-cKO retinas in vivo, while conversely, non-phosphorylated b-catenin protein levels decrease upon 2DG treatment and at low pH 6.5 in vitro.

      The Oginuma et al. 2020 (Nature 584: 98-101) study was performed on the chick tailbud and investigated lineage decisions by neuromesodermal progenitors in the presomitic mesoderm. In this context, WNT activity, glycolysis and pHi all decline in tandem, complementary to our findings. However, Oginuma et al. found that while phosphorylated and non-phosphorylated b-catenin levels do not vary, K49 b -catenin acetylation is reduced at low pHi. In their system, K49 b -catenin acetylation is associated with a switch in cell fate choice from neural to mesodermal in the chick tailbud. We will now assess this modification.

      Hauck et al. 2021 (Cell Death & Differentiation 28:1398-1417) found that by mutating Pkm, a rate-limiting glycolytic enzyme, b-catenin can more efficiently shuttle to the nucleus to activate Wnt-signaling and promote cardiomyocyte proliferation. This study highlights the importance of examining b-catenin protein levels in both cytoplasmic and nuclear fractions. They also examined transcriptional targets of Wnt signaling, such as Axin2, Ccnd1, Myc, Sox2 and Tnnt3, which we will also now assess.

      In a separate study in cancer cells, high pH leads to increased expression of Ccnd1, a b-catenin target gene, and promotes proliferation (Koch et al. 2020. Nat Metab. 2:1212-1222). These findings are consistent with our demonstration that b-catenin levels are stabilized at pH 8, and RPC proliferation is enhanced. A separate study by Melnik et al 2018 (Cell Discovery 4:37) performed in cancer cells found that acidification induced by metformin indirectly suppresses Wnt signaling by activating the DDIT3 transcriptional repressor, consistent with our data showing low pH suppresses b-catenin stability. Melnik et al also used Mcl inhibitors, as we did in our study, and showed that this treatment blocked Wnt signaling. While we did not look at the impact of CNCn on Wnt signaling, we did see a decline in proliferation, as expected if Wnt levels are low. The relationship between CNCn and Wnt activity will now be assessed.

      The one study that fits less well is from Czowski and White (BioRxiv), where they found that higher pH levels decrease b-catenin levels in the cytoplasm, nucleus and junctional complexes in MDCK cells. In this study, the authors altered pH using inhibitors for a sodium-proton exchanger and a sodium bicarbonate transporter. The Oginuma paper instead used the ionophores nigericin and valinomycin to equilibrate intracellular pHi to media pH, which we will now incorporate into our study.

      In summary, to more comprehensively examine the link between Pten loss, glycolytic activity, pHi and Wnt signaling, we will examine levels of phosphorylated, non-phosphorylated and K49 acetylated b-catenin after each manipulation (i.e., Pten loss, pH manipulations, CNCn treatment, glycolysis inhibition, acetate treatments). For pH manipulations, we will use nigericin and valinomycin to equilibrate pH. These studies will be performed on cytoplasmic and nuclear fractions from CD133+ MACS-enriched RPCs, to add cell type and stage specificity to our study. We will also use qPCR to examine Wnt signaling genes, such as Axin2, Ccnd1, Myc, Sox2 and Tnnt3.

      (5) The gene expression analysis is not completely convincing. E.g. the expression of additional glycolytic genes should be shown in Figure 1. It is not clear why Hk1 and Pgk1 are specifically shown, and conclusions about changes in glycolysis are difficult to draw from the expression of these two genes. The increase in glycolytic gene expression in the Pten-deficient retina is generally small.

      See response to point 3.

      (6) Is it possible that glycolytic inhibition with 2DG slows down the development and production of most newly differentiated cells rather than specifically affecting photoreceptor differentiation?

      We thank the reviewer for this excellent suggestion. We will examine the impact of  2DG on the differentiation of other retinal cell types, including bipolar and amacrine cells and Muller glia. For technical reasons, we will exclude ganglion cells, which die in culture and are not possible to examine in explants, and horizontal cells, which are a rare cell type, and hence, difficult to accurately quantify.

      (7) Are the prematurely-born cells caused by PFKFB3 overexpression photoreceptors as assessed by morphology or markers (in addition to position)?

      We will immunostain treated retinas with additional cell-type specific markers to examine rod and cone photoreceptor numbers and morphologies.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Hanna et al., addresses the question of energy metabolism in the retina, a neuronal tissue with an inordinately high energy demand. Paradoxically, the retina appears to employ to a large extent glycolysis to satisfy its energetic needs, even though glycolysis is far less efficient than oxidative phosphorylation (OXPHOS). The focus of the present study is on the early development of the retina and the retinal progenitor cells (RPCs) that proliferate and differentiate to form the seven main classes of retinal neurons. The authors use different genetic and pharmacological manipulations to drive the metabolism of RPCs or the retina towards higher or lower glycolytic activity. The results obtained suggest that increased glycolytic activity in early retinal development produces a more rapid differentiation of RPCs, resulting in a more rapid maturation of photoreceptors and photoreceptor segment growth. The study is significant in that it shows how metabolic activity can determine cell fate decisions in retinal neurons.

      Strengths:

      This study provides important findings that are highly relevant to the understanding of how early metabolism governs the development of the retina. The outcomes of this study could be relevant also for human diseases that affect early retinal development, including retinopathy of maturity where an increased oxygenation likely causes a disturbance of energy metabolism.

      We thank the reviewer for these positive comments on our study.

      Weaknesses:

      The restriction to only relatively early developmental time points makes it difficult to assess the consequences of the different manipulations on the (more) mature retina. Notably, it is conceivable that early developmental manipulations, while producing relevant effects in the young post-natal retina, may "even out" and may no longer be visible in the mature, adult retina.

      While we agree that it would be interesting to observe the long-term consequences of our manipulations, we are limited by our retinal explant model, which can at best be cultured for 2 weeks in vitro. Additional limitations include the lack of photoreceptor outer segment development in our in vitro model. However, we can perform more extensive analyses of our genetic models in vivo (i.e., Pten-cKO, cyto-PFKB3-GOF, Ctnnb1-cKO). For these lines, we will focus on more in-depth analyses of photoreceptor differentiation and outer segment maturation using additional markers and one later stage of development.

      Reviewer #3 (Public review):

      Summary:

      This study examines the metabolic regulation of progenitor proliferation and differentiation in the developing retina. The authors observe dynamic changes in glycolytic gene expression in retinal progenitors and use various strategies to test the role of glycolysis. They find that elevated glycolysis in Pten-cKO retinas results in alteration of RPC fate, while inhibition of glycolysis has converse effects. They specifically test the role of elevated glycolysis using dominant active cytoPFKB3, which demonstrates the selective effects of elevated glycolysis on progenitor proliferation and rod differentiation. They then show that elevated glycolysis modulates both pHi and Wnt signaling, and provide evidence that these pathways impact proliferation and differentiation of progenitors, particularly affecting rod photoreceptor differentiation.

      Strengths:

      This is a compelling and rigorous study that provides an important advance in our understanding of metabolic regulation of retina development, addressing a major gap in knowledge. A key strength is that the study utilizes multiple genetic and pharmacological approaches to address how both increased or decreased glycolytic flux affect retinal progenitor proliferation and differentiation. They discover elevated Wnt signaling pathway genes in Pten cKO retina, revealing a potential link between glycolysis and Wnt pathway activation. Altogether the study is comprehensive and adds to the growing body of evidence that regulation of glycolysis plays a key role in tissue development.

      We thank the reviewer for these positive comments on our study.

      Weaknesses:

      (1) Following the expression of cytoPFKB3, which results in increased glycolytic flux, BrDU labeling was performed at e12.5 and increased labeled cells were detected in the outer nuclear layer. However whether these are cones or rods is not established. The rest of the analysis is focused on the precocious maturation of rhodopsin-labeled outer segments, and the major conclusions emphasize rod photoreceptor differentiation. Therefore, it is unclear whether there is an effect on cone differentiation for either Pten cKO or cytoPFKB3 transgenic retina. It is also not established whether rods are born precociously. Presumably, this would be best detected by BrDU labeling at later embryonic stages.

      We agree with the reviewer that we should expand our study to also examine cone differentiation and outer segment maturation, which we will now do by adding additional markers to our study.

      (2) The authors find that there is upregulation of multiple Wnt pathway components in Pten cKO retina. They further show that inhibiting Wnt signaling phenocopies the effects of reducing glycolysis. However, they do not test whether pharmacological inhibition of Wnt signaling reverses the effects of high glycolytic activity in Pten cKO retinas. Thus the argument that Wnt is a key downstream effector pathway regulating rod photoreceptor differentiation is weak.

      See Reviewer 1, point 1

      (3) The use of sodium acetate to force protein acetylation is quite non-specific and will have effects beyond beta-catenin acetylation (which the authors acknowledge). Thus it is a stretch to state that "forced activation of beta-catenin acetylation" mimics the impact of Pten loss/high glycolytic activity in RPCs since the effects could be due to acetylation of other proteins.

      As outlined in our response to Reviewer #1, point 4, we will now assess K49 b-catenin acetylation levels, as conducted by Oginuma et al. This analysis will allow us to determine whether b-catenin acetylation is altered with manipulations of Pten, glycolysis, pH or acetate treatments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Summary

      This study has as its goal to determine how the structure and function of the circuit that stabilizes gaze in the larval zebrafish depends on the presence of the output cells, the motor neurons. A major model of neural circuit development posits that the wiring of neurons is instructed by their postsynaptic cells, transmitting signals retrogradely on which cells to contact and, by extension, where to project their axons. Goldblatt et al. remove the motor neurons from the circuit by generating null mutants for the phox2a gene. The study then shows that, in this mutant that lacks the isl1-labelled extraocular motor neurons, the central projection neurons have 1) largely normal responses to vestibular input; 2) normal gross morphology; 3) minimally changed transcriptional profiles. From this, the authors conclude that the wiring of the circuit is not instructed by the output neurons, refuting the major model.

      Strengths

      I found the manuscript to be exceptionally well-written and presented, with clear and concise writing and effective figures that highlight key concepts. The topic of neural circuit wiring is central to neuroscience, and the paper's findings will interest researchers across the field, and especially those focused on motor systems. 

      The experiments conducted are clever and of a very high standard, and I liked the systematic progression of methods to assess the different potential effects of removing phox2a on circuit structure and function. Analyses (including statistics) are comprehensive and appropriate and show the authors are meticulous and balanced in most of the conclusions that they draw. Overall, the findings are interesting, and with a few tweaks, should leave little doubt about the paper's main conclusions. 

      We are grateful for the Reviewer’s enthusiasm for our manuscript and recognition of the advance to the vestibular and motor systems fields. We particularly appreciate their suggestions for experiments to improve the characterization of our phox2a mutant line. We hope the Reviewer finds the results of the added experiment adequately address the points they raise. 

      Weaknesses/Recommendations

      (1) The main point is the incomplete characterisation of the effects of removing phox2a on the extra-ocular motor neurons. Are these cells no longer there, or are they there but no longer labelled by isl1:GFP? If they are indeed removed, might they have developed early on, and subsequently lost? These questions matter as the central focus of the manuscript is whether the presence of these cells influences the connectivity and function of their presynaptic projection neurons. Therefore, for the main conclusions to be fully supported by the data, the authors would need to test whether 1) the motor neurons that otherwise would have been labelled by the isl1:GFP line are physically no longer there; 2) that this removal (if, indeed, it is that) is developmental. If these experiments are not feasible, then the text should be adjusted to take this into account. 

      Show (e.g., with DAPI or some other staining) whether there are still cells where you would have expected to see nIII/nIV extraocular motor neurons. If this is done in a developmental timeline both main "concerns" are addressed in one go. If this doesn't work for some reason, then I'd suggest adjusting the discussion section to note this caveat. I realise it is commonplace in zebrafish and rodent papers to equate the two, but it should also be considered that the isl1:GFP does not report which cells are isl1+ 100% faithfully. 

      We thank the Reviewer for their suggestion. We’ve included the results of this experiment in (new) Supplemental Figure 1 and have updated the Results accordingly (text lines 69-72). 

      Briefly: We performed fluorescent in situ hybridization for vachta, a marker for cholinergic motor neurons, when nIII/nIV differentiation is complete at 2 dpf and prior to synaptogenesis with both their pre- and postsynaptic partners. We included a DAPI stain. We find that while phox2a does not physically remove neurons from the region that contains nIII/nIV motor neurons, neurons in this region no longer express vachta. The presence of neurons at an early stage (2dpf) that have lost expression of both a transcription factor (isl1) and motor neuron marker (vachta) supports our contention that, while cells are there, they should not be considered motor neurons.

      While the reviewer did not suggest it directly, we note that there is a more laborious way to determine “what happens to cells that would have been phox2a+ but no longer express phox2a?” Specifically, one could target a reporter transgene to the endogenous phox2a locus on the phox2a mutant background. Regrettably, generating such a knock-in reporter is difficult and success is far from assured.

      Previously (Greaney et. al. 2017, 10.1002/cne.24042 ), we compared expression patterns in nIV to those observed after retro-orbital dye fills. We never saw neurons labeled by dye that were not also GFP+. However, it was not possible to perform a similar analysis for nIII, so we acknowledge the limits of the isl1:GFP reporter.

      (2) A further point to address is the context of the manipulation. If the phox2a removal does indeed take out the extra-ocular motor neurons, what percentage of postsynaptic neurons to the projection neurons are still present?

      In other words, how does the postsynaptic nMLF output relate to the motor neurons? If, for instance, the nMLF (which, as the authors state, are likely still innervated by the projection neurons) are the main output of the projections neurons, then this would affect the interpretation of the results.

      Is there quantitative information on the projection neuron outputs to address the second point (i.e., how much of the projection neurons' output is the extra-ocular motor neurons)? If not, it should be discussed how this could affect the conclusions. 

      Qualitatively, projection neurons form more robust arbors to the nMLF than to their nIII/nIV partners (see: Schoppik et al. 2017, DOI: 10.1523/JNEUROSCI.1711-17.2017 ). We expect this is proportional to the size of each downstream target. 

      The Reviewer makes an interesting point here. These projection neurons innervate several downstream nuclei that could potentially influence their development; we’ve considered this in the Discussion based on existing literature and in the context of our own findings. A precise dissection of each target population’s contribution would be interesting and important for larger questions about neural circuits for balance (see Sugioka et. al. 2023 10.1038/s41467-023-36682-y ). However, we feel this analysis is outside our study’s scope, given that our aim here was to evaluate a standing hypothesis restricted to the contribution of nIII/nIV motor neurons. 

      Less important, but still useful: 

      - Figure 4C/D: I found these panels difficult to interpret. Perhaps split them up so each panel does a little less heavy lifting? Do the main panels in C show all axons? Where are the "two remaining nIII/nIV neurons" in D? 

      We’ve split the panels in 4C as suggested and adjusted the caption text in 4D to clarify the “remaining neurons” were simply not eliminated following phox2a knockout. We presume they are instead phox2b+. 4C shows all axons labeled by our transgenic line that follow the medial longitudinal fasciculus.  

      Extremely minor: 

      - line 28: "tantamount" --> "paramount"? 

      - some figure legends say DeltaFF, instead of DeltaF/F 

      - line 192: "the any" 

      These have been corrected; we thank the reviewer for their attention to detail. 

      Reviewer 2:

      Summary

      This study was designed to test the hypothesis that motor neurons play a causal role in circuit assembly of the vestibulo-ocular reflex circuit, which is based on the retrograde model proposed by Hans Straka. This circuit consists of peripheral sensory neurons, central projection neurons, and motor neurons. The authors hypothesize that loss of extraocular motor neurons, through CRISPR/Cas9 mutagenesis of the phox2a gene, will disrupt sensory selectivity in presynaptic projection neurons if the retrograde model is correct. 

      Strengths

      The work presented is impressive in both breadth and depth, including the experimental paradigms. Overall, the main results were that the loss of function paradigm to eliminate extraocular motor neurons did not 1) alter the normal functional connections between peripheral sensory neurons and central projection neurons, 2) affect the position of central projection neurons in the sensorimotor circuit, or 3) significantly alter the transcriptional profiles of central projection neurons. Together, these results strongly indicate that retrograde signals from motor neurons are not required for the development of the sensorimotor architecture of the vestibulo-ocular circuit. 

      We are grateful for the excellent summary of our manuscript and support for our aim, which was indeed to evaluate Hans Straka’s model for the development of the vestibulo-ocular reflex circuit.  

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions The results of this study showed that extraocular motor neurons were not required for central projection neuron specification in the vestibulo-ocular circuit, which countered the prevailing retrograde hypothesis proposed for circuit assembly. A concern is that the results presented may be limited to this specific circuit and may not be generalizable to other circuit assemblies, even to other sensorimotor circuits. 

      Impact

      As mentioned above, this study sheds valuable new insights into the developmental organization of the vestibulo-ocular circuit. However, different circuits likely utilize various mechanisms, extrinsic or intrinsic (or both), to establish proper functional connectivity. So, the results shown here, although begin to explain the developmental organization of the vestibulo-ocular circuit, are not likely to be generalizable to other circuits; though this remains to be seen. At a minimum, this study provides a starting point for the examination of patterning of connections in this and other sensorimotor circuits.

      Weaknesses/Recommendations

      A concern is that the results presented may be limited to this specific circuit and may not be generalizable to other circuit assemblies, even to other sensorimotor circuits. However, different circuits likely utilize various mechanisms, extrinsic or intrinsic (or both), to establish proper functional connectivity. So, the results shown here, although begin to explain the developmental organization of the vestibulo-ocular circuit, are not likely to be generalizable to other circuits; though this remains to be seen. 

      We agree with the Reviewer that — of course — a diverse array of developmental mechanisms shape sensorimotor circuit architecture. However, prior findings in the spinal cord (Wang & Scott 2000, Sürmeli et al. 2011, Bikoff et al. 2016, Sweeney et al. 2018, Shin et al. 2020) support our primary conclusion that motor neurons are dispensable for specification of premotor partners. The Recommendation ends with “though this remains to be seen.” We infer that the Reviewer does not have a counterexample at hand for a circuit where motor neurons determine the fate of their partners. Therefore, the preponderance of evidence argues that our work is likely to generalize to other circuits. However, we acknowledge the limitations of our work and we have tempered any claims to generality in the text.

      Lines 156-57: The authors should consider and discuss explicitly the potential of compensatory mechanisms in the CRISPR/Cas9 mutants that may permit synaptogenesis of the projection neurons even though MNs partners are absent. 

      We agree with the Reviewer that careful consideration of compensation is merited when using mutants. There are two synapses that the comment might refer to: those between projection neurons and motor neurons, and those between sensory afferents and projection neurons. Projection neurons fail to form any synapses at the region that would contain their motor neuron (nIII/nIV) partners (see Fig. 4C), so there is no question of compensation there. Figure 1B shows that there is no phox2a expression in sensory or central projection neurons. Consequentially, even if there were a gene that perfectly compensated for the loss of phox2a it wouldn’t be active in sensory or central projection neurons. We therefore do not believe that compensatory expression of other genes plays any role here. 

      Line 162: Is this an accurate global statement or should it be restricted to the evidence provided in this report?

      We’ve clarified this line, which summarizes findings described in previous results sections of this report.

      Reviewer 3:

      Summary

      In this manuscript by Goldblatt et al. the authors study the development of a well-known sensorimotor system, the vestibulo-ocular reflex circuit, using Danio rerio as a model. The authors address whether motor neurons within this circuit are required to determine the identity, upstream connectivity and function of their presynaptic partners, central projection neurons. They approach this by generating a CRISPR-mediated knockout line for the transcription factor phox2a, which specifies the fate of extraocular muscle motor neurons. After showing that phox2a knockout ablates these motor neurons, the authors show that functionally, morphologically, and transcriptionally, projection neurons develop relatively normally.

      Overall, the authors present a convincing argument for the dispensability of motor neurons in the wiring of this circuit, although their claims about the generalizability of their findings to other sensorimotor circuits should be tempered. The study is comprehensive and employs multiple methods to examine the function, connectivity and identity of projection neurons.

      We appreciate the Reviewer’s support for our manuscript and have implemented their thoughtful suggestions on how to improve the clarity and presentation of our conclusions. We acknowledge the shared consideration with Reviewer 2 as to the generalizability of our findings, and have tempered the language in our revision. 

      In the introduction the authors set up the controversy on whether or not motor neurons play an instructive role in determining "pre-motor fate". This statement is somewhat generic and a bit misleading as it is generally accepted that many aspects of interneuron identity are motor neuron-independent. The authors might want to expand on these studies and better define what they mean by "fate", as it is not clear whether the studies they are citing in support of this hypothesis actually make that claim. 

      We appreciate the Reviewer’s attention to this important consideration. We agree that there are numerous, and often ambiguous ways to define cell fate. We’ve modified our manuscript to read  “…for and against an instructive role in establishing connectivity” (line 19) to reflect that connectivity is the most pertinent readout of cell fate in (most) studies cited there, as well as in our model system (lines 25-26: “Subtype fate, anatomical connectivity, and function are inextricably linked: directionally-tuned sensory neurons innervate nose-up/nosedown subtypes of projection neurons, which in turn innervate specific motor neurons…”). We’ve expanded on the prior studies mentioned above in relevant sections of the Results and Discussion. 

      Although it appears unchanged from their images, the authors do not explicitly quantitate the number of total projection neurons in phox2a knockouts. 

      We have added this quantification (text lines 95-96); the number of projection neurons per hemisphere is unchanged in control and mutant larvae.

      For figures 2C and 3C, please report the proportion of neurons in each animal, either showing individual data points here or in a separate supplementary figure; and please perform and report the results of an appropriate statistical test. 

      Generally, we agree that per-animal sampling can provide important metrics. We’ve added a line in the appropriate Methods section with the mean/standard deviation number of neurons sampled per animal for each genotype (lines 408-410). However, our extensive prior work using this transgenic line (Goldblatt et al. 2023, DOI: 10.1016/j.cub.2023.02.048 ) argued that a per-animal breakdown can be misleading. Due to occasional technical aberrations, variation in transgenic line expression, and limitations of our registration method, we cannot sample 100% of the projection nucleus (~50 neurons/hemisphere) in each animal. Likewise, the topography of the nucleus in WT animals, both for up/down subtypes (Fig. 2) and impulse responsive/unresponsive neurons (Fig. 3), means that subtypes may not be proportionally sampled on a peranimal basis. While such problems would likely resolve if we took data from ~50-75 animals for each condition, at a throughput of ~2 animals/day and 1-2 experimental days / week on shared instrumentation the throughput simply isn’t there. We therefore believe the data is best represented as an aggregate.

      In the topographical mapping of calcium responses (figures 2D, E and 3D), the authors say they see no differences but this is hard to appreciate based on the 3D plotting of the data. Quantitating the strength of the responses across the 3-axes shown individually and including statistical analyses would help make this point, especially since the plots look somewhat qualitatively different. 

      We have added a supplemental table (new Table 2) with statistical comparisons of projection neuron topography (both to tonic and impulse stimuli) across genotypes for additional clarification. Quantitatively, we find that differences in projection neuron position (max observed: approx. 5 microns) are within the limits of our expected error in registering neurons across larvae to a standardized framework, given the small size of the nucleus (approx. 40 microns in each spatial axis) and each individual neuron (approx. 5 microns in diameter).

      The transcriptional analysis is very interesting, however, it is not clear why it was performed at 72 hpf, while functional experiments were performed at 5 days. Is it possible that early aspects of projection neuron identity are preserved, while motor neuron-dependent changes occur later? The authors should better justify and discuss their choice of timepoint. 

      As suggested, we have updated the manuscript to justify the choice of timepoint (text lines 176-177). We agree with the Reviewer that choosing the “right” timepoint for transcriptional analysis is key. The comment underscores the challenges in balancing the amount of time past neurogenesis (24-54 hpf) when potential fate markers could change, with the timecourse of synaptogenesis (2-4 dpf) and functional maturation (5 dpf). We hypothesized that selecting an intermediate timepoint (72 hpf, during peak synaptogenesis), would enable the highest resolution of both fate markers expressed at the end of neurogenesis (54 hpf) and wiring specificity molecules. We point the Reviewer to recent studies in comparable systems that proposed subtype diversity is most resolvable during synaptogenesis as further justification (see: Ozel et al. 2022, DOI: 10.1038/s41586-020-2879-3 and Li et al. 2017, DOI: 10.1016/j.cell.2017.10.019). However, we acknowledge that the ideal experiment would have been a transcriptional timecourse that would have directly addressed the question. 

      The inclusion of heterozygotes as controls is problematic, given that the authors show there are notable differences between phox2a+/+ and phox2a+/- animals; pooling these two genotypes could potentially flatten differences between controls and phox2a-/-. 

      We agree that this is an important limitation on our interpretations and have noted this more explicitly in the appropriate Results section (line 204).  

      Projection neurons appear to be topographically organized and this organization is maintained in the absence of motor neurons. Are there specific genes that delineate ventral and dorsal projection neurons? If so, the authors should look at those as candidate genes as they might be selectively involved in connectivity. Showing that generic synaptic markers (Figure 4E) are maintained in the entire population is not convincing evidence that these neurons would choose the correct synaptic partners.

      We agree with the Reviewer that Figure 4E is limited and that the most convincing molecular probe would be against a subtype-specific marker gene, ideally the one(s) that establish subtype-specific connectivity. To date, few such markers have been identified in any system, and, to the best of our knowledge, no reported markers differentiate dorsal (nose-up) from ventral (nose-down) projection neurons. We are currently evaluating candidates, but will not include that data here until the relevant genes are established as veridical subtype markers with defined roles in subtype fate specification and connectivity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors describe the participation of the Hes4-BEST4-Twist axis in controlling the process of epithelial-mesenchymal transition (EMT) and the advancement of colorectal cancers (CRC). They assert that this axis diminishes the EMT capabilities of CRC cells through a variety of molecular mechanisms. Additionally, they propose that reduced BEST4 expression within tumor cells might serve as an indicator of an adverse prognosis for individuals with CRC.

      Strengths:

      • Exploring the correlation between the Hes4-BEST4-Twist axis, EMT, and the advancement of CRC is a novel perspective and gives readers a fresh standpoint.<br /> • The whole transcriptome sequence analysis (Figure 5) showing low expression of BEST4 in CRC samples will be of broad interest to cancer specialists as well as cell biologists although further corroborative data is essential to strengthen these findings (See Weaknesses).

      Weaknesses:

      (1) The authors employed three kinds of CRC cell lines, but not untransformed cells such as intestinal epithelial organoids which are commonly used in recent research.

      Sincerely thanks for catching this issue. While we acknowledge the potential of intestinal epithelial organoids as a valuable model for this study and will consider establishing this system in future research, which falls outside the scope of our current work.

      (2) The authors use three different human CRC cell lines with a lack of consistency in the selection of them. Please clarify 1) how these lines are different from each other, 2) why they pick up one or two of them for each experiment. To be more convincing, at least two lines should be employed for each in vitro experiment.

      We apologize for any confusion caused to the reviewer. In our study, we employed HCT116 and Caco2 cell lines to investigate the overexpression of BEST4 in the biological functions of CRC and its involvement in EMT. The selection of HCT116, a human CRC cell line, was based on its relatively lower expression level of BEST4 compared to other CRC cell lines. Conversely, Caco2 is a human colon adenocarcinoma cell line that closely resembles differentiated intestinal epithelial cells and exhibits microvilli structures. Given that BEST4 serves as a marker for intestinal epithelial cells, these two cell lines were chosen for investigating the in vitro effects of overexpressing BEST4 on proliferation, clonality, invasion, migration of colon cancer tumor cells and expression of downstream EMT-related markers. Similarly, we selected the HCT-15 cell line derived from human CRC for BEST4 knockout due to its comparatively higher expression level of BEST4 among other CRC cell lines. We employed the CRISPR/Cas9 gene-editing technology to knockout BEST4 instead of utilizing shRNA for downregulating BEST4 expression, thereby limiting our selection to a single cell line.

      (3) The authors demonstrated associations between BEST4 and cell proliferation/ viability as well as migration/invasion, utilizing CRC cell lines, but it should be noted that these findings do not indicate a tumor-suppressive role of BEST4 as mentioned in line 120. Furthermore, while the authors propose that "BEST4 functions as a tumor suppressor in CRC" in line 50, there seems no supporting data to suggest BEST4 as a tumor suppressor gene.

      We apologize for these inaccurate expressions, and we have made the necessary modifications to the corresponding parts in the manuscript.

      (4) The HES4-BEST4-Twist1 axis likely plays a significant role in CRC progression via EMT but not CRC initiation. Some sentences could lead to a misunderstanding that the axis is important for CRC initiation.

      We apologize for these inaccurate expressions, and we have made the necessary modifications to the corresponding parts in the manuscript.

      (5) The authors mostly focus on the relationship of the HES4-BEST4-Twist1 axis with EMT, but their claims sometimes appear to deviate from this focus.

      We apologize for confusing the reviewer. The objectives of our study are as follows: (1) to establish the role of BEST4 in CRC growth both in vitro and in vivo; (2) to determine the underlying molecular mechanisms by which BEST4 interacts with Hes4 and Twist1, thereby regulating EMT; and (3) to investigate the correlation between BEST4 expression and prognosis of CRC. We have made the necessary modifications to the corresponding parts in the manuscript.

      (6) Some experiments do not appear to have a direct relevance to their claims. For example, the analysis using the xenograft model in Figure 2E-J is not optimal for analyzing EMT. The authors should analyze metastatic or invasive properties of the transplanted tumors if they intend to provide some supporting evidence for their claims.

      Sincerely thanks for catching this issue. The process of EMT transforms epithelial cells exhibiting a spindle fibroblast-like morphology, leading to the acquisition of mesenchymal characteristics and morphology, enabling these cells to acquire invasive and migratory abilities, with expression switching epithelial E-cadherin and Zo-1 to mesenchymal vimentin (Dongre and Weinberg, 2019)..The whole process is regulated by transcriptional factors of the Snail family and Twist1(Dongre and Weinberg, 2019). We utilized the xenograft model with overexpressed BEST4 to analyze the lysates of tumor tissue, revealing that BEST4 upregulated E-cadherin and downregulated vimentin and Twist1 (Figures 2I). These findings indicate that BEST4 inhibits EMT in vivo. Deletion of BEST4 may enable these cells to acquire invasive and migratory abilities, leading to metastasis in vivo. Therefore, we subsequently evaluated the metastatic potential of BEST4 in a CRC liver metastasis model by intrasplenically injecting HCT-15 cell lines with knockout of BEST4 (BEST4gRNA), wild-type control (gRNA), or knockout with rescue (BEST4-Rescued) into BALB/c nude mice. Our observations revealed a twofold increase in liver metastatic nodules in the absence of BEST4 compared to the control group (Fig. 2J-L). Although further in vivo experiments are required for confirmation, our research suggests a potential role for BEST4 in counteracting EMT induction in vivo.

      (7) In Figure 4H, ZO-1 and E-cad expression looks unchanged in the BEST4 KD.

      Sincerely thanks for catching this issue. We have implemented the necessary modifications to the corresponding sections in the manuscript and performed a comprehensive quantification of all Western Blot data to ensure statistically significant differences, including those presented in the supplementary file.

      (8) The in vivo and in vitro data supporting the whole transcriptome sequence analysis (Figure 5) is mostly insufficient. Including the following experiments will substantiate their claims: 1) BEST4 and HES4 immunostaining of human surgical tissue samples, 2) qPCR data of HES4, Twist1, Vimentin, etc. as shown in Figure 5C, 5D.

      Sincerely thanks for catching this issue.

      (1) Due to the substandard quality of the BEST4 antibody, we opted to evaluate the clinical significance of BEST4 in CRC by assessing mRNA results instead of protein levels using immunohistochemistry (IHC). After testing multiple antibodies for western blotting, only one (1:800; LsBio, LS-C31133) accurately indicated BEST4 protein expression while still exhibiting some non-specific bands. Consequently, we decided to transfect a HA-tagged BEST4 plasmid into the CRC cell line and used HA as a marker for BEST4 expression. Unfortunately, none of the antibodies employed for IHC were suitable as they failed to accurately distinguish between positive or negative staining for BEST4 and showed significant non-specific staining (data not shown). The challenge in detecting BEST4 protein in colorectal cancer tissues may be attributed to its low expression levels. Our findings are consistent with previous reports from the Human Protein Atlas database (https://www.proteinatlas.org/ENSG00000142959-BEST4/pathology), which also did not detect any BEST4 protein expression in colorectal cancer tissues through IHC analysis.

      (2) The qPCR data of E-cadherin, Twist1, and Vimentin mRNA expression in CRC tissue has already been published in other studies(Christou et al., 2017; Lazarova and Bordonaro, 2016; Zhu et al., 2015). It was found that E-cadherin is downregulated, while Twist1 and Vimentin are upregulated in CRC tissue compared to the adjacent normal tissues. The qPCR data of E-cadherin, Twist1, and Vimentin mRNA expression in CRC tissue has already been published in other studies(Christou et al., 2017; Lazarova and Bordonaro, 2016; Zhu et al., 2015). It was found that E-cadherin is downregulated, while Twist1 and Vimentin are upregulated in CRC tissue compared to the adjacent normal tissues. The analysis of mRNA expression data obtained from colorectal cancer samples and normal samples in the publicly available databases TCGA and GTEx also revealed a significant downregulation of _Hes_4 expression in colorectal cancer tissues, which will be our next research objective.

      (9) Some statements are inconsistent probably due to grammatical errors. (For example, some High/low may be reversed in lines 234-244.)

      We apologize for these mistakes. We have made corrections to this section in the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Using in vitro and in vivo approaches, the authors first demonstrate that BEST4 inhibits intestinal tumor cell growth and reduces their metastatic potential, possibly via downstream regulation of TWIST1.

      They further show that HES4 positively upregulates BEST4 expression, with direct interaction with BEST4 promoter region and protein. The authors further expand on this with results showing that negative regulation of TWIST1 by HES4 requires BEST4 protein, with BEST4 required for TWIST1 association with HES4. Reduction of BEST1 expression was shown in CRC tumor samples, with correlation of BEST4 mRNA levels with different clinicopathological factors such as sex, tumor stage, and lymph node metastasis, suggesting a tumor-suppressive role of BEST4 for intestinal cancer.

      Strengths:

      • Good quality western blot data.

      • Multiple approaches were used to validate the findings.

      • Logical experimental progression for readability.

      • Human patient data / In vivo murine model / Multiple cell lines were used, which supports translatability / reproducibility of findings.

      We sincerely thank Reviewer #2 for constructive feedback on this work

      Weaknesses:

      (1) Interpretation of figures and data (unsubstantiated conclusions).

      We apologize for this confusing presentation. We have made corrections to this section in the manuscript.

      (2) Figure quality.

      We apologize for the poor quality of the figures. The figure resolution was drastically reduced during the conversion of the manuscript to pdf on publisher web site. The figures have been re-uploaded and we have once again confirmed that each image has a resolution exceeding 300dpi.

      (3) Figure legends lack information.

      Sincere thanks for catching this issue. We have provided detailed figure legends including supplementary figure legends on pages 36-43 of the manuscript. We have rechecked this section and made improvements and additions.

      (4) Lacking/shallow discussion.

      We apologize for our shallow discussion. We have supplemented and improved some parts of the discussion

      (5) Requires more information for reproducibility regarding materials and methods.

      Sincere thanks for catching this issue. We have provided detailed information for reproducibility regarding materials and methods on pages 18-29; 43-47 of the manuscript. We have rechecked this section and made improvements and additions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      We sincerely thank Reviewer #1 for constructive feedback on this work.

      Minor comments:

      (1) Line 73: "Variant 4" is not precise. The term "variant" should mean mutation in the gene or different transcription.

      We apologize for using an inaccurate expression. We have now changed Variant 4 to Bestrophin 4.

      (2) Line 78. Is it correct that BEST4+ cells coexist with Hes4+ cells?

      According to the previous study that published in Nature (Parikh et al., 2019), BEST4+ cells originate from the absorptive lineage and express the transcription factors Hes4. Additionally, we also observed the nuclear co-localization of BEST4 and Hes4 in HCT116 cells through immunofluorescence staining (Figure 3E)

      (3) Line 85. The reason "Best4 may be associated with Twist1" is unclear.

      We apologize for the lack of clarity in our previous statement. In a recent analysis utilizing single cell RNA-sequencing, it was discovered that a subset of mature colonocytes expresses BEST4 (Parikh et al., 2019). Additionally, this subset coexists with hairy/enhancer of split 4 positive (Hes4+) cells (Parikh et al., 2019). Previous research has demonstrated the antagonistic role of Hes4 in regulating Twist1 through protein-protein interaction, which governs the differentiation of bone marrow stromal/stem cell lineage (Cakouros et al., 2015). Based on these findings, we speculate that there may be an interactive regulation between BEST4/Hes4/Twist1, potentially influencing the process of cell polarity during epithelial-mesenchymal transition in colorectal cancer. We have made corrections to this section in the manuscript.

      (4) Line 87. Grammatical error (Establishing the role BEST4).

      We apologize for the grammatical error of this section. We have rectified the issue in the manuscript.

      (5) Please clarify the reason the authors do not show any data of BEST4-overexpressing Caco2 cells in Figure 2?

      We apologize for our negligence in not adding this data to in Figure 2. It has now been fully supplemented.

      (6) In line 145, the authors did not show any tumorigenic properties.

      We apologize for this confusing presentation. We have made corrections to this section in the manuscript.

      (7) Figure 3 shows 1) HES4 regulates BEST4 promotor activity, and 2) HES4 and BEST4 colocalized in nuclei, but these are very different biological processes. Please clarify how these two relate to each other.

      Trajectory analysis identifies the basic helix-loop-helix (bHLH) transcription factors Hairy/enhancer of split 4 (Hes4)-expressing colonocytes (Hes4+) in BEST4-expressing colonic epithelial lineage (BEST4+). Although they are very different biological processes, the recent identification of a heterogeneous BEST4+ and Hes4+ subgroup in a human colonic epithelial lineage (Parikh et al., 2019) led us to consider their potential role in regulating CRC progression. We firstly observed a responsive upregulation of both endogenous BEST4 mRNA and protein levels in Hes4 overexpression cells compared to the control transfectant, indicating that Hes4 is a potential upstream activator regulating BEST4 functional. We then confirmed that Hes4 interacted with BEST4, binding directly to its upstream promoter at the region of 1470-1569 bp enhancing its promoter activity as analysed by Co-IP, dual-luciferase assay and ChIP-qPCR, respectively. Essentially, they were co-localized in the nucleus, as shown in immunofluorescence staining after the transient co-transfection of Hes4 and BEST4 into HCT116, therefore indicating that BEST4 interacts with Hes4 at both transcriptional and translational levels (Figure 3; Figure 3-supplemental figure 1)

      (8) In line 182-185, please clarify the reason BEST4 mediates the inhibition of the Twist 1 promotor activity by Hes4.

      Because a step of Hes4 in committing to human bone marrow stromal/stem cell lineage-specific development is mediated by Twist1 downregulation (Cakouros et al., 2015), with evidence of direct interaction between BEST4 and Hes4 observed in HCT116, it is plausible that they could exploit Twist1 to regulate EMT. In the present study, we found that Twist1 colocalized with BEST4 in the nucleus, and their interaction destabilized Twist1 and significantly inhibited EMT induction. Hes4 caused the same effect; however, it required intermediation through BEST4. Although the mechanistic insights of their intercorrelation remain to be elucidated, the present study identified the axis of Hes4-BEST4-Twist1 governing the development of CRC, at least partially by counteracting EMT induction

      (9) In line 205, please rephrase "BEST4-mediated upstream Hes4" to be clearer.

      We apologize for this confusing presentation. We have made corrections to this section in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      We sincerely thank Reviewer #2 for constructive feedback on this work

      Major Comments:

      (1) The general quality of the figures requires improvement (text in some figures is illegible, and the resolution of the images is low) with more proofreading of the text for clarity. In addition, the resolution of the histology in Fig 2K does not allow a proper evalution of the data.

      We apologize for the poor quality of the figures. The figure resolution was drastically reduced during the conversion of the manuscript to pdf on publisher web site. The figures have been re-uploaded and we have once again confirmed that each image has a resolution exceeding 300dpi. Meanwhile, the Figure 2K was further enhanced and expanded.

      (2) While the authors show that the HES4/BEST4 complex interacts with the TWIST1 protein, they do not expand on the mechanisms underpinning the post-translational or transcriptional regulation of TWIST1. We would like the authors to prove or further speculate on the mechanisms behind this regulation in the discussion.

      Our present study showed that BEST4 inhibited EMT in conjunction with downregulation of Twist1 in both HCT116 and Caco2 CRC cell lines. A previous study has shown an antagonist role of Hes4 in regulating Twist1 via protein-protein interaction that controls the bone marrow stromal/stem cell lineage differentiation (Cakouros et al., 2015). We speculate a possible interactive regulation between Hes4/BEST4/Twist1 by which they deter the process of cell polarity during EMT in CRC. In the present study, we found that BEST4 mediates the inhibition of the Twist1 both in transcription and translation level by Hes4. Twist1 colocalized with BEST4 in the nucleus, and their interaction destabilized Twist1 and significantly inhibited EMT induction. Hes4 caused the same effect; however, it required intermediation through BEST4. The present study identified the axis of Hes4-BEST4-Twist1 governing the development of CRC, at least partially by counteracting EMT induction. We agree that further studies to elucidate mechanistic insights of their intercorrelation are needed that are beyond the scope of the current work.

      (3) The authors need to show or argue that why TWIST1 is necessary for the phenotypes observed, e.g. metastasis/proliferation.

      We apologize for the lack of clarity in articulating this question. The process of EMT transforms epithelial cells exhibiting a spindle fibroblast-like morphology, leading to the acquisition of mesenchymal characteristics and morphology, enabling these cells to acquire invasive and migratory abilities, with expression switching epithelial E-cadherin and Zo-1 to mesenchymal vimentin (Dongre and Weinberg, 2019). When diagnosed in advanced stages, EMT may occur as CRC metastasize to distal organs (Pastushenko and Blanpain, 2019; Sunlin Yong, 2021; Yeung and Yang, 2017; Zhang et al., 2021).The whole process is regulated by transcriptional factors of the Snail family and Twist1(Dongre and Weinberg, 2019). Twist1 (a basic helix-loop-helix transcription factor) reprograms EMT by repressing the expression of E-cadherin and ZO-1 (Nagai et al., 2016; Yang et al., 2004) and simultaneously inducing several mesenchymal markers, typically vimentin (Bulzico et al., 2019; Meng et al., 2018; Nagai et al., 2016; Yang et al., 2004), which is a pivotal predictor of CRC progression (Vesuna et al., 2008; Yang et al., 2004; Yusup et al., 2017; Zhu et al., 2015).Overexpression of Twist1 significantly enhances the migration and invasion capabilities of colorectal cancer cells; furthermore, it is closely associated with metastasis and poor prognosis in patients with colorectal cancer(Yusup et al., 2017; Zhu et al., 2015). We have supplemented and improved these parts of the introduction and discussion.

      (4) The authors sufficiently prove that HES4/BEST4 regulates TWIST1 downregulation, however, we believe the findings are not enough to show *direct* regulation (refer also to line 273). At least rephrasing the conclusions would be adequate, also while referring to the working model depicted in Fig. 5G.

      We apologize for this inaccurate presentation. Although the interaction may not be direct, our co-immunoprecipitation (CO-IP) results demonstrated nuclear colocalization of Twist1 and BEST4 (Figure 4D; Figure 4-supplemental figure 1A). Furthermore, their interaction destabilized Twist1 and significantly inhibited the induction of EMT. We have made corrections to this section in the manuscript.

      (5) The discussion is very short and not satisfactory; is BEST4 an evolutionary conserved protein (besides the channel region)? Any speculation on which domain(s) is(are) important for the interaction with HES4 and TWIST1? How do the findings in the current study compare with recent, potentially contradicting data indicating a pro-tumorigenic function of BEST4 for CRC, including its upregulation (and not downregulation) in malignant intestinal tissues, and activation of PI3K/AKT signaling (PMID: 35058597)?

      We apologize for our shallow discussion. We have supplemented and improved some parts of the discussion. The bestrophins are a highly conserved family of integral membrane proteins initially discovered in Caenorhabditis elegans(Sonnhammer and Durbin, 1997). Homologous sequences can be found across animals, fungi, and prokaryotes, while they are absent in protozoans or plants(Hagen et al., 2005). Conservation is primarily observed within the N-terminal 350–400 amino acids, featuring an invariant motif arginine-phenylalanine-proline (RFP) with unknown functional properties (Milenkovic et al., 2008). Mutations in this region can lead to the development of vitelliform macular dystrophy. However, the C-terminus is a potential site for protein modification and function(Marmorstein et al., 2002; Miller et al., 2019). There is currently no further literature research on the functional roles of different domains of BEST4. Although the crucial domain for the interaction with HES4 and TWIST1 is yet to be determined, requiring further investigation for clarification, our findings demonstrate that Hes4 directly binds to the upstream promoter region of BEST4 at 1470-1569 bp, thereby enhancing its promoter activity. These results provide valuable insights for future research.

      Sincere thanks for catching this publication to us. We carefully read this study and would like to point out a few things.

      a) Firstly, the study demonstrated that BEST4 expression is upregulated in clinical CRC samples, which contradicts the results of other published studies except for our own research. RNA-seq of tissue samples from 95 human individuals representing 27 different tissues was performed to determine the tissue specificity of all protein-coding genes, and the results indicated that the BEST4 gene is predominantly expressed in the colon (Fagerberg et al., 2014). In addition, BEST4 was reported to be exclusively expressed by human absorptive cells and could be induced during the process of human absorptive cell differentiation(Ito et al., 2013). Recently, the research from Simmons’s group that published in Nature further proved that human absorptive colonocytes distinctly express BEST4 by single-cell profiling of healthy human colonic epithelial cells, and is dysregulated in colorectal cancer patients(Parikh et al., 2019). Furthermore, the analysis of RNA-seq expression data obtained from colorectal cancer samples and normal samples in the publicly available databases TCGA and GTEx also revealed a significant downregulation of BEST4 expression in colorectal cancer tissues, which is consistent with our research findings. The literature above demonstrates a close relationship between BEST4 and the normal function of the human colon, and provide evidence for their loss in colorectal cancer patients.

      b) Their study showed an increased expression of BEST4 protein levels in colorectal cancer patients through Western Blot. However, the antibody they used was only suitable for IHC-P and not for Western Blot (Abcam , ab188823); . In our study, we also utilized WB technology to detect the expression of BEST4 in colorectal cancer tissues and adjacent normal tissues. The results revealed a decreased expression of BEST4 protein levels in colorectal cancer patients. The antibody we used was specifically designed for WB detection (1:800; LsBio, LS-C31133 https://www.lsbio.com/antibodies/best4-antibody-n-terminus-wb-western-ls-c31133/29602).

      c) The study demonstrated an upregulation of BEST4 protein levels in colorectal cancer patients using immunohistochemistry (IHC). However, the expression of BEST4 was assessed in colorectal cancer tissues through IHC utilizing publicly available protein expression databases such as the Human Protein Atlas. Interestingly, this analysis revealed a minimal presence of BEST4 protein in colorectal cancer tissues (https://www.proteinatlas.org/ENSG00000142959-BEST4/pathology), contradicting their research findings but aligning with our own observations.

      d) Literature based on single-cell genomics analysis reports that only OTOP2 and BEST4 genes are expressed in a subset of the normal colorectal epithelial cells but not the rest(Parikh et al., 2019). An inhibitory effect of OTOP2 on CRC has been recently shown BEST4, and the Otopetrin 2 (OTOP2), which encodes proton‐selective ion channel protein were reported to distinct expressed in normal absorptive colonocytes and colocalized with each other (Drummond et al., 2017; Ito et al., 2013; Parikh et al., 2019). OTOP2 has been recently demonstrated to have an inhibitory effect on the development of CRC via being regulated by wide-type p53(Qu et al., 2019), while the role of BEST4 in CRC is less well studied, that indicate the potential of BEST4 to inhibit colorectal cancer. The Gene set enrichment analysis (GSEA) conducted by them revealed a significant enrichment of gene signatures associated with oncogenic signaling and metastasis, such as the PI3K/Akt signaling pathway, in patients exhibiting higher BEST4 expression compared to those with lower BEST4 expression. However, our GSEA did not show any significant enrichment of the PI3K/Akt signaling pathway in patients with higher BEST4 expression compared to those with lower BEST4 expression. In contrast to their findings, our BEST4 overexpression cell line did not exhibit a significant increase in phosphorylated Akt levels. The present study concludes that our findings align with previous literature and public database analyses, providing evidence for the downregulation of BEST4 in colorectal cancer tissues and its potential as an anticancer agent. Discrepancies observed in other studies may be attributed to difference in experimental model, protocols, preparations or experimental conditions.

      Minor Comments:

      (1) Western blot data should be quantified.

      Sincere thanks for catching this point to us. We have conducted a comprehensive quantification of all the Western Blot data and included the results in the supplementary file.

      (2) Errors in labelling figures in the text should be corrected (Line 214 and more).

      We apologize for these mistakes. We have made corrections to this section in the manuscript.

      (3) The authors used the human HES4 gene, which is indicated with the incorrect nomenclature. The gene and protein nomenclature should be correctly used.

      We apologize for these mistakes. We have made corrections to this section in the manuscript.

      (4) Methods and Materials for certain assays should be further clarified; e.g transwell migration/invasion assays (reference to previous publication? transwell inserts used, etc.)

      Sincerely thanks for catching this issue. We have implemented enhancements and updates to the respective sections.

      (5) Figure 2K: Quality of histology is insufficient.

      We apologize for the poor quality of the figures. The quality of Figure 2K was further enhanced and expanded.

      (6) Figure 2K: Can the authors speculate on whether there is any increase in proliferation through BEST4-ko in HCT15 cells (with overexpression of BEST4 leading to reduced proliferation) and how this may impact the metastatic assay or engraftment/seeding onto the liver?

      Our in vitro experiment demonstrated that the ablation of BEST4 in HCT-15 cells resulted in increased cell proliferation, clonogenesis, migration and invasion (Figures 1 and Figure 1-supplemental figure 1). These findings suggest that BEST4 knockout may potentially contribute to tumor proliferation in vivo; however, further research is required for confirmation. EMT transforms epithelial cells exhibiting a spindle fibroblast-like morphology, leading to the acquisition of mesenchymal characteristics and morphology, enabling these cells to acquire invasive and migratory abilities (Dongre and Weinberg, 2019). When diagnosed in advanced stages, EMT may occur as CRC metastasize to distal organs (Pastushenko and Blanpain, 2019; Sunlin Yong, 2021; Yeung and Yang, 2017; Zhang et al., 2021).  Our study demonstrated that BEST4 inhibits EMT in colorectal cancer (CRC) both in vitro and in vivo. Conversely, ablation of BEST4 promotes EMT by upregulating the expression of EMT-related genes, thereby facilitating the metastasis of colorectal cancer cells to the liver.

      (7) Figure 2L: Authors should indicate in the figure that the BEST4-rescued is at 0 (and not blank).

      Sincerely thanks for catching this issue. We have made corrections to this section in the manuscript.

      (8) Figure 3B: Authors should introduce the usage of the new LS174T cell line in the text.

      Sincerely thanks for catching this issue. The human colorectal cancer cell line, LS174T, was selected for Hes4 knockdown due to its comparatively higher expression of Hes4 in comparison to other CRC cell lines. We have made corrections to this section in the manuscript.

      (9) Figure 3F: Why is there less FLAG in the input, compared to the IP?

      Sincerely thanks for catching this issue. Cell lysates (20 µg) were used for input, and 500ug for IP according to the manufacturer's protocols.

      (10) Figure 5F-G: the quality of the figure is not good enough for interpretation.

      Again, we apologize for poor quality of pictures due to manuscript conversion. We have made corrections to this section in the manuscript.

      (11) Table 1: Conclusions made by the authors are wrong (lines 237-239); instead "high BEST4 expression more prevalent in females" and "low BEST4 expression more prevalent among CRC patients with advanced tumor stage". And how are low and high BEST4 expressions defined (the same applies to the data in Fig. 5F)?

      We apologize for these mistakes, we set cutoff-high (50%) and cutoff-low (50%) values to split the high-expression and low-expression cohorts. We have made corrections to this section in the manuscript.

      (12) In all Figure legends, there should be an indication of the type of statistical tests that were applied, as well as information on the number of independent experiments that were performed and provided the same results

      Sincerely thanks for catching this issue. The types of statistical tests applied in the Materials and Method- Statistical analysis section are indicated. Information on the number of independent experiments used is provided in the figure legend section.

      Reference

      Bulzico, D., Pires, B.R.B., PAS, D.E.F., Neto, L.V., and Abdelhay, E. (2019). "Twist1 Correlates With Epithelial-Mesenchymal Transition Markers Fibronectin and Vimentin in Adrenocortical Tumors". Anticancer research 39, 173-175. 10.21873/anticanres.13094.

      Cakouros, D., Isenmann, S., Hemming, S.E., Menicanin, D., Camp, E., Zannetinno, A.C., and Gronthos, S. (2015). "Novel basic helix-loop-helix transcription factor hes4 antagonizes the function of twist-1 to regulate lineage commitment of bone marrow stromal/stem cells". Stem Cells Dev 24, 1297-1308. 10.1089/scd.2014.0471.

      Christou, N., Perraud, A., Blondy, S., Jauberteau, M.O., Battu, S., and Mathonnet, M. (2017). "E-cadherin: A potential biomarker of colorectal cancer prognosis". Oncol Lett 13, 4571-4576. 10.3892/ol.2017.6063.

      Dongre, A., and Weinberg, R.A. (2019). "New insights into the mechanisms of epithelial-mesenchymal transition and implications for cancer". Nature reviews. Molecular cell biology 20, 69-84. 10.1038/s41580-018-0080-4.

      Drummond, C.G., Bolock, A.M., Ma, C., Luke, C.J., Good, M., and Coyne, C.B. (2017). "Enteroviruses infect human enteroids and induce antiviral signaling in a cell lineage-specific manner". Proceedings of the National Academy of Sciences of the United States of America 114, 1672-1677. 10.1073/pnas.1617363114.

      Fagerberg, L., Hallstrom, B.M., Oksvold, P., Kampf, C., Djureinovic, D., Odeberg, J., Habuka, M., Tahmasebpoor, S., Danielsson, A., Edlund, K., et al. (2014). "Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics". Mol Cell Proteomics 13, 397-406. 10.1074/mcp.M113.035600.

      Hagen, A.R., Barabote, R.D., and Saier, M.H. (2005). "The bestrophin family of anion channels: identification of prokaryotic homologues". Molecular membrane biology 22, 291-302. 10.1080/09687860500129711.

      Ito, G., Okamoto, R., Murano, T., Shimizu, H., Fujii, S., Nakata, T., Mizutani, T., Yui, S., Akiyama-Morio, J., Nemoto, Y., et al. (2013). "Lineage-specific expression of bestrophin-2 and bestrophin-4 in human intestinal epithelial cells". PLoS One 8, e79693. 10.1371/journal.pone.0079693.

      Lazarova, D.L., and Bordonaro, M. (2016). "Vimentin, colon cancer progression and resistance to butyrate and other HDACis". Journal of cellular and molecular medicine 20, 989-993. 10.1111/jcmm.12850.

      Marmorstein, L.Y., McLaughlin, P.J., Stanton, J.B., Yan, L., Crabb, J.W., and Marmorstein, A.D. (2002). "Bestrophin interacts physically and functionally with protein phosphatase 2A". The Journal of biological chemistry 277, 30591-30597. 10.1074/jbc.M204269200.

      Meng, J., Chen, S., Han, J.X., Qian, B., Wang, X.R., Zhong, W.L., Qin, Y., Zhang, H., Gao, W.F., Lei, Y.Y., et al. (2018). "Twist1 Regulates Vimentin through Cul2 Circular RNA to Promote EMT in Hepatocellular Carcinoma". Cancer research 78, 4150-4162. 10.1158/0008-5472.Can-17-3009.

      Milenkovic, V.M., Langmann, T., Schreiber, R., Kunzelmann, K., and Weber, B.H. (2008). "Molecular evolution and functional divergence of the bestrophin protein family". BMC evolutionary biology 8, 72. 10.1186/1471-2148-8-72.

      Miller, A.N., Vaisey, G., and Long, S.B. (2019). "Molecular mechanisms of gating in the calcium-activated chloride channel bestrophin". eLife 8. 10.7554/eLife.43231.

      Nagai, T., Arao, T., Nishio, K., Matsumoto, K., Hagiwara, S., Sakurai, T., Minami, Y., Ida, H., Ueshima, K., Nishida, N., et al. (2016). "Impact of Tight Junction Protein ZO-1 and TWIST Expression on Postoperative Survival of Patients with Hepatocellular Carcinoma". Digestive diseases (Basel, Switzerland) 34, 702-707. 10.1159/000448860.

      Parikh, K., Antanaviciute, A., Fawkner-Corbett, D., Jagielowicz, M., Aulicino, A., Lagerholm, C., Davis, S., Kinchen, J., Chen, H.H., Alham, N.K., et al. (2019). "Colonic epithelial cell diversity in health and inflammatory bowel disease". Nature 567, 49-55. 10.1038/s41586-019-0992-y.

      Pastushenko, I., and Blanpain, C. (2019). "EMT Transition States during Tumor Progression and Metastasis". Trends in cell biology 29, 212-226. 10.1016/j.tcb.2018.12.001.

      Qu, H., Su, Y., Yu, L., Zhao, H., and Xin, C. (2019). "Wild-type p53 regulates OTOP2 transcription through DNA loop alteration of the promoter in colorectal cancer". FEBS open bio 9, 26-34. 10.1002/2211-5463.12554.

      Sonnhammer, E.L., and Durbin, R. (1997). "Analysis of protein domain families in Caenorhabditis elegans". Genomics 46, 200-216. 10.1006/geno.1997.4989.

      Sunlin Yong, Z.W., Tang Yuan, Chuang Cheng, Dan Jiang (2021). "Comparison of MMR protein and Microsatellite Instability Detection in Colorectal Cancer and Its Clinicopathological Features Analysis". Journal of Medical Research 50, 61-66. 10.11969/j.issn.1673-548X.2021.05.015

      Vesuna, F., van Diest, P., Chen, J.H., and Raman, V. (2008). "Twist is a transcriptional repressor of E-cadherin gene expression in breast cancer". Biochem Biophys Res Commun 367, 235-241. 10.1016/j.bbrc.2007.11.151.

      Yang, J., Mani, S.A., Donaher, J.L., Ramaswamy, S., Itzykson, R.A., Come, C., Savagner, P., Gitelman, I., Richardson, A., and Weinberg, R.A. (2004). "Twist, a master regulator of morphogenesis, plays an essential role in tumor metastasis". Cell 117, 927-939. 10.1016/j.cell.2004.06.006.

      Yeung, K.T., and Yang, J. (2017). "Epithelial-mesenchymal transition in tumor metastasis". Molecular oncology 11, 28-39. 10.1002/1878-0261.12017.

      Yusup, A., Huji, B., Fang, C., Wang, F., Dadihan, T., Wang, H.J., and Upur, H. (2017). "Expression of trefoil factors and TWIST1 in colorectal cancer and their correlation with metastatic potential and prognosis". World journal of gastroenterology 23, 110-120. 10.3748/wjg.v23.i1.110.

      Zhang, N., Ng, A.S., Cai, S., Li, Q., Yang, L., and Kerr, D. (2021). "Novel therapeutic strategies: targeting epithelial-mesenchymal transition in colorectal cancer". The Lancet. Oncology 22, e358-e368. 10.1016/s1470-2045(21)00343-0.

      Zhu, D.J., Chen, X.W., Zhang, W.J., Wang, J.Z., Ouyang, M.Z., Zhong, Q., and Liu, C.C. (2015). "Twist1 is a potential prognostic marker for colorectal cancer and associated with chemoresistance". American journal of cancer research 5, 2000-2011.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      One major issue arises in Figure 4, the recording of VLPO Ca2+ activity. In Lines 211-215, they stated that they injected AAV2/9-DBH-GCaMP6m into the VLPO, while activating LC NE neurons. As they claimed in line 157, DBH is a specific promoter for NE neurons. This implies an attempt to label NE neurons in the VLPO, which is problematic because NE neurons are not present in the VLPO. This raises concerns about their viral infection strategy since Ca activity was observed in their photometry recording. This means that DBH promoter could randomly label some non-NE neurons. Is DBH promoter widely used? The authors should list references. Additionally, they should quantify the labeling efficiency of both DBH and TH-cre throughout the paper.

      In Figure 5, we found that the VLPO received the noradrenergic projection from LC, indicating the recorded Ca2+ activity may come from the axon fibers corresponding to the projection. Similarly, Gunaydin et al. (2014) demonstrated that fiber photometry can be used to selectively record from neuronal projection.

      We appreciate the reviewer's insightful suggestion to elaborate on the DBH promoter, we have now expanded our discussion to address the DBH (pg. 18): “DBH (Dopamine-beta-hydroxylase), located in the inner membrane of noradrenergic and adrenergic neurons, is an enzyme that catalyzes the conversion of dopamine to norepinephrine, and therefore plays an important role in noradrenergic neurotransmission. DBH is a marker of noradrenergic neurons. Zhou et al. (2020) clarified the probe specifically labeled noradrenergic neurons by immunolabeling for DBH. Recently, DBH promoter have been used in several studies (e.g., Han et al., 2024; Lian et al., 2023). The DBH-Cre mice are widely used to specifically labeled noradrenergic neurons (e.g., Li et al., 2023; Breton-Provencher et al., 2022; Liu et al., 2024). It is difficult to distinguish the role of NE or DA neurons when using the TH promoter in VLPO. Therefore, we used DBH promoter with more specific labeling. LC is the main noradrenergic nucleus of the central nervous system. In our study, we injected rAAV-DBH-GCaMP6m-WPRE (Figure 2 and 8) and rAAV-DBH-EGFP-S'miR-30a-shRNA GABAA receptor)-3’-miR30a-WPRES (Figure 9) into the LC. The results showed that DBH promoter could specifically label noradrenergic neurons in the LC, while non-specific markers outside the LC were almost absent.”

      As suggested, we have quantified the labeling efficiency of both DBH and TH-cre throughout the revised manuscript (Fig.2D; Fig.3D, N-O; Fig.4E-F, J, L; Fig.5E, L; Fig.6L, S, X; Fig.7G).

      A similar issue arises with chemogenetic activation in Fig. 5 L-R, the authors used TH-cre and DIO-Gq virus to label VLPO neurons. Were they labelling VLPO NE or DA neurons for recording? The authors have to clarify this.

      As previously addressed in response to Comment #1, we agree that it is difficult to distinguish the role of NE or DA neurons when using the TH promoter in the VLPO. Therefore, we injected the mixture of DBH-Cre-AAV and AAV-EF1a-DIO-hChR2(H134R)-eYFP/AAV-Ef1a-DIO-hM3Dq-mCherry viruses into bilateral LC and AAV-EF1a-DIO-hChR2(H134R)-eYFP/AAV-Ef1a-DIO-hM3Dq-mCherry virus into bilateral VLPO. Moreover, we quantified the labeling efficiency of DBH in the LC to demonstrate that this promoter can specifically label NE neurons (Fig. 5). Importantly, these corrections did not alter the outcomes of our results. Both photogenetic and chemogenetic activation of LC-NE terminals in the VLPO can effectively promote midazolam recovery (Fig. 5G, N).

      Another related question pertains to the specificity of LC NE downstream neurons in the VLPO. For example, do they preferentially modulate GABAergic or glutamatergic neurons?

      Our study primarily aimed to explore the role of the LC-VLPO NEergic neural circuit in modulating midazolam recovery. We acknowledge that our evidence for the role of LC NE downstream neurons in the VLPO, derived from activation of LC-NE terminals and pharmacological intervention in the VLPO (Fig.5, Fig.6, Fig.8, Fig.9) is limited. Accordingly, we now present the VLPO’s role as a promising direction for future research in the limitation section of our revised manuscript: “This study shows that the LC-VLPO NEergic neural circuit plays an important role in modulating midazolam recovery. However, the specificity of LC NE downstream neurons in the VLPO is not explained in this paper, which is our next research direction, VLPO neurons and their downstream regulatory mechanisms may be involved in other nervous systems except the NE nervous system, and the deeper and more complex mechanisms need to be further investigated.”

      In Figure 1A-D, in the measurement of the dosage-dependent effect of Mida in LORR, were they only performed one batch of testing? If more than one batch of mice were used, error bar should be presented in 1B. Also, the rationale of testing TH expression levels after Mid is not clear. Is TH expression level change related to NE activation specifically? If so, they should cite references.

      As recommended, we have supplemented error bar and modified the graph of LORR’s rate in the revised manuscript. (Fig. 1A-B; Fig. 9G-H).

      We agree that the use of TH as a marker of NE activation is controversial, so in the revised manuscript, we directly determined central norepinephrine content to reflect the change of NE activity after midazolam administration (Fig. 1D).

      Regarding the photometry recording of LC NE neurons during the entire process of midazolam injection in Fig. 2 and Fig. 4, it is unclear what time=0 stands for. If I understand correctly, the authors were comparing spontaneous activity during the four phases. Additionally, they only show traces lasting for 20s in Fig. 2F and Fig. 4L. How did the authors select data for analysis, and what criteria were used? The authors should also quantify the average Ca2+ activity and Ca2+ transient frequency during each stage instead of only quantifying Ca2+ peaks. In line 919, the legend for Figure 2D, they stated that it is the signal at the BLA; were they also recorded from the BLA?

      In this study, we used optical fiber calcium signal recording, which is a fluorescence imaging based on changes in calcium. The fluorescence signal is usually divided into different segments according to the behavior, and the corresponding segments are orderly according to the specific behavior event as the time=0. The mean calcium fluorescence signal in the time window 1.5s or 1s before the event behavior is taken as the baseline fluorescence intensity (F0), and the difference between the fluorescence intensity of the occurrence of the behavior and the baseline fluorescence intensity is divided by the difference between the baseline fluorescence intensity and the offset value. That is, the value ΔF/F0 represents the change of calcium fluorescence intensity when the event occurs. The results of the analysis are commonly represented by two kinds of graphs, namely heat map and event-related peri-event plot (e.g., Cheng et al., 2022; Gan-Or et al., 2023; Wei et al., 2018). In Fig. 2, the time points for awake, midazolam injection, LORR and RORR in mice were respectively selected as time=0, while in Fig. 4, RORR in mice was selected as time=0. The selected traces lasting for 20s was based on the length of a complete Ca2+ signal. We have explained the Ca2+ recording experiment more specifically in the figure legends and methods sections of our revised manuscript.

      To the BLA, we sincerely apologize for our carelessness, the signal we recorded were from the LC rather than the BLA. We have carefully checked and corrected similar problems in the revised manuscript.

      Reviewer 2:

      In figure legends, abbreviations in figure should be supplemented as much as possible. For example, "LORR" in Figure 1.

      As suggested, we have supplemented abbreviations in figure as much as possible in the revised manuscript.

      Additional recommendations:

      The main conceptual issue in the paper is the inflation of the conclusion regarding the mechanism of sedation induced by midazolam. The authors did not reveal the full mechanism of this but rather the relative contribution of NE system. Several conclusions in the text should be edited to take into account this starting from the title. I think the following examples are more appropriate: "NE contribution to rebooting unconsciousness caused by midazolam' or 'NE contribution to reverse the sedation induced by midazolam'.

      As suggested, we have moderated the assertions about the mechanism of sedation induced by midazolam in several conclusions starting from the title (Line 1,125,150,169,202,237,482), to present a more measured interpretation in the manuscript.

      Line 178-179, the authors state 'these suggest that intranuclear ... suppresses recovery from midazolam administration'. In fact, this intervention prolonged or postponed recovery from midazolam.

      In our revised manuscript, we have corrected this inappropriate term (Line 178).

      Pharmacology part (page 12) that aimed to pinpoint which NE receptor is implicated would suffer from specificity issues.

      In relation to the specificity issue, the focus on VLPO might be rational but again other areas are most likely involved given the pharmacological actions of midazolam.

      In the revised manuscript, we have discussed those specificity issues of NE receptor and areas involved throughout the midazolam-induced altered consciousness: “In addition, given the pharmacological actions of midazolam, other areas may also be involved. Current studies suggest that the neural network involved in the recovery of consciousness consists of the prefrontal cortex, basal forebrain, brain stem, hypothalamus and thalamus. The role of these regions in midazolam recovery remains to be further investigated. Therefore, we will apply more specific experimental methods to determine the importance of LC-VLPO NEergic neural circuit and related NE receptors in the midazolam recovery, and conduct further studies on other relevant brain neural regions, hoping to more fully elucidate the mechanism of midazolam recovery in the future”.

      Line 274, the authors used 'inhibitory EEG activity'. what does it mean? a description of which rhythm-related power density is affected would be more objective.

      Example of conclusion inflation: in line 477, the word 'contributes' is better than 'mediates' if the specificity issue is taken into account.

      As suggested, we have improved our expression of words in our revised manuscript (pg. 13-14).

      References

      Gunaydin LA, Grosenick L, Finkelstein JC, et al. Natural neural projection dynamics underlying social behavior. Cell. 2014;157(7):1535-1551. doi:10.1016/j.cell.2014.05.017

      Zhou N, Huo F, Yue Y, Yin C. Specific Fluorescent Probe Based on "Protect-Deprotect" To Visualize the Norepinephrine Signaling Pathway and Drug Intervention Tracers. J Am Chem Soc. 2020;142(41):17751-17755. doi:10.1021/jacs.0c08956

      Han S, Jiang B, Ren J, et al. Impaired Lactate Release in Dorsal CA1 Astrocytes Contributed to Nociceptive Sensitization and Comorbid Memory Deficits in Rodents. Anesthesiology. 2024;140(3):538-557. doi:10.1097/ALN.0000000000004756

      Lian X, Xu Q, Wang Y, et al. Noradrenergic pathway from the locus coeruleus to heart is implicated in modulating SUDEP. iScience. 2023;26(4):106284. Published 2023 Feb 27. doi:10.1016/j.isci.2023.106284

      Li C, Sun T, Zhang Y, et al. A neural circuit for regulating a behavioral switch in response to prolonged uncontrollability in mice. Neuron. 2023;111(17):2727-2741.e7. doi:10.1016/j.neuron.2023.05.023

      Breton-Provencher V, Drummond GT, Feng J, Li Y, Sur M. Spatiotemporal dynamics of noradrenaline during learned behaviour. Nature. 2022;606(7915):732-738. doi:10.1038/s41586-022-04782-2

      Liu Q, Luo X, Liang Z, et al. Coordination between circadian neural circuit and intracellular molecular clock ensures rhythmic activation of adult neural stem cells. Proc Natl Acad Sci U S A. 2024;121(8):e2318030121. doi:10.1073/pnas.2318030121

      Cheng J, Ma X, Li C, et al. Diet-induced inflammation in the anterior paraventricular thalamus induces compulsive sucrose-seeking. Nat Neurosci. 2022;25(8):1009-1013. doi:10.1038/s41593-022-01129-y

      Gan-Or B, London M. Cortical circuits modulate mouse social vocalizations. Sci Adv. 2023;9(39):eade6992. doi:10.1126/sciadv.ade6992

      Wei YC, Wang SR, Jiao ZL, et al. Medial preoptic area in mice is capable of mediating sexually dimorphic behaviors regardless of gender. Nat Commun. 2018;9(1):279. Published 2018 Jan 18. doi:10.1038/s41467-017-02648-0

    1. Author response:

      We thank the reviewers for their thoughtful and critical comments. We will revise and improve the manuscript according to the public reviews. In particular, we will:

      (1) provide a broader perspective on the potential clinical implications of our experiments regarding the mechanisms and the treatment of coma and disorders of consciousness. In particular, we will address how the reported increase in dynamical features associated with consciousness, even without behavioral signs, might be relevant to characterize patients with a motor-cognitive dissociation.

      (2) use the term "tDCS" to qualify the technique we used in the paper instead of "HD-tDCS" to avoid any potential confusion. We understand that "HD-tDCS", which we used in our paper to refer to high-density tDCS (small size electrodes), may cause some confusion with high-definition tDCS, which is more commonly used in the literature to design a 4x1 tDCS montage with smaller high-definition electrodes. We will also provide the full characteristics of the carbon electrodes we used for stimulation.

      (3) clarify the location sites of stimulation and provide structural MRI images with the accurate localization of the stimulating electrodes.

      (4) clarify the fMRI data analyses we performed and provide a schematic illustration of the analysis process.

    1. Author response:

      We are pleased that the reviewers found our study thought-provoking and appreciate the care they have taken in providing constructive feedback. Focusing on the main issues raised by the reviewers, we provide here a provisional response to the Public Comments and outline our revision plan.

      A) Reviewers 1 and 2 were concerned that our task and analyses were limited by the fact that we only tested the model based on biases in movement direction (angular biases) and did not examine biases in movement extent (radial biases).

      While we think the angular biases provide a sufficient test to compare the set of models presented in the paper, we appreciate that there was a missed opportunity to also look at movement extent.  Looking at predictions concerning both movement direction and extent would provide a stronger basis for model comparison. To this end, we will take a two-step approach:

      (1) Re-analysis of existing datasets from experiments that involve a pointing task (movements terminate at the target position) rather than a shooting task (movements terminate further than the target distance).  We will conduct a model comparison using these data. 

      (2) If we are unable to obtain a suitable dataset or datasets because we cannot access individual data or there are too few participants, we will conduct a new experiment using a pointing task.  We will use these new data to evaluate whether the transformation model can accurately predict biases in both movement direction and extent.

      We will incorporate those new results in our revision.

      B) Reviewer 3 noted that model fitting was based on group average data. They questioned if this was representative across individuals and how well the model would account for individual patterns of reach biases.

      To address this issue, we propose to do the following:

      (1) We will first fit the model to individual data in Exp 1 and assess whether a two-peak function, the signature of the transformation model, is characteristic of most the fits. We recognize that the results at the individual level may not support the model.  This could occur because the model is not correct.  Alternatively, the model could be correct but difficult to evaluate at the individual level for several reasons. First, the data set may be underpowered at the individual level. Second, motor biases can be idiosyncratic (e.g., within subject correlation is greater than between subject correlation), a point we noted in the original submission. Third, as observed in previous studies, transformation biases also show considerable individual variability (Wang et al, 2020); as such, even if the model is correct, a two-peaked function may not hold for all individuals.

      (2) If the individual variability is too large to draw meaningful conclusions, we will conduct a new experiment in which we measure motor and proprioceptive biases. Our plan would be to collect a large data set from a limited number of participants.  These data should allow us to evaluate the models on an individual basis, including using each participant’s own transformation/proprioceptive bias function to predict their motor biases.

      C) The reviewers have comments regarding the assumptions and form of the different models. Reviewer 3 questioned the visual bias model presented in the paper, and Reviewers 2 and 3 suggested additional visual bias/ biomechanical models to consider.

      We agree that what we call a visual bias effect is not confined to the visual modality: It is observed when the target is presented visually or proprioceptively, and in manifest in both reaching movements, saccades, and pressing keys to adjust a dot to match with the remembered target (Kosovicheva & Whitney, 2017; Yousif et al. 2023). As such, the bias may reflect a domain-general distortion in the representation of goals within polar space. We refer to this component as a "visual bias" because it is associated with the representation of the visual target in the reaching task.

      We do think the version of the visual bias model in the original submission is reasonable given that the bias pattern has been observed in perceptual tasks with stimuli that were very similar to ours (e.g., Kosovicheva & Whitney, 2017). We have explored other perceptual models in evaluating the motor biases observed in Experiment 1. For example, several models discuss how visual biases may depend on the direction of a moving object or the orientation of an object (Wei & Stocker, 2015; Patten, Mannion & Clifford, 2017). However, these models failed to account for the motor biases observed in our experiments, a not surprising outcome since the models were not designed to capture biases in perceived location.  There are also models of visual basis associated with viewing angle (e.g., based on retina/head position).  Since we allow free viewing, these biases are unlikely to make substantive contributions to the biases observed in our reaching tasks.

      Given that some readers are likely to share the reviewers’ concerns on this issue, we will extend our discussion to describe alternative visual models and provide our arguments about why these do not seem relevant/appropriate for our study.

      In terms of biomechanical models, we plan to explore at least one alternative model, the MotorNet Model (https://elifesciences.org/articles/88591). This recently published model combines a six-muscle planar arm model with artificial neural networks (ANNs) to generate a control policy. The model has been used to predict movement curvature in various contexts.  We will focus on its utility to predict biases in reaching to visual targets.

      D) Reviewer 1 had concerns with how we measured the transformation bias. In particular, they asked why the data from Wang et al (2020) are used as an estimate of transformation biases, and not as the joint effects of visual and proprioceptive biases in the sensed target and hand location, respectively.

      We define transformation error as the misalignment between the visual target and the hand position. We quantify this transformation bias by referencing studies that used a matching task in which participants match their unseen hand to a visual target, or vice versa. Errors observed in these tasks are commonly attributed to proprioceptive bias, although they could also reflect a contribution from visual bias. We utilized the same data set to simulate both the transformation bias model and the proprioceptive bias model.

      Although it may seem that we are simply renaming concepts, the concept of transformation error addresses biases that arise during motor planning. For the proprioceptive bias model, the bias only influences the perceived start position but not the goal since proprioception will influence the perceived position of the target before the movement begins. In contrast, the transformation bias model proposes that movements are planned toward a target whose location is biased due to discrepancies between visual and proprioceptive representations.

      The question then arises whether measurements of proprioceptive bias also reflect a transformation bias. We believe that the transformation bias is influenced by proprioceptive feedback, or at the very least, proprioceptive and transformation bias share a common source of error and thus, are highly correlated. We will revise the Introduction and Results sections to more clearly articulate these relationships and assumptions.

      E) Reviewer 3 asked whether the oblique effect in visual perception could account for our motor bias.

      The potential link between the oblique effect and the observed motor bias is an intriguing idea, one that we had not considered. However, after giving this some thought, we see several arguments against the idea that the oblique effect accounts for the pattern of motor biases.

      First, by the oblique effect, variance is greater for diagonal orientations compared to Cartesian orientations. These differences in perceptual variability can explain the bias pattern in visual perception through a Bayesian efficient coding model (Wei & Stocker, 2015). We note that even though participants showed large variability for stimuli at diagonal orientations, the bias for these stimuli was close to zero. As such, we do not think it can explain the motor bias function given the large bias for targets at locations along the diagonal axes.

      Second, the reviewer suggested an "oblique effect" within the motor system, proposing that motor variability is greater for diagonal directions due to increased visual bias. If this hypothesis is correct, a visual bias model should account for the motor bias observed, particularly for diagonal targets. In other words, when estimating the visual bias from a reaching task, a similar bias pattern should emerge in tasks that do not involve movement. However, this prediction is not supported in previous studies. For example, in a position judgment task that is similar to our task but without the reaching response, participants exhibited minimal bias along the diagonals (Kosovicheva & Whitney, 2017).

      Despite our skepticism, we will keep this idea in mind during the revision, investigating variability in movement across the workspace.

    1. Author response:

      Reviewer #1 (Public Review):

      In this work, the authors aimed to understand how titins derived from different nuclei within the syncytium are organized and integrated after cell fusion during skeletal muscle development and remodeling. The authors developed mCherry titin knock-in mice with the fluorophore mCherry inserted into titin's Z-disk region to track the titin during cell fusion. The results suggested that titin exhibited homogenous distribution after cell fusion. The authors also probed on how titin behaves during muscle injury by implantation of titin-eGFP myoblasts into adult mCherry-titin mice. Interestingly, titin is retained at the proximal nucleus and does not diffuse across the whole syncytium in this system. The findings of the study are novel and interesting. The experimental approaches are appropriate. The results are described well. However, the manuscript needs revisions to enhance its clarity.

      (1) In this work, the authors have not described the statistical analysis appropriately. In most of the figures, significance levels are not described. The information on the biological and technical replicates is missing in almost all the figures. This information is critical for understanding the strength of the experimentation.

      Thank you for this feedback, added the missing information to the figure legends.

      (2) The in vivo experiments are underpowered. The authors have used only 3 animals in the cardiotoxin injury experiment and eliminated another 3 animals from the analysis. How did they determine insufficient myoblast integration?

      The experimental design was targeted at using transplantation of myoblasts into skeletal muscle to obtain information on the ability of transplanted cells to fuse with cells in the injured area – and if those myoblasts could provide titin protein beyond the confinement of the transplanted cells (as would be expected after cell fusion). The goal was not to optimize cell transplantation with improved force generation of lesioned muscle. For this, we agree, the experiments would be underpowered.

      Here, we use a different approach, and successfully demonstrate the integration of titin protein from transplanted cells into sarcomeres of host muscle fibers. Here, only an animal number of 5 per group was approved by the local authorities, in agreement with the scope of our proposed hypothesis on cell fusion contributing titin beyond the transplanted cell and in agreement with the 3R guidelines and the necessity to addressed our research question in as few animals as possible. We proposed the need for at least 3 animals per implantation group and included 2 additional animals for compensation in case there was insufficient myoblast integration (no detection of GFP+ cells). The resulting n=3 and n=4 animals provided enough fusion events to show that even after 3 weeks, titin protein is confined to the address our hypothesis: in case after cell fusion titin is homogenously distributed, we would have expected red and greed striation throughout the fiber. This was not the case. In 8 out of 8 fused cells we had a segregation of green and red titin molecules as depicted in figure 6 and S5.

      (3) Similarly, the in vitro imaging experiments, especially the in vitro titin mobility assays used only 3 cells (Fig 2b) or 6-9 cells (Fig 2c-2e). The number of cells imaged is insufficient to derive a valid conclusion. What is the variability in the results between cells? Whether all the cells behave similarly in titin mobility assays?

      For Figure 2 we had described our replicates insufficiently. Quantification in 2b-e consists of total 9 cells out of 3 independent experiments (3 per experiment). For 2d one outlier (Grubbs test) was excluded for the GFP signal. For 2e we only included cells that could be fitted with a two-phase association curve. That resulted in 6 cells for the GFP signal and 7 cells for the mCherry signal.

      (4) Figure 1c-e, Figure 2a, Figure 3, Figure 4, Figure 5, Figure 6- please describe the replicates and also if possible, quantify the data and present them as separate figures.

      1) Figure 1d (former 1c) is the validation that titin is properly integrated into the sarcomere and that the cherry signal localizes to the Z-disk, overlapping with actinin. This is qualitative, not quantitative information and replicated and confirmed in figure 2. 1e (former 1d) is a representative image for the quantification in 1f (former 1e) with 3 biological replicates (=cells) and 3 technical replicates (=Z-disks) each, for every time point significantly different with p<0.001, tested by 2-way ANOVA

      2) 2a: representative image (+regarding profile) for quantification in 2b (9 biological replicates(=cells) measured at 3 different experiment days) (see answer to 1-3)

      3) Representative images: Cells were seeded on several cover slips and fusion was started. This was done on 4 occasions (=technical replicates) with different stainings (see supplement) and 30+ images were taken in total with at least 5 images per staining. The taken images of different fusion stadiums were later classified based on the distribution of the differentially labeled titin.

      4) a-c: representative image that shows two independent fusion events; fusion experiments were performed at 4 days with a total of 13 fusion events captured (6 only immature cells, 7 with one mature cell). For quantification in d+e, very small (< 1000 μm2) and very large (> 10,000 μm2) syncytia were excluded to minimize the effect of large size differences of the syncytia, so that 5 immature and 4 mature fusion events remained for comparative analysis.

      5) smFISH Experiment was repeated on 2 days and 6 images of fusion events were made. Since they were in different stages of fusion and 4 elements contributed to the images (mCherry-RNA, GFP-RNA, mCh-Titin protein, GFP-Titin protein), it was difficult to compare. However, we added the quantification to Fig. S4 (b and c) and added a regarding paragraph to the results. There seems to be a smaller overlap region for the RNA than for the protein signal.

      6) Representative images with n=6 (but 3 excluded due to insufficient myoblast integration) biological replicates (mice) for the CTX+cells group (main experiment group) and n=4 for the only cells control and n=1 for the only CTX control, based on 3R regulation of animal experiments. From each mouse (n=11) the contralateral TA muscle was harvested as well to serve as an uninjured and without cell transplantation control.

      (5) Figure 2- the authors excluded samples with an obvious decrease in cell quality during imaging from the analysis. How do the authors assess the cell quality? Simply by visual examination? Or were the samples that did not show fluorescence recovery eliminated? I am wondering what percentage of cells showed poor cell quality. How do they avoid the bias? I recommend that the authors include these cells also for the analysis of data presented in Figures 2b, 2c, and 2f.

      Cells were not excluded for their recovery status, but only if they showed signs of cell death (collapse of sarcomere structures, membrane bubbling, etc). All cells that stayed alive during the imaging showed a fluorescence recovery. Cells that had only a slower or uncomplete recovery were not excluded from the complete analysis. One cell was excluded from the comparison of exchange half-life (Fig. 2d), since it was a significant outlier. For Figure 2e (Fast phase) only cells could be included, where we were able to fit a two-phase association curve.

      (6) It is unclear how the authors identified the different stages of cell fusion in the microscopy images i.e. early fusion, distribution, and complete distribution.

      Early fusion was characterized when two cells made connection with their membranes, but differentially labeled titin has not yet mixed. Distribution was characterized when titin mixing has started but is not yet complete.

    1. Author response:

      Reviewer #1 (Public Review):

      Lactobacillus plantarum is a beneficial bacterium renowned for its positive physiological effects and probiotic functions. Fu et al. conducted an investigation into the involvement of this bacterium in host purine metabolism. Initially, they employed microbiomics to analyze changes in L. plantarum within a hyperuricemia model, followed by isolation of the bacterium from this model. The gene map associated with purine nucleoside metabolism was determined through whole-genome analysis. Metabolic shifts in L. plantarum under nucleoside-enriched conditions were assessed using HPLC and metabolomics, while underlying mechanisms were explored through gene knockout experiments. Finally, the efficacy of L. plantarum was validated in hyperuricemia models involving goslings and mice. The authors presented their findings coherently and logically, addressing key questions using appropriate methodologies and yielding significant and innovative results. The authors demonstrated that host-derived Lactobacillus plantarum alleviates host hyperuricemia by influencing purine metabolism. However, their study primarily focused on this bacterium without delving deeper into the mechanisms underlying hyperuricemia beyond verification through two models. Nevertheless, these findings are sufficient to support their conclusion effectively. Additionally, further research is warranted to investigate the metabolites of Lactobacillus plantarum.

      We appreciate the reviewers' suggestions. We have studied Lactobacillus plantarum in detail, focusing specifically on its role in the purine nucleoside metabolism of the host, confirmed through in vitro and in vivo experiments. Our key finding demonstrates how Lactobacillus plantarum contributes to this process. We also examined the expression of hepatic uric acid synthesis proteins and renal uric acid excretion proteins related to alleviating host hyperuricaemia (Figure 9). While discussing the metabolites of Lactobacillus plantarum may fall outside the scope of this article, we plan to investigate this further. Our goal is to identify a signature metabolite via in vitro and in vivo studies and explore how it may help reduce hyperuricaemia in the host.

      Reviewer #2 (Public Review):

      Summary:

      Purine nucleoside metabolism in intestinal flora is integral to the purine nucleoside metabolism in the host. This study identified the iunH gene in Lactobacillus plantarum that regulates its purine nucleoside metabolism. Oral gavage of Lactobacillus plantarum and subsequent analysis showed it maintains homeostasis of purine nucleoside metabolism in the host.

      Strengths:

      This study presents sufficient evidence for the role of Lactobacillus plantarum in alleviating hyperuricaemia, combining microbiomics, whole genomics, in vitro bacterial culture, and metabolomics. These results suggest the iunH gene of Lactobacillus plantarum is crucial in host purine nucleoside metabolism. The experimental design is robust, and the data are of high quality. This study makes significant contributions to the fields of hyperuricaemia, purine nucleoside metabolism, and Lactobacillus plantarum investigation.

      We appreciate the reviewers' encouraging feedback.

      Weaknesses:

      A key limitation of this manuscript is the absence of an in-depth study on the alleviation metabolism of Lactobacillus plantarum. Notable questions include: What overall metabolic changes occur in a purine nucleoside-enriched environment? How do the metabolites of Lactobacillus plantarum vary? Do these metabolites influence host purine nucleoside metabolism? These areas merit further investigation.

      Thank you! The Supplementary Material link includes intracellular and extracellular metabolomics data for Lactobacillus plantarum, detailing the overall metabolic changes. We agree with the reviewer that the effect of metabolites on host purine nucleoside metabolism is worth investigating, but it has not been explored too much in this paper as it focuses more on the changes in the metabolites of the purine nucleosides themselves. We plan to explore this topic further in future research.

      Reviewer #3 (Public Review):

      Fu et al. present a multi-model study using goose and mouse that investigates the protective effects of Lactobacillus plantarum against hyperuricaemia. They highlight this strain's significance and clarify its role in responding to intestinal nucleoside levels and affecting uric acid metabolism through modulation of host signaling pathways.

      Strengths:

      (1) Fu et al. created two animal models for validation, yielding more reliable and extensive data. In addition, the in vitro tests were repeatedly tested by a multitude of methods, proving to be convincing.

      (2) This study integrates microbiomics, whole genomics, in vitro bacterial culture, and metabolomics, providing a wealth of data and valuable insights for future research.

      We thank the reviewer for their encouraging assessment.

      Weakness:

      Fu et al. clearly described the role of Lactobacillus plantarum, but it is also important to explore its other mechanisms influencing uric acid metabolism in the host. While changes in hepatic and renal uric acid metabolism were confirmed, the gut's role in this process deserves investigation, particularly regarding whether Lactobacillus plantarum or its metabolites act within the gut. The authors have effectively conveyed the story outlined in the article's title, and the remainder can be explored later. In addition, further discussion is needed to highlight how this strain of Lactobacillus plantarum differs from other Lactobacillus strains or how it innovatively functions differ from some literature reported.

      Thank you! We fully acknowledge the importance of investigating the role of gut in this process, especially whether Lactobacillus plantarum or its metabolites have an effect within the gut, which would be an interesting topic for a follow-up study. We fully agree that it is crucial to highlight how this Lactobacillus plantarum differs from other strains and those reported in the literature regarding its innovative functions, as discussed in detail in lines 343 to 376. We fully acknowledge the importance of investigating the role of gut in this process, especially whether Lactobacillus plantarum or its metabolites have an effect within the gut, which would be an interesting topic for a follow-up study. We fully agree that it is crucial to highlight how this Lactobacillus plantarum differs from other strains and those reported in the literature regarding its innovative functions, as discussed in detail in lines 343 to 376. Previous studies indicate that Lactobacillus plantarum can reduce hyperuricaemia, but its specific uric acid-lowering mechanism and the process of nucleoside degradation remain unclear. We investigated the nucleoside hydrolysis function of Lactobacillus plantarum, identified key genes, and validated by gene knockout. Our findings suggest that host-derived Lactobacillus plantarum plays an antagonistic role against hyperuricaemia.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Theoretical principles of viscous fluid mechanics are used here to assess likely mechanisms of transport in the ER. A set of candidate mechanisms is evaluated, making good use of imaging to represent ER network geometries. Evidence is provided that the contraction of peripheral sheets provides a much more credible mechanism than the contraction of individual tubules, junctions, or perinuclear sheets.

      The work has been conducted carefully and comprehensively, making good use of underlying physical principles. There is a good discussion of the role of slip; sensible approximations (low volume fraction, small particle size, slender geometries, pragmatic treatment of boundary conditions) allow tractable and transparent calculations; clear physical arguments provide useful bounds; stochastic and deterministic features of the problem are well integrated.

      We thank the reviewer for their positive assessment of our work.

      There are just a couple of areas where more discussion might be warranted, in my view.

      (1) The energetic cost of tubule contraction is estimated, but I did not see an equivalent estimate for the contraction of peripheral sheets. It might be helpful to estimate the energetic cost of viscous dissipation in generated flows at higher frequencies.

      This is a good point. We have now included an energetic cost estimate for the contractions of peripheral sheets in the revised manuscript.

      The mechanism of peripheral sheet contraction is unclear: do ATP-driven mechanisms somehow interact with thermal fluctuations of membranes?

      The new energetic estimates in the revision might help constrain possible hypotheses for the mechanism(s) driving peripheral sheet contraction, and suggest if a dedicated ATP-driven mechanism is required.

      (2) Mutations are mentioned in the abstract but not (as far as I could see) later in the manuscript. It would be helpful if any consequences for pathologies could be developed in the text.

      We are grateful for this suggestion. The need to rationalise pathology associated with the subtle effects of mutations of ER-morphogens is indeed pointed out as one factor motivating the study of the interplay between ER structure and performance. In the revised manuscript, we have included a brief discussion potentially linking the malfunction of ER morphogens to luminal transport, referencing freshly published findings.

      Reviewer #2 (Public Review):

      Summary:

      This study explores theoretically the consequences of structural fluctuations of the endoplasmic reticulum (ER) morphology called contractions on molecular transport. Most of the manuscript consists of the construction of an interesting theoretical flow field (physical model) under various hypothetical assumptions. The computational modeling is followed by some simulations.

      Strengths:

      The authors are focusing their attention on testing the hypothesis that a local flow in the tubule could be driven by tubular pinching. We recall that trafficking in the ER is considered to be mostly driven by diffusion at least at a spatial scale that is large enough to account for averaging of any random flow occurring from multiple directions [note that this is not the case for plants].

      We thank the reviewer. Indeed, the trafficking in the ER was historically presumed to be driven by passive diffusion but this has been challenged by recent findings suggesting that the transport may also involve an active super-diffusional component (the short-lasting flows). These findings include: the dependence of ER luminal transport on ATP-derived energy observed in the historical and recent publications cited here; fast and directional single-particle motion; and a linear scaling of photoactivated signal arrival times with distance. On a larger scale, indeed, the motion can be seen as a faster effective diffusion, as there is no persistent circulatory directionality of the currents.

      Weaknesses:

      The manuscript extensively details the construction of the theoretical model, occupying a significant portion of the manuscript. While this section contains interesting computations, its relevance and utility could be better emphasized, perhaps warranting a reorganization of the manuscript to foreground this critical aspect.

      Overall, the manuscript appears highly technical with limited conclusive insights, particularly lacking predictions confirmed by experimental validation. There is an absence of substantial conclusions regarding molecular trafficking within the ER.

      We sought to balance the theoretical/computational details of our model with the biophysical conclusions drawn from its predictions. Given the model's complexity and novelty, it was essential to elucidate the theoretical underpinnings comprehensively, in order to allow others to implement it in the future with additional, or different, parameters. To maintain clarity and focus in the main text, we have judiciously relegated extensive technical details to the methods section or supplementary materials, and divided the text into stand-alone section headings allowing the reader to skip through to conclusions.

      The primary focus of our manuscript is to introduce and explore, via our theoretical model, the interplay between ER structure dynamics and molecular transport. Our approach, while in silico, generates concrete predictions about the physical processes underpinning luminal motion within the ER. For instance, our findings challenge the previously postulated role of small tubular contractions in driving luminal flow, instead highlighting the potential significance of local flat ER areas—empirically documented entities—for facilitating such motion.

      Furthermore, by deducing what type of transport may or may not occur within the range of possible ER structural fluctuations, our model offers detailed predictions designed to bridge the gap between theoretical insight and experimental verification. These predictions detail the spatial and temporal parameters essential for effective transport, delineating plausible values for these parameters. We hope that the model’s predictions will invite experimentalists to devise innovative methodologies to test them. We have introduced text edits to the revised version to clarify the reviewer’s point as per the detailed comments below.

      Recommendations for the authors:

      Editor comments (Recommendations For The Authors):

      The two reviewers have different opinions about the strengths and weaknesses of this work. The editors do believe that this work is a valuable contribution to the field of ER dynamics and transport, and could stimulate experiments.

      We thank both reviewers and the editors for the time and care they have invested in reviewing our manuscript.

      Nevertheless, discussing further the role of diffusion vs. advection in ER luminal transport, including conflicting values of measured diffusion coefficients, would be valuable. For instance, it is possible that the active contraction-driven mechanism results in an effective diffusion over a long time, which could be quantified and compared to experiments.

      In our study we focus on tubule-scale transport because the statistics of transport at this scale have been measured and the origins of the observed transport is an outstanding problem. We already know from Holcman et al. (2018) that transport at the tubule scale involves an active, possibly advective, component beyond passive molecular diffusion. Although we do touch briefly on a network-scale phenomenon in our section on mixing/content homogenisation, our main focus is on trying to understand tubule-scale transport. We agree that a substantial exploration of effective diffusion over a network scale would be of value and increase the breadth of our paper, we feel that this is beyond the scope of the current paper. We believe the “conflicting” diffusion coefficients, in fact, characterise motion at different time and length scales: the global diffusion coefficient pointed out to us in the reviews may pertain to network-scale effective diffusion over long time scales, but this is different to the Brownian motion on the scale of tubules/tubular junctions relevant to our in silico model.

      Reviewer #1 (Recommendations For The Authors):

      I congratulate the authors on their work and do not have any substantial further recommendations, beyond two minor points.

      (1) Before (13), say "Using the expression (7) for Q_2, ..."

      (2) Typo on p.25: "principal" rather than "principle" (two instances)

      We thank the reviewer for spotting these and have addressed both points.

      Reviewer #2 (Recommendations For The Authors):

      Here are some specific comments:

      (1) Insufficient Influence of ER Tubule Contraction:

      The conclusion regarding weak fluid flows generated by ER tubule contractions may seem obvious. It would be more intriguing if the authors explored conditions necessary to achieve faster flows, such as those around 20 µm/s, within tubules.

      We agree these are important conditions to explore and it is extensively covered in Fig. 4e-f, which show that tubule contraction sites the length of entire tubules and occurring at 5 and 10 times the experimentally measured rates produce mean average edge traversal speeds exploring otherµconceivable scenarios. of ~30 and 60 m/s respectively. These pinch parameters seemed unlikely and motivated

      (2) Limited Impact of ER Network Geometry:

      The comparison across different ER network structures seems insufficiently documented. A comparison between distal and proximal ER from the nucleus could provide deeper insights.

      We have added text in the new paragraph 4 of the introduction to better articulate the core principles of the ER’s structural elements. As established by historical EM and light microscopy, the ER is universally composed of tubules, with 3-way junctions, and small (peripheral) or large perinuclear sheets. We establish that the specific shaping of these elements influences the nanofluidics we investigate here. While the proportion of these elements may vary across different cell types and cellular regions, the fundamental structure, and therefore the impact on local mobility remains consistent. Our categorisation of the ER into its elements reflects these ubiquitous components, allowing us to analyse the impact of shaping at the relevant scale, covering the perinuclear and peripheral ER.

      (3) Ineffectiveness of Tubule Junction Contraction:

      The study's negative result on ER tubule junction contraction's impact on molecular exchange may not capture broad interest without experimental validation. Conducting experiments to test this hypothesis could strengthen the study.

      We agree that experimental testing of this prediction in the future, when appropriate tools become available to correlate molecular motion speed and fast contractions of nanoscopic tubular junctions, will be needed for its validation.

      (4) Potential Role of Peripheral Sheets:

      While the speculation on the contraction of peripheral ER sheets is intriguing, further experimental investigation is warranted to validate this hypothesis, especially considering the observed slow diffusion in ER sheets.

      We agree with the reviewer that our study is theoretical in nature and on the necessity of further experimental investigation before we are able to make a definitive conclusion on peripheral sheets.

      In summary, while the study underscores the complexity of ER morphology dynamics and its implications for molecular transport, its novelty and broad implications seem limited. Given its reliance on computational simulations and dense theoretical language, submission to a computational journal could be more appropriate. In addition, given there is an absence of substantial conclusions regarding molecular trafficking within the ER, publication in a specialized journal of fluid mechanics or physics may be appropriate.

      Comments:

      - The manuscript is hard to read. There is no smooth transition from Figure 1 to Figure 2.

      To smoothen the transition, we edited the text at the beginning of results and added a reference there to the introductory Fig. 1.

      - Figure 8 serves no purpose. To make the text easier, C0, C1, C2... should be presented in Figure 2 and merged with Figure 10 with a table summarizing the information of these networks. It is not clear why 5 networks are needed. They look similar. Could you add the number of nodes per network?

      We have now merged Fig 8 and Fig 10 from the previous version into one figure (which is now Fig 9). We have also added information about the number of nodes and added a sentence in the manuscript to clarify that it showcases the source data used to model/reconstruct realistic ER structures.

      - Figure 13: seems out of contex. What is the message? The ER does not show any large flow--from early FRAP and recent photoactivation - the material seems to diffuse at long distances made by few tubules.

      Fig 13 (now Fig 12 in the revised version) does not illustrate any flow. Its purpose is to illustrate the computational methodology used to simulate flows and transport due to contraction of perinuclear sheets. (Note that we have spotted and fixed a small but important typo in the caption: “peripheral” →”perinuclear”.) It is worth noting that FRAP provides a relative estimate of mobility but contains no information as to the mode of motion. Whether the motion is diffusive or otherwise must be presumed in FRAP analyses and this presumption then can be used to extract metrics such as the diffusion coefficient. Photoactivation analyses suffer from the same limitation but analysis of how photoactivated signal arrival times scale with distance was recently suggested as a workaround. These measurements suggest a superdiffusive ER transport (https://doi.org/10.1016/j.celrep.2024.114357). Although a different approach used in a recent preprint to photoactivation signal analysis suggests that at long-distances transport can be approximated as diffusion (https://doi.org/10.1101/2023.04.23.537908), improved measurement in the future would be needed to address the seeming discrepancies.

      - Figure 1: what is the difference between a and b? How do you do your cross-section? This probability needs a drawing at least to understand how you define it.

      We expanded the explanation in the third last paragraph of Section I.

      - Figure 2: this manuscript is not a review. It is not clear why part of a figure is copied and pasted from another manuscript. It should be removed. Are the authors using the quantification [peaks of different color]? Where? The title should be given to explain each panel.

      We have chosen to keep the inset, which was not in the main text of the cited paper but its supplementary information, and provides a direct benchmark for our work.

      Why the mean flow in a is stochastic? With large excursion for large values? Could you plot the Fourrier or spectrogram so we can understand the frequencies? Are there regular patterns of bursts?

      The mean (i.e. cross-sectionally averaged) flow is stochastic because the pinching events are random (more precisely, they follow a Poisson distribution, as explained in the paper). Large excursions are rare and caused by interactions between pinches. We have prescribed the distributions of pinch durations and frequencies as per experimentally measured distributions and we do not expect to recover from a Fourier analysis more information than we have prescribed.

      What do we learn from the fit of Fig 2b-c? Is it a constant flow?

      The conclusion of Fig 2b-c is that the in silico simulation model based on the pinching tubule hypothesis produces solute transport, as quantified by the instantaneous particle speeds (Fig 2b) and the average edge traversal speeds (Fig 2c), that is much weaker than experimentally measured. This is one of the main results of our paper and explained in Section IIA, paragraph 3. Fig 2a tells us that the flow is not constant (flows in this system can only be generated transiently, with directionality persistence considered unlikely).

      Figure 14: Estimating of the area is unclear. The legend is largely insufficient.Why did the authors report only nine regions of contractions? Is it so rare? How many samples have they used? Nine among how many?

      Thank you; the details of area estimation are included in the main text, in Section I.4. The nine regions are an arbitrary selection of a sample we deemed representative of this phenomenon.

      - Abstract: this is misleading, it should start by explaining that diffusion is the consensus of trafficking in the ER.

      - "the content motion in actively contracting nanoscopic tubular networks" is misleading. We should recall that this is an assumption that has not been proven.

      The current abstract is a succinct summary of the question in scope and results. The sentence highlighted by the referee specifically refers to the model we study in the paper; we modified it in order to remove any ambiguity and to make clear that we are testing a proposed mechanism. We also point out that although the biological origins of the tubule contractions or their effects on solute transport have not been established, these contractions have been documented.

      Minor comments:

      Introduction: "Thus past measurements indicate that the transport of proteins in ER is not consistent with Brownian motion" is misleading. You should explain that this depends on the time scale. At large timescale, diffusion is a coarse-grained description and is actually accurate from FRAP and photoactivation data [see J. Lippincott-Schwartz publications over the past 20 years].The super-diffusion [9] "A photoactivation chase technique also measured a superdiffusive behaviour of luminal material spread through the ER network [9]." This is not clear and is probably due to an artifact of measurements or interpretation.

      We thank the reviewer for this comment. We expanded in paragraph 2 of the introduction to better reflect the state of knowledge around this point.

      Page 2 "Strocytes" does not exist you may be meant "Astrocytes".

      Thank you; typo fixed.

      Page 5: The value of the flow seems incompatible with previous literature ~ 20 mu m/s. Again where 0.6 is found? Where in [7]: if there is no diffusion in the tubule, why compare with 0.6 mu m^2/s? The global diffusion coefficient is much higher ~ 5 mu m^2/s.

      Supplementary Figure 3b of µHolcman et al., Nat Cell Biol, 2018 (Ref. [7] in the unrevised version: The value of 0.6   m^2/s is the intranodal diffusion coefficient reported empirically in our article), for ER in COS-7 cells. Motion inside the tubule would in general consist of a combination of advection and diffusion; since the same fluid occupies the tubules and the m^2/s as the diffusion coefficient in tubules as well. The experiments in Holcman et al. (2018) µ junctions, and the junction sizes are similar to the tubule diameters, it is reasonable to take 0.6 does not mention diffusion inside tubules because (i) the study reports a dominantly advective (or at least active) transport across tubules (the driving mechanism of which remains unknown) but this does not mean diffusion is not there as well; and (ii) the time resolution in these experiments are too low to capture the fine details of solute motion inside tubules, and the transport is captured only as “jumps” between junctions. We point out also that the higher global diffusion coefficient may pertain to network-scale effective diffusion over long time scales, which is different to the Brownian motion at the scale of tubules/tubular junctions relevant to our _in silic_o model.

      Page 5: "The distributions of the average edge traversal speeds appeared insensitive to ER structure variations for both pinching-induced and exclusively diffusion transport." is rather trivial. Similar to "the presumed pinching parameters would be inadequate to facilitate ER luminal material exchange."

      These sentences, and the surrounding text, report the observed outcome of our numerical simulations: pinch-induced transport statistics has little variation across different ER geometries, and pinching does not facilitate luminal content mixing. This conclusion was not clear to us without running the simulation, and hence we deemed it nontrivial and relevant to comment on.

      Page 7: The authors mention that they could measure "typical edge traversal speed of 45 µm/s".

      I am not aware of such a measurement. Could they explain where this number comes from?

      These measurements were reported in Supplementary Figure 3b (bottom right) of Holcman et al. (2018) and are for the ER in a COS-7 cell. The main figure Fig. 2g reports analogous measurements for COS-7 cells because the tubule contraction data reported in this work measurements (mean speed 20 m/s) for a HEK-293 cell. We have worked with the speed pertains to COS-7 cells.

      A contraction that leads to 3.9 mu m/s over a distance of a few microns would be interesting. Is this a prediction of the present model?

      Yes, as stated in Section II, C paragraph 2. The present model with the experimentally measured averages for the tubule contraction parameters does indeed predict that a particle, in the absence of diffusion, is transported by a single tubule contraction at a maximum speed of 3.9 µm/s over 0.19 µm.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The overall writing is very difficult to follow and the authors need to work on significant re-writing. 

      Thank you for your comment. We have rewritten the text and asked an immunology expert, who is also a native English speaking editor, to review it.

      (2) The paper in its current form really lacks detail and it is NOT possible for readers to repeat or follow their methods. For example: a) It is not clear whether the authors checked the serum to see if the mice were producing antibodies before they sacrificed them to harvest spleen/blood i.e. using ELISA? b) How long after administration of the second dose were the mice sacrificed? c) What cell types are taken for single B cell sorting? Splenocytes or PBMC?

      Thank you for your comment. We have revised the methodology section thoroughly to ensure that the readers can follow and replicate the method. Our responses to the specific examples raised are as follows:

      a) We did not examine the serum titer after immunization. An increased serum titer, as determined by ELISA, does not always reflect the number of cross-reactive B cells because we expected the serum titer to consist of polyclonal antibodies, which are a mixture of PR8-reactive, H2-reactive, and cross-reactive clones. We thus anticipated that we would not obtain enough cross-reactive B cells after a series of immunizations. After comparing various immunization methods, including different adjuvants and immunization sites, using the readout of the number of cross-reactive B cells, we decided to adopt the immunization protocol presented in this paper.

      b) We sacrificed the mice two weeks after the second immunization (see Supplementary Figure 5).

      c) For this experiment, we used CD43 MACS B cells from the spleen purified with negatively charged beads (see Supplementary Figure 6).

      (3) According to the authors, 77 clones were sorted from the PR8+ and H2+ double positive quadrant. It is surprising that after transfection and re-analyzing of bulk antibody presenting EXPI cells on FACS, only 13 clones (or 8 clones? - unclear) seemed to be truly cross-reactive. If that is the case, the approach is not as efficient as the authors claimed.

      Thank you for your comment. To isolate high affinity antibodies, we gated the high fluorescent intensity population of cross-reactive B cells during Ig-expressing 293 cell sorting, as shown in Fig 2B, while we collected a wide intensity population of cross-reactive cells during splenocyte sorting. The narrow gating reduced the number of clones. We, however, cannot quantify how many clones we lost in the process, but we achieved a cloning efficiency exceeding 75%. To avoid any confusion, we have clarified this point by attaching additional supplementary figures (Supplementary Figures 5 and 6).

      Reviewer #2 (Public Review):

      (4) A His tagged antigen was used for immunization and H1-his was used in all assays. Either the removal of His specific clones needs to be done before selection, or a different tag needs to be used in the subsequent assays.

      Thank you for your comment. As pointed out, the possibility of antibody generation in regions other than HA cannot be ruled out since the immunized antigen and the detection antigen were the same. However, as shown in Table 1, the cross-reactive antibodies obtained in this study exhibited characteristic binding abilities to each of the six types of HA. If these were antibodies recognizing His, they would bind to all six types of HA. This indicates that these cross-reactive antibodies were not His-specific clones.

      We have incorporated information on this potential caveat into the discussion (page 12, lines 4-9).

      (5) This assay doesn't directly test the neutralization of influenza but rather equates viral clearance to competitive inhibition. The results would be strengthened with the demonstration of a functional antibody in vivo with viral clearance.

      Thank you for your constructive comment. While we agree that demonstration of a functional antibody in vivo with viral clearance would strengthen our results, this is clearly out of the scope of our current study and will be subject of future research.

      (6) Limitations of this new technique are as follows: there is a significant loss of cells during FACs, transfection and cloning efficiency are critical to success, and well-based systems limit the number of possible clones (as the author discussed in the conclusions). Early enrichment of the B cells could improve efficiency, such as selection for memory B cells.

      Thank you for your comment. Our cloning efficiency for sorted B cells exceeded 75%. However, we selected high binders of cross-reactive B cells during Ig-expressing 293 functional screening on purpose, as shown in Figure 2B, while we collected all cross-reactive B cells during B cell sorting (see attached Supplementary Figure 5). This functional selection step reduced the number of clones. We clarified this point by attaching additional supplementary figures (Supplementary Figures 5 and 6).

      Our sorted cross-reactive B cells are most likely CD38+ memory B cells, as shown in Supplementary Figure 6.

      Reviewer #1 (Recommendations For The Authors):

      a) It is advised for the authors to provide a flow chart with time stamps to prove the many statements made in the paper. For example, it is stated that "we demonstrated efficient isolation of influenza cross-reactive antibodies with high affinity from mouse germinal B cells over 4 days". It is not clear how this was calculated.

      Thank you for your comment. We have prepared a time-stamped flow chart (Supplementary Figure 5).

      b) The papers cited by the authors are relatively old if not outdated. There are many papers published focusing on efficient isolation of mAbs for SARS-CoV-2 research. For example, the paper by Lima et al (Nat Comm 2022, 13:7733) used a very similar strategy for rapid isolation of cross-reactive mAbs by FACS sorting followed by cloning of paired heavy and light chains from single B cells. The authors need to incorporate citations from the latest publications in this field.

      Thank you for your comment. The paper by Lima et al. (Nat Comm 2022, 13:7733) has been cited in the Discussion as ref 28.

      c) Figure 2 needs much more detail for readers to follow.

      Thank you for your comment. We have revised the legend of Figure 2 accordingly and added additional supplementary figures (Supplementary Figures 5 and 6) to increase clarity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This work sought to demonstrate that gut microbiota dysbiosis may promote the colonization of mycobacteria, and they tried to prove that Nos2 down-regulation was a key mediator of such gut-lung pathogenesis transition.

      Strengths:

      They did large-scale analysis of RNAs in lungs to analyze the gene expression of mice upon gut dysbiosis in MS-infected mice. This might help provide an overview of gene pathways and critical genes for lung pathology in gut dysbiosis. This data is somewhat useful and important for the TB field.

      Weaknesses:

      (1)They did not use wide-type Mtb strain (e.g. H37Rv) to develop mouse TB infection models, and this may lead to the failure of the establishment of TB granuloma and other TB pathology icons.

      The colonization of M.tb in the lungs and the amount of colonization are the first and primary conditions for the occurrence of TB. Our aim in this study is to explore the impact of gut microbiota dysbiosis on the colonization of M.tb in the lungs. However, due to the lack of necessary conditions for biosafety in our laboratory, some highly infectious bacteria (such as M.tb) are not allowed to be cultured, and establishing the M.tb infection animal model in our laboratory does not meet the requirements of biosafety. Hence, we used the model strain of M.tb, M.smegmatis (MS), and established the animal-infected model for exploring the effect of gut microbiota dysbiosis on MS colonization in mice lungs. However, the establishment of MS infected model may not necessarily produce typical TB granulomas and other TB pathology signs. we have discussed the limitations of the current study in the discussion part of the manuscript. The suggested revisions are shown in lines 21-39 of page 15. In future studies, we plan to adopt the reviewer's suggestion and will use a wide-type M.tb strain to establish the TB-infected model in the laboratory that has biosafety standards to further verify the results of the current study.

      (2) The usage of in vitro assays based on A542 to examine the regulation function of Nos2 expression on NO and ROS may not be enough. A542 is not the primary Mtb infection target in the lungs.

      Thanks for the reviewer’s comments. Although alveolar epithelial cells (AECs) are not the main target cells of Mtb infection, they are among the cells that are contacted early in M.tb infection. Early M.tb invasion of AECs is very essential for the establishment of infection ( PMID 11479618). AECs are usually the initial site of the lung’s response against M.tb. Available literature suggests that freshly isolated AECs are more permissive to M.tb growth than macrophages(PMID 33228849). As a cellular reservoir for M.tb, AECs are capable of facilitating rapid bacterial growth while potentially escaping recognition by phagocytes in the alveolus. The immune cells such as macrophages are the primary targets of M.tb infection, where the M.tb survive and proliferate, leading to the formation and maintenance of granulomas. However, AECs are subjected to the same density of infection, and the bacteria invade and replicate in these cells and induce cell apoptosis and necrosis, which is considered a major mechanism implicated in extra-pulmonary dissemination (PMID 12925134, PMID 32849525). Besides their direct barrier role, AECs also directly respond to M.tb infection by producing mediators such as cytokines, chemokines, and antimicrobial agents (PMID 35017314). Therefore, it is feasible to select alveolar epithelial cell A549 to explore the colonization mechanism of intestinal microbiota affecting M.tb in vitro.  

      (3) They did not examine the lung pathology upon gut dysbiosis to examine the true significance of increased colonization of Mtb.

      We have added the results of the lung pathological section in the revised manuscript. The results of lung pathological sections are shown in lines 11-13 of page 4, and Figure S2 of supplement information.

      (4) Most of the studies are based on MS-infected mouse models with a lack of clinical significance.

      The first and primary condition of any pathogen infection is that the bacteria must invade the host through colonization and multiply in the target organ. This study aimed to investigate the effect of intestinal microbial dysbiosis on the colonization of mycobacterium in mouse lungs. Our laboratory does not meet the biosafety standard for culturing highly infectious bacteria such as Mycobacterium tuberculosis. So, we used the Mycobacterium smegmatis as a model strain for M.tb to establish the infected mice model in the current research. Although M. smegmatis is generally considered nonpathogenic, M. smegmatis is closely related to M.tb in biochemical characteristics, genetic information, cell structure, and metabolism( PMID 32674978). M.smegmatis is regarded as a valuable model organism in the study of M.tb, which is widely been used to explore the biological characteristics of M.tb such as physiological state, stress response, non-culture state reactivation, antimicrobial activity, and biochemical protection (PMID 32674978). It has also been reported that M.smegmatis could be used as a model strain to study the molecular mechanism of interaction between M.tb and its host (PMID 30546046, PMID25970481, PMID 29568875). However, in preclinical experimental research, we used M. smegmatis as the object of study. Instead of focusing on the pathological changes caused by M.smegmatis in the host lungs, we mainly focused on the influence of intestinal microbiota on the colonization of mycobacterium in the lungs and its possible mechanism, which provides a reliable model to study the prevention of early infection and spread of M.tb through regulating the intestinal microbiota. It has important clinical significance for the further development of new measures for the prevention and control of tuberculosis. If experimental conditions permit, the establishment of an infected model with wild-type M.tb can be used to verify the findings of the present study which may provide important clinical guidelines.

      Reviewer #2 (Public Review):

      The manuscript entitled "Intestinal microbiome dysbiosis increases Mycobacteria pulmonary colonization in mice by regulating the Nos2-associated pathways" by Han et al reported that using clindamycin, an antibiotic to selectively disorder anaerobic Bacteriodetes, intestinal microbiome dysbiosis resulted in Mycobacterium smegmatis (MS) colonization in the mice lungs. The authors found that clindamycin induced damage of the enterocytes and gut permeability and also enhanced the fermentation of cecum contents, which finally increased MS colonization in the mice's lungs. The study showed that gut microbiota dysbiosis up-regulated the Nos2 gene-associated pathways, leading to increased nitric oxide (NO) levels and decreased reactive oxygen species (ROS) and β-defensin 1 (Defb1) levels. These changes in the host's immune response created an antimicrobial and anti-inflammatory environment that favored MS colonization in the lungs. The findings suggest that gut microbiota dysbiosis can modulate the host's immune response and increase susceptibility to pulmonary infections by altering the expression of key genes and pathways involved in innate immunity. The authors reasonably provided experimental data and subsequent gene profiles to support their conclusion. Although the overall outcomes are convincing, there are several issues that need to be addressed:

      (1) In Figure S1, the reviewer suggests checking the image sizes of the pathological sections of intestinal tissue from the control group and the CL-treatment group. When compared to the same intestinal tissue images in Figure S4, they do not appear to be consistently magnified at 40x. The numerical scale bars should be presented instead of just magnification such as "40x".

      Thanks for the precise comments. We have carefully checked the pathological section in Figure S1 and Figure S5 and added the numerical scale bars to the figure. The revised sections are added in the supplementary materials.

      (2) In Figure 4d, the ratio of Firmicutes in the CL-FMT group decreased compared to the CON-FMT group, whereas the CL-treatment group showed an increase in Firmicutes compared to the Control group in Figure 3b. The author should explain this discrepancy and discuss its potential implications on the study's findings.

      The success of fecal microbial transfer (FMT) is influenced by many factors, such as host intestinal microbiota, immunity, and genetic factors (PMID 37167953). During FMT procedure, all microbiota of the donor feces do not have the same colonization ability in the recipients. Some research has revealed that the colonization success rate of Bacteroidetes is higher than that of Firmicutes [PMID 24637796]. In this study, we noticed that the reason for the difference between Figure 4D and Figure 3B was that during FMT, the colonization of Firmicutes decreased in the Cl-FMT receptor after transplantation, while the colonization of Bacteroides increased, resulting in a decrease in the proportion of Firmicutes/ Bacteroides in the Cl-FMT group. However, we considered the gut microbiota as a whole in the present study. After FMT, we found that 85.11% of bacterial genera and 52.38% of fungi genera present in the CL inocula were successfully transferred to the CL-recipient mice, and 91.45% of bacteria genera and 56.36% of fungi genera in the CON inocula were also successfully transferred to the CON-recipient mice, respectively (Figure 4g). The trans-kingdom network analyses between bacteria and fungi showed that the trends of the gut microbiome in recipient mice were consistent with those in the donor mice. Therefore, the FMT model established in this study remains successful. For reviewer clarification, we have added explanations in the discussion part of the manuscript. See lines 8-29 of page 12 for details.

      (3) In Figure 6, did the authors have a specific reason for selecting Nos2 but not Tnf for further investigation? The expression level of the Tnf gene appears to be the most significant in both RT-qPCR and RNA-sequencing results in Figure 5f. Tnf is an important cytokine involved in immune responses to bacterial infections, so it is also a factor that can influence NO, ROS, and Defb1 levels.

      Thanks for the valuable reviewer’s comment. By analyzing the transcriptome data, we found that there were 8 genes strongly associated with TB infection in the KEGG pathway, including Nos2, Cd14, Tnf, Cd74, Clec4e, Ctsd, Cd209a, and Il6. Then, we performed KO pathway analysis and found that the Nos2 gene was strongly associated with multiple pathways including “cytokine activity ", "chemokine activity", and "nitric oxide synthase binding". Moreover, in a clinical study on tuberculosis, the expression level of Nos2 in the plasma of patients with newly diagnosed tuberculosis was significantly higher than that of healthy people, indicating that Nos2 is associated with the occurrence of tuberculosis (PMID 34847295). Therefore, we selected Nos2 as the main target gene in the current study to conduct the correlation pathway analysis. As an important cytokine involved in the immune response to bacterial infection, Tnf mentioned by the reviewers may also be a factor affecting the levels of NO, ROS, and Defb1, which provides a new idea for our future research.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      First, they need to use a true Mtb-infected mouse model to determine the relationship between gut dysbiosis and increased lung infection of Mtb.

      Second, the mechanism by which nos2-mediated NO and ROS production need to be further analyzed in the real Mtb infection process (either in vivo or in vitro).

      Third, Lung pathology should be included in addressing the increased colonization of mycobacteria. Addressing these problems may help improve this work.

      (1) Our laboratory does not meet the biosafety standard for culturing highly infectious bacteria such as Mycobacterium tuberculosis. So, we used the Mycobacterium smegmatis as a model strain for M.tb to establish the infected mice model in the current research. Although M. smegmatis is generally considered nonpathogenic. M. smegmatis is closely related to M.tb in biochemical characteristics, genetic information, cell structure, and metabolism( PMID 32674978). M.smegmatis is regarded as a valuable model organism in the study of M.tb, and has been widely used to explore the biological characteristics of M.tb such as physiological state, stress response, non-culture state reactivation, antimicrobial activity, and biochemical protection (PMID 32674978). It has also been reported that M.smegmatis was used as a model strain to study the molecular mechanism of interaction between M.tb and its host (PMID 30546046, PMID25970481, PMID 29568875). However, in preclinical experimental research, we mainly focused on the influence of intestinal microbiota on the colonization of mycobacterium in the lungs and its possible mechanism which provides a reliable model to study the prevention of early infection and spread of M.tb through regulating the intestinal microbiota.

      (2) In the future, we will establish an infected model with wild-type M.tb to verify the mechanism by which nos2-mediated NO and ROS production and promote M.tb colonization.

      (3) We have added the results in the lung pathological section in the revised manuscript. The results of lung pathological sections are shown in lines 11-13 of page 4, and Figure S2 of supplement information.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The exclusive use of males is a major concern lacking adequate justification and should be disclosed in the title and abstract to ensure readers are aware of this limitation. With several reported sex differences in rat vocal behaviors this means caution should be exercised when generalizing from these findings. The occurrence of an estrus cycle in typical female rats is not justification for their exclusion. Note also that male rodents experience great variability in hormonal states as well, distinguishing between individuals and within individuals across time. The study of endocrinological influences on behavior can be separated from the study of said behavior itself, across all sexes. Similarly, concerns about needing to increase the number of animals when including all sexes are usually unwarranted (see Shansky [2019] and Phillips et al. [2023]).

      As suggested by the Reviewer, we have disclosed the use of males in the title and the abstract. Also, we have added the statement that research on female rat subjects is required: “Here we are showing introductory evidence that 44-kHz vocalizations are a separate and behaviorally-relevant group of rat ultrasonic calls. These results require further confirmations and additional experiments, also in form of repetition, including research on female rat subjects.”

      Regarding the analysis where calls were sorted using DBSCAN based on peak frequency and duration, my comment on the originally reviewed version stands. It seems that the calls are sorted by an (unbiased) algorithm into categories based on their frequency and duration, and because 44kHz calls differ by definition on frequency and duration the fact that the algorithm sorts them as a distinct category is not evidence that they are "new calls [that] form a separate, distinct group". I appreciate that the authors have softened their language regarding the novelty and distinctness of these calls, but the manuscript contains several instances where claims of novelty and specificity (e.g. the subtitle on line 193) is emphasized beyond what the data justifies.

      We further softened our language regarding novelty and distinctness of 44-kHz vocalizations – including the aforementioned subtitle. However, in response, we would like to bring to the readers’ attention that all major groups of calls, i.e., long 22-kHz calls, short 22-kHz calls, and 50-kHz vocalization, are also defined in our manuscript and in the literature by their frequency and duration. However not one of these groups was identified separately by DBSCAN clustering excepting the 44kHz vocalizations. If they were not a distinct group, we would expect the 44-kHz and 50-kHz vocalizations to blend first (because of the similar frequencies) or 44-kHz and 22-kHz calls to merge first (because of the similar durations), but they do not in this unbiased examination.

      The behavioral response to call playback is intriguing, although again more in line with the hypothesis that these are not a distinct type of call but merely represent expected variation in vocalization parameters. Across the board animals respond rather similarly to hearing 22 kHz calls as they do to hearing 44 kHz calls, with occasional shifts of 44 kHz call responses to an intermediate between appetitive and aversive calls. This does raise interesting questions about how, ethologically, animals may interpret such variation and integrate this interpretation in their responses. However, the categorical approach employed here does not address these questions fully.

      This paragraph is exactly the same as in the previous review. There was no comment regarding our previous answer. Here is the previous answer:

      “We are unsure of the Reviewer’s critique in this paragraph and will attempt to address it to the best of our understanding. Our finding of up to >19% of long seemingly aversive, 44-kHz calls, at a frequency in the define appetitive ultrasonic range (usually >32 kHz) is unexpected rather than “expected”. We would agree that aversive call variation is expected, but not in the appetitive frequency range.

      Kindly note the findings by Saito et al. (2019), which claim that frequency band plays the main role in rat ultrasonic perception. It is possible that the higher peak frequency of 44-kHz calls may be a strong factor in their perception by rats, which is, however, modified by the longer duration and the lack of modulation. 

      Also, from our experience, it is quite challenging to demonstrate different behavioral responses of naïve rats to pre-recorded 22-kHz (aversive) vs. 50-kHz (appetitive) vocalizations. Therefore, to demonstrate a difference in response to two distinct, potentially aversive, calls, i.e., 22kHz vs. 44-kHz calls, to be even more difficult (as to our knowledge, a comparable experiment between short vs. long 22-kHz ultrasonic vocalizations, has not been done before). 

      Therefore, we do not take lightly the surprising and interesting finding that “animals respond rather similarly to hearing 22 kHz calls as they do to hearing 44 kHz calls, with occasional shifts of 44 kHz call responses to an intermediate between appetitive and aversive calls”. We would rather put this description in analogous words: “the rats responded similarly to hearing 44-kHz calls as they did to hearing aversive 22-kHz calls, especially regarding heart-rate change, despite the 44-kHz calls occupying the frequency band of appetitive 50-kHz vocalizations” and “other responses to 44-kHz calls were intermediate, they fell between response levels to appetitive vs. aversive playback” – which we added to the Discussion.

      Finally, we acknowledge that our findings do not present a finite and complete picture of the discussed aspects of behavioral responses to the presented ultrasonic stimuli (44-kHz vocalizations). Therefore, we have incorporated the Reviewer’s suggestion in the discussion. The added sentence reads: “Overall, these initial results raise further questions about how, ethologically, animals may interpret the variation in hearing 22-kHz vs. 44-kHz calls and integrate this interpretation in their responses.”

      I appreciate the amendment in discussing the idea of arousal being the key determinant for the increased emission of 44kHz, and the addition of other factors. Some of the items in this list, such as annoyance/anger and disgust/boredom, don't really seem to fit the data. I'm not sure I find the idea that rats become annoyed or disgusted during fear conditioning to be a particularly compelling argument. As such the list appears to be a collection of emotion-related words, with unclear potential associations with the 44kHz calls.

      We agree that most of the factors listed are not supported by the data. These are hypotheses and speculations only – hence, an assumption / tentative statement, i.e., “It could also be argued that…”. We have changed it into “It could also be speculated that…”.

      Later in the Discussion the authors argue that the 44kHz aversive calls signal an increased intensity of a negative valence emotional state. It is not clear how the presented arguments actually support this. For example, what does the elongation of fear conditioning to 10 trials have to do with increased negative emotionality? Is there data supporting this relationship between duration and emotion, outside anthropomorphism? Each of the 6 arguments presented seems quite distant from being able to support this conclusion.

      We have added a description summarizing the literature that expounds the differences in employing one-two vs. five-ten foot-shocks during fear-conditioning training. It says:

      “Importantly, it has been demonstrated multiple times that training rats with several electric foot-shocks (i.e., 5-10 shocks) produces a qualitatively different kind of fear-memory compared to training with only 1-2 shocks. Training with more numerous shocks has been shown to result in augmented freezing (e.g., Fanselow and Bolles, 1979, Haubrich et al., 2020, Haubrich and Nader, 2023, Poulos et al., 2016, Wang et al., 2009) which reflects a more intense fear-memory that is resistant to extinction (Haubrich et al., 2020, Haubrich and Nader, 2023), resistant to reconsolidation blockade (Haubrich et al., 2020, Wang et al., 2009, Finnie and Nader, 2020), associated with downregulation of NR2B NMDA-receptor subunits as well as elevated amyloid-beta concentrations in the lateral and basal amygdala (Finnie and Nader, 2020, Wang et al., 2009). Additionally, it involves activation of the noradrenaline-locus coeruleus system (Haubrich et al., 2020) and collective changes in connectivity across multiple brain regions within the neural network (Haubrich and Nader, 2023). 

      Notably, it has also been shown that higher freezing as a result of fear-conditioning training correlates with increased concentrations of stress hormone, corticosterone, in the blood (Dos Santos Correa et al., 2019). The rats subjected to 6- and 10-trial fear conditioning, whose results are reported herein (Tab. 1/Exp. 2/#7,8,11,12; n = 73), also demonstrated higher freezing than rats subjected to 1trial conditioning (Tab. 1/Exp. 2/#6,10; n = 33), which is reported elsewhere (Olszynski et al., 2021, Fig. S1C-E; Olszynski et al., 2022, Fig. S1D-G). Therefore, we postulate that emission of 44-kHz calls is associated with increased stress and the training regime forming robust memories.”  

      In sum, rather than describing the 44kHz long calls as a new call type, it may be more accurate to say that sometimes aversive calls can occur at frequencies above 22 kHz. Individual and situational variability in vocalization parameters seems to be expected, much more so than all members of a species strictly adhering to extremely non-variable behavioral outputs.

      This paragraph is exactly the same as in the previous review. There was no comment regarding our previous answer. Here is the previous answer:

      “The surprising fact that there are presumably aversive calls that are beyond the commonly applied thresholds, i.e., >32 kHz, while sharing some characteristics with 22-kHz calls, is the main finding of the current publication. Whether they be finally assigned as a new type, subtype, i.e. a separate category or become a supergroup of aversive calls with 22-kHz vocalizations is of secondary importance to be discussed with other researchers of the field of study. 

      However, we would argue – by showing a comparison – that 22-kHz calls occur at durations of <300 ms and also >300 ms, and are, usually, referred to in literature as short and long 22-kHz vocalizations, respectively (not introduced with a description that “sometimes 22-kHz calls can occur at durations below 300 ms”). These are then regarded and investigated as separate groups or classes usually referred to as two different “types” (e.g., Barker et al., 2010) or “subtypes” (e.g., Brudzynski, 2015). Analogously, 44-kHz vocalizations can also be regarded as a separate type or a subtype of 22kHz calls. The problem with the latter is that 22-kHz vocalizations are traditionally and predominantly defined by 18–32 kHz frequency bandwidth (Araya et al., 2020; Barroso et al., 2019; Browning et al., 2011; Brudzynski et al., 1993; Hinchcliffe et al., 2022; Willey & Spear, 2013).”

      Reviewer #1 (Recommendations For The Authors):

      Additional considerations:

      Abstract: The 19.4% seems to be the percentage of 44 kHz calls observed during the 9th trial of the 10trial experiment, not the percentage of calls that were 44kHz during bouts of freezing.

      We clarified the sentence. It now says:

      “We observed 44-kHz calls to be associated with freezing behavior during fear conditioning training, during which they constituted up to 19.4% of all calls and most of them appeared next to each other forming groups of vocalizations (bouts).”

      Abstract: "We hope that future investigations of 44-kHz calls in rat models of human diseases will  contribute to expanding our understanding and therapeutic strategies related to human psychiatric conditions." This sounds like a far too strong of an implication provided the link between these calls and models of human psychiatric conditions is not clear.

      We agree, the link is not clear. Therefore we only express our hope. We hope “the link” is there. While other ultrasonic calls are already being investigated in such animal models, training regimes employing numerous electric shocks are used as models of PTSD, helplessness etc.

      Line 101: Seems a strong assumption to state the authors of the other publication were inspired by this paper, unless there is personal communication corroborating this.

      The wording of the sentence has been changed.

      It is still not clear why both Friedman and Wilcoxon tests were used, especially in situations where only one result seems to be referenced (for example on line 108-109).

      We added the explanation within Methods: “In particular, the Friedman test was used to assess the presence of change within the sequence of several ITI, while the Wilcoxon test was used for the difference between the first and the last ITI analyzed.”

    1. Author response:

      We sincerely appreciate the positive assessment regarding the significance of our study, as well as the valuable suggestions provided by editor and the reviewers.

      In response to the reviewers’ comments, we will modify the manuscript to include co-staining of CD66b and GSDMD in the whole skin samples of clinical patients, which will further clarify the expression of GSDMD in neutrophils.

      Additionally, we plan to conduct further analyses using publicly available data to elucidate the changes in neutrophil pyroptotic signaling in IMQ-induced psoriatic mice tissue, thereby strengthening our conclusions about the role of neutrophil pyroptosis in the progression of psoriasis.

      Moreover, while our research primarily focuses on the role of neutrophil pyroptosis in psoriasis, this does not conflict with existing reports indicating that KC cell pyroptosis also contributes to disease progression. Both studies underscore the significant role of GSDMD-mediated pyroptotic signaling in psoriasis, and the consistent involvement of KC cells and neutrophils further emphasizes the potential therapeutic value of targeting GSDMD signaling in psoriasis treatment. We will expand upon this discussion in the revised manuscript.

      In our model, to accurately assess the disease condition in mice, we standardized the drug treatment area on the dorsal side (2*3 cm). Therefore, the area was not factored into the scoring process, and we will include a detailed description of this in the revised manuscript.

      Regarding the downregulation of CASP in GSDMD KO mouse skin tissue, existing studies indicate that GSDMD generates a feed-forward amplification cascade via the mitochondria-STING-Caspase axis (PMID: 36065823, DOI: 10.1161/HYPERTENSIONAHA.122.20004). We hypothesize that the absence of GSDMD attenuates STING signaling’s activation of Caspase.

      Furthermore, in the revised manuscript, we will address the reviewers’ other comments to enhancing the manuscript quality, such as providing further clarification on relevant issues in the discussion section, refining the key experiments in the methods section, and adding details about the antibodies used, including their associated clones and catalog numbers, as well as including sample sizes (n numbers) in the figure legends.

      We believe that the new data and further discussions and clarifications included in the revised manuscript will adequately address all the concerns raised by the reviewers and better support our conclusions.

      Finally, we would like to express our gratitude once again to the editor and reviewers for their invaluable feedback on this work!

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      Comment 1. Clinical Data on Patient Brain Samples: The inclusion of specific details such as postmortem intervals and the age at disease onset for patient brain samples would be valuable. These factors could significantly affect the quality of the tissues and their relevance to the study. Moreover, given the large variation in disease duration between PD and PDD, it’s important to consider disease duration as a potential confounding factor, especially when concluding that PDD patients have a more severe form of synucleinopathy compared to PD.

      We thank the reviewer for this valuable comment. We have included the post-mortem interval (PMI) and age of death in Table S1, showing the clinicopathological information. Changes on page 16. As suggested by the reviewer, we included the discussion on the large variation in disease duration between PD and PDD cases. We noted that DLB cases also have shorter disease durations but still demonstrate seeding kinetics similar to PDD. Therefore, we hypothesise that the molecular differences we observed between different diseases were due to the strain properties or higher pathological load (seen in both PDD and DLB) and are unlikely due to the disease duration. Changes on pages 9-11, lines 204-212.

      Comment 2. Inclusion of Healthy Controls in Multiple Tests: Given the importance of healthy controls in scientific studies, especially those involving human brain samples, the authors could consider using healthy controls in more tests to strengthen the robustness of the findings. Expanding the use of healthy controls in biochemical profiling and phosphorylation profiles would provide a better basis for comparison and clarify the significance of results in a disease context. This will help the authors to elaborate on the interpretation of results, for example, in Figure 3, where the authors claim that PD brains show mostly monomeric _α_Syn forms (line 119 and 120, and also in 222 and 223). Whether it implies the absence of alpha-syn pathology in PD brains? If there are differences from healthy controls? What are these low molecular weight bands (¡15kD) (line 125-126) and whether they are also present in healthy controls? Also, we do not have a perfect pS129-specific (anti-p_α_Syn) antibody. They are known for non-specific labeling. Investigating the phosphorylation levels in healthy controls and comparing them to PD brains, especially considering the predominance of monomeric (healthy _α_Syn?) in PD brains, would help clarify the observed changes.

      We agree with the reviewer’s assessment and consider this an important suggestion. We performed biochemical profiling and immunogold imaging with the three HC cases and presented the results in Figure 4. aSyn in healthy controls was completely digested by PK. The low MW bands were absent in PD and HC, and there was no difference in the PK profiles. However, this may be due to the low pathology load and amount of pathological aSyn in the selected PD brains. Additional comments were added to the results. Changes are on pages 4 (lines 136-137) and page 7 (Figure 4).

      Comment 3. Age of Healthy Controls: Providing information about the age at death for healthy controls is crucial, as age can impact the accumulation of aSyn. Also include if the brain samples were age-matched, or analyses were age-adjusted.

      We have described the age of each patient, and the analyses were age-adjusted. Changes on page 16 (Table S1).

      Comment 4. Braak Staging Discrepancy: The study reports the same Braak staging for both PD and PDD, despite the significant difference in disease duration. Maybe other reviewers with clinical experience might have a better take on this. This observation merits discussion in the paper, allowing readers to better understand the implications of this finding.

      ddressed: Our PD and PDD cases are Braak stage 6, indicating that the LB pathology had progressed to the neocortex. It‘s important to note that Braak stage represents only where the LB pathogy has spread and does not indicate anything about the load of LBs. However, our immunohistochemistry results (page 20) show that PDD demonstrates a higher LB load than PD cases in the entorhinal cortex. As the reviewer has suggested, this comment has been amended in the manuscript. Changes on pages 9-11, lines 204-212.

      Comment 5. Citation of Relevant Studies: The paper should consider citing and discussing a recent celebrated study on PD biomarkers that used thousands of cerebrospinal fluid (CSF) samples from different PD patient cohorts to demonstrate the effectiveness of SAA as a biochemical assay for diagnosing PD and its subtypes.

      As suggested by the reviewer, we included this study in the discussion. Changes on page 12, lines 275-278.

      Reviewer 3 (Public Review):

      The experiments are missing two important controls. 1) what to fibrils generated by different in vitro fibril preparations made from recombinant synclein protein look like; and 2) the use of CSF from the same patients whose brain tissue was used to assess whether CSF and brain seeds look and behave identically. The latter is perhaps the most important question of all - namely how representative are CSF seeds of what is going on in patients’ brains?

      We thank the reviewers for this valuable comment. Although in vitro preformed fibrils (PFFs) made out of recombinant aSyn are still important sources for cellular and animal studies to generate disease models and investigate mechanisms, many studies have now turned to use human brain amplified fibrils considering them to more closely present the human structure. Therefore, our study was designed to specifically address this hypothesis by comparing e human derived and SAA-amplified fibrils. It would be interesting to compare these structures also to PFFs but this was beyond the scope of our study. Comparing the CSF and brain seed from the same patients would be very interesting indeed but also difficult as this would require biosample collection during life followed by brain donation. The SAA cannot be done from the PM CSF due to contamination with blood. However, we are in a privileged position to examine such a comparison soon with our longitudinal Discovery cohort, where some participants have donated their brains. These future studies will address the critical question of whether the CSF seeds reflect those in the brain.

      In their discussion the authors do not comment on the obvious differences in the conditions leading to the formation of seeds in the brain and in the artificial conditions of the seeding assay. Why should the two sets of conditions be expected to yield similar morphologies, especially since the extracted fibrils are subjected to harsh conditions for solubilization and re-suspension.

      We agree with the reviewer that the formation of seeds in the brain and the SAA reaction conditions are very different, and one would not expect similar fibrillar morphologies. However, the theory is that pathological seeds are known to amplify through templated seeding, where seeds copy their intrinsic properties to the growing SAA fibrils. Thus, numerous studies use the SAA fibrils as model fibrils to investigate the different aSyn strains. Our study aimed to test whether the SAA fibrils are representative models of the brain fibrils. We included a more explicit comment on this discussion. Changes on page 3, lines 78-83.

      Finally, the key experiment was not performed - would the resultant seeds from SAA preparations from the different nosological entities produce different pathologies when injected into animal brains? But perhaps this is the subject of a future manuscript.

      We agree this is an essential experiment to build on our conclusion. Animal studies would be imperative to assess whether the SAA fibrils reflect the brain fibrils’ toxicity. However, these were beyond the scope of the present study but are being performed in collaboration with some expert groups.

      Furthermore, the authors comment on phosphorylation patterns, stating that the resultant seeds are less heavy phosphorylated than the original material. Again, this should not be surprising, since the SAA assay conditions are not known to contain the enzymes necessary to phosphorylate synuclein. The discussion of PTMs is limited to pS-129 phosphorylation. What about other PTMs? How does the pattern of PTMs affect the seeding pattern.

      We agree with the reviewer that other PTMs should be explored, but this was beyond the scope of this study. Here, we could focus on pS129, which has multiple reliable antibodies that also work with immunogold-TEM.

      Lastly, the manuscript contains no data on how the diagnostic categories were assigned at autopsy. This information should be included in the supplementary material.

      Clinical and neuropathological diagnostic criteria are now included in Table S1. Changes on page 16, lines 448-461.

      Reviewer 1 (Recommendations for the authors):

      (1) Remove a duplicate sentence in line 94-96.

      Addressed: Thank you for pointing this out. The duplicated sentence has been corrected. Changes are on page 4, lines 105-106.

      (2) Figure 1 Placement of Healthy Controls: Moving the graph representing healthy controls from the supplementary materials to the main figures could help readers better appreciate the results of diseased states.

      The healthy control SAA curves were moved to the main figure. Changes are on page 5, Figure 2.

      (3) Commenting on Case 2 Healthy Control: In the discussion section, you may comment on the case of the healthy control that showed amplification towards the end. While definitive conclusions may be challenging, acknowledging the possibility of incidental Lewy bodies or the prodromal phase of the disease would add depth to the analysis? But make sure to include the age information for healthy controls.

      We believe this is an important point to discuss in the manuscript. We have referenced other studies with similar observations and stated that it is currently unknown what this phenomenon reflects (page 11, lines 221-226). The age information of the healthy control subjects was added to Table S1.

      (4) Figure S3 Clarity: To enhance the clarity of Figure S3, consider adding a reference marker or arrow in the low-magnification image that points to the region being magnified in the insets. This visual cue will make it easier for readers to connect the detailed insets with the corresponding area in the broader image.

      In Figure S3, we included a reference arrow in the low-magnification images to clarify where the higher-magnification images are taken. Changes are on page 19, Figure S3.

      Reviewer 2 (Recommendations for the authors):

      (1) A major issue confronting the field is the conflation of the PMCA and RT-QuIC assays (the latter of which was used here). The decision to rename and combine the two under the umbrella of SAAs does a major disservice to the field for many reasons. Recognizing that the push for this did not come from the authors, clarifying the differences in their Introduction would be very useful. I suggest this, in large part, because in the prion field, PMCA is known to amplify prion strains with high fidelity whereas the product from RT-QuIC does not. In fact, the RT-QuIC product for PrP is not even infectious, while the synuclein field uses it as a means to generate material for subsequent studies. Highlighting these differences would certainly strengthen the arguments the authors are making about the inadequacy of the synuclein RT-QuIC approach in research.

      We thank the reviewers for these very valuable comments. We have included a further introduction on PMCA and RT-QuIC, explaining the differences and clearly stating our selection of the RT-QuIC method in this paper (page 3, lines 55-68). In addition, we have highlighted that, unlike PMCA, the RT-QuIC end-products are non-infectious and biologically dissimilar to the seed protein. Combined with our results, the findings demonstrate the methodological limitation of RT-QuIC in reproducing the seed fibrils and replicating their intrinsic biophysical information.

      (2) On page 4, sentences starting on lines 94 and 95 are a duplication.

      The duplicated sentence has been corrected. Changes are on page 4, lines 105-106.

      (3) In the Results, noting that the pSyn staining on the RT-QuIC fibrils is coming from the human patient sample used to seed the reaction would be useful. This is mentioned in the Discussion, but the lack of mention in the Results made me pause reading to double check the methods. I think this could also be addressed a bit more clearly in the Abstract.

      We have clarified this in the Results and Abstract. Changes on page 1 (lines 21-22) and page 9 (lines 192-194)

      (4) On page 8 line 188, change was to were in the sentence, ”First, faster seeding kinetics was...”

      This grammar error has been corrected. Changes are on page 9, line 200.

      (5) The authors may want to comment on the unexpected finding that despite the RT-QuIC fibrils having a difference in twisted vs straight filaments, all 4 seeded reactions gave identical results in the conformational stability assay.

      Addressed: We want to thank the reviewer for this comment and have highlighted the unexpected finding with a comment on what could be causing the identical results in the conformational stability assay. Changes are on page 12, lines 297-303.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The study "Endogenous oligomer formation underlies DVL2 condensates and promotes Wnt/βcatenin signaling" by Senem Ntourmas et al. contributes to the understanding of phase separation in Dishevelled (DVL) proteins, specifically focusing on DVL2. It builds upon existing research by investigating the endogenous complexes of DVL2 using ultracentrifugation and contrasting them with DVL1 and DVL3 behavior. The study identifies a DVL2-specific region involved in condensate formation and introduces the "two-step" concept of DVL2 condensate formation, enriching the field's knowledge. 

      Strengths: 

      A notable strength of this study is the validation of endogenous DVL2 complexes, providing insights into its behavior compared to DVL1 and DVL3. The functional validation of the DVL C-terminus (here termed conserved domain 2 (CD2) and the identification of DVL2-specific regions (here termed LCR4) involved in condensate formation are significant contributions that complement the current knowledge on the importance of DVL DIX domain, DEP domain and intrinsically disordered regions between DIX and PDZ domains. Additionally, the introduction of the concept where oligomerization (step 1) precedes condensate formation (step 2) is an interesting hypothesis, which can be further experimentally challenged in the future.

      We thank the reviewer for her/his interest in our work and for acknowledging our significant contributions to the understanding of DVL2 phase separation.   

      Weaknesses: 

      However, the applicability of the findings to full-length DVL2 protein, hence the physiological relevance, is limited. This is mostly due to the fact that the authors almost completely depend on the set of DVL2 mutants, which lack the (i) DEP domain and (ii) nuclear export signal (NES). These variants fail to establish DEP domain-mediated interactions, including those with FZD receptors. Of note, the DEP domain itself represents a dimerization/tetramerization interface, which could affect the protein condensate formation of these mutants. Possibly even more importantly, the used mutants localize into the nucleus, which has different biochemical & biophysical properties than a cytoplasm, where DVL typically reside, which in turn affects the condensate formation. On top, in the nucleus, most of the DVL binding partners, including relevant kinases, which were reported to affect protein condensate formation, are missing.

      The most convincing way to address this valid concern and to support a physiological relevant role of our findings is to extend our experiments with full-length DVL2, which we did alongside the suggestion in point two (please see below). In addition, we address the specific issues as follows:

      We completely agree that interaction through the DEP domain contributes to condensate formation, which was thoroughly demonstrated in great studies by Melissa Gammons and Mariann Bienz, and complex formation (Fig. 2B, C). We deleted this domain on purpose for our mapping experiments, since we obtained more consistent results without any additional contribution of the DEP domain. Once we mapped CFR and identified crucial amino acids within CFR (VV, FF), we demonstrated that CFR-mediated interaction contributes to complex formation, condensate formation and pathway activation in the context of full-length DVL2 (Fig. 7A-G). 

      We also agree that the nuclear localization may affect condensate formation because of the reasons mentioned by the reviewer or others, such as differences in DVL2 protein concentration. However, later proof-of-concept experiments in full-length DVL2 confirmed that CFR and its identified crucial amino acids (VV, FF), which were mapped in this rather artificial nuclear context, contribute to the typical cytosolic condensate formation of DVL2 (Fig. 7C, D). Moreover, we also observed cells with cytosolic condensates for the NES-lacking DVL2 constructs, although to a lower extent as compared to cells with nuclear condensates. A new analysis of NES-lacking key constructs focusing exclusively on cells with cytosolic condensates revealed similar differences between the DVL2 mutants as were observed before when investigating cells with nuclear (and cytosolic) condensates (new Fig. S3E, F), suggesting that the detected differences are not due to nuclear localization but reflect the overall condensation capacity. 

      In addition, our condensate-challenging experiments (osmotic shock, 1,6-hexandiol) suggested that cytosolic condensates of full-length DVL2 and nuclear CFR-mediated condensates of deletion proteins lacking the DEP domain behave quite similar (Fig. 6A-C).

      Second, the use of an overexpression system, while suitable for comparing DVL2 protein condensate features, falls short in functional assays. The study could benefit from employing established "rescue systems" using DVL1/2/3 knockout cells and re-expression of DVL variants for more robust functional assessments. 

      We used the suggested established rescue system of DVL1/2/3 knockout cells (T-REx DVL1/2/3 triple knockout cells and T-REx DVL1/2/3 RNF43 ZNRF3 penta knockout cells, which are even more sensitive towards DVL re-expression as they lack RNF43/ZNRF3-mediated degradation of DVL activating receptors; both cell lines from the Bryja lab). Upon overexpression, our key mutants DVL2 VV-AA FF-AA and ∆CFR showed markedly reduced pathway activation compared to WT DVL2 (new Figs. 7F and S5J), as we observed before. Especially in the DVL1/2/3 triple knockout cells, DVL2 VV-AA FF-AA hardly activated the pathway and was as inactive as the established M2 mutant (new Fig. 7F). Most importantly, while re-expression of WT DVL2 at close to endogenous expression levels fully rescued Wnt3a-induced pathway activation in DVL1/2/3 knockout cells, DVL2 VV-AA FF-AA revealed significantly reduced rescue capacity and was almost as inactive as DVL2 M2 (new Figs. 7G and S5K). 

      Furthermore, the discussion and introduction overlook some essential aspects of DVL biology. One such example is the importance of the open/close conformation of DVL and its effects on DVL phase separation and activity. In the context of this study, it is important to say that this conformational plasticity is mediated by DVL C-terminus (CD2 in this study). The second example is the reported roles of DVL1 and DVL3, which can both mediate the Wnt3a signal. How this can be interpreted when DVL1 and DVL3 lack LCR4 and still form condensates? 

      We included the open/close conformation of DVL in our manuscript (introduction p. 3 and new discussion paragraph p. 10) and discussed it in the context of our findings. It is intriguing to speculate that Wnt-induced opening of DVL2 increases the accessibility of LCR4 and CD2, thereby triggering pre-oligomerization and subsequent phase separation of DVL2 (see discussion).

      We extended the last paragraph of the discussion to interpret the roles of DVL1 and DVL3 lacking LCR4 (see p. 10). In short, the general ability of DVL1 and DVL3 to form condensates and to activate the Wnt pathway can be potentially explained through the other interaction sites (DIX, DEP, intrinsically disordered region). However, previous studies suggest that the DVL paralogs exhibit (quantitative) differences in Wnt pathway activation and that all three paralogs have to interact at a certain ratio for optimal pathway activation. In this context, a physiologic role for DVL2 LCR4 may be to promote the formation of these DVL1/2/3 assemblies and/or to enhance the stability of these assemblies.

      In order to increase the physiological relevance of the study, I would recommend analyzing several key mutants in the context of the full-length DVL2 protein using the rescue/complementation system. Further, a more thorough discussion and connections with the existing literature on DVL protein condensates/puncta/LLPS can improve the impact of the study. 

      We thank the reviewer for her/his suggestions to improve our study, which we addressed as detailed above.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors aimed to identify which regions of DVL2 contribute to its endogenous/basal clustering, as well as the relevance of such domains to condensate/phase separation and WNT activation. 

      Strengths: 

      A strength of the study is the focus on endogenous DVL2 to set up the research questions, as well as the incorporation of various techniques to tackle it. I found also quite interesting that DVL2-CFR addition to DVL1 increased its MW in density gradients. 

      We thank the reviewer for her/his interest in our work and the constructive suggestions to improve our study.

      Weaknesses: 

      I think that several of the approaches of the manuscript are subpar to achieve the goals and/or support several of the conclusions. For example: 

      (1) Although endogenous DVL2 indeed seems to form complexes (Figure 1A), neither the number of proteins involved nor whether those are homo-complexes can be determined with a density gradient. Super-resolution imaging or structural analyses are needed to support these claims. 

      We agree that it will be very interesting to study the nature of the detected endogenous complexes in detail and we will consider this for any follow-up study, as structural analyses were out of scope for the revision of the presented manuscript. To address the issue, we mentioned that the calculation of about eight DVL2 molecules per complex is based on the assumption of homotypic complexes (results p. 4) and we discussed, why we think that homotypic complexes are the most likely assumption based on the currently available (limited) data (discussion p. 8).

      (2) Follow-up analyses of the relevance of the DVL2 domains solely rely on overexpressed proteins. However, there were previous questions arising from o/e studies that prompted the focus on endogenous, physiologically relevant DVL interactions, clustering, and condensate formation.

      Although the title, conclusions, and relevance all point to the importance of this study for understanding endogenous complexes, only Figures 1A and B deal with endogenous DVL2. 

      We think that the biochemical detection of endogenous DVL2 complexes itself represents a valuable contribution to the understanding of endogenous DVL clustering, especially (i) since it is still lively discussed in the field whether and to which extent endogenous DVL assemblies exist (see introduction) and (ii) since recent studies addressing this issue rely on fluorescent tagging of the endogenous protein, which, among all benefits, harbors the risk to artificially affect DVL assembly. The follow-up analysis predominantly strengthens this key finding through (i) associating the detected complexes with established (DEP domain) and newly mapped (LCR4) DVL2 interaction sites, which we think is crucial to validate our biochemical approach, and (ii) linking the complexes with condensate formation and pathway activation for functional insights.

      In addition, we performed new experiments with re-expression of DVL2 and our key mutants at close to endogenous expression levels in DVL1/2/3 knockout cells, supporting a physiological relevant role of our findings (new Figs. 7G and S5K, please also see point (5) below).

      (3) Mutants lacking activity/complex formation, e.g. DVL2_1-418, may need further validation. For instance, DVL2_1-506 (same mutant but with DEP) seems to form condensates and it is functional in WNT signalling (King et al., 20223). These differences could be caused by the lack of DEP domain in this particular construct and/or folding differences. 

      We would definitely expect that DVL2 1-506 exhibits increased condensate formation and pathway activation as compared to DVL2 1-418, since the DEP domain was thoroughly characterized as interaction domain in the Bienz lab and the Gammons lab (see references), which we confirmed in our assays (Fig. 2B-D). However, as the DEP domain is an established DVL2 interaction site, we were not interested to further characterize the DEP domain but to explain the marked difference in complex formation between DVL2 ∆DEP and 1-418 (Fig. 2A-C), which could not be associated with any known DVL2 interaction site and which we finally mapped to CFR (Fig. 4A-D). 

      Since fusion of the newly-characterized interaction site CFR to DVL2 1-418 (1-418+CFR) rescued complex formation, condensate formation and signaling activity (Fig. 3B-E and Fig. 4C, D), we think that the lacking activity/complex formation of DVL2 1-418 is more likely due to missing interaction sites than due to folding problems. However, as it is hard to exclude folding differences of deletion mutants, we confirmed the CFR activity through loss-of-function experiments in the context of fulllength DVL2 with minimal point mutations (Fig. 7A-G, VV,FF). 

      (4) The key mutants, DeltaCFR and VV/FF only show mild phenotypes. The authors' results suggest that these regions contribute but are not necessary for 1) complex formation (Density gradient Figures 7A and B), condensate formation (Figures 7C and D), and WNT activity (Figure 7E). Of note Figure 7C shows examples for the mutants with no condensates while the qualification indicates that 50% of the cells do have condensates. 

      Condensate formation and Wnt pathway activation by DVL VV-AA FF-AA were reduced by more than 50% as compared to WT (Fig. 7D, E). We consider these marked differences, since loss of function always ranges between 0% and 100%. In newly performed experiments in DVL1/2/3 knockout cells, the differences were even more pronounced, see point (5) below.

      Yes, Fig. 7C shows an example to qualitatively visualize the change in condensate formation, while Fig. 7D provides the corresponding quantification allowing quantitative assessment of the differences.

      (5) Most of the o/e analyses (including all reporter assays) should be performed in DVL1-3 KO cells in order to explore specifically the behaviour of the investigated mutants. 

      As suggested, we employed DVL1/2/3 knockout cells for performing reporter assays (T-REx DVL1/2/3 triple knockout cells and T-REx DVL1/2/3 RNF43 ZNRF3 penta knockout cells, which are even more sensitive towards DVL re-expression as they lack RNF43/ZNRF3-mediated degradation of DVL activating receptors; both cell lines from the Bryja lab). Here, we focused on key mutants in the context of full-length DVL2, as they are closest to the physiologic situation. Upon overexpression, DVL2 VV-AA FF-AA and DVL2 ∆CFR showed markedly reduced pathway activation as compared to WT DVL2 (new Figs. 7F and S5J). Especially in the DVL1/2/3 triple knockout cells, DVL2 VV-AA FF-AA hardly activated the pathway and was as inactive as the established M2 mutant (new Fig. 7F). Moreover, re-expression at close to endogenous expression levels revealed that DVL2 VV-AA FF-AA less efficiently rescued Wnt3a-induced pathway activation as compared to WT (Figs. 7G and S5K).

      (6) How comparable are condensates found in the cytoplasm (usually for wt DVL) with those located in the nucleus (DEP mutants)? 

      In principal, cytosolic condensates could differ from nuclear condensates due to various reasons, such as e.g. different protein concentration, different availability of interaction partners or different biochemical/biophysical properties (please also see point 1 of reviewer 1). In our condensatechallenging experiments (osmotic shock, 1,6-hexandiol), cytosolic condensates of full-length DVL2 and nuclear condensates of DVL2 mutants behaved quite similar (Fig. 6A-C).

      We are confident that the differences between different DEP mutants in our mapping experiments are not due to nuclear localization but reflect the overall condensation capacity because later proofof-concept experiments demonstrated that CFR, which was identified in these mapping experiments, contributes to cytosolic condensate formation in the context of full-length DVL2 (Fig. 7C, D). Moreover, a new analysis focusing only on cells with cytosolic condensates, which can also be observed for DEP mutants to a low extent, revealed similar differences between key DEP mutants as observed before (Fig. S3E, F; for details please also see point 1 of reviewer 1).

      Several studies in the last two decades have analysed the relevance of DVL homo - and heteroclustering by relying on overexpressed proteins. Recent studies also explored the possibility of DVL undergoing liquid-liquid phase separation following similar principles. As highlighted by the authors in the introduction, there is a need to understand DVL dynamics under endogenous/physiological conditions. Recent super-resolution studies aimed at that question by characterising endogenously edited DVL2. The authors seemed to aim in the same direction with their initial findings (Figure 1A) but quickly moved to o/e proteins without going back to the initial question. This reviewer thinks that to support their conclusions and advance in this important question, the authors should introduce the relevant mutations in the endogenous locus (e.g. by Cas9+ donor template encoding the required 3' exons, as done by others before for WNT components, including DVL2) and determine their impact in the above-indicated processes.

      We agree that genomic editing of the DVL2 locus would be the cleanest system to study the relevance of CFR at endogenous expression levels. As we did not have the resources to generate the suggested cells, we, as an alternative, transiently re-expressed DVL2 and the respective mutants at low levels that were really close to the endogenous expression levels in DVL1/2/3 triple knockout cells (Fig. S5K). These experiments revealed that DVL2 VV-AA FF-AA less efficiently rescued Wnt3ainduced pathway activation as compared to DVL2 WT (Fig. 7G).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      This is a detailed description of the role of PKCδ in Drosophila learning and memory. The work is based on a previous study (Placais et al. 2017) that has already shown that for the establishment of long-term memory, the repetitive activity of MP1 dopaminergic neurons via the dopamine receptor DAMB is essential to increase mitochondrial energy flux in the mushroom body. 

      In this paper, the role of PKCδ is now introduced. PKCδ is a molecular link between the dopaminergic system and the mitochondrial pyruvate metabolism of mushroom body Kenyon cells. For this purpose, the authors establish a genetically encoded FRET-based fluorescent reporter of PKCδspecific activity, δCKAR. 

      Strengths: 

      This is a thorough study of the long-term memory of Drosophila. The work is based on the extensive, high-quality experience of the senior authors. This is particularly evident in the convincing use of behavioral assays and imaging techniques to differentiate and explore various memory phases in Drosophila. The study also establishes a new reporter to measure the activity of PKCδ - the focus of this study - in behaving animals. The authors also elucidate how recurrent spaced training sessions initiate a molecular gating mechanism, linking a dopaminergic punishment signal with the regulation of mitochondrial pyruvate metabolism. This advancement will enable a more precise molecular distinction of various memory phases and a deeper comprehension of their formation in the future. 

      Weaknesses: 

      Apart from a few minor technical issues, such as the not entirely convincing visualisation of the localisation of a PKCδ reporter in the mitochondria, there are no major weaknesses. Likewise, the scientific classification of the results seems appropriate, although a somewhat more extensive discussion in relation to Drosophila would have been desirable.

      We are very grateful for this very positive appreciation of our work. Following this comment, we have revised our manuscript to bring more compelling evidence of the mitochondrial localization of the PKCδ reporter. We also developed the discussion of our results with respect to the Drosophila learning and memory literature.

      Reviewer #2 (Public Review):

      Summary 

      This study deepens the former authors' investigations of the mechanisms involved in gating the longterm consolidation of an associative memory (LTM) in Drosophila melanogaster. After having previously found that LTM consolidation 1. costs energy (Plaçais and Préat, Science 2013) provided through pyruvate metabolism (Plaçais et al., Nature Comm 2017) and 2. is gated by the increased tonic activity in a type of dopaminergic neurons ('MP1 neurons') following only training protocol relevant for LTM, i.e. interspaced in time (Plaçais et al., Nature Neuro 2012), they here dig into the intra-cell signalling triggered by dopamine input and eventually responsible for the increased mitochondria activity in Kenyon Cells. They identify a particular PKC, PKCδ, as a major molecular interface in this process and describe its translocation to mitochondria to promote pyruvate metabolism, specifically after spaced training. 

      Methodological approach 

      To that end, they use RNA interference against the isozyme PKCδ, in a time-controlled way and in the whole Kenyon cell populations or in the subpopulation forming the α/β lobe. This knock-down decreased the total PKCδ mRNA level in the brain by ca. 30%, and is enough to observe decreased in flies performances for LTM consolidation. Using Pyronic, a sensor for pyruvate for in vivo imaging, and pharmacological disruption of mitochondrial function, the authors then show that PKCδ knockdown prevents a high level of pyruvate from accumulating in the Kenyon cells at the time of LTM consolidation, pointing towards a role of PKCδ in promoting pyruvate metabolism. They further identify the PDH kinase PDK as a likely target for PKCδ since knocking down both PKCδ and PDK led to normal LTM performances, likely counterbalancing PKCδ knock-down alone. 

      To understand the timeline of PKCδ activation and to visualise its mitochondrial translocation in a subpart of Mushroom body lobes they imported in fruitfly the genetically-encoded FRET reporters of PKCδ, δCKAR, and mitochondria-δCKAR (Kajimoto et al 2010). They show that PKCδ is activated to the sensor's saturation only after spaced training, and not other types of training that are 'irrelevant' for LTM. Further, adding thermogenetic activation of dopaminergic neurons and RNA interference against Gq-coupled dopamine receptor to FRET imaging, they identify that a dopamine-triggered cascade is sufficient for the elevated PKCδ-activation. 

      Strengths and weaknesses 

      The authors use a combination of new fluorescent sensors and behavioral, imaging, and pharmacological protocols they already established to successfully identify the molecular players that bridge the requirement for spaced training/dopaminergic neurons MP1 oscillatory activity and the increased metabolic activity observed during long-term memory consolidation. 

      The study is dense in new exciting findings and each methodological step is carefully designed. Almost all possible experiments one could think of to make this link have been done in this study, with a few exceptions that do not prevent the essential conclusions from being drawn. 

      The discussion is well conducted, with interesting parallels with mammals, where the possibility that this process takes place as well is yet unknown. 

      Impact 

      Their findings should interest a large audience: 

      They discover and investigate a new function for PKCδ in regulating memory processes in neurons in conjunction with other physiological functions, making this molecule a potentially valid target for neuropathological conditions. They also provide new tools in drosophila to measure PKCδ activation in cells. They identify the major players for lifting the energetic limitations preventing the formation of a long-term memory. 

      We warmly thank Reviewer #2 for the enthusiastic assessment of our work. There were no specific point to address in the Public Review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have a few comments that could help improve the paper and help the reader navigate the detailed analysis.

      (1) Perhaps the authors could add a sentence or two in the intro about the different PKC genes in Drosophila and whether they are expressed in the MB.

      We thank Reviewer #1 for this suggestion. We now describe in the introduction the various subfamilies of PKCs downstream of Gq signaling , the Drosophila members of those different PKC subfamilies, and their expression in the brain. 

      (2) Italicise Drosophila throughout the text.

      We have done this correction.

      (3) In Figure 1, you could change the scheme in Figure F-H and have the timeline always start after training. Then you could see that the training varies in time (perhaps provide the exact duration for each training protocol) and the test interval is constant. Why is it actually measured in a time window and not at an exact time?

      This is indeed a good suggestion to clarify the presentation of our results. We changed the timelines schemes in all the figures with the t=0 starting at the end of the conditioning. Indeed, each conditioning protocol has a different duration as represented on these timelines: as one-cycle training lasts 5 min, 5x massed training has a duration of 20 min, and 5x spaced training takes 1 hours and 30 min to be completed, with its 15 min intertrial intervals. In vivo imaging experiments are performed during a certain time window after conditioning during which, according to our previous experience, the activity of MP1 dopamine neurons after spaced training remains constant (Plaçais et al., 2012). This offers the practical advantage that we can image several flies after a given training session, instead of having to perform many consecutive conditioning protocols.  

      (4) In Figure 2 you could show the massed training data from the supplement. This is very similar to what is shown in Figure 1. Are there also imaging experiments on massed training?

      The reason why massed training data was initially displayed in the supplementary data is that α/β neurons are known to be crucial for LTM formation but are not required for memory formed after massed training, so that the absence of effect was somehow expected. Nonetheless, we performed δCKAR imaging in α/β neurons after 5x massed training and found that PKCδ activity was not increased post-conditioning as expected (Figure 2C). This experiment was performed in parallel of additional data after 5x spaced conditioning δCKAR imaging in α/β neurons as a positive control (these new data were added to the Figure 2B). Following Reviewer #1’s suggestion, all data investigating the effect of PKCδ in α/β neurons are now displayed on Figure 2.

      (5) Figure 3: I am not sure if the blue curve in Figure A really represents an upregulated pyruvate flux compared to the control (mentioned in line 210). It may be the case initially, but it is clearly below the control after 40s. Why is that?

      This visual effect is due to the fact that PDBu injection in itself increases the pyruvate level in MB neurons (independently of its effect on PKCδ), before sodium azide injection. As a result, the baseline of the PDBu treated flies is above the DMSO control flies when sodium azide is injected, which results in the fact that the pyronic sensor saturates quicker and therefore reaches its plateau before the control when traces where normalized right before sodium azide injection. 

      That being said, the measure of the slope in itself following sodium azide injection is not affected by these differences, and is always measured between 10 and 70% of the plateau. 

      Given this remark, and another comment from Reviewer#2 about this experiment, we removed the panel 3A and present only the complete recording of this experiment, that is now displayed on Figure 3 – figure supplement 1C.

      (6) For me, the localisation of the mitochondrial reporter in the mitochondria is not clear. The image in the supplement is not sufficient to show this clearly. What is missing here is a co-staining in the same brain of UAS-mito-δCKAR and a mitochondrial marker to label the mitochondria and the reporter at the same time in the same animal.

      We agree with Reviewer #1’s remark and added new data to make this point more convincing. As suggested, we co-expressed mito-δCKAR with the mitochondrial reporter mito-DsRed in MB neurons (Lutas et al., 2012). We observed a clear colocalization of both signals by performing confocal imaging in the MB neurons somas, indicating that mito-δCKAR is indeed addressed to mitochondria (Figure 4 – figure supplement 1B and 2). 

      (7) Are there controls that the MB expression of the reporters in the flies does not influence the learning ability? In order to make statements about the physiology of the cells, it must also be shown that the cells still have normal activity and allow learning behaviour comparable to wild-type flies.

      This is indeed an important control that we added in the revised version. We tested the memory after 5x spaced, 5x massed and 1x training of flies expressing in the MB the various imaging probes used in our study (cyto-δCKAR, mito-δCKAR and Pyronic). Memory performance was similar to controls in all cases (Figure 1 – figure supplement 1E).  

      (8) Perhaps the authors could go into more detail on two points in the discussion and shorten the comprehensive comparison to the vertebrate system somewhat. It would be nice to know how the local transfer from the peduncle to the vertical lobus is supposed to take place. What is the mechanism here? Any suggestions from the literature? It would also be useful to mention the compartmentalisation of the MB and how the information can overcome these boundaries from the peduncle to the vertical lobe.

      We now elaborate on this question in the discussion (lines 368-386). To sum up, given that the compartmentalization of the MBs is anatomically defined by the presence of specific subset of MBON and DAN cell types (forming different information-processing units), rather than by physical boundaries per se, we can consider two main hypotheses to explain PKCδ activation transfer from the peduncle to the lobes: passive diffusion of activated PKCδ, or mitochondrial motility that would displace PKCδ from its place of first activation. We indeed found that mitochondrial motility was occurring upon 5x spaced conditioning for LTM formation (Pavlowsky et al. 2024).

      In principle, one could also consider that PKCδ could be activated in the lobes by a relaying neuron. The MVP2 neuron (aka MBON-γ1>pedc) presents dendrites facing MP1 and makes synapses with the α/β neurons at the level of the α and β lobes, which makes it a good candidate. Furthermore, as we show that PKCδ activation in the lobes requires DAMB (Figure 4C, Figure 5A-B, Figure 5 – figure supplement 1), one could imagine the following activation loop: MP1 activates the MB neurons via DAMB, that activate MVP2 at the level of the peduncle, which activates in turn the MB neurons at the level of the lobes. However, we did not retain this hypothesis, because MVP2 is GABAergic, which makes it highly unlikely to be able to activate a kinase like PKCδ.

      Regarding the comparative discussion with mammalian systems, we appreciate Reviewer #1’s remark that it may appear too detailed, but given that Reviewer #2 (public comment) highlighted the ‘interesting parallel with mammals’ in our discussion, we finally chose not to reduce this part in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Fig 1G: is there a decrease in PKCδ activation after mass training as compared to the control, indicating an inhibitory mechanism onto PKCδ following mass training? Or is this an artifact of the PDBu application procedure in the control group? 

      We thank Reviewer #2 for this careful comment. The dent in the timetrace following PDBu application after massed training (Figure 1G) is indeed an artifact due to the manual injection of the drug. But we would like to emphasize that what matters in the determination of PKCδ activity is the level of the baseline before PDBu application after normalization to the final plateau, so that variation around the injection time do not impact the result of the analysis. Moreover, in the revised version, we performed a similar series of experiments, using an α/β neuron-specific driver (Figure 2C). In this series of experiments, there were limited injection artefacts, and we obtained the same conclusion as Figure 1G that PKCδ activity is left unchanged by 5x massed conditioning. 

      Fig 3A: I suggest moving this panel in the supplement: I found it difficult to process the effect of PDBu that is unspecific to PKCδ and that leads to a different plateau because of a different baseline. It would be better explained in more detail in the supplement, especially given that the 3B panel can lead to a similar conclusion and does not have this specificity problem. Up to the authors.

      We thank Reviewer #2 for this feedback. We followed the suggestion and now only display the full recording of this experiment on Figure 3 – figure supplement 1C.

      Fig 3C: To go further, one wonders if knocking-down PDK would act as a switch for gating LTM formation, i.e. if done during a 1x training or a 5x massed training would it gate long-term consolidation?

      This is indeed an excellent suggestion. We performed this experiment and showed that in flies expressing the PDK RNAi in adult MB neurons, only one cycle of training was sufficient to induce longterm memory formation (Figure 3A), instead of the 5 spaced cycles normally required. This confirms the model we previously established in Plaçais et al. 2017, where long-term memory formation was observed upon PDK MB knock-down after 2 cycles of spaced training. This new result goes further in characterizing this facilitation effect, now showing that even a single cycle is sufficient. Altogether these data show that mitochondrial metabolic activation is the critical gating step in long-term memory formation. Spaced training achieves this activation through PDK inhibition, mediated by PKCδ.

      What is the level of mRNA in this construct? I don't see a quantification, can you justify it?

      We thank Reviewer #2 for this remark. This PDK RNAi had been used in a previous work in pyruvate imaging experiment, where it successfully boosted mitochondrial pyruvate uptake. But indeed we had not validated it at the mRNA level. In the revised version of the present manuscript, we now confirm by RT-qPCR that the PDK RNAi efficiently downregulates PDK expression in neurons (Figure 3 – figure supplement 1A).

      Fig. 4C: Is PKCδ activation increase in Vertical lobe DAMB-dependent? One wonders, because MP1 may somehow activate other neurons that could reach this part of the Kenyon Cells. I do not see in the results what could disprove this possibility. The mechanism linking DAMB activation in the peduncle and PKCδ activation in the VL is mysterious, see also Fig. 5.

      This is a very sound remark. In the revised version we have checked whether PKCδ activation in the vertical lobes is also dependent on DAMB.  We performed thermogenetic activation of MP1 neurons and imaged mito-δCKAR signal in the vertical lobes upon DAMB MB knock-down. We found that as for the peduncle, DAMB was required for PKCδ mitochondrial activation (Figure 4C, right panel). This experiment was performed in parallel with similar measurements in flies that did not express DAMB RNAi, as a positive control (these new control data were added to the Figure 4C, left panel).

      This result supports a model where dopamine from MP1 neurons directly acts on Kenyon cells, even for PKCδ activation in the vertical lobes. Thus, this advocates for a diffusion of DAMB-activated PKCδ from the peduncle to the vertical lobes, either by passive diffusion or by mitochondrial motility - two hypotheses that we added in the discussion. 

      Fig. 5: If MP1 neurons release dopamine only to the peduncle, how do you expect PKCδ to be translocated to mitochondria all the way to the vertical lobe? Also is it specific to the vertical lobe and not found in the medial lobe?

      Investigating the spatial distribution of PKCδ is, once again, a very sound suggestion. We re-analyzed our dataset of the mito-δCKAR signal after spaced training for peduncle measurement, as the imaging plane also included the β lobe. We found that PKCδ is also activated at that level, and that its activation also depends on DAMB (Figure 5 – figure supplement 1). We also performed additional pyruvate measurements in the medial lobes, and observed that mitochondria pyruvate uptake presents the same extension in time in the medial lobes as in the vertical lobes when comparing spaced training (Figure 6 E-F and Figure 6 – figure supplement 1E-F) to 1x training (Figure 6A-B and Figure 6 – figure supplement 1C-D). Therefore, the metabolic action of PKCδ seems not to be restricted to the vertical lobes, but spreads across the whole axonal compartment.

      Altogether, these data point toward the fact that activated PKCδ diffuse from its point of activation, the peduncle, where dopamine is released by MP1 and DAMB is activated, to both the vertical and medial lobes, either by passive diffusion, or taking advantage of mitochondrial movement that was shown to be triggered by spaced training (Pavlowsky et al. 2024), from the MB neurons somas to the axons. To further characterize the kinetics of PKCδ activation, we measured its activity using the mitoδCKAR sensor at 3 and 8 hours following spaced training. We found that while PKCδ was still active at 3 hours, it was back to its baseline activity level at 8 hours, both at the level of the peduncle and the vertical lobes (Figure 5 C-F). However, at 8 hours, pyruvate metabolism is still upregulated in the lobes, which indicates that an additional mechanism is relaying PKCδ action to maintain the high energy state of the MBs at later time points. As we propose in the revised discussion, the mitochondrial motility hypothesis makes sense here (Pavlowsky et al. 2024), as the progressive increase in the number of mitochondria in the lobes would be able to sustain high mitochondrial metabolism beyond PKCδ activation at 8 hours post-conditioning. This new result and its implications open exciting perspectives for future research about the different mitochondrial regulations occurring after spaced training, their organization over time and their interactions.

      Fig.7:  PDK written in yellow is almost invisible

      This has been changed.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors use the teleost medaka as an animal model to study the effect of seasonal changes in day-length on feeding behaviour and oocyte production. They report a careful analysis of how day-length affects female medakas and a thorough molecular genetic analysis of genes potentially involved in this process. They show a detailed analysis of two genes and include a mutant analysis of one gene to support their conclusions

      Strengths:

      The authors pick their animal model well and exploit the possibilities to examine in this laboratory model the effect of a key environmental influence, namely the seasonal changes of day-length. The phenotypic changes are carefully analysed and well-controlled. The mutational analysis of the agrp1 by a ko-mutant provides important evidence to support the conclusions. Thus this report exceeds previous findings on the function of agrp1 and npyb as regulators of food-intake and shows how in medaka these genes are involved in regulating the organismal response to an environmental change. It thus furthers our understanding of how animals react to key exogenous stimuli for adaptation.

      Weaknesses:

      The authors are too modest when it comes to underscoring the importance of their findings. Previous animal models used to study the effect of these neuropeptides on feeding behaviour have either lost or were most likely never sensitive to seasonal changes of day length. Considering the key importance of this parameter on many aspects of plant and animal life it could be better emphasised that a suitable animal model is at hand that permits this. The molecular characterization of the agrp1 ko-mutant that the authors have generated lacks some details that would help to appreciate the validity of the mutant phenotype. Additional data would help in this respect.

      We would like to thank Reviewer #1 for the really constructive advice. In the revised manuscript, we will try to provide more information on the molecular characterization of the agrp1 KO-mutant and to emphasize the importance of our present animal model that permits the analysis of neuropeptide effects on feeding behavior in response to seasonal changes of day length.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated the mechanisms behind breeding season-dependent feeding behavior using medaka, a well-known photoperiodic species, as a model. Through a combination of molecular, cellular, and behavioral analyses, including tests with mutants, they concluded that AgRP1 plays a central role in feeding behavior, mediated by ovarian estrogenic signals.

      Strengths:

      This study offers valuable insights into the neuroendocrine mechanisms that govern breeding season-dependent feeding behavior in medaka. The multidisciplinary approach, which includes molecular and physiological analyses, enhances the scientific contribution of the research.

      Weaknesses:

      While medaka is an appropriate model for studying seasonal breeding, the results presented are insufficient to fully support the authors' conclusions.

      Specifically, methods and data analyses are incomplete in justifying the primary claims:<br /> - the procedure for the food intake assay is unclear;

      - the sample size is very small;

      - the statistical analysis is not always adequate.

      Additionally, the discussion fails to consider the possible role of other hormones that may be involved in the feeding mechanism.

      We would like to thank Reviewer #2 for the helpful comments. As the reviewer suggested, we will try to edit the paragraph describing the procedure for the food intake assay to make it much easier for the readers to understand in the revised manuscript. In Figure 1-Supplementary figure 2, RNAseq was performed to search for the candidate neuropeptides, and that’s why the sample size was the minimum. On the other hand, each group in the other experiments consist of n ≥ 5 samples, which is usually accepted to be an adequate sample size in various studies (cf. Kanda et al., Gen Comp Endocrinol., 2011, Spicer et al., Biol Reprod., 2017). As for the statistical analyses, we will revise our manuscript so that the readers may be convinced with the validity of our statistical analyses.

      Reviewer #3 (Public review):

      Summary:

      Understanding the mechanisms whereby animals restrict the timing of their reproduction according to day length is a critical challenge given that many of the most relevant species for agriculture are strongly photoperiodic. However, the principal animal models capable of detailed genetic analysis do not respond to photoperiod so this has inevitably limited progress in this field. The fish model medaka occupies a uniquely powerful position since its reproduction is strictly restricted to long days and it also offers a wide range of genetic tools for exploring, in depth, various molecular and cellular control mechanisms.

      For these reasons, this manuscript by Tagui and colleagues is particularly valuable. It uses the medaka to explore links bridging photoperiod, feeding behaviour, and reproduction. The authors demonstrate that in female, but not male medaka, photoperiod-induced reproduction is associated with an increase in feeding, presumably explained by the high metabolic cost of producing eggs on a daily basis during the reproductive period. Using RNAseq analysis of the brain, they reveal that the expression of the neuropeptides agrp and npy that have been previously implicated in the regulation of feeding behaviour in mice are upregulated in the medaka brain during exposure to long photoperiod conditions. Unlike the situation in mice, these two neuropeptides are not co-expressed in medaka neurons, and food deprivation in medaka led to increases in agrp but also a decrease in npy expression. Furthermore, the situation in fish may be more complicated than in mice due to the presence of multiple gene paralogs for each neuropeptide. Exposure to long-day conditions increases agrp1 expression in medaka as the result of increases in the number of neurons expressing this neuropeptide, while the increase in npyb levels results from increased levels of expression in the same population of cells. Using ovariectomized medaka and in situ hybridization assays, the authors reveal that the regulation of agrp1 involves estrogen acting via the estrogen receptor esr2a. Finally, a loss of agrp1 function mutant is generated where the female mutants fail to show the characteristic increase in feeding associated with long-day enhanced reproduction as well as yielding reduced numbers of eggs during spawning.

      Strengths:

      This manuscript provides important foundational work for future investigations aiming to elucidate the coordination of photoperiod sensing, feeding activity, and reproduction function. The authors have used a combination of approaches with a genetic model that is particularly well suited to studying photoperiodic-dependent physiology and behaviour. The data are clear and the results are convincing and support the main conclusions drawn. The findings are relevant not only for understanding photopriodic responses but also provide more general insight into links between reproduction and feeding behaviour control.

      Weaknesses:

      Some experimental models used in this study, namely ovariectomized female fish and juvenile fish have not been analysed in terms of their feeding behaviour and so do not give a complete view of the position of this feeding regulatory mechanism in the context of reproduction status. Furthermore, the scope of the discussion section should be expanded to speculate on the functional significance of linking feeding behaviour control with reproductive function.

      We would like to thank Reviewer #3 for the insightful advice. We will try to revise several pertinent sentences describing the ovariectomized female fish and juvenile fish so that our present experimental results will give more complete view of their feeding regulatory mechanism in the context of reproduction status. We will also try to expand and revise the discussion section to incorporate the valuable suggestion of Reviewer #3.

    1. Author response:

      I thank the Senior Editor and the three reviewers for their consideration and careful assessments, which I find fair and justified. I agree the evidence is inadequate that single cell fluctuating CpG DNA methylation allows for human neuron lineage tracing. I agree with Reviewer #1 that fCpGs essentially function as “a cellular division clock that diverges over time”, but that fCpG methylation also records ancestry because cells with more similar patterns should be more related than cells with different patterns. However, as noted, there are alternative explanations that could explain fCpG DNA methylation pattern neuronal differences, or potentially obscure ancestry recorded by replication errors. Lineage tracing with fCpG methylation previously appeared possible in human intestines, endometrium, and blood, and potentially a similar approach could be used to reconstruct human brain cell ancestries.

      I intend to revise the manuscript in a few weeks to address points raised by reviewers. These include a) editing to improve clarity and correct neurodevelopmental concepts, and b) adding a supplement that explains in much more detail how fCpG methylation may record cell divisions and ancestries. As recommended, additional “experiments” will be added including a) an analysis of single cell zygote to inner cell mass data to illustrate how fCpG brain barcode methylation changes between cell divisions very early in development before neurogenesis, and b) an analysis of newly released single cell brain aging data (Chien et al., 2024, Neuron 112, 2524–2539, August 7, 2024) that should help address issues of reproducibility and barcode stability over time. The evidence for lineage tracing will still be incomplete, but the modifications should help support the idea that fCpG methylation can record somatic cell ancestries.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The overall analysis and discovery of the common motif are important and exciting. Very few human/primate ribozymes have been published and this manuscript presents a relatively detailed analysis of two of them. The minimized domains appear to be some of the smallest known self-cleaving ribozymes.

      Strengths:

      The manuscript is rooted in deep mutational analysis of the OR4K15 and LINE1 and subsequently in modeling of a huge active site based on the closely-related core of the TS ribozyme. The experiments support the HTS findings and provide convincing evidence that the ribozymes are structurally related to the core of the TS ribozyme, which has not been found in primates prior to this work.

      Weaknesses:

      (1) Given that these two ribozymes have not been described outside of a single figure in a Science Supplement, it is important to show their locations in the human genome, present their sequence and structure conservation among various species, particularly primates, and test and discuss the activity of variants found in non-human organisms. Furthermore, OR4K15 exists in three copies on three separate chromosomes in the human genome, with slight variations in the ribozyme sequence. All three of these variants should be tested experimentally and their activity should be presented. A similar analysis should be presented for the naturally-occurring variants of the LINE1 ribozyme. These data are a rich source for comparison with the deep mutagenesis presented here. Inserting a figure (1) that would show the genomic locations, directions, and conservation of these ribozymes and discussing them in light of this new presentation would greatly improve the manuscript. As for the biological roles of known self-cleaving ribozymes in humans, there is a bioRxiv manuscript on the role of the CPEB3 ribozyme in mammalian memory formation (doi.org/10.1101/2023.06.07.543953), and an analysis of the CPEB3 functional conservation throughout mammals (Bendixsen et al. MBE 2021). Furthermore, the authors missed two papers that presented the discovery of human hammerhead ribozymes that reside in introns (by de la PeÃ{plus minus}a and Breaker), which should also be cited. On the other hand, the Clec ribozyme was only found in rodents and not primates and is thus not a human ribozyme and should be noted as such.

      We thank this Reviewer for his/her input and acknowledgment of this work. To improve the manuscript, we have included the genomic locations in Figure 1A, Figure 6A and Figure 6C. And we have tested the activity of representative variants found in the human genome and discussed the activity of the variants in other primates. All suggested publications are now properly cited.

      Line 62-66: It has been shown that single nucleotide polymorphism (SNP) in CPEB3 ribozyme was associated with an enhanced self-cleavage activity along with a poorer episodic memory (14). Inhibition of the highly conserved CPEB3 ribozyme could strengthen hippocampal-dependent long-term memory (15, 16). However, little is known about the other human self-cleaving ribozymes.

      Line 474-501: Homology search of two TS-like ribozymes. To locate close homologs of the two TS-like ribozymes, we performed cmsearch based on a covariance model (38) built on the sequence and secondary structural profiles. In the human genome, we got 1154 and 4 homolog sequences for LINE-1-rbz and OR4K15-rbz, respectively. For OR4K15-rbz, there was an exact match located at the reverse strand of the exon of OR4K15 gene (Figure 6A). The other 3 homologs of OR4K15-rbz belongs to the same olfactory receptor family 4 subfamily K (Figure 6C). However, there was no exact match for LINE-1-rbz (Figure 6A). Interestingly, a total of 1154 LINE-1-rbz homologs were mapped to the LINE-1 retrotransposon according to the RepeatMasker (http://www.repeatmasker.org) annotation. Figure 6B showed the distribution of LINE-1-rbz homologs in different LINE-1 subfamilies in the human genome. Only three subfamilies L1PA7, L1PA8 and L1P3 (L1PA7-9) can be considered as abundant with LINE-1-rbz homologs (>100 homologs per family). The consensus sequences of all homologs obtained are shown in Figure 6D. In order to investigate the self-cleavage activity of these homologs, we mainly focused on the mismatches in the more conserved internal loops. The major differences between the 5 consensus sequences are the mismatches in the first internal loop. The widespread A12C substitution can be found in majority of LINE-1-rbz homologs, this substitution leads to a one-base pair extension of the second stem (P2) but almost no activity (RA’: 0.03) based on our deep mutational scanning result. Then we selected 3 homologs without A12C substitution for LINE-1-rbz for in vitro cleavage assay (Figure 6E). But we didn’t observe significant cleavage activity, this might be caused by GU substitutions in the stem region. For 3 homologs of OR4K15-rbz, we only found one homolog of OR4K15 with pronounced self-cleavage activity (Figure 6F). In addition, we performed similar bioinformatic search of the TS-like ribozymes in other primate genomes. Similarly, the majority (15 out of 18) of primate genomes have a large number of LINE-1 homologs (>500) and the remaining three have essentially none. However, there was no exact match. Only one homolog has a single mutation (U38C) in the genome assembly of Gibbon (Figure S15). The majority of these homologs have 3 or more mismatches (Figure S15). For OR4K15-rbz, all representative primate genomes contain at least one exact match of the OR4K15-rbz sequence.

      Line 598-602: According to the bioinformatic analysis result, there are some TS-like ribozymes (one LINE-1-rbz homolog in the Gibbon genome, and some OR4K15-rbz homologs) with in vitro cleavage activity in primate genomes. Unlike the more conserved CPEB3 ribozyme which has a clear function, the function of the TS-like ribozymes is not clear, as they are not conserved, belong to the pseudogene or located at the reverse strand.

      (2) The authors present the story as a discovery of a new RNA catalytic motif. This is unfounded. As the authors point out, the catalytic domain is very similar to the Twister Sister (or "TS") ribozyme. In fact, there is no appreciable difference between these and TS ribozymes, except for the missing peripheral domains. For example, the env33 sequence in the Weinberg et al. 2015 NCB paper shows the same sequences in the catalytic core as the LINE1 ribozyme, making the LINE1 ribozyme a TS-like ribozyme in every way, except for the missing peripheral domains. Thus these are not new ribozymes and should not have a new name. A more appropriate name should be TS-like or TS-min ribozymes. Renaming the ribozymes to lanterns is misleading.

      Although we observed some differences in mutational effects, we agree with the reviewer that it is more appropriate to call them TS-like ribozymes. We have replaced all “lantern ribozyme” with “TS-like ribozyme” as suggested.

      (3) In light of 2) the story should be refocused on the fact the authors discovered that the OR4K15 and LINE1 are both TS-like ribozymes. That is very exciting and is the real contribution of this work to the field.

      We thank this Reviewer for their acknowledgement of this work. To improve the manuscript, we have re-named the ribozymes as suggested.

      (4) Given the slow self-scission of the OR4K15 and LINE1 ribozymes, the discussion of the minimal domains should be focused on the role of peripheral domains in full-length TS ribozymes. Peripheral domains have been shown to greatly speed up hammerhead, HDV, and hairpin ribozymes. This is an opportunity to show that the TS ribozymes can do the same and the authors should discuss the contribution of peripheral domains to the ribozyme structure and activity. There is extensive literature on the contribution of a tertiary contact on the speed of self-scission in hammerhead ribozymes, in hairpin ribozyme it's centered on the 4-way junction vs 2-way junction structure, and in HDVs the contribution is through the stability of the J1/2 region, where the stability of the peripheral domain can be directly translated to the catalytic enhancement of the ribozymes.

      We appreciate your question and the valuable suggestions provided. We have included the citations and discussion about the peripheral domains in other ribozymes.

      Line 570-576: Thus, a more sophisticated structure along with long-range interactions involving the SL4 region in the twister sister ribozyme must have helped to stabilize the catalytic region for the improved catalytic activity. Similarly, previous studies have demonstrated that peripheral regions of hammerhead (49), hairpin (50) and HDV (51, 52) ribozymes could greatly increase their self-cleavage activity. Given the importance of the peripheral regions, absence of this tertiary interaction in the TS-like ribozyme may not be able to fully stabilize the structural form generated from homology modelling.

      (5) The argument that these are the smallest self-cleaving ribozymes is debatable. LÃ1/4nse et al (NAR 2017) found some very small hammerhead ribozymes that are smaller than those presented here, but the authors suggest only working as dimers. The human ribozymes described here should be analyzed for dimerization as well (e.g., by native gel analysis) particularly because the authors suggest that there are no peripheral domains that stabilize the fold. Furthermore, Riccitelli et al. (Biochemistry) minimized the HDV-like ribozymes and found some in metagenomic sequences that are about the same size as the ones presented here. Both of these papers should be cited and discussed.

      We apologize for any confusion caused by our previous statement. To clarify, we highlighted “35 and 31 nucleotides only” because 46 and 47 nt contain the variable hairpin loops which are not important for the catalytic activity. By comparing the conserved segments, the TS-like ribozyme discussed in this paper is the shortest with the simplest secondary structure. And we have replaced the terms “smallest” and “shortest” with “simplest” in our manuscript. The title has been changed to “Minimal twister sister (TS)-like self-cleaving ribozymes in the human genome revealed by deep mutational scanning”. All the publications mentioned have been cited and discussed. Regarding possible dimerization, we did not find any evidence but would defer it to future detailed structural analysis to be sure.  

      Line 605-608: Previous studies also have revealed some minimized forms of self-cleaving ribozymes, including hammerhead (19, 53) and HDV-like (54) ribozymes. However, when comparing the conserved segments, they (>= 36 nt) are not as short as the TS-like ribozymes (31 nt) found here.

      (6) The authors present homology modeling of the OR4K15 and LINE1 ribozymes based on the crystal structures of the TS ribozymes. This is another point that supports the fact that these are not new ribozyme motifs. Furthermore, the homology model should be carefully discussed as a model and not a structure. In many places in the text and the supplement, the models are presented as real structures. The wording should be changed to carefully state that these are models based on sequence similarity to TS ribozymes. Fig 3 would benefit from showing the corresponding structures of the TS ribozymes.

      We thank the reviewer for pointing these out and we have already fixed them. We have replaced all “lantern ribozyme” with “TS-like ribozyme” as suggested. The term “Modelled structures” were used for representing the homology model. And we have included the TS ribozyme structure in Fig 3.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript applies a mutational scanning analysis to identify the secondary structure of two previously suggested self-cleaving ribozyme candidates in the human genome. Through this analysis, minimal structured and conserved regions with imminent importance for the ribozyme's activity are suggested and further biochemical evidence for cleavage activity are presented. Additionally, the study reveals a close resemblance of these human ribozyme candidates to the known self-cleaving ribozyme class of twister sister RNAs. Despite the high conservation of the catalytic core between these RNAs, it is suggested that the human ribozyme examples constitute a new ribozyme class. Evidence for this however is not conclusive.

      Strengths:

      The deep mutational scanning performed in this study allowed the elucidation of important regions within the proposed LINE-1 and OR4K15 ribozyme sequences. Part of the ribozyme sequences could be assigned a secondary structure supported by covariation and highly conserved nucleotides were uncovered. This enabled the identification of LINE-1 and OR4K15 core regions that are in essence identical to previously described twister sister self-cleaving RNAs.

      Weaknesses:

      I am skeptical of the claim that the described catalytic RNAs are indeed a new ribozyme class. The studied LINE-1 and OR4K15 ribozymes share striking features with the known twister sister ribozyme class (e.g. Figure 3A) and where there are differences they could be explained by having tested only a partial sequence of the full RNA motif. It appears plausible, that not the entire "functional region" was captured and experimentally assessed by the authors.

      We thank this Reviewer for his/her input and acknowledgment of this work. Because a similar question was raised by reviewer 1, we decided to name the ribozymes as TS-like ribozymes. Regarding the entire regions, we conducted mutational scanning experiments at the beginning of this study. The relative activity distributions (Figure 1B, 1C) have shown that only parts of the sequence contributes to the self-cleavage activity. That is the reason why we decided to focus on the parts of the sequence afterwards.

      They identify three twister sister ribozymes by pattern-based similarity searches using RNA-Bob. Also comparing the consensus sequence of the relevant region in twister sister and the two ribozymes in this paper underlines the striking similarity between these RNAs. Given that the authors only assessed partial sequences of LINE-1 and OR4K15, I find it highly plausible that further accessory sequences have been missed that would clearly reveal that "lantern ribozymes" actually belong to the twister sister ribozyme class. This is also the reason I do not find the modeled structural data and biochemical data results convincing, as the differences observed could always be due to some accessory sequences and parts of the ribozyme structure that are missing.

      We appreciate the reviewer for raising this question. As we explained in the last question, we now called the ribozymes as TS-like ribozymes. We also emphasize that the relative activity data of the original sequences have indicated that the other part did not make any contribution to the activity of the ribozyme. The original sequences provided in the Science paper (Salehi-Ashtiani et al. Science 2006) were generated from biochemical selection of the genomic library. It did not investigate the contribution of each position to the self-cleavage activity.

      Highly conserved nucleotides in the catalytic core, the need for direct contacts to divalent metal ions for catalysis, the preference of Mn2+ oder Mg2+ for cleavage, the plateau in observed rate constants at ~100mM Mg2+, are all characteristics that are identical between the proposed lantern ribozymes and the known twister sister class.

      The difference in cleavage speed between twister sister (~5 min-1) and proposed lantern ribozymes could be due to experimental set-up (true single-turnover kinetics?) or could be explained by testing LINE-1 or OR4K15 ribozymes without needed accessory sequences. In the case of the minimal hammerhead ribozyme, it has been previously observed that missing important tertiary contacts can lead to drastically reduced cleavage speeds.

      We thank the reviewer for this question. We now called the ribozymes as TS-like ribozymes. As we explained in the last question, the relative activity data of the original sequences have proven that the other part did not make any contribution to the activity of the ribozyme. Moreover, we have tested different enzyme to substrate ratios to achieve single turn-over kinetics (Figure S13). The difference in cleavage speed should be related to the absence of peripheral regions which do not exist in the original sequences of the LINE-1 and OR4K15 ribozyme. We have included the publications and discussion about the peripheral domains in other ribozymes.

      Line 458-463: The kobs of LINE-1-core was ~0.05 min-1 when measured in 10mM MgCl2 and 100mM KCl at pH 7.5 (Figure S13). Furthermore, the single-stranded ribozymes exhibited lower kobs (~0.03 min-1 for LINE-1-rbz) (Figure S14) when comparing with the bimolecular constructs. This confirms that the stem loop region SL2 does not contribute much to the cleavage activity of the TS-like ribozymes.

      Line 570-576: Thus, a more sophisticated structure along with long-range interactions involving the SL4 region in the twister sister ribozyme must have helped to stabilize the catalytic region for the improved catalytic activity. Similarly, previous studies have demonstrated that peripheral regions of hammerhead (49), hairpin (50) and HDV (51, 52) ribozymes could greatly increase their self-cleavage activity. Given the importance of the peripheral regions, absence of this tertiary interaction in the TS-like ribozyme may not be able to fully stabilize the structural form generated from homology modelling.

      Reviewer 2: ( Recommendations For The Authors):

      Major points

      It would have made it easier to connect the comments to text passages if the submitted manuscript had page numbers or even line numbers.

      We thank the reviewer for pointing this out and we have already fixed it.

      In the introduction: "...using the same technique, we located the functional and base-pairing regions of..." The use of the adjective functional is imprecise. Base-paired regions are also important for the function, so what type of region is meant here? Conserved nucleotides?

      We thank the reviewer for pointing this out. We were describing the regions which were essential for the ribozyme activity. And we have defined the use of “functional region” in introduction.

      Line 95: we located the regions essential for the catalytic activities (the functional regions) of LINE-1 and OR4K15 ribozymes in their original sequences.

      In their discussion, the authors mention the possible flaws in their 3D-modelling in the absence of Mg2+. Is it possible to include this divalent metal ion in the calculations?

      We thank the reviewer for this question. Currently, BriQ (Xiong et al. Nature Communications 2021) we used for modeling doesn’t include divalent metal ion in modeling.

      Xiong, Peng, Ruibo Wu, Jian Zhan, and Yaoqi Zhou. 2021. “Pairing a High-Resolution Statistical Potential with a Nucleobase-Centric Sampling Algorithm for Improving RNA Model Refinement.” Nature Communications 12: 2777. doi:10.1038/s41467-021-23100-4.

      Abstract:

      It is claimed that ribozyme regions of 46 and 47 nt described in the manuscript resemble the shortest known self-cleaving ribozymes. This is not correct. In 1988, hammerhead ribozymes in newts were first discovered that are only 40 nt long.

      We apologize for any confusion caused by our previous statement. To clarify, we highlighted “35 and 31 nucleotides only” as 46 and 47 nt contain the variable hairpin loops which are not important for the catalytic activity. By comparing the conserved segments, the TS-like ribozyme discussed in this paper is the shortest with the simplest secondary structure. And we have replaced the terms “smallest” and “shortest” with “simplest” in our manuscript. The title has been changed to “Minimal TS-like self-cleaving ribozyme revealed by deep mutational scanning”.

      The term "functional region" is, to my knowledge, not a set term when discussing ribozymes. Does it refer to the catalytic core, the cleavage site, the acid and base involved in cleavage, or all, or something else? Therefore, the term should be 1) defined upon its first use in the manuscript and 2) probably not be used in the abstract to avoid confusion to the reader.

      We apologize for any confusion caused by our previous statement. To clarify, we have changed the term “functional region” in abstract. And we have defined the use of “functional region” in introduction.

      Line 34-37: We found that the regions essential for ribozyme activities are made of two short segments, with a total of 35 and 31 nucleotides only. The discovery makes them the simplest known self-cleaving ribozymes. Moreover, the essential regions are circular permutated with two nearly identical catalytic internal loops, supported by two stems of different lengths.

      Line 95: we located the regions essential for the catalytic activities (the functional regions) of LINE-1 and OR4K15 ribozymes in their original sequences.

      The choice of the term "non-functional loop" in the abstract is a bit unfortunate. The loop might not be important for promoting ribozyme catalysis by directly providing, e.g. the acid or base, but it has important structural functions in the natural RNA as part of a hairpin structure.

      We thank the reviewer for pointing this out and we have re-phrased the sentences.

      Line 33-34: We found that the regions essential for ribozyme activities are made of two short segments, with a total of 35 and 31 nucleotides only.

      Line 283: Removing the peripheral loop regions (Figures 1B and 1C) allows us to recognize that the secondary structure of OR4K15-rbz is a circular permutated version of LINE-1-rbz.

      Results:

      Please briefly explain CODA and MC analysis when first mentioned in the results (Figure (1) The more detailed explanation of these terms for Figure 2 could be moved to this part of the results section (including explanations in the figure legend).

      We thank the reviewer for pointing this out and we included a brief explanation.

      Line 150-154: CODA employed Support Vector Regression (SVR) to establish an independent-mutation model and a naive Bayes classifier to separate bases paired from unpaired (26). Moreover, incorporating Monte-Carlo simulated annealing with an energy model and a CODA scoring term (CODA+MC) could further improve the coverage of the regions under-sampled by deep mutations.

      Please indicate the source of the human genomic DNA. Is it a patient sample, what type of tissue, or is it an immortalized cell line? It is not stated in the methods I believe.

      We thank the reviewer for pointing this out. According to the original Science paper (Salehi-Ashtiani et al. Science 2006), the human genomic DNA (isolated from whole blood) was purchased from Clontech (Cat. 6550-1). In our study, we directly employed the sequences provided in Figure S2 of the Science paper for gene synthesis. Thus, we think it is unnecessary to mention the source of genomic DNA in the methods section of our paper.  

      Please also refer to the methods section when the calculation of RA and RA' values is explained in the main text to avoid confusion.

      We thank the reviewer for pointing this out and we have fixed it.

      Line 207-208: Figure 2A shows the distribution of relative activity (RA’, measured in the second round of mutational scanning) (See Methods) of all single mutations

      For OR4K15 it is stated that the deep mutational scanning only revealed two short regions as important. However, there is another region between approx. 124-131 nt and possibly even at positions 47 and 52 (to ~55), that could contribute to effective RNA cleavage, especially given the library design flaws (see below) and the lower mutational coverage for OR4K15. A possible correlation of the mutations in these regions is even visible in the CODA+MC analysis shown in Figure 1D on the left. Why are these regions ignored in ongoing experiments?

      We thank the reviewer for this question. As shown in Table S1, although the double mutation coverage of OR4K15-ori was low (16.2 %), we got 97.6 % coverage of single mutations. The relative activity of these single mutations was enough to identify the conserved regions in this ribozyme. Mutations at the positions mentioned by the reviewer did not lead to large reductions in relative activity. Since the relative activity of the original sequence is 1, we presumed that only positions with average relative activity much lower than 1 might contribute to effective cleavage.

      Regarding the corresponding correlation of mutations in CODA+MC, they are considered as false positives generated from Monte Carlo simulated annealing (MC), because lack of support from the relative activity results.

      Have the authors performed experiments with their "functional regions" in comparison to the full-length RNA or partial truncations of the full-length RNA that included, in the case of OR4K15, nt 47-131? Also for LINE-1 another stem region was mentioned (positions 14-18 with 30-34) and two additional base pairs. Were they included in experiments not shown as part of this manuscript?

      We appreciate the reviewer for raising this question. We only compared the full-length or partial truncations of the LINE-1 ribozyme. Since the secondary structure predicted from OR4K15-ori data was almost the same as LINE-1, we didn’t perform deep mutagenesis on the partial truncation of the OR4K15. However, the secondary structure of OR4K15 was confirmed by further biochemical experiments.   

      Regarding the second question, the additional base pairs were generated by Monte Carlo simulated annealing (MC). They are considered as false positives because of low probabilities and lack of support from the deep mutational scanning results. The appearance of false positives is likely due to the imperfection of the experiment-based energy function employed in current MC simulated annealing. 

      Are there other examples in the literature, where error-prone PCR generates biases towards A/T nucleotides as observed here? Please cite!

      We thank the reviewer for pointing this out and we have included the corresponding citation.

      Line 161-162: The low mutation coverage for OR4K15-ori was due to the mutational bias (27, 28) of error-prone PCR (Supplementary Figures S1, S2, S3 and S4).

      Line 170-171: whose covariations are difficult to capture by error-prone PCR because of mutational biases (27, 28).

      The authors mention that their CODA analysis was based on the relative activities of 45,925 and 72,875 mutation variants. I cannot find these numbers in the supplementary tables. They are far fewer than the read numbers mentioned in Supplementary Table 2. How do these numbers (45,925 and 72,875) arise? Could the authors please briefly explain their selection process?

      We apologize for any confusion caused by our previous statement. Our CODA analysis only utilized variants with no more than 3 mutations. The number listed in the supplementary tables is the total number of the variants. To clarify, we have included a brief explanation for these numbers.

      Line 203-204: We performed the CODA analysis (26) based on the relative activities of 45,925 and 72,875 mutation variants (no more than 3 mutations) obtained for the original sequence and functional region of the LINE-1 ribozyme, respectively.

      What are the reasons the authors assume their findings from LINE-1 can be used to directly infer the structure for OR4K15? (Third section in results, last paragraph)

      We apologize for any confusion caused by our previous statement. We meant to say that the consistency between LINE-1-rbz and LINE-1-ori results suggested that our method for inferring ribozyme structure was reliable. Thus, we employed the same method to infer the structure of the functional region of OR4K15. To clarify, we have re-phrased the sentence.   

      Line 259-261: The consistent result between LINE-1-rbz and LINE-1-ori suggested that reliable ribozyme structures could be inferred by deep mutational scanning. This allowed us to use OR4K15-ori to directly infer the final inferred secondary structure for the functional region of OR4K15.

      There are several occasions where the authors use the differences between the proposed lantern ribozymes and twister sister data as reasons to declare LINE-1 and OR4K15 a new ribozyme class. As mentioned previously, I am not convinced these differences in structure and biochemical results could not simply result from testing incomplete LINE-1 and OR4K15 sequences.

      We apologize for any confusion caused by our previous statement. Despite we observed some differences in mutational effects, we agree with the reviewer that it is not convincing to claim them as a new ribozyme class. We have replaced all “lantern ribozyme” with “TS-like ribozyme” as the reviewer 1 suggested.

      The authors state, that "the result confirmed that the stem loop SL2 region in LINE-1 and OR4K15 did not participate in the catalytic activity". To draw such a conclusion a kinetic comparison between a construct that contains SL2 and does not contain SL2 would be necessary. The given data does not suffice to come to this conclusion.

      We appreciate the reviewer for raising this question. To address this, we performed gel-based kinetic analysis of these two ribozymes (Figure S14).

      Line 458-462: The kobs of LINE-1-core under single-turnover condition was ~0.05 min-1 when measured in 10mM MgCl2 and 100mM KCl at pH 7.5 (Figure S13). Only a slightly lower value of  kobs (~0.03 min-1) was observed for LINE-1-rbz (Figure S14). This confirms that the stem loop region SL2 does not contribute to the cleavage activity of the TS-like ribozymes.

      Construct/Library design:

      The last 31 bp in the OR4K15 ribozyme template sequence are duplicated (Supplementary Table 4). Therefore, there are 2 M13 fwd binding sites and several possible primer annealing sites present in this template. This could explain the lower yield for the mutational analysis experiments. Did the authors observe double bands in their PCR and subsequent analysis? The experiments should probably be repeated with a template that does not contain this duplication. Alternatively, the authors should explain, why this template design was chosen for OR4K15.

      We apologize for this mistake during writing. Our construct design for OR4K15 contains only one M13F binding site. We thank the reviewer for pointing this out and we have fixed the error.

      Figure 5B: Where are the bands for the OR4K15 dC-substrate? They are not visible on the gel, so one has to assume there was no substrate added, although the legend indicates otherwise.

      Also this figure, please indicate here or in the methods section what kind of marker was used. In panels A and B, please label the marker lanes.

      We apologize for this mistake and we have repeated the experiment. The marker lane was removed to avoid confusion caused by the inappropriate DNA marker. 

      The authors investigated ribozyme cleavage speeds by measuring the observed rate constants under single-turnover conditions. To achieve single-turnover conditions enzyme has to be used in excess over substrate. Usually, the ratios reported in the literature range between 20:1 (from the authors citation list e.g.: for twister sister (Roth et al 2014) and hatchet (Li et al. 2015)) or even ~100:1 (for pistol: Harris et al 2015, or others https://www.sciencedirect.com/science/article/pii/S0014579305002061). Can the authors please share their experimental evidence that only 5:1 excess of enzyme over the substrate as used in their experiments truly creates single-turnover conditions?

      We greatly appreciate the Reviewer for raising this question. To address this, we performed kinetic analysis using different enzyme to substrate ratios (Figure S13). There is not too much difference in kobs, except that kobs reach the highest value of 0.048 min-1 when using 100:1 excess of enzyme over the substrate. 

      Line 458-460: The kobs of LINE-1-core under single-turnover condition was ~0.05 min-1 when measured in 10mM MgCl2 and 100mM KCl at pH 7.5 (Figure S13).

      Citations:

      In the introduction citation number 12 (Roth et al 2014) is mentioned with the CPEB3 ribozyme introduction. This is the wrong citation. Please also insert citations for OR4K15 and IGF1R and LINE-1 ribozyme in this sentence.

      We thank the reviewer for pointing this out and we now have fixed it.

      Also in the introduction, a hammerhead ribozyme in the 3' UTR of Clec2 genes is mentioned and reference 16 (Cervera et al 2014) is given, I think it should be reference 9 (Martick et al 2008)

      We thank the reviewer for pointing this out and we now have fixed it.

      In the results section it is stated that, "original sequences were generated from a randomly fragmented human genomic DNA selection based biochemical experiment" citing reference 12. This is the wrong reference, as I could not find that Roth et al 2014 describe the use of such a technique. The same sentence occurs in the introduction almost verbatim (see also minor points).

      We thank the reviewer for pointing this out and we now have fixed it.

      Minor points

      Headline:

      Either use caps for all nouns in the headline or write "self-cleaving ribozyme" uncapitalized

      We thank the reviewer for pointing this out and we now have fixed it.

      Abstract:

      1st sentence: in "the" human genome

      "Moreover, the above functional regions are..." - the word "above" could be deleted here

      "named as lantern for their shape"- it should be "its shape"

      "in term of sequence and secondary structure"- "in terms"

      "the nucleotides at the cleavage sites" - use singular, each ribozyme of this class has only one cleavage site

      We thank the reviewer for pointing these out and we now have fixed them.

      Introduction:

      Change to "to have dominated early life forms"

      Change to "found in the human genome"

      Please write species names in italics (D. melanogaster, B. mori)

      Please delete "hosting" from "...are in noncoding regions of the hosting genome"

      Please delete the sentence fragment/or turn it into a meaningful sentence: "Selection-based biochemical experiments (12).

      Change to "in terms of sequence and secondary structure, suggesting a more"

      Please reword the last sentence in the introduction to make clear what is referred to by "its", e.g. probably the homology model of lantern ribozyme generated from twister sister ribozymes?

      Please refer to the appropriate methods section when explaining the calculation of RA and RA'.

      We thank the reviewer for pointing these out and we now have fixed them.

      The last sentence of the second paragraph in the second section of the results states that the authors confirmed functional regions for LINE-1 and OR4K15, however, until that point the section only presents data on LINE-1. Therefore, OR4K15 should not be mentioned at the end of this paragraph.

      In response to the reviewer's suggestions, we have removed OR4K15 from this paragraph.

      Line 225-228: The consistency between base pairs inferred from deep mutational scanning of the original sequences and that of the identified functional regions confirmed the correct identification of functional regions for LINE-1 ribozyme.

      Change to "Both ribozymes have two stems (P1, P2), to internal loops ..."

      We thank the reviewer for pointing this out and we now have fixed it.

      The section naming the "functional regions" of LINE-1 and OR4K15 lantern ribozymes should be moved after the section in which the circular permutation is shown and explained. Therefore, the headline of section three should read "Consensus sequence of LINE-1 and OR4K15 ribozymes" or something along these lines.

      We thank the reviewer for pointing this out and we now have fixed it.

      Line 308-309: Given the identical lantern-shaped regions of the LINE-1-rbz and OR4K15-rbz ribozyme, we named them twister sister-like (TS-like) ribozymes.

      The statement on the difference between C8 in OR4K15 and U38 in LINE-1 should be further classified. As U38 is only 95% conserved. Is it a C in those other instances or do all other nucleotide possibilities occur? Is the high conservation in OR4K15 an "artifact" of the low mutation rate for this RNA in the deep mutational scanning?

      We thank the reviewer for this question. Yes, the high conservation in OR4K15 an "artifact" of the low mutation rate for this RNA in the deep mutational scanning. That is why RA’ value is more appropriate to describe the conservation level of each position. We also mentioned this in the manuscript:

      Line 287-288: The only mismatch U38C in L1 has the RA’ of 0.6, suggesting that the mismatch is not disruptive to the functional structure of the ribozyme.

      Section five, first paragraph: instead of "two-stranded LINE-1 core" use the term "bimolecular", as it is more commonly used.

      We thank the reviewer for pointing this out and we now have changed it.

      Figure caption 3 headline states "Homology modelled 3D structure..."but it also shows the secondary structures of LINE1, OR4K15 and twister sister examples.

      We thank the reviewer for pointing this out and we now have removed “3D”.

      In Figure 3C, we see a nucleobase labeled G37, however in the secondary structure and sequence and 3D structural model there is a C37 at this position. Please correct the labeling.

      We thank the reviewer for pointing this out and we now have fixed it.

      Section 7 "To address the above question..." please just repeat the question you want to address to avoid any confusion to the reader.

      We thank the reviewer for pointing these out and we have re-phrased this sentence.

      Line 364: Considering the high similarity of the internal loops, we further investigated the mutational effects on the internal loop L1s.

      Please rephrase the sentence "By comparison, mutations of C62 (...) at the cleavage site did not make a major change on the cleavage activity...", e.g. "did not lead to a major change" etc.

      Section 8, first paragraph: This result further confirms that the RNA cleavage in lantern...", please delete "further"

      Change to "analogous RNAs that lacked the 2' oxygen atom in the -1 nucleotide"

      Methods

      Change to "We counted the number of reads of the cleaved and uncleaved..."

      Change to "...to produce enough DNA template for in vitro transcription."

      Change to "The DNA template used for transcription was used..." (delete while)

      We thank the reviewer for pointing these out and we now have fixed them.

      Supplement

      All supplementary figures could use more detailed Figure legends. They should be self-explanatory.

      Fig S1/S2: how is "mutation rate" defined/calculated?

      We thank the reviewer for pointing this out and we now have added a short explanation. The mutation rate was calculated as the proportion of mutations observed at each position for the DNA-seq library.

      Fig S3/S4: axis label "fraction", fraction of what? How calculated?

      We thank the reviewer for pointing this out and we now have added a short explanation. The Y axis “fraction” represents the ratio of each mutation type observed in all variants.

      Fig S5: RA and RA' are mentioned in the main text and methods, but should be briefly explained again here, or it should be clearly referred to the methods. Also, the axis label could be read as average RA' divided by average RA. I assume that is not the case. I assume I am looking at RA' values for LINE-1 rbz and RA values for LINE-1-ori? Also, mention that only part of the full LINE-1-ori sequence is shown...

      We thank the reviewer for pointing this out and we have now added a short explanation. The Y axis represents RA’ for LINE-1-rbz, or RA for LINE-1-ori. The part shown is the overlap region between LINE-1-rbz and LINE-1-ori. We apologize for any confusion caused by our previous statement.

      Fig S9 the magenta for coloring of the scissile phosphate is hard to see and immediately make out.

      We thank the reviewer for pointing this out and we now have added a label to the scissile phosphate.

      Fig S10: Why do the authors only show one product band here? Instead of both cleavage fragments as in Figure 5?

      We thank the reviewer for this question. We purposely used two fluorophores (5’ 6-FAM, 3’ TAMRA) to show the two product bands in Figure 5. In Fig S10, long-time incubation was used to distinguish catalysis based self-cleavage from RNA degradation. This figure was prepared before the purchasing of the substrate used in Figure 5. The substrate strand used in Fig S10 only have one fluorophore (5’ 6-FAM) modification. And the other product was too short to be visualized by SYBR Gold staining.

      Fig S13: please indicate meaning of colors in the legend (what is pink, blue, grey etc.)

      Please change to "RtcB ligase was used to capture the 3' fragment after cleavage...."

      We thank the reviewer for pointing this out and we now have fixed it.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Materials and Methods section:

      Cell gating and FACS sorting strategies need to be explained. There is no figure legend of supplementary figure 4 which is supposed to explain the gating strategy. Please detail the strategy for each cell types.

      Thank you for your suggestion. We have given a detailed description about the gating and FACS sorting strategies for different liver cell types in supplementary figure 1. In addition, flow cytometry plots of CD45+Ly6C-CD64+F4/80+ KCs from Bmp9fl/flBmp10fl/flLrat Cre mouse were also presented in supplementary figure 1.

      The genetic background of the different mouse strains and the age of the mice should be noted on each figure.

      All the mice used in our study are C57BL/6 background (method section). The age of the mice has been described on each figure.

      The Mann Whitney test instead of the two-tailed student's t-test should be used for the different statistical analyses. Why are the expression counts statically analyzed by 2-tailed Student's t test as they were already identified as DE in RNAseq statistical analysis?

      Thank you for your suggestion. Statical methods have been corrected in the revised manuscript.

      What is the age of the mice and how many are used for each bulk RNAseq?

      This information has been added on the corresponding figure legends.

      Figure 1:

      Figure 1a and c: The qPCR data would be much more interesting if presented as DDct and not as relative value as we do not see the mRNA levels of BMP9 and BMP10 in each Bmp9fl/flBmp10fl/flCre mouse. This would allow to compare the mRNA level of BMP9 versus BMP10. This should be changed in all figures.

      The presentation of qPCR data in Figure 1a have been changed, which is allowed to compare the abundance of BMP9 versus BMP10 mRNA. Figure 1c only shows the expression of BMP10, so it is unnecessary to present qPCR data as DDct. In our bulk RNA sequencing data of liver tissues, we found that BMP9 expression counts is higher than that of BMP10, in line with the data from BioGPS.

      Figure 1e (IF) and f (FACS), the quantification of these data should be added as shown in Fig2d. What is the difference between Fig1e and Fig2d as they both seem to show the quantification of F4/80 in CTL versus Bmp9fl/flBmp10fl/flLratCre mice. Are the cells sorted in Fig1f and 1e and suppl Fig1b? if yes please precise the strategy. If they are not gated how can the authors obtain 93% of KC? The reference Tillet et al., JBC 2018 should be added in the discussion of figure 1 as it is the first description of BMP10 in HSC.

      The quantitative data of Figure 1e and 1f have been added in our revised manuscript. Compared with other tissue-resident macrophages, CLEC4F as a KC-specific marker exclusively expressed on KCs. In our previous report (PMID: 34874921), we demonstrated that BMP9/10-ALK1 signal induced the expression of CLEC4F. The data shown in Figure 1e repeated this phenotype that upon loss of BMP9/10-ALK1 signal, liver macrophages did not express CLEC4F. F4/80 in Figure 1e was used as an internal positive control. Fig2d showed the quantification of F4/80 and CD64, two pan-macrophage markers, which was more accurate to measure the number of liver macrophages, especially given that F4/80 mean fluorescence intensity was reduced in liver macrophages of Bmp9fl/flBmp10fl/flLrat Cre mice. Cells in Fig1f, 1e and suppl Fig1b were not sorted and the flow cytometry plots of these cells were pre-gated on live CD45+Ly6C-CD64+F4/80+ liver macrophages. The reference Tillet et al., JBC 2018 has been added in our revised manuscript.

      Supplementary 4 should have a detailed figure legend and should appear before gating experiments. What cell subtype is used for each cell type gating. Please add the exact references of all the antibodies used and if they are fluorescently labeled antibodies. Why is the number of lymphocytes noted and how is it calculated? The gating strategy for the Bmp9fl/flBmp10fl/flLratCre mice should also be showed as the number of FA4/80+ and Tim4+ cells are decreased.

      A detailed figure legend has been added in original supplementary figure 4 that has been moved to supplementary figure 1 in our revised manuscript. The antibodies used in our study were also used in our previous report (PMID: 34874921) and others (PMID: 31561945; PMID: 26813785). Lymphocytes number on flow cytometry plots will automatically appear when we analyze flow cytometry data, so it does not mean that these selected cells are lymphocytes. To avoid the misunderstanding, these words have been deleted. The gating strategy of CD45+Ly6C-CD64+F4/80+ liver macrophages for the Bmp9fl/flBmp10fl/flLrat Cre mice was showed in our revised manuscript (Supplementary Figure 1).

      Figure 2:

      Figure 2a: How many mice were used for bulk RNAseq at what age? Please describe the gating strategy for sorting liver macrophages. The PCA should be shown. The genes represented in Fig2c and cited in the text should be shown on the volcano plot and the heatmap (Timd4, Cdh5, Cd5l). A reference for these KC and monocytic markers should be added in the text.

      Control and Bmp9fl/flBmp10fl/flLrat Cre mice at the age of 8-10 weeks (n=3/group) were used for bulk RNAseq. This information has been added in Figure 2a legend. The PCA, Timd4 gene and references for these KC and monocytic markers have been shown in our revised manuscript according to your suggestion.

      Figure 2b: How are selected the genes represented in the heatmap? The top ones? If it is a KC signature the authors should give a reference for this signature.

      These genes were KC signature genes. The reference (PMID: 30076102) has been given in our revised manuscript.

      Fig2e: Please explain what is the Vav1 promoter and in which cells it will delete Alk1and Smad4? The authors also need to show that Alk1 and Smad4 are indeed deleted in these mice and in which cell subtype (EC and KC?). This is an important point as the authors conclude that other molecular mechanisms than Smad4 signaling may affect the phenotypes of liver macrophages in Bmp9fl/flBmp10fl/flLratCre.

      Cre recombinase of Vav1Cre mice is expressed at high levels in hematopoietic stem cells (PMID: 27185381). This strain is widely used to target all hematopoietic cells with a high efficiency (PMID: 24857755). In our previous report (PMID: 34874921), we demonstrated that Alk1 (Supplemental Figure 6A) and Smad4 (Supplemental Figure 6G) were efficiently deleted in KCs from Alk1fl/flVav1Cre and Smad4fl/flVav1Cre mice, respectively. This sentence and reference have been added in our revised manuscript. Homozygous loss of ALK-1 causes embryonically lethality due to aberrant angiogenesis (PMID: 28213819). EC-specific ALK1 knockout in the mouse through deletion of the ALK1 gene from an Acvrl12loxP allele with the EC-specific L1-Cre line results in postnatal lethality at P5, and mice exhibiting hemorrhaging in the brain, lung, and gastrointestinal tract (PMID: 19805914). In contrast, Alk1fl/flVav1Cre mice generated in our lab did not observe this phenomenon or body weight loss, and still survived at the age of 16 weeks. Thus, we don’t think that ECs can be targeted by Vav1Cre strain, at least in our experimental system.

      Supl Figure 3 (revised Supl Figure 4): The authors need to explain what cell types are affected by Csf1r-Cre and Clec4fDTR. Have the authors tried to perform a similar experiment in Bmp9fl/flBmp10fl/flLratCre? The legend of the Y axis is not clear, why is CD45+ used in the first bar graph while the other two graphs use F4/80+?

      We (PMID: 34874921) and others (PMID: 31587991; PMID: 31561945; PMID: 26813785) have demonstrated that Clec4f specifically expressed on KCs and thus only KCs can be deleted in Clec4fDTR mice after DT injection. CSF1R, also known as macrophage colony-stimulating factor receptor (M-CSFR), is the receptor for the major monocyte/macrophage lineage differentiation factor CSF1. Thus, Csf1r-Cre strain can target monocyte, monocyte-derived macrophage and tissue-resident macrophage including liver, spleen, intestine, heart, kidney, and muscle with a high efficiency (PMID: 29761406). We did not perform a similar experiment in Bmp9fl/flBmp10fl/flLrat Cre mice as we have demonstrated that the differentiation of liver macrophages from Bmp9fl/flBmp10fl/flLrat Cre mice is inhibited. The other two graphs in Supl Figure 4C were obtained from Supl Figure 4B. Flow cytometry plots in Supl Figure 4B are pre-gated on CD45+Ly6C-CD64+F4/80+ liver macrophages, so it is appropriate to use F4/80+ as an internal control.

      Figure 3: Same remarks as in Figure 2. How many mice were used for bulk RNAseq, at what age? The PCA should be shown. How were selected the genes represented in the heatmap? The top ones? A reference should be given for the sinusoidal EC and the continuous EC signatures and large artery signature. Maf and Gata4 should be shown on the volcano plot. A quantification for CD34 IF (Fig3e) as well as for the quantification of the FACS data (Fig 3f) should be added.

      Control and Bmp9fl/flBmp10fl/flLrat Cre mice at the age of 8-10 weeks (n=3/group) were used for bulk RNAseq. According to your suggestion, other revisions have been made.

      Figure 4: A quantification and statistical analysis of Prussian staining area and GS IF should be added not just number of mice which were affected.

      A quantification and statistical analysis of Prussian staining area and GS IF has been added.

      Minor points:

      Few spelling mistakes that should be checked.

      Figure 5a, some bar graphs are missing.

      Spelling mistakes and missing bar graphs in Figure 5a have been corrected.

      Reviewer #2 (Recommendations For The Authors):

      The authors should provide some additional information:

      - Did the single HSC-KO mice for either BMP9 or BMP10 already show partial phenotypes?

      We think that under steady state, the phenotype of KCs and ECs, described in our manuscript, in the livers of single HSC-KO mice for either BMP9 or BMP10 was not altered. However, we don’t know whether the role of BMP9 and BMP10 is still redundant in liver diseases or inflammation, which is worth further studying.

      - The authors should also stain Endomucin, Lyve1, CD32b on liver tissue to assess endothelial zonation/differentiation in addition to FACS analysis.

      In our revised manuscript, we performed immunostaining for Endomucin and Lyve1 and found increased expression of Endomucin and decreased expression of Lyve1 (Figure 3g), suggesting that endothelial zonation/differentiation was disrupt in the liver of Bmp9fl/flBmp10fl/flLrat Cre mice compared to their littermates. We did not stain CD32b expression in the liver section as there is no good antibody against mouse CD32b for frozen sections.

      - Did the authors assess BMP9/BMP10 effects individually and combined in vitro on KC and EC? Are these likely only direct effects or may they also involve each other (i.e. also cross talk between KC and EC in response to BMP9/10?). This could be assessed in co-culture models.

      Using ALK1 reporter mice, we demonstrated that KCs and liver ECs express ALK1.We and others have shown that in vitro stimulation with BMP9/BMP10 can induce the expression of ID1/ID3 and GATA4/Maf in KCs and ECs (PMID: 34874921; PMID: 35364013; PMID: 30964206), respectively. These results suggested that BMP9/BMP10 can directly function on KCs and ECs. Indeed, we are also interested in the crosstalk between KCs and ECs. However, in vitro coculture system can not mimic the interaction between KCs and ECs in the liver as these cells will lose their identity upon their isolation from liver environment. Nevertheless, Bonnardel et al. applied Nichenet bioinformatic analysis to predict that liver ECs provide anchoring site, Notch and CSF1 signal for KCs (PMID: 31561945). Of course, this prediction still needs experimental validation.

      - The abstract should be rephrased and more specific focus on BMP related intercellular crosstalk in the liver and its implications for liver health and disease. At the end of the abstract they should also emphasize for which specific fields/topics/diseases these findings are important.

      Thank you for your suggestion. The abstract has been rephrased and we hope this abstract could satisfy you.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Response to Reviewer #1 (Public Review):

      The reviewer is correct that the previous explanation of the fitness calculation could be considered insufficient as it was only briefly described in Results. In the revised manuscript, in the "Supplementary Materials" section and then in "Supplementary Text 1", we provide a full definition of the fitness of strains carrying single or multiple mutations and thus show how epistasis was calculated.

      Response to Reviewer #2 (Public Review):

      In our opinion, the reviewer's comments relate to three issues. First, our finding that the level of transcription of the monosomic chromosomes is not upregulated was not compared with the results of other studies, including those in other organisms. Indeed, we did not mention that the gene dosage distortions introduced by aneuploidy are frequently and profoundly compensated in multicellular organisms. We cite the suggested broad and recent review paper in the revised manuscript (line 247). We also removed the somewhat provocative sentence: “The relationship between transcriptome and proteome is generally fixed in yeast”. Regarding this organism, both data and opinions remain indeed conflicting when considering the work with many different yeast strains. But the standard laboratory strains stand out as those where dosage compensation is absent or weak. A paper published a year ago states flatly: "... at least in the strain background used here (authors: BY, the same we use), aneuploidies are transmitted to transcriptome and proteome with a minimum of gene-dosage buffering, rendering aneuploidies discoverable by proteomics" (Messner et al. 2023). A more recent paper reports: "In lab-generated aneuploids, some proteins - especially subunits of protein complexes - show reduced expression, but the overall protein levels correspond to the aneuploid gene dosage" (Muenzner et al. 2024). This "reduced expression" was seen in disomics and was achieved by upregulated proteolysis, whereas we have monosomics and downregulated proteolysis. In summary, we cannot back away from our claim that the biases introduced by monosomy were not compensated. (It is not critical to our paper, we could do it and still leave our main claim about extraordinarily high positive epistasis intact). Muezner and colleagues do report compensation, but in "wild" strains. Our explanation would be that the existing yeast aneuploids are not a random sample of aneuploid mutations. In particular, they could be strains, perhaps relatively rare, in which the genetic background was permissive for aneuploidy from the start or allowed rapid evolution toward tolerance of aneuploidy. Strains with rigid gene-mRNA-protein relationships suffer so much that they perish unless they are shielded from selection, as is possible in the laboratory. The reviewer will know better whether this might also apply to multicellular organisms.

      The second concern is that we did not sufficiently report "... the trends of up- and downregulation across the genome and whether there are any genes on the varied chromosome that are dosage compensated". We believe we have indeed done this, albeit mostly in a simple graphical fashion. For the whole genomes, Datasheet 2 reports the extent of down- or up-regulation for each gene in each strain and highlights those that are statistically significant. We do not analyze the distributions of these deviations because they are relative. They represent individual gene down- and up-regulations within a monosomic transcriptome compared to the corresponding genes in the diploid transcriptome, with the total size of the transcriptomes set equal. Thus, the downs and ups cancel each other out, the left and right sides of the distribution would be equal in their totals, and we have no meaningful expectations about the possible variation in the shapes of the overall distributions or their opposite sides. As for the "varied chromosome", we show that there were extensive down- and up-regulations on the monosomic chromosomes, even though the mean expression for them was half that of the diploid chromosomes. This can be seen in Figure 3B as blue and red colored bars that are present on each monosomic chromosome and intermingled along its length. The purpose of these graphs is to show that even the genes in which the halving of the dose was most damaging to fitness (most negative values of rDR-1) did not tend to be upregulated on average (both blue and red colors are found among them). We consider this an important and original part of our data.

      Finally, the reviewer is concerned that we are only dealing with the relative abundance of mRNA species. He/she suggests that "... an experiment that would clarify the results would be to perform estimates of the total transcriptome size. If the general transcriptome size is indeed increased, the claims of reduced proteosome expression may need to be revised". We followed this advice and extracted transcriptomes from known amounts of yeast cells with known amounts of standard mRNA or "spike" added. We thus seriously considered the reviewer's suggestion, even though it was contrary to our intuition and, we believe, was not confirmed in the additional experiment. The results are reported in the last paragraph of Results and shown in Supplementary Figure S3. Our arguments are listed in that paragraph, so we will not repeat them here.

      Response to Reviewer #3 (Public Review):

      (1) Figure 3b – both its legend and reference to it in the main text are corrected in line with suggestions made by Reviewers #1 and #3.  

      (2) We had to restrict our mRNA analysis to about a half of strains. We decided for purely random selection. It left M4 outside but nevertheless included M2, M10 or M16 representing the strains with especially high level of epistasis. See msc. lines 161-162.

      (3) We agree, and say so in the article, that both the loss and gain of a copy of a chromosome most likely result in errors in mitosis. By "endoreduplication" we mean any event resulting in two chromosomes instead of one, not necessarily additional DNA replication as we now clarify. We also suggest that both loss and endoreduplication occurred in all strains, but in M7 and M13 they happened so close together that we could not isolate the rare monosomic cells from the rapidly spreading revertants (lines 86-91).

      Recommendations for the authors:

      Reply to Reviewer #1 (Recommendations for The Authors):

      The legend to Fig. 3b is hopefully clearer now.

      Reply to Reviewer #2 (Recommendations for The Authors):

      We understand that these points were raised also in the public review so the answer to the latter is also relevant to the recommendations for authors.

      Reply to Reviewer #3 (Recommendations for The Authors):

      (1) The first sentence of this comment may be based on a misinterpretation of our main argument. We believe that the upregulation of ribosomal protein (RP) coding genes was not helpful, but harmful. It was costly because RPs are a large part of the proteome, but it did not help translation because it did not restore the stoichiometry of RPs. This unproductive investment reduced the rate of remaining metabolism, so that other impairments introduced by halving the doses of other genes were no longer critical, and this made them unobservable at the level of phenotype, i.e., produced epistasis. However, both this Reviewer and Reviewer 2 seem to suggest that an entire translational apparatus may have been expanded, compensating for its reduced efficiency (per transcript). Reviewer 2 suggested an mRNA spike as a standard, and we followed this approach as more accessible to us. (We reiterate our claim of good agreement between mRNAs and proteins in the BY strain, supported by two new important papers, line 256-257). The results are reported in the last paragraph of Results. We believe that they indicate a reduction, not an increase, in the translational apparatus (including its parts encoded on the monosomal chromosomes), so that our explanation of positive epistasis remains unchallenged.

      (2) We re-examined the sequences and found that there were heterozygous SNPs in the same gene, RSP5, in several strains. One was a loss of a START codon (M3, M4, M6, M8, M9, M10, M14, M16), always the same. The other was a substitution, always the same, in M5, M11 and M15. There were no mutations in this gene in M1 and M2. We tested our stock haploid strains BY4741 and 4742 and found that they were not mutated. However, we also recovered the specific haploid strains used in the final crosses to construct the diploid strains used to obtain monosomics. Some had one of the two mutations, some were clean. All grew normally, the mutants were similar to the wild types, indicating that the fitness effect of the mutations, even in haploids, was at most partial, since the expected severe effects of RSP5 inactivation were not visible.

      Where do the mutations come from? In previous experiments, we subjected some BY strains to severe selection regimes. As we can now surmise, mutations in RSP5 helped to resist some of them, especially those involving overexpression of selected genes. (We do not summarize here the results of our lengthy review of our notes and the literature leading this explanation to be the most plausible). Unfortunately, we used strains that went through that harsh selection in crosses serving to derive another collection of strains, those used here.

      How critical is it? First, the mutations were heterozygous, which further reduced their apparently weak effects. Second, M1 and M2 were free of them. Third, we tried to get clean monosomics, i.e. with type homozygous for RSP5. We obtained such strains with monosomy as the only change for M9, M10 and M16. The other three attempts did not yield correct M3, M5 and M6, but complex aneuploids. This is normal, as we explain (complain) in Results. We would have to isolate a large number of potential monosomies and then sequence them to show that all exact monosomies can be derived in the absence of mutations in RSP5. We believe that after an effort comparable to that required to obtain the first set of monosomics, we would complete it. For financial and organizational reasons, this is not possible at this time. We do not consider it necessary to complete the revision. Note that of the five mutation-free straight monosomics, M2, M10 and M16 are among the most affected and thus have the highest positive epistasis. Yes, the role of point mutations cannot be excluded for other monosomics, although we strongly believe it is unlikely. But we have removed all our previous claims that our monosomies are certainly not supported by other genetic changes. Most importantly, our main claim of positive epistasis in its purely descriptive genetic sense remains unaffected. The main functional argument also holds: the indiscriminate overproduction of unbalanced RP proteins was so costly that inefficiencies introduced in functional modules other than biosynthesis become much less relevant. Thus, the main message of our work does not depend on the thinkable, in our view unlikely, role of mutations in RSP5.

      We provided this lengthy explanation to show that we cared about the reviewer's comment and tried to deal with it in an honest way. It was a lot of pain and no gain for us, but we are still grateful for the opportunity to re-examine our main claims.  

      (3) The 16 (non-essential) plus 16 (essential) strains were replicated 3 times each. In preliminary experiments, we tested that they were not statistically different (using one-way ANOVA). We considered these 32 strains to have the same genetic background, and thus we considered the 96 estimates homogeneous, except for being influenced only by environmental variation or random error.

      (4) We changed the description of Figure 3b to explain that a particular color shows a range (not its boundary) of log2 fold change (FC) relative to the control.

      (5) Corrected.

      (6) Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this important study, Huffer et al posit that non-cold sensing members of the TRPM subfamily of ion channels (e.g., TRPM2, TRPM4, TRPM5) contain a binding pocket for icilin which overlaps with the one found in the cold-activated TRPM8 channel.

      The authors identify the residues involved in icilin binding by analyzing the existing TRPM8-icilin complex structures and then use their previously published approach of structure-based sequence comparison to compare the icilin binding residues in TRPM8 to other TRPM channels. This approach uncovered that the residues are conserved in a number of TRPM members: TRPM2, TRPM4, and TRPM5. The authors focus on TRPM4, with the rationale that it has the simplest activation properties (a single Ca2+-binding site). Electrophysiological studies show that icilin by itself does not activate TRPM4, but it strongly potentiates the Ca2+ activation of TRPM4, and introducing the A867G mutation (the mutation that renders avian TRPM8 sensitive to icilin) further increases the potentiating effects of the compound. Conversely, the mutation of a residue that likely directly interacts with icilin in the binding pocket, R901H, results in channels whose Ca2+ sensitivity is not potentiated by icilin.

      The data indicate that, just like in TRPV channels, the binding pockets and allosteric networks might be conserved in the TRPM subfamily.

      The data are convincing, and the authors employ good experimental controls.

      We appreciate the supportive feedback of this reviewer.

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to study whether the cooling agent binding site in TRPM8, which is located between the S1-S4 and the TRP domain, is conserved within the TRPM family of ion channels. They specifically chose the TRPM4 channel as the model system, which is directly activated by intracellular Ca2+. Using electrophysiology, the authors characterized and compared the Ca2+ sensitivity and the voltage dependence of TRPM4 channels in the absence and presence of synthetic cooling agonist icilin. They also analyzed the mutational effects of residues (A867G and R901H; equivalent mutations in TRPM8 were shown involved in icilin sensitivity) on Ca2+ sensitivity and voltage-dependence of TRPM4 in the absence and presence of Ca2+. Based on the results as well as structure/sequence alignment, the authors concluded that icilin likely binds to the same pocket in TRPM4 and suggested that this cooling agonist binding pocket is conserved in TRPM channels.

      Strengths:

      The authors gave a very thorough introduction to the TRPM channels. They have nicely characterized the Ca2+ sensitivity and the voltage-dependence of TRPM4 channels and demonstrated icilin potentiates the Ca2+ sensitivity and diminishes the outward rectification of TRPM4. These results indicate icilin modulates TRPM4 activation by Ca2+.

      We appreciate the supportive feedback of this reviewer.

      Weaknesses:

      The reviewer has a few concerns. First, icilin alone (at 25µM) and in the absence of Ca2+ does not activate the TRPM4 channel. Have the authors titrated a wide range of icilin concentrations (without Ca2+ present) for TRPM4 activation? It raises the question that whether icilin is indeed an agonist for TRPM4 channel. This has not been tested so it is unclear. One may argue that icilin needs Ca2+ as a co-factor for channel activation just like in TRPM8 channel. This leads to the second concern, which is a complication in the experimental design and data interpretation. TRPM4 itself requires Ca2+ for activation to begin with, thus it is hard to dissect whether the current observed here for TRPM4 is activated by Ca2+ or by icilin plus its cofactor Ca2+. This is the difference between TRPM8 and TRPM4, as TRPM8 itself is not activated by Ca2+, thus TRPM8 activation is through icilin and Ca2+ acts as a prerequisite for icilin activation.

      We agree that the comparison between TRPM8 and TRPM4 is not perfect because TRPM4 requires Ca2+ for activation, but it is clear that the current activated by Ca2+ in the presence of icilin also involves icilin because it activates at lower Ca2+ concentrations and lower voltages. We have tested icilin at concentrations between 12.5 and 25 µM and at these concentrations icilin does not activate TRPM4 when applied alone, so we have no evidence that it is an agonist. Both of these concentrations are higher than those reported by Chuang et al. to be saturating for TRPM8 in the presence of Ca2+. We haven’t tested icilin at higher concentrations because we wanted to keep the final concentration of DMSO low enough to avoid any effects of the vehicle. We now emphasize this even more clearly in the revised manuscript.

      The results presented in this study are only sufficient to show that icilin modulates the Ca2+-dependent activation of TRPM4 and icilin at best may act as an allosteric modulator for TRPM4 function. One cannot conclude from the current work that icilin is an agonist or even specifically a cooling agonist for TRPM4. Icilin is a cooling agonist for TRPM8, but it does not mean that if icilin modulates TRPM4 activity then it serves as a cooling agonist for TRPM4.

      We agree with these comments, and we believe that the intent of our statements in the manuscript are completely in line with this perspective. We never refer to icilin as a cooling agent for TRPM4 but rather refer to the cooling agent binding pocket in TRPM8 and how that appears to be conserved and functions in TRPM4 to modulate opening of the channel. We have carefully gone through the manuscript to refer directly to icilin by name (rather than as a cooling agent) when referring to its actions on TRPM4 to make sure there is no confusion.

      For the mutation data on A867G, Figure 4A-B, left panels, it looks like A867G has stronger Ca2+ sensitivity compared to the WT in the absence of icilin and the onset of current activation is faster than the WT, or this is simply due to the scale of the data figure are different between A867G and the WT. Overall the mutagenesis data are weak to support the conclusion that icilin binds to the S1-S4 pocket. The authors need to mutate more residues that are involved in direct interaction with icilin based on the available structural information, including but limited to residues equivalent to Y745 and H845 in human TRPM8.

      The A867G mutant does seem to promote opening by Ca2+ in the absence of icilin, and we now comment on this in the manuscript. Having said that, we have not carefully studied the concentration-dependence for activation by Ca2+ because at higher concentrations we see evidence of desensitization. We think Ca2+, icilin and depolarized voltages promote an open state of TRPM4 and the A867G does so as well.

      We respectfully disagree about the strength of mutagenesis results present in our manuscript. We present clear gain and loss of function for two mutants corresponding to influential residues within the cooling agent binding pocket of TRPM8. We agree that Y786 mutations would have been a valuable addition, and our plan was to include mutations of this residue. Unfortunately, both the Y786A and Y786H mutants exhibited rundown to repeated stimulation by Ca2+, making them challenging to obtain reliable results on their effects on modulation by icilin.

      The authors set out to study the conservation of the cooling agonist binding site in TRPM family, but only tested a synthetic cooling agonist icilin on TRPM4. In order to draw a broad conclusion as the title and the discussion have claimed, the authors need to more cooling compounds, including the most well-known natural cooling agonist menthol, and other cooling agonists such as WS-12 and/or C3, and test their effects on several TRPM channels, not just TRPM4. With the current data, the authors need to significantly tone down the claim of a conserved cooling agonist binding pocket in the TRPM family.

      We would have liked to broaden the scope to other ligands that modulate TRPM8 and we agree that including those data would certainly reinforce our conclusions. However, the first author recently moved on to a new faculty position and extending our findings would require enlisting another member of the lab and take away from their independent projects. We also do not agree that this is essential to support any of our conclusions. It is also important to keep in mind that icilin is a high-affinity ligand for TRPM8, such that weaker interactions with TRPM4 can still be readily observed. We think it is likely that lower affinity agonists like menthol might not have sufficient affinity to see activity in TRPM4. This scenario is not unlike our earlier experience with TRPV channels where we succeeded in engineering vanilloid sensitivity into TRPV2 and TRPV3 using the high affinity agonist resiniferatoxin (Zhang et al., 2016, eLife). In the case of TRPV2, another group had made the same quadruple mutant and failed to see activation by capsaicin even though resiniferatoxin also worked in their hands (see Fig. 2 in Yang et al., 2016, PNAS).

      On page 11, the authors suggest based on the current data, that TRPM2 and TRPM5 may also be sensitive to cooling agonists because the key residues are conserved. TRPM2 is the closest homolog to TRPM8 but is menthol-insensitive. There are studies that attempted to convert menthol sensitivity to TRPM2, for example, Bandell 2006 attempted to introduce S2 and TRP domains from TRPM8 into TRPM2 but failed to make TRPM2 a menthol-sensitive channel. The sequence conservation or structural similarity is not sufficient for the authors to suggest a shared cooling agonist sensitivity or even a common binding site in the TRPM2 and TRPM5 channels. Again, as pointed out above, the authors need to establish the actual activation of other TRPM channels by these agonists first, before proceeding to functionally probe whether other TRPM channels adopt a conserved agonist binding site.

      We are somewhat confused by these comments because we do not comment about whether cooling agents can activate TRPM2 or TRPM5. We simply analyzed the structures to make the point that the key residues in the cooling agent binding pocket of TRPM8 are conserved in these other TRPM channels. The Bandell paper is relevant, but it is also possible that they failed to uncover a relationship because they only used an agonist that has relatively low affinity for TRPM8. It would have been interesting to see what they might have found if they had used a high-affinity ligand like icilin instead of a low affinity ligand like menthol.

      Taken together, this current work presents data to show the modulatory effects of icilin on the Ca2+ dependent activation and voltage dependence of the TRPM4 channel.

      We agree.

      Reviewer #3 (Public Review):

      Summary:

      The family of transient receptor potential (TRP) channels are tetrameric cation selective channels that are modulated by a variety of stimuli, most notably temperature. In particular, the Transient receptor potential Melastatin subfamily member 8 (TRPM8) is activated by noxious cold and other cooling agents such as menthol and icilin and participates in cold somatosensation in humans. The abundance of TRP channel structural data that has been published in the past decade demonstrates clear architectural conservation within the ion channel family. This suggests the potential for unifying mechanisms of gating despite their varied modes of regulation, which are not yet understood. To address this question, the authors examine the 264 structures of TRP channels determined to date and observe a potential binding pocket for icilin in multiple members of the Melastatin subfamily, TRPM2, TRPM4, and TRPM5. Interestingly, none of the other Melastatin subfamily members had been shown to be sensitive to icilin apart from TRPM8. Each of these channels is activated by intracellular calcium (Ca2+) and a Ca2+ binding site neighbors the predicted pocket for icilin binding in all cryo-EM structures. The authors examined whether icilin could modulate the activation of TRPM4 in the presence of intracellular Ca2+. The addition of icilin enhances Ca2+-dependent activation of TRPM4, promotes channel opening at negative membrane potentials, and improves the kinetics of opening. Furthermore, mutagenesis of TRPM4 residues within the putative icilin binding pocket predicted to enhance or diminish TRPM4 activity elicit these behaviors. Overall, this study furthers our understanding of the Melastatin subfamily of TRP channel gating and demonstrates that a conserved binding pocket observed between TRPM4 and TRPM8 channel structures can function similarly to regulate channel gating.

      Strengths:

      This is a simple and elegant study capitalizing on a vast amount of high-resolution structural information from the TRP channel of ion channels to identify a conserved binding pocket that was previously unknown in the Melastatin subfamily, which is interrogated by the authors through careful electrophysiology and mutagenesis studies.

      Weaknesses:

      No weaknesses were identified by this reviewer.

      We appreciate the supportive comments of the review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I don't have any major asks, but a few questions did arise while reading your work.

      (1) You refer multiple times to the VSLD pocket as being "open to the cytoplasm". It is not clear if you are implying that compounds such as icilin access the pocket via the cytoplasm (e.g., permeate the membrane to the cytosol, and then enter the binding site?) Is there data to support this? Some clarification here would be helpful, and perhaps explain if there is any distinction between how calcium might enter the VSLD binding site vs hydrophobic compounds like icilin.

      This is an excellent point. Our reference to “open to the cytoplasm” was for Ca2+ ions and we have no evidence for how icilin enters the cooling agent binding pocket. We had tried to look for evidence that Ca2+ might trap icilin in TRPM4 but at the end of the day the results were not convincing enough to include in the manuscript. We have added data showing that icilin slows deactivation of TRPM4 after removing Ca2+, which is particularly evident in the A867G mutant, but this doesn’t inform on whether Ca2+ can trap icilin. We have added a statement about not knowing how icilin enters or leaves the cooling agent binding pocket in TRPM channels.

      (2) Icilin is referred to as a "cooling compound", but its cooling effects are dependent on its interactions with TRPM8. This might be something to clarify, as it might otherwise be understood that other TRPM channels that interact with icilin also mediate the sensing of cool temperatures.

      This is another excellent point and we have no reason to believe that icilin interacting with any TRPM channel other than TRPM8 mediates cooling sensations. We have added a statement to this effect in the discussion when considering actions of icilin that might be mediated by TRPM4 channels.

      Reviewer #2 (Recommendations For The Authors):

      (1) The title and statements in the results/discussion refer to icilin as a cooling agonist of TRPM4 and binds to a conserved "cooling agonist binding pocket", and the authors suggested a similar role and binding site for icilin in TRPM2 and TRPM5 channel. It is a too broad conclusion that is not fully supported by the current experimental data, which only shows icilin works as a modulator, not an agonist for TRPM4 channel. The authors should change the usage of cooling agonist or conserved cooling agonist binding pocket plus significantly tone down the conclusion of a conserved cooling agonist binding pocket, which is potentially misleading. Alternatively, if the authors insist on using cooling agonist in this context, they should establish the activation of TRPM4, TRPM2, and TRPM5 by icilin as the first step, because the current data only support icilin as a TRPM4 modulator but not an agonist.

      We respectfully don’t agree with this opinion. We show broad conservation of the cooling agent binding pocket in structures of many TRPM channels, and we chose one of them to test for a functional relationship. We think that the title accurately reflects the topic of the paper and does not specify the extent to which functional conservation has been demonstrated and we would like to keep it. The distinction between agonist and modulator is not even germane because icilin is not an agonist of TRPM8 either.

      (2) The manuscript will be strengthened if the authors test additional cooling compounds of TRPM8, including menthol, the menthol analog WS-12, and C3. More importantly, distinct from icilin, these three compounds do not depend on Ca2+ to activate the TRPM8 channel. Thus when testing these compounds on TRPM4, it may reduce the complication of the role of Ca2+, as TRPM4 channel itself requires Ca2+ for activation.

      We restate our response to this point in the public review…

      We would have liked to broaden the scope to other ligands that modulate TRPM8 and we agree that including those data would certainly reinforce our conclusions. However, the first author recently moved on to a new faculty position and extending our findings would require enlisting another member of the lab and taking away from their independent projects. We also do not agree that this is essential to support any of our conclusions. It is also important to keep in mind that icilin is a high-affinity ligand for TRPM8, such that weaker interactions with TRPM4 can still be readily observed. We think it is likely that lower affinity agonists like menthol might not have sufficient affinity to see activity in TRPM4 This scenario is not unlike our earlier experience with TRPV channels where we succeeded in engineering vanilloid sensitivity into TRPV2 and TRPV3 using the high affinity agonist resiniferatoxin (Zhang et al., 2016, eLife). In the case of TRPV2, another group had made the same quadruple mutant and failed to see activation by capsaicin even though resiniferatoxin also worked in their hands (see Fig. 2 in Yang et al., 2016, PNAS).

      (3) The manuscript will be strengthened if the authors test additional residues in the S1-S4 pocket that form direct interactions or are within interacting distances with icilin based on the cryo-EM structures.

      We restate our response to this point in the public review…

      We present clear gain and loss of function for two mutants corresponding to influential residues within the cooling agent binding pocket of TRPM8. We agree that Y786 mutations would have been a valuable addition and our plan was to include mutations of this residue. Unfortunately, both the Y786A and Y786H mutants exhibited rundown, making them challenging to obtain reliable results on their effects on modulation by icilin.

      Furthermore, the ambiguity in the icilin binding pose based on available TRPM8 structures complicates structure-based identification of the most important interacting residues in TRPM8, and we would have needed to functionally validate the effects of any novel mutations we identified in TRPM8 prior to testing them in TRPM4. Instead, we have based our mutagenesis on constructs that have been previously characterized to affect the sensitivity of TRPM8 to cooling agents. A systematic mutagenesis scan of TRPM8 residues predicted to interact differentially with icilin in the two different available binding poses would likely help clarify the true binding pose of icilin and would be an interesting future study.

      Reviewer #3 (Recommendations For The Authors):

      I enjoyed reading this manuscript. It was well-executed and written. It will be interesting to corroborate these findings with a cryo-EM structure of TRPM2, TRPM4, or TRPM5 in the presence of icilin.

      We agree and may pursue these in future studies. This would be particularly interesting given ambiguities in how icilin docks into TRPM8 in previously published structures.

      Minor comments/questions:

      Have the authors considered icilin accessibility to its binding pocket? In other words, could the presence of intracellular Ca2+ inhibit the accessibility of icilin to its binding pocket in TRPM4? It should be a straightforward experiment, I think it would be informative, and could further support the authors' conclusion of the location of the TRPM4 icilin binding pocket.

      We completely agree and we had tried to look for evidence that Ca2+ might trap icilin in TRPM4 but at the end of the day the results were not convincing enough to include in the manuscript. We have added data showing that icilin slows deactivation of TRPM4 after removing Ca2+, which is particularly evident in the A867G mutant, but this doesn’t inform on whether Ca2+ can trap icilin. We have added a statement about not knowing how icilin enters or leaves the cooling agent binding pocket in TRPM channels.

      Figures 7 and 8 are missing the 0 µM Ca2+ control trace in the presence of 25 µM icilin.

      All sample traces from Figures 7 and 8 are shown from a single cell for the sake of comparison (Likewise, the sample traces from Figures 3 and 4 come from a single cell, and the sample traces from Figures 5 and 6 come from a single cell). Unfortunately, we were unable to obtain data from an R901H mutant cell that contained all six conditions we wished to show, and there is no representative trace for 0 µM Ca2+ in the presence of 25 µM icilin for that cell.

      This is up to the discretion of the authors, but perhaps a better way to arrange the paper Figures would be to combine Figures 5-6 and Figures 7-8 and rearrange the data to place some in a supplementary figure (e.g. Figure 5-6 = Figure 5 and Figure 5 - Figure Supplement 1, Figure 7-8 = Figure 6 and Figure 6 - Figure Supplement 1).

      We carefully considered these suggestions and we appreciate the reviewers’ flexibility but would prefer to retain the original arrangement of data in the figures.

      Are there any mutations in the icilin binding pocket in TRPM4, and presumably TRPM2 and TRPM5, that are associated with human disease? This is a question that came to my mind and not one that needs to be addressed in the manuscript.

      This is an interesting point. There are quite a few disease-associated mutants within TRPM4 at positions corresponding to the cooling agent binding pocket in TRPM8. We could not see an appropriate place in the discussion where we could concisely bring this information in so we decided against commenting.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors want to determine the role of the sperm hook of the house mouse sperm in movement through the uterus. They use transgenic lines with fluorescent labels to sperm proteins, and they cross these males to C57BL/6 females in pathogen-free conditions. They use 2-photon microscopy on ex vivo uteri within 3 hours of mating and the appearance of a copulation plug. There are a total of 10 post-mating uteri that were imaged with 3 different males. They provide 10 supplementary movies that form the basis for some of the quantitative analysis in the main body figures. Their data suggest that the role of the sperm hook is to facilitate movement along the uterine wall.

      Strengths:

      Ex vivo live imaging of fluorescently labeled sperm with 2-photon microscopy is a powerful tool for studying the behavior of sperm.

      Weaknesses:

      The paper is descriptive and the data are correlations.

      The authors cannot directly test their proposed function of the sperm hook in sliding and preventing backward slipping.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I suggest that the authors clearly state and explain in the manuscript that this study is limited with respect to the ability to "directly test the role of the sperm hook in facilitating movement along the uterine wall". I think that if they make this statement in the manuscript, perhaps at the end of the abstract, then the strength of evidence for their claims could be deemed as solid after re-review.

      We thank the reviewer again for the review process. We believe that our manuscript has improved considerably during the review process. Regarding the limitations and future work, we have added the following to the discussion section.

      “Further investigation of sperm behaviour inside the female reproductive tract or tissue mimicking microfluidic devices with real-time deep tissue imaging as in the current study, will provide valuable opportunities for a more comprehensive examination of both sperm-sperm and sperm-epithelium interactions in the female reproductive tract. While we have focused on observing sperm interactions for only natural healthy mice in this study, future works employing specifically targeted genetically modified knockout animal models will further elucidate and confirm the exact genetic and functional mechanisms that guide these interactions.”

      The revised manuscript is an improvement over the initial submission. I suggest that the authors mark the oviduct explicitly in Fig. 1A.

      The oviduct includes the ampulla, isthmus, and UTJ. We have additionally marked the oviduct in Fig. 1A, with according arrows and a box.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is a well-written and detailed manuscript showing important results on the molecular profile of 4 different cohorts of female patients with lung cancer.

      The authors conducted comprehensive multi-omic profiling of air-pollution-associated LUAD to study the roles of the air pollutant BaP. Utilizing multi-omic clustering and mutation-informed interface analysis, potential novel therapeutic strategies were identified.

      Strengths:

      The authors used several different methods to identify potential novel targets for therapeutic interventions.

      Weaknesses:

      Statistical test results need to be provided in comparisons between cohorts.

      We appreciate your recognition and valuable suggestions.. We have revised statistical test results in the panels including: Fig. 3b, e and g.

      Reviewer #2 (Public Review):

      Summary:

      Zhang et al. performed a proteogenomic analysis of lung adenocarcinoma (LUAD) in 169 female never-smokers from the Xuanwei area (XWLC) in China. These analyses reveal that XWLC is a distinct subtype of LUAD and that BaP is a major risk factor associated with EGFR G719X mutations found in the XWLC cohort. Four subtypes of XWLC were classified with unique features based on multi-omics data clustering.

      Strengths:

      The authors made great efforts in performing several large-scale proteogenomic analyses and characterizing molecular features of XWLCs. Datasets from this study will be a valuable resource to further explore the etiology and therapeutic strategies of air-pollution-associated lung cancers, particularly for XWLC.

      Weaknesses:

      (1) While analyzing and interpreting the datasets, however, this reviewer thinks that authors should provide more detailed procedures of (i) data processing, (ii) justification for choosing methods of various analyses, and (iii) justification of focusing on a few target gene/proteins in the datasets for further validation in the main text.

      We appreciate your valuable feedback. In response to the suggestions for enhancing the manuscript's clarity, we have provided more detailed procedures in the main text and methods sections.

      (2) Importantly, while providing the large datasets, validating key findings is minimally performed, and surprisingly there is no interrogation of XWLC drug response/efficacy based on their findings, which makes this manuscript descriptive and incomplete rather than conclusive. For example, testing the efficacy of XWLC response to afatinib combined with other drugs targeting activated kinases in EGFR G719X mutated XWLC tumors would be one way to validate their datasets and new therapeutic options.

      We appreciate your suggestion. In reference to testing the efficacy of XWLC response to afatinib combined with drugs targeting kinases, we have planned to establish PDX and organoid models to validate the effectiveness of our therapeutic approach. Due to the extended timeframe required, we intend to present these results in a subsequent study.

      (3) The authors found MAD1 and TPRN are novel therapeutic targets in XWLC. Are these two genes more frequently mutated in one subtype than the other 3 XWLC subtypes? How these mutations could be targeted in patients?

      Thank you for your question. We have investigated the TPRN and MAD1 mutations in our dataset, identifying five TPRN mutations and eight MAD1 mutations. Among the TPRN mutations, XWLC_0046 and XWLC_0017 belong to the MCII subtype, XWLC_0012 belongs to the MCI subtype, and the subtype of the other three samples is undetermined, resulting in mutation frequencies of 1/16, 2/24, 0/15, and 0/13, respectively. Similarly, for the MAD1 mutations, XWLC_0115, XWLC_0021, and XWLC_0047 belong to the MCII subtype, XWLC_0055 containing two mutations belongs to the MCI subtype, and the subtype of the other three samples is undetermined, resulting in mutation frequencies of 1/16, 3/24, 0/15, and 0/13 across subtypes, respectively. Fisher’s test did not reveal significant differences between the subtypes.

      For targeting novel therapeutic targets such as MAD1 and TPRN, we propose a multi-step approach. Firstly, we advocate for conducting functional in vivo and in vitro experiments to verify their roles during cancer progression. Secondly, we suggest conducting small molecule drug screening based on the pharmacophore of these proteins, which may lead to the identification of potential therapeutic drugs. Lastly, we recommend testing the efficacy of these drugs to further validate their potential as effective treatments.

      (4) In Figures 2a and b: while Figure 2a shows distinct genomic mutations among each LC cohort, Figure 2b shows similarity in affected oncogenic pathways (cell cycle, Hippo, NOTCH, PI3K, RTK-RAS, and WNT) between XWLC and TNLC/CNLC. Considering that different genomic mutations could converge into common pathways and biological processes, wouldn't these results indicate commonalities among XWLC, TNLC, and CNLC? How about other oncogenic pathways not shown in Figure 2b?

      Thank you for your question. Based on the data presented in Fig. 2a, which encompasses all genomic mutations, it appears that the mutation landscape of XWLC bears the closest resemblance to TSLC (Fig. 2a). However, when considering oncogenic pathways (Fig. 2b) and genes (Fig. 2c), there is a notable disparity between the two cohorts. These findings suggest that while XWLC and TSLC exhibit similarities in terms of genomic mutations, they possess distinct characteristics in terms of oncogenic pathways and genes.

      Regarding the oncogenic signaling pathways, we referred to ten well-established pathways identified from TCGA cohorts. These members of oncogenic pathways are likely to serve as cancer drivers (functional contributors) or therapeutic targets, as highlighted by Sanchez-Vega et al. in 2018(Sanchez-Vega et al., 2018).

      (5) In Figure 2c, how and why were the four genes (EGFR, TP53, RBM10, KRAS) selected? What about other genes? In this regard, given tumor genome sequencing was done, it would be more informative to provide the oncoprints of XWLC, TSLC, TNLC, and CNLC for complete genomic alteration comparison.

      Thank you for your question and good suggestion. Building upon our previous study (Zhang et al., 2021), we found that EGFR, TP53, RBM10, and KRAS were the top mutated genes in Xuanwei lung cancer cohorts. Furthermore, we have included the mutation frequency of cancer driver genes (Bailey et al., 2018) across XWLC, TSLC, TNLC, and CNLC in Supplementary Table 2b.

      (6) Supplementary Table 11 shows a number of mutations at the interface and length of interface between a given protein-protein interaction pair. Such that, it does not provide what mutation(s) in a given PPI interface is found in each LC cohort. For example, it fails to provide whether MAD1 R558H and TPRN H550Q mutations are found significantly in each LC cohort.

      We appreciate your careful review. In Supplementary Table 11, we have provided significant onco_PPI data for each LC cohort, focusing on enriched mutations at the interface of two proteins. Our emphasis lies on onco_PPI rather than individual mutations, as any mutation occurring at the interface could potentially influence the function of the protein complex. Thus, our Supplementary Table 11 exclusively displays the onco_PPI rather than mutations. MAD1 R558H and TPRN H550Q were identified through onco_PPI analysis, and subsequent extensive literature research led us to focus specifically on these mutations.

      (7) Figure 7c and d are simulation data not from an actual binding assay. The authors should perform a biochemical binding assay with proteins or show that the mutation significantly alters the interaction to support the conclusion.

      We appreciate your suggestion. The relevant experiments are currently in progress, and we anticipate presenting the corresponding data in a subsequent study.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript from Zhang et al. utilizes a multi-omics approach to analyze lung adenocarcinoma cases in female never smokers from the Xuanwei area (XWLC cohort) compared with cases associated with smoking or other endogenous factors to identify mutational signatures and proteome changes in lung cancers associated with air pollution. Mutational signature analysis revealed a mutation hotspot, EGFR-G719X, potentially associated with BaP exposure, in 20% of the XWLC cohort. This correlated with predicted MAPK pathway activations and worse outcomes relative to other EGFR mutations. Multi-omics clustering, including RNA-seq, proteomics, and phosphoproteomics identified 4 clusters with the XWLC cohort, with additional feature analysis pathway activation, genetic differences, and radiomic features to investigate clinical diagnostic and therapeutic strategy potential for each subgroup. The study, which nicely combines multi-modal omics, presents potentially important findings, that could inform clinicians with enhanced diagnosis and therapeutic strategies for more personalized or targeted treatments in lung adenocarcinoma associated with air pollution. The authors successfully identify four distinct clusters with the XWLC cohort, with distinct diagnostic characteristics and potential targets. However, many validating experiments must be performed, and data supporting BaP exposure linkage to XWLC subtypes is suggestive but incomplete to conclusively support this claim. Thus, while the manuscript presents important findings with the potential for significant clinical impact, the data presented are incomplete in supporting some of the claims and would benefit from validation experiments.

      Strengths:

      Integration of omics data from multimodalities is a tremendous strength of the manuscript, allowing for cross-modal comparison/validation of results, functional pathway analysis, and a wealth of data to identify clinically relevant case clusters at the transcriptomic, translational, and post-translational levels. The inclusion of phosphoproteomics is an additional strength, as many pathways are functional and therefore biologically relevant actions center around activation of proteins and effectors via kinase and phosphatase activity without necessarily altering the expression of the genes or proteins.

      Clustering analysis provides clinically relevant information with strong therapeutic potential both from a diagnostic and treatment perspective. This is bolstered by the individual microbiota, radiographic, wound healing, outcomes, and other functional analyses to further characterize these distinct subtypes.

      Visually the figures are well-designed and presented and for the most part easy to follow. Summary figures/histograms of proteogenomic data, and specifically highlighted genes/proteins are well presented.

      Molecular dynamics simulations and 3D binding analysis are nice additions.

      While I don't necessarily agree with the authors' interpretation of the microbiota data, the experiment and results are very interesting, and clustering information can be gleaned from this data.

      Weaknesses:

      (1) Statistical methods for assessing significance may not always be appropriate.

      We appreciate your suggestion. We have revised statistical test results in the panels including: Fig. 3b,e and g.

      (2) Necessary validating experiments are lacking for some of the major conclusions of the paper.

      Thank you for raising this point. However, we respectfully choose not to comment on this matter at present.

      (3) Many of the conclusions are based on correlative or suggestive results, and the data is not always substantive to support them.

      Thank you for raising this point. However, we respectfully choose not to comment on this matter at present.

      (4) Experimental design is not always appropriate, sometimes lacking necessary controls or large disparity in sample sizes.

      Thank you for raising this point. However, we respectfully choose not to comment on this matter at present.

      (5) Conclusions are sometimes overstated without validating measures, such as in BaP exposure association with the identified hotspot, kinase activation analysis, or the EMT function.

      Thank you for raising this point. However, we respectfully choose not to comment on this matter at present.

      Reviewer #1 (Recommendations For The Authors):

      (1) Please provide a justification for why only females were included in the study. I am concerned that the results obtained in this study can not be generalized as only females were included.

      We appreciate your suggestion. Lung cancer in never smokers (LCINS) accounts for approximately 25% of lung cancer cases (15% of lung cancer in men and 53% in women) (Parkin et al., 2005). Currently, the etiology and mechanisms of LCINS are not clear. Globally, LCINS shows remarkable gender and geographic variations, occurring more frequently among Asian women (Bray et al., 2018). Indoor coal burning for heating and cooking has been implicated as a risk factor for Chinese women, as they spend more time indoors (Mumford et al., 1987). Among men, the proportion of never smokers is lower, with less regional variation, and lung cancer in males is frequently caused by smoking. Thus, to better reveal the etiology and molecular mechanisms of LCINS, we collected data exclusively from female LCINS patients in the Xuanwei area, excluding potential confounding factors such as hormonal or smoking status. Our study specifically aims to uncover the etiology and mechanisms of LCINS in female patients, with future research planned to verify whether our conclusions can be generalized to LCINS in male patients.

      (2) "Therefore, the XWLC and TSLC cohorts are more explicitly influenced by environmental carcinogens, while the TNLC and CNLC cohorts may be more affected by age or endogenous risk factors." This statement in the results (starting line 142) does not have adequate support from the results. First, the average age in the 4 cohorts does not seem to be very different to me based on Figure 1b. if they are different, please provide statistical test results. Please make sure this statement is supported by other results, otherwise, I would recommend excluding it from the manuscript.

      We appreciate your suggestion. To gain biological insights, we frequently associate mutational signatures with factors such as age, defective DNA mismatch repair, or environmental exposures. These remain associations rather than causation. Thus, we agree with the suggestion to weaken the conclusion as follows:

      “Generally, exposure to tobacco smoking carcinogens (COSMIC signature 4) and chemicals such as BaP (Kucab signatures 49 and 20) were identified as the most significant contributing factors in both the XWLC and TSLC cohorts (Fig. 1f and 1g). In contrast, defective DNA mismatch repair (COSMIC signature ID: SBS6) was identified as the major contributor in both the TNLC and CNLC cohorts (Fig. 1h and 1i), with no potential chemicals identified based on signature similarities. Therefore, the XWLC and TSLC cohorts appear to be more explicitly associated with environmental carcinogens, while the TNLC and CNLC cohorts may be more associated with defective DNA mismatch repair processes.”

      (3) Please provide statistical test results in this subsection "The EGFR-G719X mutation, which is a hotspot associated with BaP exposure, possesses distinctive biological features " (Line 203) showing that the number of G719X is significantly different in XWLC.

      We appreciate your suggestion. Two-sided Fisher’s test was used to calculate p-values, which are labeled in Figure 3b.

      (4) "Analysis of overall survival and progression-free interval (PFI) revealed that patients with the G719X mutation had worse outcomes compared to other EGFR mutation subtypes " This statement (starting Line 232) should be supported by literature data.

      We appreciate your suggestion.

      In the Watanabe et al. post-hoc analysis, patients with the G719 mutation had significantly shorter OS with gefitinib compared to patients with the common mutations (Watanabe et al., 2014). We revised the sentences as following:

      “Analysis of overall survival and progression-free interval (PFI) revealed that patients with the G719X mutation had worse outcomes compared to other EGFR mutation subtypes (Fig. 3j and 3k) which was consistent with a previous study(Watanabe et al., 2014).”

      (5) I would suggest changing this statement to a "suggestion" as there is no experimental support for this, and mentioning that this requires further experimental validation with the suggested drugs "Therefore, a promising approach to overcome resistance in tumors with this mutation could involve combining afatinib, which targets activated EGFR, with FDA-approved drugs that specifically target the activated kinases associated with G719X. " (Line 260).

      We appreciate your suggestion. We change the sentences as following:

      "Therefore, we propose a potential approach to overcoming resistance in tumors with this mutation, which could involve combining afatinib, targeting activated EGFR, with FDA-approved drugs that specifically target the activated kinases associated with G719X. "

      (6) It is not clear to me how PPIs were integrated with missense. Please clarify the method.

      We appreciate your suggestion. To identify interactions enriched with missense mutations, we constructed mutation-associated protein–protein interactomes (PPIs). Initially, we downloaded protein-protein interactomes from Interactome INSIDER (v.2018.2) (Meyer et al., 2018). Subsequently, we identified interfaces carrying missense mutations by mapping mutation sites to PPI interface genomic coordinates using bedtools (v2.25.0)(Quinlan and Hall, 2010). Finally, we defined oncoPPI as those PPIs significantly enriched in interface mutations in either of the two protein-binding partners across individuals. For more details, please refer to the methods sections “Building mutation-associated protein–protein interactomes” and “Significance test of PPI interface mutations.”

      Reviewer #2 (Recommendations For The Authors):

      Regarding the tumor microbiota composition, it is not clear what the significance of these results would be. Are the specific microbiota associated with MC-IV more pathogenic than other species found in other subtypes? What are the unique features of these MC-IV microbiota? If these are difficult to address, this section could be removed from the manuscript.

      We appreciate your suggestion. This section is removed from the manuscript.

      Regarding the radiomic data section (Figure 6d and Extended Figure 6d), more description about the eight and five features (that are different between MC-II and others) would be helpful to better understand the importance and significance of these data.

      We appreciate your suggestion. We have added the description as following: “Features such as median and mean reflect average gray level intensity and Idmn and Gray Level Non-Uniformity measure the variability of gray-level intensity values in the image, with a higher value indicating greater heterogeneity in intensity values. These results suggest a denser and more heterogeneous image in the MC-II subtype.”

      Other minor comments:

      (1) If EGFR G719X is a known hotspot mutation associated with BaP, please cite previous literature.

      We appreciate your suggestion. Upon careful retrieval using "G719X" and "BaP" as keywords, we did not find previous literature discussing G719X as a known hotspot mutation associated with BaP.

      (2) In Figure 1d, it should be clearly written in the legend that tumor (T) and normal (N) tissue were analyzed.

      We appreciate your suggestion. We have clarified the figure legend of Figure 1d.

      (3) In Figure 1m, it is not obvious that EGFR pY1173 and pY1068 are more abundant in the Bap+S9 sample. Total EGFR bands are very faint. These western blots should be repeated and quantified.

      We appreciate your suggestion. We have removed Fig. 1m. After identifying the antibody with satisfactory performance, we will provide the revised results.

      (4) In Figure 2d, aren't the EGFR E746__A750del mutations more frequently found in CNLC, TSLC, and TNLC? (which is opposite to what the authors wrote in the text).

      We appreciate your suggestion. This mistake has been corrected.

      (5) In Figure 7f-i and Ext Figure 8, Does "CK" mean empty vector control? Then, it would be changed to "EV".

      We appreciate your suggestion. This mistake has been corrected.

      Reviewer #3 (Recommendations For The Authors):

      Methods:

      While previous work was referenced, a description of proteomics methods should still include: instrumentation, acquisition method, all software packages used, method for protein identification, method for protein quantification, how FDR was maintained for identification/quantification, definition of differentially expressed proteins, whether multiple testing correction was performed and if so what method.

      We appreciate your suggestion. We revised the description of label-free mass spectrometry methods accordingly.

      The paper would greatly benefit from brief methodological explanations throughout, as all methods are currently exclusively found in the supplementary information. This severely hampers the readability of the manuscript.

      Thank you for raising this point. However, we respectfully choose not to comment on this matter at present.

      Suggestions Throughout

      The paper would greatly benefit from proofreading/editing

      Line 157-158/Figure 1J for CYP1A1 displays protein concentrations while Figure 1K for AhR shows mRNA. Why this discrepancy? It would be preferable to show both mRNA and protein levels for both CYP1A1 and AhR. Also, there is a large discrepancy in the "n" between the normal and tumor groups, which makes the statistical comparison challenging. The AhR data is therefore unconvincing, and additional protein data is suggested. Thus the claim of significantly elevated AhR and CYP1A1 levels in tumors is not sufficiently supported and requires further investigation, both mRNA and protein, and with similarly sized sample groups.

      We appreciate your suggestion. We have thoroughly edited the revised manuscript, with all changes marked accordingly. Compared to mRNA level assessment, protein abundance is a better indicator of gene expression. Therefore, we reanalyzed the protein level of AhR for comparison and found no significant differences (Figure 1k). Additionally, the samples sequenced by mRNA-seq were not entirely consistent with those sequenced by label-free proteomics. The samples analyzed by different methods are shown in Figure 1d.

      Line 159 Figure 1I There is no control for the data serum data presented here. What are the serum levels for individuals not residing in the Xuanwei? It is unclear whether this represents elevated BPDE serum levels without appropriate controls. Thus nothing insightful can be derived from this data.

      We appreciate your suggestion. We have deleted the results concerning BPDE serum detection in the revised manuscript.

      Line 164 The statement "sites such as Y1173 and Y1068 of EGFR were more phosphorylated in BaP treated cells" is not sufficiently supported by the presented data and cannot be made. Figure 1M has no quantitation, no indication of "n" or whether this represents a single experiment or one validated with repeating. The western blot is also cropped with no indication of molecular weight or antibody specificity. This data is NOT convincing. The antibody signal is very weak, and not convincing with cropped blots. An updated figure, with an uncropped blot, and quantitation with multiple n's and statistical comparison is required. I am not sure the Wilcoxon rank sum test is appropriate to test significance in j-l. The null hypothesis should not be equal medians but equal means based on the experimental design.

      We appreciate your suggestion. We have removed Fig. 1m. After identifying the antibody with satisfactory performance, we will provide the revised results.

      Line 181 phrase "significant differences" should not be used unless making a claim about statistical significance.

      We appreciate your suggestion. We change “significant differences” to “noticeable differences”.

      Line 197: "The blood serum assay provided support..." As noted above this claim is not sufficiently supported by the presented data and requires more complete investigation.

      We appreciate your suggestion. This conclusion has been deleted in the revised manuscript.

      Line 219: Requires proofreading/editing.

      We appreciate your suggestion. We have thoroughly edited the revised manuscript, with all changes marked accordingly.

      Line 220: appears to have a typo and should read GGGC>GTGC

      We appreciate your suggestion. This mistake has been corrected in the revised manuscript.

      Line 223/224 Figure 3e-h. Again there is a large disparity between the n's of each group. Despite the WT having the highest frequency in the XWLC study population, it has only n=5 when comparing the protein and phosphosite for MAPKs. There is also no explanation for what the graph symbols indicate, what statistical test was performed to determine the statistical significance of the presented differences, and between which specific groups that significance exists. Thus, it is challenging to ascertain whether there are relevant differences in the MAPK signaling components.

      We appreciate your suggestion. We added the description of “N, number of tumor samples containing corresponding EGFR mutation” to the figure legend. p-values were calculated with a two-tailed Wilcoxon rank sum test, and p<0.05 was labeled on Figures 3e-i.

      Figure 3I Good figure. However, it would be beneficial to provide validation with Western Blotting for a few of these substrates using pospho-specific antibodies. It is suggested that this experiment be added.

      We appreciate your suggestion. Figure 3I showed the comparison of patients’ ages among subtypes. I guess you mean Figure 3g and Figure 3h. The relevant experiments are currently underway, and we will provide the corresponding data in the next revised version.

      Figure 4b. Very compelling figure.

      We appreciate your suggestion.

      Line 276: The AhR and CYP1A1 data presented earlier was not convincing, and CYP1A1 and AhR cannot be responsibly used as indicators of BaP activity based on potential. This is not an appropriate application.

      We appreciate your suggestion. CYP1A1 and AhR are two key regulators involved in BaP metabolism and signaling transduction, respectively. However, after examining the protein expression of AhR between tumor and normal tissues, we found no significant differences (Fig. 1k) and CYP1A1 has been proven to be highly expressed in tumor samples (Fig. 1j). Thus, we mainly examined the expression of CYP1A1 among the four subgroups. We changed our description as follows:

      “As CYP1A1 is a key regulator involved in BaP metabolism and has been proven to be highly expressed in tumor samples (Fig. 1j), we next examined the expression of CYP1A1 among the four subgroups to evaluate their associations with air pollution.”

      Figure 4d. Here it is AhR protein used rather than mRNA measured earlier. What is the explanation for this change?

      We appreciate your suggestion. As there was no significant differences of the protein expression of AhR between tumor and normal tissues (Fig. 1k), we deleted the expression comparison of AhR among subtypes.

      Line 281 "Moderately elevated expression level of AhR" is not supported by the presented data and should be removed.

      We appreciate your suggestion. We have deleted the result of comparison of AhR among subtypes.

      Figure 4: There is no indication or explanation of how the protein abundance is being measured. Is this from the proteomics (MS) approaches, by ELISA or by Western? If it is simply by MS then validation by another method is preferable. The data presented in Figure 4 do not adequately support the claim that MC-II subtype is more strongly associated with BaP exposure. What statistical test is used in 4F? Why is the n in the MC-II group, which is the highlighted group of interest nearly double the other groups?

      We appreciate your suggestion. Fig. 4e is derived from the proteomics data. The two-tailed Wilcoxon rank sum test was used to calculate p-values in panels c and e.

      Figure 4g: At least one or two of these should be validated by Western Blot or targeted MS.

      We appreciate your suggestion. The relevant experiments are currently underway, and we will provide the corresponding data in the next revised version.

      Figure 5a: Assuming these were also measured via proteomic analysis, how do their expression patterns compare across the different omics modes?

      Thank you for your suggestion. Figure 5 integrates transcriptomics (19182 genes), proteomics (9152 genes), and phosphoproteomics (5733 genes) data. In general, we utilized transcriptomics data to identify unique or distinct pathways among subgroups. Furthermore, proteomics and phosphoproteomics data were employed to validate key gene expressions, as they encompass fewer genes compared to transcriptomics data.

      For instance, in Fig. 5a-d, we observed higher expression levels of mesenchymal markers such as VIM, FN1, TWIST2, SNAI2, ZEB1, ZEB2, and others in the MC-IV subtype using transcriptomics data (Fig. 5a). Additionally, we calculated epithelial-mesenchymal transition (EMT) scores using the ssGSEA enrichment method based on protein levels and conducted GSEA analysis using transcriptomics data (Fig. 5b). Furthermore, using proteomics data, we evaluated Fibronectin (FN1), an EMT marker that promotes the dissociation, migration, and invasion of epithelial cells, at the protein level (Fig. 5c), and β-Catenin, a key regulator in initiating EMT, also at the protein level (Fig. 5d). Overall, our findings indicate that the MC-IV subtype exhibits an enhanced EMT capability, which may contribute to the high malignancy observed in this subtype.

      Line 314: Not compared with MCI, which appeared to be much lower at the mRNA level. Is there an explanation for this difference?

      We appreciate your suggestion. FN1 expression is lowest in MCI at the protein level (Fig. 5c). However, at the transcriptome level, FN1 expression is lowest in the MCIII subtype (Fig. 5a). You may wonder why these results are inconsistent. Discrepancies between mRNA and protein expression levels are common, and previous study showed that about 20% genes had a statistically significant correlation between protein and mRNA expression in lung adenocarcinomas (Chen et al., 2002). Post-transcriptional mechanisms, including protein translation, post-translational modification, and degradation, may influence the level of a protein present in a given cell or tissue. In this situation, we focused on identifying distinct biological pathways in each subgroup, supported by multi-omics data.

      Line 321: MC-IV *potentially* possesses an enhanced EMT capability. This statement cannot be conclusively made.

      We appreciate your suggestion. We changed our description as: “Collectively, our findings demonstrate that the MC-IV subtype is associated with enhanced EMT capability, which may contribute to the high malignancy observed in this subtype.”

      Lines 325 and 327 indicated dysregulation of cell cycle processes and activation of CDK1 and CDK2 pathways based on KSEA analysis which is closely linked to cell cycle regulation as two separate pieces of evidence. However, these are both drawn from the phosphoproteomics, and likely indicate conclusions drawn from the same phosphosite data. Said another way, if phosphosite data indicates differences in kinases linked to cell cycle regulation then you would also expect phosphosite data to indicate dysregulation of cell cycle.

      We appreciate your suggestion. You mentioned that Fig. 4f and Fig. 5e redundantly prove that the CDK1 and CDK2 pathways are dysregulated. However, KSEA analysis in Fig. 4f estimates changes in kinase activity based on the collective phosphorylation changes of its identified substrates (Wiredja et al., 2017). In contrast, Fig. 5e directly evaluates the abundance of protein and phosphosite levels of CDK1 and CDK2 across subtypes. These analyses mutually confirm each other rather than being redundant.

      Line 413/Figure 6b: While there may be a trend displayed by the figure, it is not convincing enough to state that MC-IV shows a conclusively distinguishable bacterial composition. Too much variability exists within groups MC-II and MC-III. However, it does show that MC-IV and MC-II have consistent composition within their groups, and that is interesting.

      We appreciate your suggestion. We have deleted the analysis of bacterial composition across subtypes.

      Figure 6: Overall very nice figure, with intriguing diagnostic potential. See the above note on 6a-b interpretation.

      We appreciate your suggestion. We have deleted the analysis of bacterial composition across subtypes, including Fig. 6a-6c.

      Figure 7c-f better labeling of the panels will aid reader comprehension.

      We appreciate your suggestion. Necessary labeling has been added to Fig. 7c-f to enhance comprehension.

      Figure 7 panel order is confusing, switching from right to left to vertical. Rearranging to either left to right or vertical would help orient readers.

      We appreciate your suggestion. We have adjusted the order of Fig. 7 and extended Fig. 8 panel.

      Figure 7 legend i: should read Cell colony* assay

      We appreciate your suggestion. We have corrected this mistake in the revised manuscript.

      The Discussion is very brief. While it includes a discussion of the potential impact of the study, it does not include an analysis of the caveats/drawbacks of the study. A more thorough discussion of other studies focusing on the impacts of BaP exposure is also suggested as this was a highlighted point by the authors.

      We appreciate your suggestion. we have added discussion about the associations between BaP exposure and lung cancer and also talked about the shortcomings of our study as followings:

      “Mechanistically, Qing Wang showed that BaP induces lung carcinogenesis, characterized by increased inflammatory cytokines, and cell proliferative markers, while decreasing antioxidant levels, and apoptotic protein expression(Wang et al., 2020). In our study, we used clinical samples and linked the mutational signatures of XWLC to the chemical compound BaP, which advanced the etiology and mechanism of air-pollution-induced lung cancer. In our study, several limitations must be acknowledged. Firstly, although our multi-omics approach provided a comprehensive analysis of the subtypes and their unique biological pathways, the sample size for each subtype was relatively small. This limitation may affect the robustness of the clustering results and the identified subtype-specific pathways. Larger cohort studies are necessary to confirm these findings and refine the subtype classifications. Secondly, although our study advanced the understanding of air-pollution-induced lung cancer by using clinical samples, the reliance on epidemiological data in previous studies introduces potential confounding factors. Our findings should be interpreted with caution, and further mechanistic studies are warranted to establish causal relationships more definitively. Thirdly, our in silico analysis suggested potential approach to drug resistence in G719X mutations. However, these predictions need to be validated through extensive in vitro and in vivo experiments. The reliance on computational models without experimental confirmation may limit the clinical applicability of these findings.”

      References:

      Bailey, M. H., Tokheim, C., Porta-Pardo, E., Sengupta, S., Bertrand, D., Weerasinghe, A., Colaprico, A., Wendl, M. C., Kim, J., Reardon, B., et al. (2018). Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 173, 371-385 e318.

      Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R. L., Torre, L. A., and Jemal, A. (2018). Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 68, 394-424.

      Chen, G., Gharib, T. G., Huang, C. C., Taylor, J. M., Misek, D. E., Kardia, S. L., Giordano, T. J., Iannettoni, M. D., Orringer, M. B., Hanash, S. M., and Beer, D. G. (2002). Discordant protein and mRNA expression in lung adenocarcinomas. Mol Cell Proteomics 1, 304-313.

      Meyer, M. J., Beltran, J. F., Liang, S., Fragoza, R., Rumack, A., Liang, J., Wei, X., and Yu, H. (2018). Interactome INSIDER: a structural interactome browser for genomic studies. Nat Methods 15, 107-114.

      Mumford, J. L., He, X. Z., Chapman, R. S., Cao, S. R., Harris, D. B., Li, X. M., Xian, Y. L., Jiang, W. Z., Xu, C. W., Chuang, J. C., and et al. (1987). Lung cancer and indoor air pollution in Xuan Wei, China. Science 235, 217-220.

      Parkin, D. M., Bray, F., Ferlay, J., and Pisani, P. (2005). Global cancer statistics, 2002. CA Cancer J Clin 55, 74-108.

      Quinlan, A. R., and Hall, I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841-842.

      Sanchez-Vega, F., Mina, M., Armenia, J., Chatila, W. K., Luna, A., La, K. C., Dimitriadoy, S., Liu, D. L., Kantheti, H. S., Saghafinia, S., et al. (2018). Oncogenic Signaling Pathways in The Cancer Genome Atlas. Cell 173, 321-337 e310.

      Wang, Q., Zhang, L., Huang, M., Zheng, Y., and Zheng, K. (2020). Immunomodulatory Effect of Eriocitrin in Experimental Animals with Benzo(a)Pyrene-induced Lung Carcinogenesis. J Environ Pathol Toxicol Oncol 39, 137-147.

      Watanabe, S., Minegishi, Y., Yoshizawa, H., Maemondo, M., Inoue, A., Sugawara, S., Isobe, H., Harada, M., Ishii, Y., Gemma, A., et al. (2014). Effectiveness of gefitinib against non-small-cell lung cancer with the uncommon EGFR mutations G719X and L861Q. J Thorac Oncol 9, 189-194.

      Wiredja, D. D., Koyuturk, M., and Chance, M. R. (2017). The KSEA App: a web-based tool for kinase activity inference from quantitative phosphoproteomics. Bioinformatics 33, 3489-3491.

      Zhang, H., Liu, C., Li, L., Feng, X., Wang, Q., Li, J., Xu, S., Wang, S., Yang, Q., Shen, Z., et al. (2021). Genomic evidence of lung carcinogenesis associated with coal smoke in Xuanwei area, China. Natl Sci Rev 8, nwab152.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reports:

      In the public reports there is only one point we would like to discuss. It concerns our use of a computational model to analyse spatial tumour growth. Citing from the eLife assessment, which reflects several comments of the referees:

      The paper uses published data and a proposed cell-based model to understand how growth and death mechanisms lead to the observed data. This work provides an important insight into the early stages of tumour development. From the work provided here, the results are solid, showing a thorough analysis. However, the work has not fully specified the model, which can lead to some questions around the model’s suitability.

      The observables we use to determine the (i) growth mode and the (ii) dispersion of cells are modelindependent. The method to determine the (iii) rate of cell death does not use a spatial model. Throughout, our computational model of spatial growth is not used to analyze data. Instead, it is used to check that the observables we use can actually discriminate between different growth modes given the limitations of the data. We have expanded the description of the computational model in the revised version, and have released our code on Github. However, the conclusions we reach do not rely on a computational model. Instead, where we estimate parameters, we use population dynamics as described in section S5. The other observables are parameter free and model-independent. We view this as a strength of our approach.

      Recommendations for the authors:

      Reviewer #1:

      (1.1) In Figure 1, the data presented by Ling et al. demonstrate a distinctive “comb” pattern. While this pattern diverges from the conventional observations associated with simulated surface growth, it also differs from the simulated volume growth pattern. Is this discrepancy attributable to insufficient data? Alternatively, could the emergence of such a comb-like structure be feasible in scenarios featuring multiple growth centers, wherein clones congregate into spatial clusters?

      We are unsure what you are referring to. One possibility is you refer to the honey-comb structure formed by the samples of the Ling et al. data shown in Fig. 1A of the main text. This is an artefact arising from the cutting of the histological cut into four quadrants, see Fig. S1 in the SI of Ling et al. The perceived horizontal and vertical “white lines” in our Fig. 1A stems from the lack of samples near the edges of these quadrants. We have added this information to the figure caption.

      An alternative is you are referring to the peaks in Fig 2A of the main text. The three of these peaks indeed stem from individual clones. We have placed additional figures in the SI (S2 B and S2 C) to disentangle the contribution from different clones. The peaks have a simple explanation: each clone contributes the same weight to the histogram. If a clone only has few offspring, this statistical weight is concentrated on a few angles only, see SI Figure S2 B.

      (1.2) I am not sure why there are two sections about “Methods” in the main text: Line 50 as well as Line 293. Furthermore, the methods outlined in the main paper lack the essential details necessary for readers to navigate through the critical aspects of their analysis. While these details are provided in the Supplementary Information, they are not adequately referenced within the methods section of the main text. I would recommend that the authors revise the method sections of the main text to include pertinent descriptions of key concepts or innovations, while also directing readers to the corresponding supplementary method section for further elucidation.

      We have merged the Section “Materials and Methods” at the end of the main text with the SI description of the data in SI 4.2 and placed a reference to this material in the main body.

      (1.3) The impact of the particular push method (proposed in the model) on the resultant spatial arrangement of clones remains unclear. For instance, it’s conceivable that employing a different pushing method (for example, with more strict constraints on direction) could yield a varied pattern of spatial diversity. Furthermore, there is ambiguity regarding the criteria for determining the sequence of the queue housing overlapping cells.

      Regarding the off-lattice dynamics we use, there are indeed many variants one could use. In nonexhaustive trials, we found that the details of the off-lattice dynamics did not affect the results. The reason may be that at each computational step, each cell only moves a very small amount, and differences in the dynamics tend to average out over time.

      We deliberately do not give constraints on the direction. Such constraints emerge in lattice-based models (when preferred directions arise from the lattice symmetry), but these are artifacts of the lattice.

      At cell division the offspring is placed in a random direction next to the parent regardless of whether this introduces an overlap. Cells then push each other along the axis connecting their two centers of mass – unlike in lattice based models a sequence of pushes does not propagate through the tumor straight away but sets off of a cascade of pushes. Equal pushing of two cells (i.e. two initial displacements as opposed to pushing one of the two) results in the same patterns of directed, low dispersion surface and undirected, high dispersion volume growth but is much harder computationally as it reintroduces overlaps that have been resolved in the previous step.

      We have rewritten the description of the pushing queue in the SI Section 1. The choice of the pushing sequence is somewhat arbitrary but we found that it also has no noticable effect on the growth mode. Maybe putting it in contrast to depth-first approaches helps to illustrate this: We tried two queueing schemes for iterating through overlapping cells, width-first and depth-first. In both cases, we begin by scanning a given cell’s (the root’s) neighborhood for overlaps and shuffle the list of overlapping neighbours. In a width-first approach we then add this list to the queue. Subsequent iterations append their lists of overlapping cells to the queue, such that we always resolve overlaps within the neighborhood of the root first. A depth-first approach follows a sequence of pushes by immediately checking a pushed cell’s neighborhood for new overlaps and adding these to the front of the queue (which works more like a stack then). This can be efficiently implemented by recursion but has no noticeable performance advantage and results in the same patterns of directed, low dispersion surface and undirected, high dispersion volume growth. In our opinion the width-first approach of first resolving overlaps in the immediate neighborhood is more intuitive, which is why we adopted it for our simulation model.

      (1.4) For the example presented in S5.1, how can the author identify from genomic data that mutation 3 does not replace its ancestral clade mutation 2? In other words, if mutation 2, 3 and 4 are linked meaning clone 4 survives but 2 and 3 dies, how does one know if clone 3 dies before clone 2? I understand that this is a conceptual example, but if one cannot identify this situation from the real data, how can the clade turnover be computed?

      Thank you for this comment, which points to an error of ours in the turnover example of the SI: Clade 3 does in fact replace 2 and contributes to the turnover! (The algorithm correctly annotated clade 3 as orphaned and computes a turnover of 3/15 for this example). We have corrected this.

      In this example, it does not matter for the clade turnover whether clone 3 dies before clone 2. As long as its ancestor (clone 2) becomes extinct it adds to the clade turnover. The term “replaces” applies to the clade of 3 which has a surviving subclone and thereby eventually replaces clade 2. The clade turnover its solely based on the presence of the mutations (which define their clade) and not on the individual clones.

      (1.5) After reviewing reference 24 (Li et al.), I noticed that the assertions made therein contradict the findings presented in S3 (Mutation Density on Rings). Specifically, Li et al. state that “peripheral regions not only accumulated more mutations, but also contained more changes in genes related to cell proliferation and cell cycle function” (Page 6) and “Phylogenetic trees show that branch lengths vary greatly with the long-branched subclones tending to occur in peripheral regions” (Page 4). However, upon re-analysis of their data, the authors demonstrated a decrease in mutation density near the surface. It is crucial to comprehend the underlying cause of such a disparity.

      The reason for this disparity is the way Li et al. labelled samples as belonging to peripheral or central regions of the tumour. We have added a new figure in the SI to show this: Fig. S14 shows the number of mutations found in samples of Li et al. against their distances from the centre, along with the classification of samples as center/periphery given in Li et al. In the case of tumor T1, the classification of a sample in reference Li et al. does not agree with the distance from the center: samples classified as core are often more distant from the center than those classified as peripheral. Furthermore, Lewinsohn et al. (see below) show in their Fig. 5 that samples classified as ‘center’ by Li all fall into a single clade, and we believe this affects all results derived from this classification. For this reason, we do not consider the classification in reference 24 (Li et al.) further. We now briefly discuss this in Section S3.3.

      (1.6) The authors consider coinciding mutations to occur when offspring clades align with an ancestral clade. Nevertheless, since multiple mutations can arise simultaneously in a single generation (such as kataegis), it becomes essential to discern its impact on clade turnover and, consequently, the estimation of d/b.

      The mutational signatures found here show no sign of kataegis. Also, the number of polymorphic sites in the whole-exome data is small and the mutations are uniformly spread across the exome. The point is well taken, however, the method requires single mutations per generation. In practice, this can be achieved by subsampling a random part of the genome or exome (see [45]). We tested this point by processing the data from only a fraction of the exome; this did not change the results. In particular, Figure S30 shows the turnover-based inference for different subsampling rates L of the Ling et al. data. Subsampling of sites reduces the exome-wide mutation rate, the inferred rate scales linearly with L, as expected.

      (1.7) I could not understand Step 2 in Section S2.1, an illustration may be helpful.

      We have added figure S2 explaining the directional angle algorithm to Section S2.1 in the supplementary information.

      (1.8) Figure S2, does a large rhoc lead to volume growth rather than surface growth, not the other way around?

      Thank you for catching this mix-up!

      Reviewer #2

      I do have a few minor comments/questions, but I am confident the authors will be able to address them appropriately.

      (2.1) Line 56: I am not sure what the units of “average read depth 74X” is in terms of SI units?

      This number gives the number of sequence reads covering a particular nucleotide and is dimensionless. We have added this information.

      (2.2) Lines 63 - 68: I am unsure what is meant by the terms “T1 of ca.” and “T2 of ca.”. Can these also be explained/defined please?

      These refer to the approximate (circa) diameters of tumor 1 and tumor 2 in the data by Li et al. We have expanded the abbreviations.

      (2.3) Line 69: I would like to see a more extensive description of the cell-based model here in the main text, such as how do the cells move. Moreover, do cells have a finite reach in space, do they have a volume/area?

      We have expanded the model description in the main body of the paper and placed information there that previously was only in the SI.

      (2.4) Line 76: You have said cells can “push” one another in your model. Do they also “pull” one another? Cell adhesion is know to contribute to tumour integrity - so this seems important for a model of this nature.

      We have not implemented adhesive forces between pairs of cells so far. This would cause a higher pressure under cell growth (which can have important physiological consequences). However, the hard potential enforcing a distance between adjacent cells would still lead to cells pushing each other apart under population growth, so we expect to see the dispersion effect we discuss even when there is adhesion.

      (2.5) Line 80-81: “due to lack of nutrient”. Is nutrient included in this work? It is my understanding it is not. No problem if so, it is just that this line makes it seem like it is and important. If it is not, the authors should mention this in the same sentence.

      Thank you for pointing out this source of misunderstanding, your understanding is correct and we have modified the text to remove the ambiguity.

      (2.6) Line 94-95: Since you are interested in tissue growth, recent work has indicated how the cell boundary (and therefore tissue boundary) description influences growth. Please also be sure to indicate this when you describe the model.

      We presume you refer to the recent paper by Lewinsohn et al. (Nature Ecology and Evolution, 2023), which reports a phylogenetic analysis based on the Li et al. data. Lewinsohn et al. find that cells near the tumour boundary grow significantly faster than those in the tumour’s core. This is at variance with what we find; we were not aware of this paper at the time of submission. We now refer to this paper in the main text, and also have included a new section S3.4 in the SI accounting for this discrepancy. If you refer to a different paper, please let us know.

      Briefly, we repeat the analysis of Lewinsohn et al., using their algorithm on artificial data generated by our model under volume growth. Samples were placed precisely like they were placed in the tumor analyzed by Li et al. We find that, even though the data was generated by volume growth, the algorithm of Lewinsohn et al. finds a signal of surface growth, in many cases even stronger compared to the signal which Lewinsohn et al. find in the empirical data. We have added subsection S3.4 with new figure S15 in the Supplementary Information.

      (2.7) Line 107: “thus no evidence for enhanced cell growth near the edge of the tumour”. It is unclear to me how this tells us information relative to the tumour edge. It seems to me this is an artifact that at the edge of the tumour, there are less cells to compare with? Could you please expand on this a bit?

      The direction angles tell us if new mutations arise predominantly radially outwards. With this observable, surface growth would lead to a non-uniform distribution of these angles even if we restrict the analysis to samples from the interior of the tumor (which, under surface growth, was once near the surface). So the effect is not linked to fewer cells for comparison. Also, we have checked the direction angles in simulations under different growth modes with the samples placed in the same way as in the data (see Figs. S3 and S4 right panels). We have expanded the text in the main text, section Results accordingly.

      (2.8) I really enjoyed the clear explanation between lines 119 and 122 regarding cell dispersion!

      Thank you!

      (2.9) Figure 2B: Since you are looking at a periodic feature in theta, I would have expected the distribution to be periodic too, and therefore equal at theta=-180=180. Can you explain why it is different, please? Interestingly, you simulated data does seem to obey this!

      The distribution of theta is periodic but the binning and midpoints of bins were chosen badly. We have replotted the diagram with bin boundaries that handle the edge-points -180/180 correctly. Thank you for pointing this out.

      (2.10) Figure 3B: This plot does not have a title. Also, what do the red vertical lines in plots 3B, 3C and 3D indicate?

      We have added the title. The red lines indicate the expectation values of the distributions.

      (2.11) Figure 4: I am unsure how to read the plot in 4B. Also, what does the y-axis represent in 4C and 4D?

      We have added explanations for 4B and have placed the labels for 4C and 4D in the correct position on the y-axes.

      (2.12) Lines 194-199: you discuss your inferred parameters here, but you do not indicate how you inferred these parameters. May you please briefly mention how you inferred these, please?

      These were inferred using the turnover method explained in the paragraph above, we have expanded the information. A full account is given in the SI Section S5.

      (2.13) Line 258-260: “... mutagen (aristolochic acid) found in herbal traditional Chinese medicine and thought to cause liver cancer.” I do not see what this sentence adds to the work. Could you please be clearer with the claim you are making here?

      Mutational signatures allow to infer underlying mutational processes. The strongest signature found in the data is associated with a mutagen that has in the past been used in traditional Chinese medicines. The patients from whom the tumours were biopsied were from China, so past exposure to this potent mutagen is possible. We are not making a big claim here, the mutational signature of aristolochic acid and its cancerogenic nature has been well studied and is referenced here. The result is interesting in our context because in one of the datasets (Li et al.) the signature is present in early (clonal) mutations but absent in later ones, allowing to make inferences from present data on the past. We have added the information that the patients were from China.

      (2.14) In your Supplementary Information, S1, I believe your summation should not be over i, as you state in the following it is over cells within 7 cell radii. Please fix this by possibly defining a set which are those within 7 cell radii.

      We have done this.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study provides an incremental advance to the scavenger receptor field by reporting the crystal structures of the domains of SCARF1 that bind modified LDL such as oxidized LDL and acylated LDL. The crystal packing reveals a new interface for the homodimerization of SCARF1. The authors characterize SCARF1 binding to modified LDL using flow cytometry, ELISA, and fluorescent microscopy. They identify a positively charged surface on the structure that they predict will bind the LDLs, and they support this hypothesis with a number of mutant constructs in binding experiments.

      Strengths:

      The authors have crystallized domains of an understudied scavenger receptor and used the structure to identify a putative binding site for modified LDL particles. An especially interesting set of experiments is the SCARF1 and SCARF2 chimeras, where they confer binding of modified LDLs to SCARF2, a related protein that does not bind modified LDLs, and use show that the key residues in SCARF1 are not conserved in SCARF2.

      Weaknesses:

      While the data largely support the conclusions, the figures describing the structure are cursory and do not provide enough detail to interpret the model or quality of the experimental X-ray structure data. Additionally, many of the flow cytometry experiments lack negative controls for non-specific LDL staining and controls for cell surface expression of the SCARF constructs. In several cases, the authors interpret single data points as increased or decreased affinity, but these statements need dose-response analysis to support them. These deficiencies should be readily addressable by the authors in the revision.

      The paper is a straightforward set of experiments that identify the likely binding site of modified LDL on SCARF1 but adds little in the way of explaining or predicting other binding interactions. That a positively charged surface on the protein could mediate binding to LDL particles is not particularly surprising. This paper would be of greater importance if the authors could explain the specificity of the binding of SCARF1 to the various lipoparticles that it does or does not bind. Incorporating these mutants into an assay for the biological role of SCARF1 would be powerful.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Wang and colleagues provided mechanistic insights into SCARF1 and its interactions with the lipoprotein ligands. The authors reported two crystal structures of the N-terminal fragments of SCARF1 ectodomain (ECD). On the basis of the structural analysis, the authors further investigated the interactions between SCARF1 and modified LDLs using cell-based assays and biochemical experiments. Together with the two structures and supporting data, this work provided new insights into the diverse mechanisms of scavenger receptors and especially the crucial role of SCARF1 in lipid metabolism.

      Strengths:

      The authors started by determining the crystal structures of two fragments of SCARF1 ECD. The superposition of the two high-resolution structures, together with the predicted model by AlphaFold, revealed that the ECD of SCARF1 adopts a long-curved conformation with multiple EGF-like domains arranged in tandem. Non-crystallographic and crystallographic two-fold symmetries were observed in crystals of f1 and f2 respectively, indicating the formation of SCARF1 homodimers. Structural analysis identified critical residues involved in dimerization, which were validated through mutational experiments. In addition, the authors conducted flow cytometry and confocal experiments to characterize cellular interactions of SCARF1 with lipoproteins. The results revealed the vital role of the 133-221aa region in the binding between SCARF1 and modified LDLs. Moreover, four arginine residues were identified as crucial for modified LDL recognition, highlighting the contribution of charge interactions in SCARF1-lipoprotein binding. The lipoprotein binding region is further validated by designing SCARF1/SCARF2 chimeric molecules. Interestingly, the interaction between SCARF1 and modified LDLs could be inhibited by teichoic acid, indicating potential overlap in or sharing of binding sites on SCARF1 ECD.

      The author employed a nice collection of techniques, namely crystallographic, SEC, DLS, flow cytometry, ELISA, and confocal imaging. The experiments are technically sound and the results are clearly written, with a few concerns as outlined below. Overall, this research represents an advancement in the mechanistic investigation of SCARF1 and its interaction with ligands. The role of scavenger receptors is critical in lipid homeostasis, making this work of interest to the eLife readership.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Wang et. al. described the crystal structures of the N-terminal fragments of Scavenger receptor class F member 1 (SCARF1) ectodomains. SCARF1 recognizes modified LDLs, including acetylated LDL and oxidized LDL, and it plays an important role in both innate and adaptive immune responses. They characterized the dimerization of SCARF1 and the interaction of SCARF1 with modified lipoproteins by mutational and biochemical studies. The authors identified the critical residues for dimerization and demonstrated that SCARF1 may function as homodimers. They further characterized the interaction between SCARF1 and LDLs and identified the lipoprotein ligand recognition sites, the highly positively charged areas. Their data suggested that the teichoic acid inhibitors may interact with SCARF1 in the same areas as LDLs.

      Strengths:

      The crystal structures of SCARF1 were high quality. The authors performed extensive site-specific mutagenesis studies using soluble proteins for ELISA assays and surface-expressed proteins for flow cytometry.

      Weaknesses:

      (1) The schematic drawing of human SCARF1 and SCARF2 in Fig 1A did not show the differences between them. It would be useful to have a sequence alignment showing the polymorphic regions.

      The schematic drawing in Fig.1A is to give a brief idea about the two molecules, the sequence alignment may take too much space in the figure. A careful alignment between SCARF1 and SCARF2 can be found in Ref. 24 (Ishii, et al., J Biol Chem, 2002. 277, 39696-702) an also mentioned in p.4.

      (2) The description of structure determination was confusing. The f1 crystal structure was determined by SAD with Pt derivatives. Why did they need molecular replacement with a native data set? The f2 crystal structure was solved by molecular replacement using the structure of the f1 fragment. Why did they need to use EGF-like fragments predicted by AlphaFold as search models?

      The crystal structure of f1 was first determined by SAD using Pt derivatives, but soaking of Pt reduced the resolution of the crystals, therefore we use this structure as a search model for a native data set that had higher resolution for further refinement. For the structural determination of f2, the molecular replacement using f1 structure was not able to show the initial density of the extra region in f2 (residues 133-209), which was missing in f1. Therefore, the EGF-like domains of SCARF1 modeled by AlphaFold were applied as search models for this region (p.18).

      (3) It's interesting to observe that SCARA1 binds modified LDLs in a Ca2+-independent manner. The authors performed the binding assays between SCARF1 and modified LDLs in the presence of Ca2+ or EDTA on Page 9. However, EDTA is not an efficient Ca2+ chelator. The authors should have performed the binding assays in the presence of EGTA instead.

      The binding assays in the presence of EGTA are included in the revised manuscript (Fig. S7) (p.9), which also suggest that SCARA1 binds OxLDL in a Ca2+-independent manner.

      (4) The authors claimed that SCARF1Δ353-415, the deletion of a C-terminal region of the ectodomain, might change the conformation of the molecule and generate hinderance for the C-terminal regions. Why didn't SCARF1Δ222-353 have a similar effect? Could the deletion change the interaction between SCARF1 and the membrane? Is SCARF1Δ353-415 region hydrophobic?

      The truncation mutants were constructed to roughly locate the binding region of lipoproteins on SCARF1, and the overall results showed that the sites might locate at the region of 133-221. Mutant Δ222-353 may also affect the conformation, but it still had binding with OxLDL like wild type, suggesting the binding sites were retained in this mutant. Mutant Δ353-415 showed a reduction of binding, implying that the binding sites might be retained but binding was affected, we think it might be due to the conformational change that could reduce the binding or accessibility of lipoproteins. Since this region locates closer to the membrane, it’s possible that it may change the interaction with the membrane. In the AF model, Δ353-415 region does not seem to be more hydrophobic than other regions (Fig. S2C).

      (5) What was the point of having Figure 8? Showing the SCARF1 homodimers could form two types of dimers on the membrane surface proposed? The authors didn't have any data to support that.

      Fig. 8 shows a potential model of the SCARF1 dimers on the cell surface by combining the structural information from crystals and AF predictions. The two dimers in the figure are identical but with different viewing angles. The lipoprotein binding sites are also indicated (Fig. 8).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors need to show examples of the electron density for both structures.

      Electron density examples of the two structures are shown in Fig. S2A.

      Figure 1)

      The figure does not show enough details of the structure. The text mentions hydrogen-bond and disulfide bonds that stabilize the loops, these should be shown.

      Disulfide bonds of the two structures are shown in Fig. 1.

      Figure 2)

      D) The full gel should be shown.

      E) Rather than just relying on changes in gel filtration elution volumes, the authors do the appropriate experiment and measure the hydrodynamic radius of the WT and mutant ectodomains by DLS. However, they need to show plots of the size distribution, not just mean radial values, in order to show if the sample is monodisperse.

      The full gel and plots of DLS are shown in Fig. S3A-B.

      Figure 3)

      I have concerns about the rigor of the experiments in panels A-D. The authors include a non-transfected control but do not appear to have treated non-transfected cells with the lipoproteins to evaluate the specificity of binding. Every cell binding assay (flow  or confocal) must show the data from non-transfected cells treated with each lipoprotein, as each lipoprotein species could have a unique non-specific binding pattern. The authors show these controls in Figure 6, but these controls are necessary in every experiment.

      In Fig. 3A, since several lipoproteins were included in the figure, we use non-transfected cells without lipoprotein treatment as a negative control. The OxLDL or AcLDL treated non-transfected cells were also used as negative controls and shown in Fig. 3B-C. LDL, HDL or OxHDL may have their own non-specific binding patterns, the treatment of LDL, HDL or OxHDL with the transfected cells all gave negative results (Fig. 3A and D).

      Cell-surface of the SCARF1 variants is a major concern. The constructs the authors use are tagged with a GFP on the cytosolic side. However, the Methods to do indicate if they gate on GFP+ transfected cells for analytical flow. Such gating may have been used because the staining experiments in Figures 3 and 4 show uniform cell populations, whereas the staining done with an anti-SCARF1 Ab in S4 shows most of the cells not expressing the protein on the surface. Please clarify.

      Data for the anti-SCARF1 Ab assay is gated for GFP in the revised Fig. S4, and  the non-transfected cells are included as a control.

      The authors must demonstrate cell-surface staining with an epitope tag on the extracellular side and clarify if the analyzed cells are gated for surface expression. The anti-SCARF antibody used in S4 may not recognize the truncated or mutant SCARFs equally. Cell-surface expression in the flow experiments cannot be inferred from confocal experiments because the flow experiments have a larger quantitative range.

      Anti-SCARF1 antibody assay provides an estimation of the surface expression of the proteins. If the epitope of the antibody was mutated or removed in the mutants, most likely it would lose binding activity. Including an epitope tag on the ectodomain could be an option, but if truncation or mutation changes the conformation of the ectodomain, the accessibility of the epitope may also be affected, and addition of an extra sequence or domain, such as an epitope tag, may affect the surface expression of proteins sometimes.

      In several places, the authors infer increased or decreased affinity from mean fluorescent intensity values of a single concentration point without doing appropriate dose-curves. These experiments need to be done or else the mentions of changes in apparent affinities should be removed.

      We add a concentration for the WT interaction with OxLDL (Fig. S6, p.9) and the manuscript is also modified accordingly.

      Figure 7

      The concentration of teichoic acid used to inhibit modified LDL binding should be indicated and a dose-curve analysis should be done comparing teichoic acid to some non-inhibitory bacterial polymer.

      The concentration of teichoic acids used in the inhibition assays is 100 mg/ml (p.21). Unfortunately, we don’t have other bacterial polymers in the lab and not sure about the potential inhibitory effects.

      Reviewer #2 (Recommendations For The Authors):

      Major points:

      (1) The SCARF1 ECD contains three N-linked glycosylation sites (N289, N382, N393). It remains unclear whether these modifications are involved in SCARF1 binding to modified LDLs. Is it possible to design some experiments to investigate the effect of N-glycans on the recognition of modified LDLs? In particular, N382 and N393 are included in 353-415aa and the truncation mutant of SCARF1Δ353-415aa resulted in reduced binding with OxLDL in Fig.3G. Or whether the reduced binding is only due to the potential conformational changes caused by the deletion of the C-terminal region of the ECD?

      A previous study regarding the N-glycans (N289, N382, N393) of SCARF1 (ref.17) has shown that they may affect the proteolytic resistance, ligand-binding affinity and subcellular localization of SCARF1, which is not quite surprising as lipoproteins are large particles, the N-glycans on the surface of SCARF1 could affect accessibility or affinity for lipoproteins. But the exact roles of each glycan could be difficult to clarify as they might also be involved in protein folding and trafficking.

      The reduction of the binding of OxLDL for the mutant SCARF1 Δ353-415aa may be due to the conformational change or the loss of the glycans or both.

      (2) The authors speculated that the dimeric form of SCARF1 may be more efficient in recognizing lipoproteins on the cell surface. Please highlight the critical region/sites for ligand binding in Figure 8 and discuss the structural basis of dimerization improving the binding.

      The binding sites for lipoproteins on SCARF1 are indicated in Fig. 8. According to our data, it might be possible the conformation of the dimeric form of SCARF1 makes it more accessible to the ligands on the cell surface as implied by flow cytometry (p.14-15), but still needs further evidence on this.

      (3) Could the two salt bridges (D61-K71, R76-D98) observed in f1 crystals be found in f2 crystals? They seemed to be a little far from the defined dimeric interface (F82, S88, Y94) and how important are these to SCARF1 dimerization?

      The two salt bridges observed in f1 crystal are not found in f2 crystal (distances are larger than 5.0 Å), suggesting they are not required for dimerization (p. 7-8), but may be helpful in some cases.

      (4) The monomeric mutants (S88A/Y94A, F82A/S88A/Y94A) exhibited opposite affinity trends to OxLDL in ELISA and flow cytometry. The authors proposed steric hinderance of the dimers coated onto the plates as the potential explanation for this observation. However, the method of ELISA stated that OxLDLs, instead of SCARF1 ECD, were coated onto the plates. So what's the underlying reason for the inconsistency in different assays?

      Thanks. ELISA was done by coating OxLDLs on the plates as described in the Methods. But still, a dimeric form of SCARF1 may only bind one OxLDL coated on the plates due to steric hinderance. We correct this on p.12.

      Minor points:

      (1) Figure 2D and Figure S3 - please label the molecular weight marker on the SEC traces to indicate the native size of various purified proteins.

      The elution volume of SEC not only reflects the molecular weight, but it’s also affected by the conformation or shape of protein. The ectodomain of SCARF1 has a long curved conformation, the elution volumes of the monomeric or dimeric forms of SCARF1 do not align well with the standard molecular weight marker and elute much earlier in SEC. We include the standard molecular weight marker in Fig. S3C-D.

      (2) Could the authors provide SEC profiles of f1 and f2 that were used in crystallographic study?

      The SEC profiles of f1 and f2 for crystallization are shown in Fig. S5 (p.6).

      (3) The legend of Figure 3A states that the NC in flow cytometry assay represents the non-transfected cells, but please confirm whether the NC in Fig. 3A-C corresponds to non-transfected cells or no lipoprotein.

      NC in Fig. 3A represents the non-transfected cells, and no lipoproteins were added in this case as several lipoproteins are included in Fig. 3A. The lipoprotein (OxLDL or AcLDL) treated non-transfected cells (NC) were shown in Fig. 3B-C as negative controls.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript authored by Stockner and colleagues delves into the molecular simulations of Na+ binding pathway and the ionic interactions at the two known sodium binding sites site 1 and site 2. They further identify a patch of two acidic residues in TM6 that seemingly populate the Na+ ions prior to entry into the vestibule. These results highlight the importance of studying the ion-entry pathways through computational approaches and the authors also validate some of their findings through experimental work. They observe that sodium site 1 binding is stabilized by the presence of the substrate in the s1 site and this is particularly vital as the GABA carboxylate is involved in coordinating the Na+ ion unlike other monoamine transporters and binding of sodium to the Na2 site stabilizes the conformation of the GAT1 by reducing flexibility among the helical bundles involved in alternating access.

      Strengths:

      The study displays results that are generally consistent with available information from experiments on SLC6 transporters particularly GAT1 and puts forth the importance of this added patch of residues in the extracellular vestibule that could be of importance to the ion permeation in SLC6 transporters. This is a nicely performed study and could be improved if the authors could comment on and fix the following queries.

      We thank the reviewer for the overall positive assessment of our work.

      Comments on revised version:

      The authors have satisfactorily addressed my comments and this has significantly improved the clarity of the manuscript.

      The only point that I would like to inquire about is the role of EL4 in modulating Na+ entry.

      In the simulations do the authors see no role of EL4 in controlling Na+ entry. It is particularly intriguing as some studies in the recent past displayed charged mutations in EL4 of dDAT, SERT and GAT1 as being detrimental for substrate entry/uptake. It would therefore be nice to add a small discussion if there is any role for EL4 in Na+ entry.

      In this study we focused on sodium binding to the sodium binding site NA1 and NA2 and discovered the role of negatively charged residues at the beginning of TM6 contribution to sodium binding. Our data shows less than average interactions of sodium ions with EL4. In particular, we do also not observe any prominent role for D355, which is the only negatively charged residues in EL4a. We associate this effect to the presence of four positively charged residues (R69,Y76, K350, R351) surrounded D355 and an electrostatic repulsion by a local positive field, which is also visible in Figure 1k. Following the suggestion of the reviewer, we added a short statement to the last paragraph of the discussion.

      Reviewer #2 (Public Review):

      Summary

      Starting from an AlphaFold2 model of the outward-facing conformation of the GAT1 transporter, the authors primarily use state-of-the-art MD simulations to dissect the role of the two Na+ ions that are known to be co-transported with the substrate, GABA (and a cotransported Cl- ion). The simulations indicated that Na+ binding to OF GAT depends on the electrostatic environment. The authors identify an extracellular recruiting site including residues D281 and E283 which they hypothesized to increase transport by locally increasing the available Na+ concentration and thus increasing binding of Na+ to the canonical binding sites NA1 and NA2. The charge-neutralizing double mutant D281AE283A showed decreased binding in simulations. The authors performed GABA uptake experiments and whole-cell patch clamp experiments that taken together validated the hypothesis that the Na+ staging site is important for transport due to its role in pulling in Na+.

      Detailed analysis of the MD simulations indicated that Na+ binding to NA2 has multiple structural effects: The binding site becomes more compact (reminiscent of induced fit binding) and there is some evidence that it stabilizes the outward-facing conformation.

      Binding to NA1 appears to require the presence of the substrate, GABA, whose carboxylate moiety participates in Na+ binding; thus the simulations predict cooperativity between binding of GABA and Na+ binding to NA1.

      Strengths

      - MD simulations were used to propose a hypothesis (the existence of the staging Na+ site) and then tested with a mutant in simulations AND in experiments. This is an excellent use of simulations in combination with experiments.

      - A large number of repeat MD simulations are generally able to provide a consistent picture of Na+ binding. Simulations are performed according to current best practices and different analyses illuminate the details of the molecular process from different angles.

      - The role of GABA in cooperatively stabilizing Na+ binding to the NA1 site looks convincing and intriguing.

      We thank the reviewer for the overall positive assessment of our work.

      Weaknesses

      - Assessing the effects of Na+ binding on the large scale motions of the transporter is more speculative because the PCA does not clearly cover all of the conformational space and the use of an AlphaFold2 model may have introduced structural inconsistencies. For example, it is not clear if movements of the inner gate are due to a AF2 model that's not well packed or really a feature of the open outward conformation.

      We do not think that the results of the manuscript and in particular the large scale motions are speculative or dependent too much on the limitations of PCA. We only use PCA for Figure 6a-d,6g,h. Motions of SLC6 transporters (and of any other transporter) are much more complex than a single 2D PCA plot could every capture. We therefore used PCA here only to identify the two motions with the largest amplitude, show in Figure 6a-d, 6g,h.

      Given that all the ~13000 degrees of freedom of GAT1 contribute to conformational differences, a dimensionally reduction method like PCA can be very helpful for extracting dominant motions. Structure comparison showed that motions observed in PC1 captured a large portion of the motions of occlusion (Figure 6c,d) when compared to the full transition observed in the unfiltered trajectories (See Figure 6e,f). PCA therefore helps to extract this main motions.

      For completeness, we show a series of structures from the unfiltered trajectories in figure 6e,f. In the overlay, the motion of occlusion is more difficult to observe, because convoluted with all other degrees of freedom. In figure 6e,f, the structures are aligned with the maximum likelihood method theseus, while the coloring is based on the amplitudes measured by PCA to visualize the regions moving relative to each other with largest amplitude. All other structural measures, including the opening of the inner gate (Figure 6i-k), are direct measures of the raw trajectories.

      With respect to the question of the instability of the inner gate, we made similar observations for hSERT (please see DOI: 10.1038/s41467-023-44637-6) using the experimentally determined structure as starting point. We find a weakening of the inner gate for sodium free SERT and at intermediate or full occlusion of sodium- and serotonin-bound SERT. These previous data on SERT corroborate our finding and indicates that the effect could be a general feature of the SLC6 transporter family.

      Unfortunately no outward-open structure of GAT1 was available for this study. AlphaFold2 models have limitations and we are well aware of these limitations, but AlphaFold2 can also make high quality models including small adjustment of backbone positions, if the sequence identity is high, as in the current project (43% sequence identity for the transmembrane region). For GAT1 (as described in the manuscript) we initially tested hSERT based model created with MODELLER. MODELLER uses as premises the assumption that the protein backbone does not change or only very little between the template protein and the target protein. These MODELLER created models did not perform well, because of a slight shift in the position of the backbone, which is a consequence of consistently smaller side chains in the bundle domain-scaffold domain interface of GAT1 as compared to SERT.

      In the simulations described in the manuscript (using the AlphaFold created model) we observed that the overall structural and dynamic parameters and in particular also observation at the inner gate are very similar to the results described in our papers on sodium binding to SERT using experimental SERT structures. The differences of Na1 binding are explained in the manuscript and are contingent to the residue difference of D98 in SERT and the corresponding residue G65 in GAT1. This makes us confident about the quality of the obtained data. Please see DOI: 10.3390/cells11020255; DOI: 10.3389/fncel.2021.673782.

      - Quantitative analyses are difficult with the existing data; for example, the tICA "free energy" landscape is probably not converged because unbinding events haven't been observed.

      The tICA analysis is a Marco State Model approach, which relies on the convergence of transitions between a large number of microstates. A limited number of trajectories showing full sodium unbinding are not obligatory for converged dataset, but the transitions between the microstates must to be converged. For the transitions within the S1 we have many transitions and very good convergence for transition probabilities within the S1. We limit interpretation of free energy data and discussion on this part of the free energy surface. The supporting information (Figure S5) reports on the quality of the tICA analysis. Flat lines with a time lag larger than 40 ns is consistent with a converged model based on the data of the trajectories used for the analysis, and consistently, also the Chapman-Kolmogorov tests show minimal difference between estimates and predictions.

      We see about 40 binding event from the extracellular side to the S1, which seems insufficient for a converged quantification for sodium transiting from the extracellular side to the S1. We state this limitation of the dataset in the results section of the manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors show that the Gαs-stimulated activity of human membrane adenylyl cyclases (mAC) can be enhanced or inhibited by certain unsaturated fatty acids (FA) in an isoform-specific fashion. Thus, with IC50s in the 10-20 micromolar range, oleic acid affects 3-fold stimulation of membrane-preparations of mAC isoform 3 (mAC3) but it does not act on mAC5. Enhanced Gαs-stimulated activities of isoforms 2, 7, and 9, while mAC1 was slightly attenuated, but isoforms 4, 5, 6, and 8 were unaffected. Certain other unsaturated octadecanoic FAs act similarly. FA effects were not observed in AC catalytic domain constructs in which TM domains are not present. Oleic acid also enhances the AC activity of isoproterenol-stimulated HEK293 cells stably transfected with mAC3, although with lower efficacy but much higher potency. Gαs-stimulated mAC1 and 4 cyclase activity were significantly attenuated in the 20-40 micromolar by arachidonic acid, with similar effects in transfected HEK cells, again with higher potency but lower efficacy. While activity mAC5 was not affected by unsaturated FAs, neutral anandamide attenuated Gαs-stimulation of mAC5 and 6 by about 50%. In HEK cells, inhibition by anandamide is low in potency and efficacy. To demonstrate isoform specificity, the authors were able to show that membrane preparations of a domain-swapped AC bearing the catalytic domains of mAC3 and the TM regions of mAC5 are unaffected by oleic acid but inhibited by anandamide. To verify in vivo activity, in mouse brain cortical membranes 20 μM oleic acid enhanced Gαs-stimulated cAMP formation 1.5-fold with an EC50 in the low micromolar range.

      Strengths:

      (1) A convincing demonstration that certain unsaturated FAs are capable of regulating membrane adenylyl cyclases in an isoform-specific manner, and the demonstration that these act at the AC transmembrane domains.

      (2) Confirmation of activity in HEK293 cell models and towards endogenous AC activity in mouse cortical membranes.

      (3) Opens up a new direction of research to investigate the physiological significance of FA regulation of mACs and investigate their mechanisms as tonic or regulated enhancers or inhibitors of catalytic activity.

      (4) Suggests a novel scheme for the classification of mAC isoforms.

      Weaknesses:

      (1) Important methodological details regarding the treatment of mAC membrane preps with fatty acids are missing.

      We will address this issue in more detail.

      (2) It is not evident that fatty acid regulators can be considered as "signaling molecules" since it is not clear (at least to this reviewer) how concentrations of free fatty acids in plasma or endocytic membranes are hormonally or otherwise regulated.

      Although this question is not the subject of this ms., we will address this question in more detail in the discussion of the revision.

      Reviewer #2 (Public review):

      Summary:

      The authors extend their earlier findings with bacterial adenylyl cyclases to mammalian enzymes. They show that certain aliphatic lipids activate adenylyl cyclases in the absence of stimulatory G proteins and that lipids can modulate activation by G proteins. Adding lipids to cells expressing specific isoforms of adenylyl cyclases could regulate cAMP production, suggesting that adenylyl cyclases could serve as 'receptors'.

      Strengths:

      This is the first report of lipids regulating mammalian adenylyl cyclases directly. The evidence is based on biochemical assays with purified proteins, or in cells expressing specific isoforms of adenylyl cyclases.

      Weaknesses:

      It is not clear if the concentrations of lipids used in assays are physiologically relevant. Nor is there evidence to show that the specific lipids that activate or inhibit adenylyl cyclases are present at the concentrations required in cell membranes. Nor is there any evidence to indicate that this method of regulation is seen in cells under relevant stimuli.

      Although this question is not the subject of this manuscript, we will address this question in more detail in the discussion of the revision.

      Reviewer #3 (Public review):

      Summary:

      Landau et al. have submitted a manuscript describing for the first time that mammalian adenylyl cyclases can serve as membrane receptors. They have also identified the respective endogenouse ligands which act via AC membrane linkers to modify and control Gs-stimulated AC activity either towards enhancement or inhibition of ACs which is family and ligand-specific. Overall, they have used classical assays such as adenylyl cyclase and cAMP accumulation assays combined with molecular cloning and mutagenesis to provide exceptionally strong biochemical evidence for the mechanism of the involved pathway regulation.

      Strengths:

      The authors have gone the whole long classical way from having a hypothesis that ACs could be receptors to a series of MS studies aimed at ligand indentification, to functional studies of how these candidate substances affect the activity of various AC families in intact cells. They have used a large array of techniques with a paper having clear conceptual story and several strong lines of evidence.

      Weaknesses:

      (1) At the beginning of the results section, the authors say "We have expected lipids as ligands". It is not quite clear why these could not have been other substances. It is because they were expected to bind in the lipophilic membrane anchors? Various lipophilic and hydrophilic ligands are known for GPCR which also have transmembrane domains. Maybe 1-2 additional sentences could be helpful here.

      Will be done as suggested.

      (2) In stably transfected HEK cells expressing mAC3 or mAC5, they have used only one dose of isoproterenol (2.5 uM) for submaximal AC activation. The reference 28 provided here (PMID: 33208818) did not specifically look at Iso and endogenous beta2 adrenergic receptors expressed in HEK cells. As far as I remember from the old pharmacological literature, this concentration is indeed submaximal in receptor binding assays but regarding AC activity and cAMP generation (which happen after signal amplification with a so-called receptor reserve), lower Iso amounts would be submaximal. When we measure cAMP, these are rather 10 to 100 nM but no more than 1 uM at which concentration response dependencies usually saturate. Have the authors tried lower Iso concentrations to prestimulate intracellular cAMP formation? I am asking this because, with lower Iso prestimulation, the subsequent stimulatory effects of AC ligands could be even greater.

      The best way to address this issue is to establish a concentration-response curve for Iso-stimulated cAMP formation using the permanently transfected cells. We note that in the past isoproterenol concentrations used in biochemical or electrophysiological experiments differed substantially.

      (3) The authors refer to HEK cell models as "in vivo". I agree that these are intact cells and an important model to start with. It would be very nice to see the effects of the new ligands in other physiologically relevant types of cells, and how they modulate cAMP production under even more physiological conditions. Probably, this is a topic for follow-up studies.

      The last sentence is correct.

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The authors have achieved their aims to a very high degree, their results do nicely support their conclusions. There is only one point (various classical GPCR concentrations, please see above) that would be beneficial to address.

      Without any doubt, this is a groundbreaking study that will have profound implications in the field for the next years/decades. Since it is now clear that mammalian adenylyl cyclases are receptors for aliphatic fatty acids and anandamide, this will change our view on the whole signaling pathway and initiate many new studies looking at the biological function and pathophysiological implications of this mechanism. The manuscript is outstanding.

    1. Author response:

      eLife Assessment

      This important study reports the transcriptomic and proteomic landscape of the oviducts at four different preimplantation periods during natural fertilization, pseudopregnancy, and superovulation. The data presented convincingly supported the conclusion in general, although more analyses would strengthen the conclusions drawn. This work will interest reproductive biologists and clinicians practicing reproductive medicine. 

      We appreciate the concise summary and agree that additional experiments can reinforce the fidelity of predictions made by our robust bioinformatic characterization of the oviduct. Our robust bioinformatic model appears reproducible as similar pathway trends have been produced in all three datasets, lending confidence for future researchers to establish testable hypotheses more effectively.  

      Reviewer #1 (Public review):

      The paper demonstrated through a comprehensive multi-omics study of the oviduct that the transcriptomic and proteomic landscape of the oviduct at 4 different preimplantation periods was dynamic during natural fertilization, pseudopregnancy, and superovulation using three independent cell/tissue isolation and analytical techniques. This work is very important for understanding oviductal biology and physiology. In addition, the authors have made all the results available in a web search format, which will maximize the public's access and foster and accelerate research in the field.

      Strengths:

      (1) The manuscript addresses an important and interesting question in the field of reproduction:

      how does the oviduct at different regions adapt to the sperm and embryos for facilitating fertilization and preimplantation embryo development and transport?

      (2) Authors used cutting-edge techniques: Integrated multi-modal datasets followed by in vivo confirmation and machine learning prediction.

      (3) RNA-seq, scRNA-seq, and proteomic results are immediately available to the scientific community in a web search format.

      (4) Substantiated results indicate the source of inflammatory responses was the secretory cell population in the IU region when compared to other cell types; sperm modulate inflammatory responses in the oviduct; the oviduct displays immuno-dynamism.

      We sincerely thank you for your thorough and insightful review of our manuscript. Your comprehensive summary accurately captures the essence of our multi-omics study on oviductal biology, highlighting its importance in understanding reproductive physiology. We are particularly grateful for your recognition of our study's strengths. In the revised manuscript, we

      plan to add another searchable scRNA-seq data on our public website; https://genesearch.org/winuthayanon/Oviduct_pregnancy/. We will also address the weaknesses in the response below in our revised manuscript.  

      Weaknesses:  

      (1) The rationale for using the superovulation model is not clear. The oviductal response to sperm and embryos can be studied by comparing mating with normal and vasectomized mice and comparing pregnancy vs pseudopregnancy (induced by mating with vasectomized males). Superovulation causes supraphysiological hormone levels and other confounding conditions.

      We agree with this assessment that superovulation changes the hormonal levels and could have a confounding impact on the oviduct function. As such, for all experiments involving pseudopregnant datasets, pseudopregnancy was induced by mating females with vasectomized males without superovulation. In our oviductal luminal protein content analysis, oviductal fluid was collected from pregnant females with and without superovulation. This allowed us to directly compare the impact of superovulation on protein abundance and profile. In the revised manuscript, we will provide clarifying statements on using superovulation in our experimental design. 

      One exception for using superovulation in the absence of a “natural mating” group for comparison is the scRNA-seq dataset. As single-cell libraries should be performed in a single run to avoid batch effects, we need to ensure that the sufficient number of females were pregnant for single-cell isolation (we used ~4 mice/timepoint). Therefore, superovulation was used to synchronize and ensure that the females were receptive to mating. At the time of our sample collection, single nuclei isolation methods (freeze tissue now, isolate nuclei later) have not been reliable or standardized. We have tried to synchronize females using the male bedding without having to superovulate. However, we would still need to set up at least 12-15 females per pregnancy timepoint to mate with male mice, which totals to ~48-60 mice each night. Due to budget and vivarium space limitations, we were not able to do so. We will include a similar statement to explain and clarify these limitations in the revised manuscript.

      (2) This study involves a very complex dataset with three different models at four time points. If possible, it would be very informative to generate a graphic abstract/summary of their major findings in oviductal responses in different models and time points

      Thank you for this suggestion. We will include the graphical abstract to accompany our final version of the manuscript.

      (3) The resolution of Figures 3A-3C in the submitted file was not high enough to assess the authors' conclusion.

      We plan to provide a higher magnification of images in Figures 3A-C in the revised version.

      (4) The authors need to double-check influential transcription factors identified by machine learning. Apparently, some of them (such as Anxa2, Ift88, Ccdc40) are not transcription factors at all.

      We appreciate the recognition of this oversight. We will clearly state the distinction between ‘influential TFs’ and ‘significant proteins’ in the revised manuscript. We will ensure that all TFs are stated correctly. 

      Reviewer #2 (Public review):

      The manuscript investigates oviductal responses to the presence of gametes and embryos using a multi-omics and machine learning-based approach. By applying RNA sequencing (RNAseq), single-cell RNA sequencing (sc-RNA-seq), and proteomics, the authors identified distinct molecular signatures in different regions of the oviduct, proximal versus distal. The study revealed that sperm presence triggers an inflammatory response in the proximal oviduct, while embryo presence activates metabolic genes essential for providing nutrients to the developing embryos. Overall, this study offers valuable insights and is likely to be of great interest to reproductive biologists and researchers in the field of oviduct biology. However, further investigation into the impact of sperm on the immune cell population in the oviduct is necessary to strengthen the overall findings.

      We appreciate the concise summary, strengths, and weakness highlighted. We plan to address comments made by the reviewer concerning superovulation, figure recommendations, and additional analysis in our revised manuscript. We plan to include the comparison of findings from scRNA-seq analysis from fallopian tube tissues collected from hydrosalpinx patients by Ulrich et al. (PMID: 35320732) with our data. The evaluation of this data by Ulrich et al. will help distinguish between different inflammatory pathways stimulated by sperm vs. general inflammation. We will follow up on a detailed description of immune cell types present at 0.5 dpc using FACS analysis in future studies. This is mainly due to a lack of expertise and technical limitations in our lab on immune cell investigation. Nevertheless, we have made collaborative efforts and recruited two immunologists to facilitate our future immune cell studies. We will also provide a clear justification for using superovulation, especially in the scRNA-seq analysis in the revised manuscript (please see response to Reviewer 1 above).