26,869 Matching Annotations
  1. Jun 2024
    1. Reviewer #3 (Public Review):

      Summary:

      The manuscript by Dajka and co-workers reports the application of a biophysical approach to analyse the dynamics of the LptB2FG-C ABC transporter, involved in LPS transport across the cell envelope in Escherichia coli. LptB2FG-C belongs to a new class of ABC transporters (type VI) and is essential and conserved in several Gram-negative pathogens. Since LPS is the major component of the outer membrane of the Gram-negative cell and is responsible for the low permeability of this membrane to several antibiotics, a deep understanding of the mechanism and function of the LptB2FG-C transporter is crucial for the development of new drugs targeting Gram-negative pathogens.

      Several structural studies have been published so far on the LptB2FG-C transporter, disclosing important aspects of the transport mechanism; nevertheless, lack of resolution of some regions of the individual proteins as well as the dynamic nature of the transport mechanism per se (e.g. the insertion and removal of the TM helix of LptC from the TMDs of the transporter during the LPS transport cycle) has greatly limited the understanding of the mechanism that couples ATP binding and hydrolysis with LPS transport. This knowledge gap could be filled by applying an approach that allows the analysis of dynamic processes. The DEER/PELDOR technique applied in this work fits well with this requirement.

      Strengths:

      In this study, the authors provide some new pieces of information on the LptB2FG-C function and the role of LptC in the transporter. Notably, they show that:

      -there is high heterogeneity in the conformational states of the entry gate of LPS in the transporter (gate-2) that are reduced by the insertion of LptC, and the heterogeneity observed is not altered by ATP binding or hydrolysis (as expected since LPS entry is ATP-independent).

      -ATP binding induces an allosteric opening of LptF β-jellyroll domain that allows for LPS passage to the β-jellyroll of LptC, which is stably associated with the β-jellyroll of LptF throughout the cycle.

      - the β-jellyroll of LptG is highly flexible, indicating an involvement in the LPS transport cycle.

      The manuscript is timely and overall clear.

      Weaknesses:

      I list my concerns below and provide suggestions that, in my opinion, should be addressed to reinforce the findings of this study.

      (1) Protein complex controls: the authors assess the ATPase activity of the spin-labelled variants of their protein complexes to rule out the possibility that engineering the proteins to enable spin labelling could affect their functionality (Figure S4). It has been reported that the association of LptC to LptB2FG complex inhibits its ATPase activity. However, in the ATPase assay data shown in Figure S4, the inhibitory effect of the LptC TM is not visible (please compare LptB2FG F-A45C G-I335C and F-L325C G-A52C with and without LptC). This can lead to suspect that the regulatory function of LptC is missing in the LptC-containing complexes used in this work. I suggest the authors include wt LptB2FGC in the assay to compare the ATPase activity of this complex with wt LptB2FG. The published inhibitory effect of TM LptC has been observed in proteoliposomes. Since it is not clear from the paper if the ATPase assay in Figure 4 has been conducted in DDM or proteoliposomes, the lack of inhibitory effect could be due to the assay conditions. A comparative test could answer this question.

      (2) Figure 2: NBD closure upon ATP binding to LptB2FG is convincingly demonstrated both in DDM micelles and proteoliposomes, validating the experimental system. However, since under physiological conditions, ATP binding should take place before the displacement of the TM of LptC (Wilson and Ruiz, Mol microbiol 2022), I suggest the authors carry out the experiments with LptC-containing complexes to investigate conformational changes (if any) that are triggered when ATP binding occurs before the TM displacement.

      (3) Proteoliposomes: in the experiments shown in Figures 3 and 4, unlike those in Figure 2, measurements in proteoliposomes give different results from the experiments in DDM, showing higher heterogeneity. Could this be related to the presence (or absence) of LPS in liposomes? It is not mentioned in the materials and methods section whether LPS is present. Could the authors please discuss this?

      (4) The authors show large conformational heterogeneity in gate-2 (using the spin-labelled pair F-L325R1-G-A52R1) and suggest that deviation from the corresponding simulations could be due to the need for enhanced dynamics to allow for gate interaction with LPS or LptC. The effect of LptC is probed in the experiments shown in Figure 6, but I suggest the authors add LPS to the complexes to evaluate the possible stabilizing effect of LPS on the conformations shown in Figure 4.

      (5) Figure 6: the measurement of lateral gate 1 and 2 dynamics in the LptC-containing complexes clearly supports the hypothesis, proposed based on the available structures, that TM LptC dissociates from LptB2FG upon ATP binding. However, direct evidence of this movement is still missing. Would it be possible to monitor the dynamics of the TM LptC by directly labelling this protein domain? This would give a conclusive demonstration of the displacement during the ATPase cycle.

      (6) LPS release assay: Figure 6 panels H-I-J show the MS spectra relative to LPS-bound and free proteins obtained from wt LptB2FG upon ATP binding and ATP hydrolysis conditions. From these spectra the authors conclude that LPS is completely released only upon ATP hydrolysis. However, the current model predicts that LPS release into the Lpt bridge made by LptC-A-D is triggered by ATP binding. For this reason, I suggest the authors assess LPS release also from the LptB2FGC complex where, in the absence of LptA, LPS would be expected to be mostly retained by the complex under the same conditions.

    1. Reviewer #1 (Public Review):

      Summary:

      Schafer et al. tested whether the hippocampus tracks social interactions as sequences of neural states within an abstract social space defined by dimensions of affiliation and power, using a task in which participants engaged in narrative-based social interactions. The findings of this study revealed that individual social relationships are represented by unique sequences of hippocampal activity patterns. These neural trajectories corresponded to the history of trial-to-trial affiliation and power dynamics between participants and each character, suggesting an extended role of the hippocampus in encoding sequences of events beyond spatial relationships.

      The current version has limited information on details in decoding and clustering analyses which can be improved in the future revision.

      Strengths:

      (1) Robust Analysis: The research combined representational similarity analysis with manifold analyses, enhancing the robustness of the findings and the interpretation of the hippocampus's role in social cognition.

      (2) Replicability: The study included two independent samples, which strengthens the generalizability and reliability of the results.

      Weaknesses:

      I appreciate the authors for utilizing contemporary machine-learning techniques to analyze neuroimaging data and examine the intricacies of human cognition. However, the manuscript would benefit from a more detailed explanation of the rationale behind the selection of each method and a thorough description of the validation procedures. Such clarifications are essential to understand the true impact of the research. Moreover, refining these areas will broaden the manuscript's accessibility to a diverse audience.

    2. Reviewer #2 (Public Review):

      Summary:

      Using an innovative task design and analysis approach, the authors set out to show that the activity patterns in the hippocampus related to the development of social relationships with multiple partners in a virtual game. While I found the paper highly interesting (and would be thrilled if the claims made in the paper turned out to be true), I found many of the analyses presented either unconvincing or slightly unconnected to the claims that they were supposed to support. I very much hope the authors can alleviate these concerns in a revision of the paper.

      Strengths & Weaknesses:

      (1) The innovative task design and analyses, and the two independent samples of participants are clear strengths of the paper.

      (2) The RSA analysis is not what I expected after I read the abstract and tile of the result section "The hippocampus represents abstract dimensions of affiliation and power". To me, the title suggests that the hippocampus has voxel patterns, which could be read out by a downstream area to infer the affiliation and power value, independent of the exact identity of the character in the current trial. The presented RSA analysis however presents something entirely different - namely that the affiliation trials and power trials elicit different activity patterns in the area indicated in Figure 3. What is the meaning of this analysis? It is not clear to me what is being "decoded" here and alternative explanations have not been considered. How do affiliation and power trials differ in terms of the length of sentences, complexity of the statements, and reaction time? Can the subsequent decision be decoded from these areas? I hope in the revision the authors can test these ideas - and also explain how the current RSA analysis relates to a representation of the "dimensions of affiliation and power".

      (3) Overall, I found that the paper was missing some more fundamental and simpler RSA analyses that would provide a necessary backdrop for the more complicated analyses that followed. Can you decode character identity from the regions in question? If you trained a simple decoder for power and affiliation values (using the LLE, but without consideration of the sequential position as used in the spline analysis), could you predict left-out trials? Are affiliation and power represented in a way that is consistent across participants - i.e. could you train a model that predicts affiliation and power from N-1 subjects and then predict the Nth subject? Even if the answer to these questions is "no", I believe that they are important to report for the reader to get a full understanding of the nature of the neural representations in these areas. If the claim is that the hippocampus represents an "abstract" relationship space, then I think it is important to show that these representations hold across relationships. Otherwise, the claim needs to be adjusted to say that it is a representation of a relationship-specific trajectory, but not an abstract social space.

      (4) To determine that the location of a specific character can be decoded from the hippocampal activity patterns, the authors use a sequential analysis in a low-dimensional space (using local linear embedding). In essence, each trial is decoded by finding the pair of two temporally sequential trials that is closest to this pattern, and then interpolating the power/affiliation values linearly between these two points. The obvious problem with this analysis is that fMRI pattern will have temporal autocorrelation and the power and affiliation values have temporal autocorrelation. Successful decoding could just reflect this smoothness in both time series. The authors present a series of control analyses, but I found most of them to not be incisive or convincing and I believe that they (and their explanation of their rationale) need to be improved. For example, the circular shifting of the patterns preserves some of the autocorrelation of the time series - but not entirely. In the shifted patterns, the first and last items are considered to be neighboring and used in the evaluation, which alone could explain the poor performance. The simplest way that I can see is to also connect the first and last item in a circular fashion, even when evaluating the veridical ordering. The only really convincing control condition I found was the generation of new sequences for every character by shuffling the sequence of choices and re-creating new artificial trajectories with the same start and endpoint. This analysis performs much better than chance (circular shuffling), suggesting to me that a lot of the observed decoding accuracy is indeed simply caused by the temporal smoothness of both time series.

      (5) Overall, I found the analysis of the brain-behavior correlation presented in Figure 5 unconvincing. First, the correlation is mostly driven by one individual with a large network size and a 6.5 cluster. I suspect that the exclusion of this individual would lead to the correlation losing significance. Secondly, the neural measure used for this analysis (determining the number of optimal clusters that maximize the overlap between neural clustering and behavioral clustering) is new, non-validated, and disconnected from all the analyses that had been reported previously. The authors need to forgive me for saying so, but at this point of the paper, would it not be much more obvious to use the decoding accuracy for power and affiliation from the main model used in the paper thus far? Does this correlate? Another obvious candidate would be the decoding accuracy for character identity or the size of the region that encodes affiliation and power. Given the plethora of candidate neural measures, I would appreciate if the authors reported the other neural measures that were tried (and that did not correlate). One way to address this would have been to select the method on the initial sample and then test it on the validation sample - unfortunately, the measure was not pre-registered before the validation sample was collected. It seems that the correlation was only found and reported on the validation sample?

    3. Author response:

      a) that the investigation is very interesting and inventive, and has the potential to reveal some novel insights.

      We thank the reviewers and are excited to improve upon the manuscript through their suggestions.

      b) that the problem of temporal autocorrelation in the fMRI and behavioral data has not been dealt with clearly and convincingly

      We agree that convincingly accounting for fMRI temporal autocorrelation is important to our claims. To reduce its effects, we used field standard methods: prewhitening and autocorrelation modeling with SPM’s FAST algorithm (shown by Olszowy et al. 2019 to be superior to SPM’s default setting), as well as a high-pass filter of 128 Hz. There is still some first-order autocorrelation structure present across voxels in the left hippocampal beta series: across participants there is slightly positive autocorrelation between the betas of decision trials on successive trials, that decays to ~0 at subsequent lags. We note that our task is a narrative, and some patterns over time are expected; instead of attempting to fully eliminate all temporal structure in the data, we aim to show that the temporal distance between trials is unlikely to explain our effects.

      In the within versus between social dimension representational similarity analysis, the average temporal distance between trials is the same within and between dimensions. The clustering analysis is a between subject analysis about individual differences–and the same overall temporal structure is experienced by all participants.

      The trajectory analysis does not focus on consecutive trials across characters, but rather on consecutive trials within characters, where the time gap between successive trials is relatively large and highly variable. An average of over a minute of time elapses between successive decision trials for a given character (versus ~20 seconds across characters), which is on average almost 11 narrative slides and 3 decision trials. Across characters, the temporal gap between decision trials ranges between 12 seconds to more than 10 minutes, reducing the likelihood that temporal autocorrelation drives character-related estimates. We also highlight the shuffled choices control model, which shares the same temporal autocorrelation structure as the model of interest but had significantly poorer social location decoding–a strong indication that temporal autocorrelation alone can’t explain these results. For each participant, we shuffled their choices and re-computed trajectories that preserved the origin and end locations but produced different locations along the way. Our model decoded location significantly better than this null model, and this difference in performance can't be explained by differences in temporal autocorrelation in the neural or behavioral data.

      In the revision, we will further address this concern. For example, we will report more details on the task structure to aid in interpretation and will more precisely characterize the temporal autocorrelation profile. Where appropriate, we will also improve on and/or add more control analyses that preserve the autocorrelation structure.

      c) that a number of important interesting questions have not been addressed: Are the differences between social partners encoded in the hippocampus? Are the social dimensions encoded in a consistent manner across social partners?

      We believe that we should be able to decode other interesting task- and relationship-related features from the hippocampal patterns, as suggested by the reviewers. In the revision, we will attempt several such analyses, while taking care to control for temporal autocorrelation.

      d) that the cluster analysis in the brain-behavior correlation analysis is not well motivated or validated and should be clarified.

      We agree with the reviewers that this clustering analysis should be better described and validated. We aimed to ask whether less diverse and distinctive cognitive representations of the relationship trajectories relate to smaller real-world social networks. This question of impoverished cognitive maps was first raised by Edward Tolman; we think it is relevant here, as well. In the revision, we will clarify its motivations and implications, and better evaluate it for its robustness. Here, we address a few comments made by the reviewers.

      Reviewer 2 noted that other analyses could be used to ask whether social cognitive map complexity relates to real-world social network complexity. While the proposed alternatives are interesting (e.g., correlating decoding accuracy with social network size), we believe these analyses ask different questions. The current co-clustering analysis was intended to estimate map complexity jointly from the behavioral and neural signatures of the social map across characters. In contrast, the spline location decoding is within character; the accuracy of this decoding does not say much about representations across characters. And although we think character decoding is an interesting possible addition to this manuscript, its accuracy may reflect other aspects of the relationships, beyond just spatial representation. Thus, we will provide a clearer and better validated version of the current analysis to address this question.

      We would also like to clarify that we did not collect the Social Network Index questionnaire in the Initial sample; as such these results are more tentative than the other analyses, due to the inability to confirm them in a separate sample. Reviewer 2 also suggests that a single outlier could drive this effect; but estimating the effect with robust regression also returns a right-tailed p < 0.05, showing that the relationship is robust to outliers.

      References

      Olszowy, W., Aston, J., Rua, C. & Williams, W.B. Accurate autocorrelation modeling substantially improves fMRI reliability. Nature Communications. (2019).

    1. Reviewer #1 (Public Review):

      Summary:

      An online database called MRAD has been developed to identify the risk or protective factors for AD.

      Strengths:

      This study is a very intriguing study of great clinical and scientific significance that provided a thorough and comprehensive evaluation with regard to risk or protective factors for AD. It also provided physicians and scientists with a very convenient, free as well as user-friendly tool for further scientific investigation.

      Weaknesses:

      (1) The paper mentions that the MRAD database currently contains data only from European populations, with no mention of data from other populations or ethnicities. Given potential differences in Alzheimer's Disease (AD) across different populations, the limitations of the data should be emphasized in the discussion, along with plans to expand the database to include data from more racial and geographic regions.

      (2) Sufficient information should be provided to clarify the data sources, sample selection, and quality control methods used in the MRAD database. Readers may expect more detailed information about the data to ensure data reliability, representativeness, and research applicability.

      (3) While the authors mention that the MRAD database offers interactive visualization interfaces, the paper lacks detailed information on how to interpret and understand these visual results. Guidelines on effectively using these visualization tools to help researchers better comprehend the data are essential.

      (4) In the conclusion section of the paper, it is advisable to explicitly emphasize the practical applications and potential clinical significance of the MRAD database. The paper should articulate how MRAD can contribute to the early identification, diagnosis, prevention, and treatment of AD and its potential societal and clinical value more clearly.

      (5) Grammar and Spelling Errors: There are several spelling and grammar errors in the paper. Referring to a scientific editing service is recommended.

    2. eLife assessment

      This study introduces the MRAD database, which provides a useful tool for evaluating risk and protective factors for Alzheimer's disease through Mendelian randomization analysis. While the findings are supported by solid evidence, the study's value could be enhanced by addressing methodological concerns and ensuring rigorous validation of significant associations. The MRAD database has the potential to aid researchers and clinicians, but the current analysis appears incomplete without these refinements.

    3. Reviewer #2 (Public Review):

      Summary:

      This MR study by Zhao et al. provides a comprehensive hypothesis-free approach to identifying risk and protective factors causal to Alzheimer's Disease (AD).

      Strengths:

      The study employs a comprehensive, hypothesis-free approach, which is novel over traditional hypothesis-driven studies. Also, causal associations between risk/protective factors and AD were addressed using genetic instruments and analysis.

      Major comments:

      (1) The authors used the inverse-variance weighted (IVW) model as the primary method and other MR methods (MR-Egger, weighted mean, etc.) for sensitivity analysis. However, each method has its own assumption, and IVW is only robust when pleiotropy and heterogeneity are not severe. Rather than using IVW imprudently across all associations, it would be more appropriate to choose the best MR method for each association based on heterogeneity/Egger intercept tests. This customized approach, based on tests of MR assumption violations, yields more stable and reliable results. For reference, please follow up on work by Milad et al. (EHJ - "Plasma lipids and risk of aortic valve stenosis: a Mendelian randomization study"). This study selected the best MR model for each association based on pleiotropy and heterogeneity tests. Given the large number of tests in this work, I suggest initially screening significant signals using IVW, as done, and then validating the results using multiple MR methods for those signals. It is common for MR estimates from different methods to vary significantly (with some being statistically significant and others not), and in such cases, the MR estimates from the best-fitted model should be trusted and highlighted.

      (2) Lines 157-160 mentioned "But to date, AD has been reported as hypothesis-driven MR study based on a single factor, ignoring the potential role of a huge number of other risk factors. Also, due to the high degree of heterogeneity present in AD subtypes, which have different biological and genetic characteristics. Thus, the previous studies cannot offer a systematic and complete viewpoint.". This statement overlooks a similar study published in Molecular Psychiatry ("A Phenome-wide Association and Mendelian Randomization Study for Alzheimer's Disease: A Prospective Cohort Study of 502,493"), which rigorously assessed the effects of 4171 factors spanning 10 different categories on AD using observational analysis and MR. The authors should revise their statement on the novelty of their study type throughout the manuscript and discuss how their work differs from and potentially strengthens previous studies.

      (3) Given the large number of tests, the multiple testing issue is concerning. To mitigate potential false positives, I recommend employing the Bonferroni threshold or FDR. The authors should only interpret exposures that are significant at the Bonferroni threshold.

      (4) In the discussion, the authors should interpret or highlight exposures that remain significant after multiple testing corrections.

    1. eLife assessment

      The study presents a valuable finding in advancing our understanding of the cellular and molecular mechanisms that regulate the switching of the migration mode from parallel to radial in cerebellar granule cell development. The evidence supporting the claims of the authors is solid and supports the main conclusion; the highlight was the imaging system's visualization of the cell-recognition event associated with neuronal migration, which established a new standard for the field. This study would be of interest to cell biologists and neurodevelopmental biologists working on cell-cell interaction and neuronal migration.

    2. Reviewer #1 (Public Review):

      This study by Hallada et al. reported the detailed characterization of cis and trans-binding of JAM-C in mediating the developmental migration of CGNs, combining ex vivo cultures, time-lapse imaging, and mathematical analyses. Overall, the study was comprehensively carried out, and the conclusion is important in our understanding of the signaling mechanism of cerebellar development.

      Weaknesses:

      Several technical concerns need to be clarified.

      (1) The efficiency of shRNA knockdown of endogenous JAM-C. The entire study was based on the assumption that the endogenous wild-type JAM-C was depleted to the extent that it would not influence the observed phenotypes. However, this point requires verification, particularly in the ex vivo cultures.

      (2) The expression levels of mutant JAM-C proteins. It is unclear whether the exogenous expression of mutant JAM-C proteins would be comparable to that of the endogenous JAM-C. In addition, the levels of exogenously expressed JAM-C may likely alter over the time course of experiments, e.g., in some experiments over 48 hours.

      (3) The resolution of imaging methods. Different imaging methods were utilized in the study, and it is essential to clearly state the resolution of each imaging dataset (e.g., 0.2 x 0.2 um per pixel). This information is crucial to assess the reliability of observed phenotypes, which in some cases were relatively unimpressive.

    3. Reviewer #2 (Public Review):

      Summary:

      Lamination is a layered neuronal arrangement that provides a basic frame to establish functional connectivity in the brain. The formation of a layered structure requires a highly coordinated interaction between migrating neurons and the developing microenvironment. Earlier studies revealed that to reach specific locations, migrating neurons typically follow various morphogen gradients. Here, Hallada et al. showed that cerebellar granule neurons (CGNs) could navigate via adhesive interaction with Junctional Adhesion Molecule C (JAM-C) followed by recruitment and distribution of intercellular partners (Pard3 and debris) at the contact sites. These results show that neuronal migration could be structured by specific interactions with adhesion molecules and spatial re-arrangements of downstream effectors.

      Strengths:

      The authors concluded that cis/trans binding sites of JAM-C on CGNs are crucial for contact formation with cerebellar glial cells (Bergman glial cells, BGs) and recruitment of Pard3 and drebrin to contact sites. This conclusion was based on the data obtained utilizing several advanced tools and technical approaches, such as cutting-edge microscopy, detailed visualization of cell-cell recognition, and a new correlation analysis.

      Weaknesses:

      (1) Despite multiple advanced methodologies, the study has weaknesses related primarily to the lack of specific evidence in support of findings and data interpretation issues. For example, it is unclear how JAM-C-mediated adhesion facilitates the entry of CGNs into the cerebellar molecular layer (ML). The authors described that CGN-CGN JAM recognition recruits more Pard3 and drebrin compared to CGN-BG recognition, which could increase the dwelling time of CGNs before moving to ML. However, such a mechanism does not explain what would initiate the entry of CGNs into ML. Perhaps the authors could provide a detailed explanation of this phenomenon in the Discussion (but certainly not in the Abstract). Also, the authors could consider revising the content of the Abstract, emphasizing their findings, and leaving out the speculations.

      (2) To allow for comparison, it would be very helpful to indicate specific numerical values for each data point throughout the manuscript. For example, the authors stated that a change in instantaneous migration angle due to JAM-C silencing negatively affects CGNs movement to the ML (Figure 2) and that spatial distribution of negative JAM-Drebrin correlation is altered at CGN-CGN contacts (Figure 7). However, without specific values, it remains unclear what the magnitude of the discussed changes is or whether they were actually significant. It was not certainly straightforward to make specific conclusions based on graphical presentation alone.

    4. Reviewer #3 (Public Review):

      Summary:

      This study elucidated the mechanism controlling the switch from parallel migration to radial migration during the development of cerebellar granule cells by analyzing the behavior of cell-type-specific JAM-mediated adhesion and the downstream factors that promote migration. The research represents a detailed analysis, employing probes to capture cell recognition events between different cell types, a co-culture system (monolayer culture and slice imaging), and imaging techniques, building upon the authors' prior research on JAM-Pard3 interactions. As a result, the authors found that:

      (1) JAM-C-mediated interactions between granule cells (GCNs) are formed earlier and are more robust than JAM-C/JAM-B interactions between GCNs and glia;

      (2) Recruitment of migration-promoting factors Pard3/Drebrin by JAM interactions is more efficient in GCN-GCN (JAM-C/JAM-C) interactions; and

      (3) The distribution pattern of Pard3/Drebrin differs between GCN-GCN and GCN-Glia interactions, as revealed by detailed imaging analysis.

      Consequently, the authors discovered that these differences contribute to a time lag between parallel and radial migration, which serves as a temporal checkpoint sorting mature cerebellar granule cells.

      Strengths:

      Cell migration is a commonly observed phenomenon in neural development. It is crucial for sorting specific cell populations and positioning them appropriately to develop proper neural circuits. While the regulation of these migrations is known to be mediated by secreted guidance factors, this study demonstrates that combinations of cell adhesion molecules (JAM) mediate cell type-specific interactions that contribute to the timing control of cell migration. This finding significantly advances our understanding of the mechanisms governing cell migration in neural development.

      Weaknesses:

      The author's hypothesis has been validated using in vitro systems. While in vitro systems allow for a more detailed design of experimental parameters, validation in vivo would still be necessary to demonstrate whether the temporal checkpoint of migration mediated by cell-cell interactions works. For example, knockout of JAM-C in cerebellar granule cells could be considered for such validation. Furthermore, the behavioral analysis of these mutant mice would be interesting.

      Additionally, the author's observation that recruitment patterns of Pard3 and Drebrin at adhesive sites vary between interacting cell pairs is intriguing and suggests exciting implications. It would be highly informative if the relationship between these differences and ML entry timing could be demonstrated.

    1. eLife assessment

      Wittkamp et al. investigated the spatiotemporal dynamics of expectation of pain using an original fMRI-EEG approach. The methods are solid and the evidence for a substantially different neural representation between the anticipatory and the actual pain period is convincing. These important findings would benefit from a general framework to encompass their research questions, hypotheses, and interpretation of results. Furthermore, a more in-depth discussion about the choice of conditions would be desirable, specifically whether the definitions of nocebo and placebo in the study are comparable with traditional paradigms, and whether the control condition can be considered as a situation with no expectation or no prediction.

    2. Reviewer #1 (Public Review):

      Summary:

      In this important paper, the authors investigate the temporal dynamics of expectation of pain using a combined fMRI-EEG approach. More specifically, by modifying the expectations of higher or lower pain on a trial-to-trial basis, they report that expectations largely share the same set of activations before the administration of the painful stimulus, and that the coding of the valence of the stimulus is observed only after the nociceptive input has been presented. fMRI-informed EEG analysis suggested that the temporal sequence of information processing involved the Dorsolateral prefrontal cortex (DLPFC), the anterior insula, and the anterior cingulate cortex. The strength of evidence is convincing, and the methods are solid, but a few alternative interpretations about the findings related to the control group, as well as a more in-depth discussion on the correlations between the BOLD and EEG signals would strengthen the manuscript.

      Strengths:

      In line with open science principles, the article presents the data and the results in a complete and transparent fashion.

      From a theoretical standpoint, the authors make a step forward in our understanding of how expectations modulate pain by introducing a combination of spatial and temporal investigation. It is becoming increasingly clear that our appraisal of the world is dynamic, guided by previous experiences, and mapped on a combination of what we expect and what we get. New research methods, questions, and analyses are needed to capture these evolving processes.

      Weaknesses:

      The control condition is not so straightforward. Across the manuscript it is defined as "no expectation", and in the legend of Figure 1 it is mentioned that the third state would be "no prediction". However, it is difficult to conceive that participants would not have any expectations or predictions. Indeed, in the description of the task it is mentioned that participants were instructed that they would receive stimuli during "intermediate sensitive states". The results of the pain scores and expectations might support the idea that the control condition is situated in between the placebo and nocebo conditions. However, since this control condition was not part of the initial conditioning, and = participants had no reference to previous stimuli, one might expect that some ratings might have simply "regressed to the mean" for a lack of previous experience.

      General considerations and reflections:

      Inducing expectations in the desired direction is not a straightforward task, and results might depend on the exact experimental conditions and the comparison group. In this sense, the authors' choice of having 3 groups of positive, negative, and "neutral" expectations is to be praised. On the other hand, also control groups form their expectations, and this can constitute a confounder in every experiment using expectation manipulation, if not appropriately investigated.

      In addition, although fMRI is still (probably) the best available tool we have to understand the spatial representation of cortical processing, limitations about not only the temporal but even the spatial resolution should be acknowledged. Given the anatomical and physiological complexity of the cortical connections, as we know from the animal world, it is still well possible that subcircuits are activated also for positive and negative expectations, but cannot be observed due to the limitation of our techniques. Indeed, on an empirical/evolutionary basis it would remain unclear why we should have a system that waits for the valence of a stimulus to show differential responses.

      Also, moving in a dimension of network and graph theory, one would not expect single areas to be responsible for distinct processes, but rather that they would integrate information in a shared way, potentially with different feedback and feedforward communications. As such, it becomes more difficult to assume the insula is a center for coding potential pain, perhaps more of a node in a system that signals potential dangers for the integrity of the body.

      The authors analyze the EEG signal between 0.5 to 128 Hz, finding significant results in the correlation between single-trial BOLD and EEG activity in the higher gamma range (see Figure 6 panel C). It would be interesting to understand the rationale for including such high frequencies in the signal, and the interpretation of the significant correlation in the high gamma range.

    3. Reviewer #2 (Public Review):

      I think this is a very promising paper. The combination of EEG and fMRI is unique and original. However, I also have some suggestions that I think could help improve the manuscript.

      This manuscript reports the findings of an EEG-fMRI study (n = 50) on the effects of expectations on pain. The combination of EEG with fMRI is extremely original and well-suited to study the transition from expectation to perception. However, I think that the current treatment of the data, as well as the way that the manuscript is currently written, does not fully capitalize on the potential of this unique dataset. Several findings are presented but there is currently no clear message coming out of this manuscript.

      First, one positive point is that the experimental manipulation clearly worked. However, it should be noted that the instructions used are not typical of studies on placebo/nocebo. Participants were not told that the stimulations would be of higher/lower intensity. Rather, they were told that objective intensities were held constant, but that EEG recordings could be used to predict whether they would perceive the stimulus as more or less intense. I think that this is an interesting way to manipulate expectations, but there could have been more justification in the introduction for why the authors have chosen this unusual procedure.

      Also, the introduction mentions that little is known about potential cerebral differences between expectations of high vs. low pain expectations. I think the fear conditioning literature could be cited here. Activations in ACC, SMA, Ins, parahippocampal gyrus, PAG, etc. are often associated with upcoming threat, whereas activations vmPFC/default mode network are associated with safety.

      The fact that the authors didn't observe a clearer distinction between high and low expectations here could be related to their specific instructions that imply that the stimulus is the same and that it is the subjective perception that is expected to change. In any case, this is a relatively minor issue that is easy to address.

      Towards the end of the introduction, the authors present the aims of the study in mainly exploratory terms:<br /> (1) What are the differences between anticipation and perception?<br /> (2) What regions display a difference between high and low expectations (high > low or low < high) vs. an effect of expectation regardless of the direction (high and low different than neutral)?<br /> I think these are good questions, but the authors should provide more justification, or framework, for these questions. More specifically, what will they be able to conclude based on their observations?

      For instance (note that this is just an example to illustrate my point. I encourage the authors to come up with their own framework/predictions) :

      (1) Possibility #1: A certain region encodes expectations in a directed fashion (high > low) and that same region also responds to perception in the same direction (high > low). This region would therefore modulate pain by assimilating perception towards expectations.<br /> (2) Possibility # 2: different regions are involved in expectation and perception. Perhaps this could mean that certain regions influence pain processing through descending facilitation for instance...

      Regarding analyses, I think that examining the transition from expectations to perception is a strong angle of the manuscript given the EGG-fMRI nature of the study. However, I feel that more could have been done here. One problem is that the sequence of analyses starts by identifying an fMRI signal of interest and then attempts to find its EEG correlates. The problem is that the low temporal resolution of fMRI makes it difficult to differentiate expectation from perception, which doesn't make this analysis a good starting point in my opinion. Why not start by identifying an EEG signal that differentiates perception vs expectation, and then look for its fMRI correlates?

      Finally, I found the hypotheses on "valenced" vs. "absolute" effects a little bit more difficult to follow. This is because "neutral" is not really neutral: it falls in between low and high. If I follow correctly, participants know that the temperature is always the same. Therefore, if they are told that the machine cannot predict whether their perception is going to be low or high, then it must be because it is likely to be in between. Ratings of expectation and pain ratings confirm that. The neutral condition is not "devoid" of expectations as the authors suggest. Therefore, it would make sense to look at regions with the following pattern low > neutral > high, or vice-versa, low < neutral < high. Low & high being different than neutral is more difficult to interpret. I don't think that you can say that it reflects "absolute" expectations because neutral is also the expectation of a medium temperature. Perhaps it reflects "certainty/uncertainty" or something like that, but it is not clear that it reflects "expectations".

    1. eLife assessment

      This interesting study explores whether tumor cells can manipulate their Hydra hosts and has useful findings on the consequences for the fitness of the host Hydra.<br /> However, the evidence supporting these findings was incomplete, would benefit from the addition of several control experiments. The work will be of broad interest to many fields including development biology, evolutionary biology and tumor biology.

    2. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, BOUTRY et al examined a cnidarian Hydra model system where spontaneous tumors manifest in laboratory settings, and lineages featuring vertically transmitted neoplastic cells (via host budding) have been sustained for over 15 years. They observed that hydras harboring long-term transmissible tumors exhibit an unexpected augmentation in tentacle count. In addition, the presence of extra tentacles, enhancing the host's foraging efficiency, correlated with an elevated budding rate, thereby promoting tumor transmission vertically. This study provided evidence that tumors, akin to parasitic entities, can also exert control over their hosts.

      Strengths:

      The manuscript is well-written, and the phenotype is intriguing.

      Weaknesses:

      The quality of this manuscript could be improved if more evidence were to be provided regarding the beneficial versus detrimental effects of the tumors.

    3. Reviewer #2 (Public Review):

      Background and Summary:

      This study addresses the intriguing question of whether and how tumors can develop in the freshwater polyp hydra and how they influence the fitness of the animals. Hydra is notable for its significant morphogenetic plasticity and nearly unlimited capacity for regeneration. While its growth through asexual reproduction (budding) and the associated processes of pattern formation have been extensively studied at the cellular level, the occurrence of tumors was only recently described in two strains of Hydra oligactis (Domazet-Lošo et al, 2014). In that research, an arrest in the differentiation of female germ cells led to an accumulation of germline cells that failed to develop into eggs. In hydra, fertile egg cells typically incorporate nurse cells, which originate from large interstitial stem cells (ISCs) restricted to the germline, through apoptosis. However, this increase in apoptosis activity is absent in "germline tumors," and germline ISCs instead form slowly growing patches that do not compromise tissue integrity. Despite the upregulation of certain genes associated with mammalian neoplasms (such as tpt1 and p23) in this tissue, determining whether this differentiation arrest and the resulting egg patches truly constitute neoplasms remains a challenge.

      The authors have recently published two papers on the ecological and evolutionary aspects of hydra tumor formation (Boutry et al 2022, 2023), which is also the focus of this manuscript. They transplanted tissues derived from animals with germline tumors to wildtype animals and analyzed their growth patterns, specifically the number of tentacles in the host tissue. They observed that such tissues induced the growth of additional tentacles compared to tissues without germline tumors. The authors conclude that this growth pattern (increased number of tentacles) is correlated with "reducing the burden on the host by (over-)compensating for the reproductive costs of tumors" and claim that "transmissible tumors in hydra have evolved strategies to manipulate the phenotype of their host". While it might be stimulating to add a fresh view from other disciplines (here, ecological and evolutionary aspects), the authors completely ignore the current knowledge of the underlying cell biology of the processes they analyze.

      Strengths:

      The study focuses on intriguing questions. Whether and how tumors can develop in the freshwater polyp hydra, and how they influence the fitness of the animals?

      Weaknesses:

      Concept of germline tumors.<br /> The conceptual foundation of their experiments on germline tumors was the study of Domazet-Lošo et al (2014) introducing the concept of germline tumors in hydra (see above). While this is an intriguing hypothesis, there has been little advancement in comprehending the molecular mechanisms underlying tumor formation in hydra beyond this initial investigation. Germline tumors in hydra do not fully meet the typical criteria for neoplasms observed in mammalian tissues. More importantly, a similar phenotype was already reported by the work of Paul Brien and described as "crise gametique" (Brien, 1966, Biologie de la reproduction animale - Blastogenèse, Gamétogenèse, Sexualisation, ed. Masson & Cie, Paris). This phenomenon of gametic crisis is unique to Hydra oligactis, a stenotherm, cold-adapted cosmopolitan species. In this species, gametogenesis severely impacts the vitality of the polyps, often leading to complete exhaustion and death (Tardent, 1974). Animals can only be rescued during the initial phase of the cold-induced sexual period (see also the research of Littlefield (1984, 1985, 1986, 1991). The observed arrest in differentiation arrest in germline tumors might represent an epigenetically established consequence of surviving gametogenesis. Regrettably, this important work was not mentioned by the authors or by Domazet-Lošo et al. (2014), highlighting a notable gap in the recognition of basic research in this area that might challenge the hydra tumor hypothesis.

      "Super-nummary" tentacles in graft experiments.<br /> The authors describe that after grafting tissue from animals with germline tumors to wild-type animals, the number of tentacles in the host tissue increased when the donor tissue had germline tumors. A maximum effect of four additional tentacles was found with donor strain H. oligactis robusta and three additional tentacles with donor strain H.oligactis St Petersburg. In general, H.oligactis wild-type host strains had fewer tentacles than H.oligactis St Petersburg strains. This is consistent with the results of Domazet-Lošo et al (2014) who showed that the number of tentacles increased in the strains with germline tumors. What conclusions can be drawn from these experiments? The authors might want to conclude that transmissible tumors in Hydra have developed strategies to manipulate the phenotype of their host. But there is no evidence for this, as essential controls are missing. It is known that the size of hydra polyps is proportion-regulated, i.e. the number of tentacles varies with the size and number of (epithelial) cells. Such controls are missing in the experiments. There is also a lack of controls from wild-type animals in gametogenesis: it is very likely that grafts with wild-type animals with egg spots of comparable size as the germline tumors (see above) will result in similar numbers of tentacles in host tissue.

    1. eLife assessment

      This is an important and timely study that advances our understanding of the role of lateral hypothalamic orexin/hypocretin neurons in appetitive approach and consummatory behaviors. Specifically, using fiber photometry, the authors provide solid and convincing evidence that orexin neurons are primarily active during approach and not consummatory behavior, in a manner that is dependent on metabolic state. Further, using optogenetics and cell type-specific electrophysiology, they show that inputs from the ventral pallidum and lateral nucleus accumbens shell to orexin/hypocretin neurons in the lateral hypothalamus are predominantly inhibitory.

    2. Reviewer #1 (Public Review):

      Summary:

      Using fiber photometry, Mitchell et al. report that the calcium activity of lateral hypothalamic orexin neurons increases during the approach to a food pellet in a manner that depends on the metabolic state and begins to return to baseline prior to and during food consumption. This activity is also enhanced during the approach to palatable food relative to a standard chow pellet. They also present ex vivo electrophysiological evidence that GABAergic neurons in the ventral pallidum and lateral nucleus accumbens shell, but not medial nucleus accumbens shell, provide predominantly inhibitory, monosynaptic input onto lateral hypothalamic neurons. Overall, most claims are well supported by the data, though the electrophysiology analysis is somewhat limited and some information that could inform interpretation of the data is lacking.

      Strengths:

      (1) The fiber photometry recordings make use of an isosbestic control, and the signals were aligned using linear regression after baseline correction and calculation of robust z-scores.

      (2) The fiber photometry analyses are based on animal averages, rather than trial-based averages, which can result in Type 1 errors without appropriate measures to account for the influence of the subject.

      (3) Monosynaptic currents from GABAergic inputs from the ventral pallidal and lateral shell are identified by the remaining current in the presence of tetrodotoxin (TTX) and 4-aminopyridine (4-AP).

      Weaknesses:

      (1) The data are not discussed in the context of the prior literature on ventral pallidal GABAergic inputs to the lateral hypothalamus (such as Prasad et al. 2020, JNeurosci) and it is not clear whether these patterns of monosynaptic inhibitory inputs are specific to orexin neurons.

      (2) The paper does not address whether there are synaptic inputs from non-GABAergic ventral pallidum neurons, though very recent work suggests that ventral pallidal projections to the lateral hypothalamus may be enriched with glutamatergic RNA markers relative to other projections (Bernet et al. 2024, JNeurosci). Some statements in the manuscript refer to ventral pallidal inputs in general, despite the use of cell-type specific expression in VGAT-cre mice.

      (3) The statistical analysis of the electrophysiology data is limited and does not appear to account for the lack of independence for cells recorded from the same mouse.

    3. Reviewer #2 (Public Review):

      Summary:

      Mitchell & Mohammadkhani et al. used an Orexin-Cre mouse line with a Cre-dependent GCaMP virus to perform lateral hypothalamic (LH) Ca2+ fiber photometry recordings in mice during the approach to food under various metabolic and saliency conditions. They also used a Vgat-Cre mouse line with Cre-dependent ChR2 in various regions of the ventral striatopallidal (VSP) complex in combination with an Orexin promoter-driven reporter virus labeling Orx-LH neurons to assess electrophysiological connectivity of inhibitory/excitatory inputs from VSP to Orx-LH. Overall, authors note that Orx-LH Ca2+ activation occurs during approach to food (but not consumption of food), and that VSP->Orx-LH connectivity is primarily monosynaptic and inhibitory, although this varies across subregions, with some monosynaptic excitatory input as well. While their methods and analyses are technically sound and the manuscript is clearly written and presented, the further knowledge gained over previous work is rather incremental and does not produce a substantial shift in the current existing framework.

      Strengths:

      Cell type specificity of OX/HT recordings is confirmed by post-hoc immunostaining, both for fiber photometry and electrophysiological connectivity. This is an important strength given the contentious history of cell specificity in various transgenic OX/HT mouse lines.

      Clearly implicating metabolic state and food saliency as factors impacting OX/HT activity dynamics is a strength, and linking the influence of ghrelin receptor signaling is relatively novel.

      Weaknesses:

      In fiber photometry traces, OX/HT activity begins increasing 2-3 seconds prior to the food approach (Figures 1F and 1G), requiring an explanation. One possibility is that mice may be detecting odorant cues indicative of food prior to the physical approach.

      Figure 1F - the authors' interpretation that OX/HT activity doesn't actually decrease during consumption, but simply "trends toward baseline" is complicated by the fact that the authors shaded 20s-30s intervals labeled "eating". Mice do not typically consume food for 20-30s nonstop. Mice typically consume for ~1-5 seconds, then they take a break, then they resume.

      The authors state in the Discussion "... the reduction in OX/HT cell activity was more closely correlated with the termination of approach behavior" (rather than with eating per se). However, in many cases, mice begin consuming food immediately after approaching it, so it is puzzling that there is an activity reduction following the approach, but not an activity reduction upon consumption. In other words, the cessation of approach and the beginning of consumption are often tightly linked together in rapid sequence.

      Figure 2E - the single polysynaptic oIPSC appears to have the same/similar latency as many of the Monosynaptic oIPSCs. Close proximity of consecutive oIPSCs may affect the analysis of amplitude and latency. For example, in representative traces of Figure 2C, it is unlikely to get an accurate measure of the second oIPSC.

      The comparison of apparent connectivity differences between VP vs. mNAcSh vs. lNAcSh is limited by appropriate anatomical quantification and demonstration. When using a Vgat-Cre mouse line and targeting the VSP, there is the potential for massive viral spread across the entire Nucleus accumbens/VP/SI/BNST area.

      How do the electrophysiological properties of OX/HT neurons (and VSP inputs) change across metabolic/saliency states? For example, under High Fat Diet, chronic Food Restriction, and chronic Ghrelin. This seems to be the fundamental question that the authors are working toward, but it is not resolved with the current data set.

      Potential Ephys Pitfall: a high Chloride internal solution means that oEPSCs might actually be GABAergic after all. Low Chloride solution, so Cl reversal potential is closer to RMP (or put more Chloride in pipette so it has more depolarized potential than resting- to reverse current mediated by Chloride ions). However, the internal solution used for oEPSCs was calculated to have a Cl reversal potential at ~ -20mV; thus, the Cl-mediated PSCs would be depolarizing when cells were held at -65mV. Did the authors apply any blockers in the bath to confirm that recorded oEPSCs were glutamatergic?

    4. Reviewer #3 (Public Review):

      Summary:

      Orexin/hypocretin (OX/HT) neurons are implicated in food intake and there is evidence supporting OX/HT neurons' role in reward consumption potentially influenced by animal's metabolic state. Here, Mitchell, Mohammadkhani, et al. use fiber photometry to dissociate OX/HT neurons' role in reward-seeking by contrasting their role in reward consumption. Mice were given normal chow or palatable food in a fed or fasted state. The authors recorded GCAMP signals from OX/HT neurons during food approach and consumption. They observed heightened OX/HT GCAMP signals during the food approach; in contrast, they saw the signals decline during arrival at the food source and during food consumption. In a second set of experiments, the authors investigate upstream circuits that could potentially gate OX/HT neurons. They use optogenetics to directly stimulate inhibitory inputs arriving from either the ventral pallidum, the medial, or the lateral nucleus accumbens shell to OX/HT neurons. They investigated if these circuits impinge monosynaptically or polysynaptically onto OX/HT neurons to assess their functional role in inhibiting these neurons. The authors found that the ventral pallidum and the lateral but not medial nucleus accumbens shell exert inhibitory control over OX/HT neurons.

      Strengths:

      The manuscript is well-written, employs suitable statistical analyses, and the conclusions are generally supported by the results.

      Weaknesses:

      Larger group sizes in some instances and causal manipulation of the inhibitory circuits during reward approach vs consumption would enable the authors to make stronger assertions about these circuits' role in gating OX/HT neurons in these behaviors.

    1. Reviewer #1 (Public Review):

      Summary:

      The authors propose that the energy landscape of animals can be thought of in the same way as the fundamental versus realized niche concept in ecology. Namely, animals will use a subset of the fundamental energy landscape due to a variety of factors. The authors then show that the realized energy landscape of eagles increases with age as the animals are better able to use the energy landscape.

      Strengths:

      This is a very interesting idea and that adds significantly to the energy landscape framework. They provide convincing evidence that the available regions used by birds increase with size.

      Weaknesses:

      Some of the measures used in the manuscript are difficult to follow and there is no mention of the morphometrics of birds or how these change with age (other than that they don't change which seems odd as surely they grow). Also, there may need to be more discussion of other ontogenetic changes such as foraging strategies, home range size etc.

    2. Reviewer #2 (Public Review):

      Summary:

      With this work, the authors tried to expand and integrate the concept of realized niche in the context of movement ecology by using fine-scale GPS data of 55 juvenile Golden eagles in the Alps. Authors found that ontogenic changes influence the percentage of area flyable to the eagles as individuals exploit better geographic uplifts that allow them to reduce the cost of transport.

      Strengths:

      Authors made insightful work linking changes in ontogeny and energy landscapes in large soaring birds. It may not only advance the understanding of how changes in the life cycle affect the exploitability of aerial space but also offer valuable tools for the management and conservation of large soaring species in the changing world.

      Weaknesses:

      Future research may test the applicability of the present work by including more individuals and/or other species from other study areas.

    1. eLife assessment

      This important work significantly advances the field of computational modelling of genome organisation through the development of OpenNucleome. The evidence supporting the tool's effectiveness is compelling, as the authors compare their predictions with experimental data. It is anticipated that OpenNucleome will attract significant interest from the biophysics and genomics communities.

    2. Reviewer #1 (Public Review):

      Summary:

      In this paper the authors develop a comprehensive program to investigate the organization of chromosome structures at 100 kb resolution. It is extremely well executed. The authors have thought through all aspects of the problem. The resulting software will be most useful to the community. Interestingly they capture many experimental observations accurately. I have very little complaints.

      Strengths:

      A lot of details are provided. The success of the method is well illustrated. Software is easily available,

      Weaknesses:

      The number of parameters in the energy function is very large. Any justification? Could they simply be the functions?

      What would the modification be if the resolution is increased?

      They should state that the extracted physical values are scale dependent. Example, viscosity.

    3. Reviewer #2 (Public Review):

      Summary:

      In this work, Lao et al. develop an open-source software (OpenNucleome) for GPU-accelerated molecular dynamics simulation of the human nucleus accounting for chromatin, nucleoli, nuclear speckles, etc. Using this, the authors investigate the steady-state organization and dynamics of many of the nuclear components.

      Strengths:

      This is a comprehensive open-source tool to study several aspects of the nucleus, including chromatin organization, interactions with lamins and organization, and interactions with nuclear speckles and nucleoli. The model is built carefully, accounting for several important factors and optimizing the parameters iteratively to achieve experimentally known results. Authors have simulated the entire genome at 100kb resolution (which is a very good resolution to simulate and study the entire diploid genome) and predict several static quantities such as the radius of gyration and radial positions of all chromosomes, and time-dependent quantities like the mean-square displacement of important genomic regions.

      Weaknesses:

      One weakness of the model is that it has several parameters. Some of them are constrained by the experiments. However, the role of every parameter is not clear in the manuscript.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors present OpenNucleome, a computational tool for simulating the structure and dynamics of the human nucleus. The software models nuclear components, including chromosomes and nuclear bodies, and incorporates GPU acceleration for potential performance gains. The authors aim to advance the understanding of nuclear organization by providing a tool that aligns with experimental data and is accessible to the genome architecture research community.

      Strengths:

      OpenNucleome provides a model of the nucleus, contributing to the advancement of computational biology.<br /> Utilizing GPU acceleration with OpenMM may offer potential performance improvements.

      Weaknesses:

      It could still take advantage of clearer explanations regarding the generation and usage of input and output files and compatibility with other tools.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 0: In this paper, the authors develop a comprehensive program to investigate the organization of chromosome structures at 100 kb resolution. It is extremely well executed. The authors have thought through all aspects of the problem. The resulting software will be most useful to the community. Interestingly they capture many experimental observations accurately.

      I have very few complaints.

      We appreciate the reviewer’s strong assessment of the paper’s significance, novelty, and broad interest, and we thank them for the detailed suggestions and comments.

      Comment 1: The number of parameters in the energy function is very large. Is there any justification for this? Could they simplify the functions?

      We extend our gratitude to the reviewer for their insightful remarks. The parameters within our model can be categorized into two groups: those governing chromosome-chromosome interactions and those governing chromosome-nuclear landmark interactions.

      In terms of chromosome-chromosome interactions, the parameter count is relatively modest compared to the vast amount of Hi-C data available. For instance, while the whole-genome Hi-C matrix at the 100KB resolution encompasses approximately 303212 contacts, our model comprises merely six parameters for interactions among different compartments, along with 1000 parameters for the ideal potential. As outlined in the supporting information, the ideal potential is contingent upon sequence separation, with 1000 chosen to encompass bead separations of up to 100MB. While it is theoretically plausible to reduce the number of parameters by assuming interactions cease beyond a certain sequence separation, determining this scale a priori presents a challenge.

      During the parameterization process, we observed that interchromosomal contacts predicted solely based on compartmental interactions inadequately mirrored Hi-C data. Consequently, we introduced 231 additional parameters to more accurately capture interactions between distinct pairs of autosomes. These interactions may stem from factors such as non-coding RNA or proteins not explicable by simple, non-specific compartmental interactions.

      Regarding parameters concerning chromosome-nuclear landmark interactions, we have 30321 parameters for speckles and 30321 for the nuclear lamina. To streamline the model, we opted to assign a unique parameter to each chromatin bead. However, it is conceivable that many chromatin beads share a similar mechanism for interacting with nuclear lamina or speckles, potentially allowing for a common parameter assignment. Nonetheless, implementing such simplification necessitates a deeper mechanistic understanding of chromosome-nuclear landmark interactions, an aspect currently lacking.

      As our comprehension of nuclear organization progresses, the interpretability of parameter counts may improve, facilitating their reduction.

      Comment 2: What would the modification be if the resolution is increased?

      To increase the resolution of chromatin, we can in principle keep the same energy function as defined in Eq. S6. In this case, we only need to carry out further parameter optimization.

      However, transitioning to higher resolutions may unveil additional features not readily apparent at 100kb. Notably, chromatin loops with an average size of 200kb or smaller have been identified in high-resolution Hi-C data [1]. To effectively capture these loops, new terms in the energy function must be incorporated. For instance, Qi and Zhang [2] employed additional contact potentials between CTCF sites to account for loop formation. Alternatively, an explicit loop-extrusion process could be introduced to model loop formation more accurately.

      Comment 3: They should state that the extracted physical values are scale-dependent. For example, viscosity.

      We thank the reviewer for the comment and would like to clarify that our model does not predict the viscosity. The nucleoplasmic viscosity was set as 1Pa · s to produce a diffusion coefficient that reproduces experimental value. The exact value for the nucleoplasmic viscosity is still rather controversial, and our selected value falls in the range of reported experimental values from 10−1Pa·s to 102Pa · s.

      We have modified the main text to clarify the calculation of the diffusion coefficient.

      “The exponent and the diffusion coefficient Dα = (27±11)×10−4μm2 · s−α both match well with the experimental values [cite], upon setting the nucleoplasmic viscosity as 1Pa · s (see Supporting Information Section: Mapping the reduced time unit to real time for more details).”

      Reviewer 2:

      Comment 0: In this work, Lao et al. develop an open-source software (OpenNucleome) for GPU-accelerated molecular dynamics simulation of the human nucleus accounting for chromatin, nucleoli, nuclear speckles, etc. Using this, the authors investigate the steady-state organization and dynamics of many of the nuclear components.

      We thank the reviewer for summary of our work.

      Comment 1: The authors could introduce a table having every parameter and the optimal parameter value used. This would greatly help the reader.

      We would like to point out that model parameters are indeed provided in Table S1, S2, S3, S4, and Fig. S7. In these tables, we further provided details on how the parameters were determined.

      Given the large number of parameters for the ideal potential (1000), we opted to plot it rather than listing out all the numbers. We added three new figures to plot the interaction parameters between chromosomes, between chromosomes and speckles, and between chromosomes and the nuclear lamina. Numerical values can be found online in the GitHub repository (parameters).

      Comment 2: How many total beads are simulated? Do all beads have the same size?

      The total number of the coarse-grained beads is 70542, including 60642 chromatin beads, 300 nucleolus beads, 1600 speckle beads, and 8000 nuclear lamina beads. The radius of the chromatin, nucleolus, and speckle beads is 0.25, while that of the lamina bead is 0.5. More information of the size and number of the beads are discussed in the Section: Components of the whole nucleus model.

      Comment 3: In Equation S17, what is the 3rd and 4th powers mean? What necessitates it?

      The potential defined in Equation S17 follows the definition of class2 bond in the LAMMPS package (LAMMPS docs). Compared to a typical harmonic potential, the presence of higher order terms produces sharper increase in the energy at large distances (Author response image 1). This essentially reduces the flucatuation of bond length in simulations.

      Author response image 1.

      Comparison between the Class2 potential (defined in Eq. S17) and the Harmonic potential (K(r − r0)2, with K = 20 and r0 = 0.5).

      Comment 4: What do the X-axis and Y-axis numbers in Figure 5A and 5B mean? What are their units?

      We apologize for the lack of clarify in our original figure. In Fig. 5A, the X and Y axis depicts the simulated and experimental radius of gyration (Rg) for individual chromosomes, as indicated in the title of the figure. Similarly, in Fig. 5B, the X and Y axis depicts the simulated and experimental radial position of individual chromosomes.

      We have converted the chromosome Rg values into reduced units and labeled the corresponding axes in the updated figure (Fig. 5). The normalized radial position is unitless and its detailed definition is included in the supporting information Section: Computing simulated normalized chromosome radial positions. We updated the figure caption to provide an explicit reference to the SI text.

      Reviewer 3:

      Comment 0: In this work, the authors present the development of OpenNucleome, a software for simulating the structure and dynamics of the human nucleus. It provides a detailed model of nuclear components such as chromosomes and nuclear bodies, and uses GPU acceleration for better performance based on the OpenMM package. The work also shows the model’s accuracy in comparisons with experimental data and highlights the utility in the understanding of nuclear organization. While I consider this work a good tool for the genome architecture scientific community, I have some comments and questions that could further clarify the usage of this tool and help potential users. I also have a few questions that would help to clarify the technique and results and some suggestions for references.

      We appreciate the reviewer’s strong assessment of the paper’s significance, novelty, and broad interest, and we thank them for the detailed suggestions and comments.

      Comment 1: Could the authors elaborate on what they consider to be ’well-established and easily adoptable modeling tools’?

      By well established, we meant that models that have been extensively validated and verified, and are highly regarded by the community.

      By easily adoptable, we meant that tools that are well documented and can be relatively easily learned by new groups without help from the developers.

      We have revised the text to clarify our meaning.

      “Despite the progress made in computational modeling, the absence of well-documented software with easy-to-follow tutorials pose a challenge.”

      Comment 2: Recognizing the value of a diverse range of tools in the community, the Open-MiChroM tool is also an open-source platform built on top of OpenMM. The documentation shows various modeling approaches and many tutorials that contain different approaches besides the MiChroM energy function. How does OpenNucleome compare in terms of facilitating crossvalidation and user accessibility? The two tools seem to be complementary, which is a gain to the field. I recommend adding one or two sentences in the matter. Also, while navigating the OpenNucleome GitHub, I have not found the tutorials mentioned in the text. I also consider a barrier in the process of generating necessary input files. I would suggest expanding the tutorials and documentation to help potential users.

      We thank the reviewer for the excellent comments. We agree that while many of the tutorials were included in the original package, they were not as clearly documented. We have revised them extensively to to now present:

      • A tutorial for optimizing chromosome chromosome interactions.

      • A tutorial for optimizing chromosome nuclear landmark interactions.

      • A tutorial for building initial configurations.

      • A tutorial for relaxing the initial configurations.

      • A tutorial for selecting the initial configurations.

      • A tutorial for setting up performing Langevin dynamics simulations.

      • A tutorial for setting up performing Brownian dynamics simulations.

      • A tutorial for setting up performing simulations with deformed nucleus.

      • A tutorial for analyzing simulation trajectories.

      • A tutorial for introducing new features to the model.

      These tutorials and our well-documented and open source code (https://zhanggroup-mitchemistry.github.io/OpenNucleome) should significantly promote user accessibility. Our inclusion of python scripts for analyzing simulation trajectorials shall allow users to compute various quantities for evaluating and comparing model quality.

      We added a new paragraph in the Section: Conclusions and Dicussion of the main text to compare OpenNucleosome with existing software for genome modeling.

      “Our software enhances the capabilities of existing genome simulation tools [cite]. Specifically, OpenNucleome aligns with the design principles of Open-MiChroM [cite], prioritizing open-source accessibility while expanding simulation capabilities to the entire nucleus. Similar to software from the Alber lab [cite], OpenNucleome offers highresolution genome organization that faithfully reproduces a diverse range of experimental data. Furthermore, beyond static structures, OpenNucleome facilitates dynamic simulations with explicit representations of various nuclear condensates, akin to the model developed by [citet].”

      Comment 3: Lastly, I would appreciate it if the authors could expand their definition of ’standardized practices’.

      We apologize for any confusion caused. By ”standardized practices,” we refer to the fact that different groups often employ unique procedures for structural modeling. These procedures differ in the representation of chromosomes, the nucleus environment, and the algorithms for parameter optimization. This absence of a consensus on the optimal practices for genome modeling can be daunting for newcomers to the field.

      We have revised the text to the following to avoid confusion:

      “Many research groups develop their own independent software, which complicates crossvalidation and hinders the establishment of best practices for genome modeling [3–5].”

      Comment 4: On page 7, the authors refer to the SI Section: Components of the whole nucleus model for further details. Could the authors provide more information on the simulated density of nuclear bodies? Is there experimental data available that details the ratio of chromatin to other nuclear components, which was used as a reference in the simulation?

      We thank the reviewer for the comment. Imaging studies have provided quantitative measures about the size and number of various nuclear bodies. For example, there are 2 ∼ 5 nucleoli per nucleus, with the typical size RNo ≈ 0.5μm [6–10]. In the review by Spector and Lamond [11], the authors showed that there are 20 ∼ 50 speckles, with the typical size RSp ≈ 0.3μm. We used these numbers to guide our simulation of nuclear bodies. These information was mentioned in the Section: Chromosomes as beads on the string polymers of the supporting information.

      The chromatin density is fixed by the average size of chromatin bead and the nucleus size. We chose the size of chromatin based on imaging studies as detailed in the Subsection: Mapping chromatin bead size to real unit of the supporting information. Upon fixing the bead size, the chromatin volume is determined.

      Comment 5: In the statement, ’the ideal potential is only applied for beads from the same chromosome to approximate the effect of loop extrusion by Cohesin molecules for chromosome compaction and territory formation,’ it would be helpful if the authors could clarify the scope of this potential. Specifically, the code indicates that the variable ’dend ideal’ is set at 1000, suggesting an interaction along a 100Mb polymer chain at a resolution of 100Kb per bead. Could the authors elaborate on their motivation for the Cohesin complex’s activity having a significant effect over such long distances within the polymer chain?

      We thank the reviewer for the insight comment. They are correct that the ideal potential was introduced to capture chromosome folding beyond the interactions between compartments, including loop extrusion. Practically, we parameterized the ideal potential such that the simulated average contact probabilities as a function of sequence separation match the experimental values. The reviewer is correct that beyond a specific value of sequence separation, one would expect the impact of loop extrusion on chromosome folding should be negligible, due to Cohesin dissociation. Correspondingly, the interaction potential should be zero at large sequence separations.

      However, it is important to note that the precise separation scale cannot be known a priori. We chose 100Mb as a conservative estimation. However, as we can see from Fig. S7, our parameterization scheme indeed produced interaction parameters are mainly zero at large sequence separations. Interesting, the scale at which the potential approaches 0 (∼ 500KB), indeed agree with the estimated length traveled by Cohesin molecules before dissociation [12].

      Comment 6: On pages 8 and 9, the authors discuss the optimization process. However, in reviewing the code and documentation available on the GitHub page, I could not find specific sections related to the optimization procedure described in the paper. In this context, I have a few questions: Could the authors provide more details or direct me to the parts of the documentation and the text/SI that address the optimization procedure used in their study? Additional clarification on the cost/objective function employed during the optimization process would be highly beneficial, as this was not readily apparent in the text.

      We thank the reviewer for the comment. We revised the SI to include the definition of the cost function for the Adam optimizer.

      “During the optimization process, our aim was to minimize the disparity between experimental findings and simulated data. To achieve this, we defined the cost function as follows:

      where the index i iterates over all the constraints defined in Eq. S28.”

      The detailed optimization procedure was included in the SI as quoted below

      “The details of the algorithm for parameter optimization are as follows

      (1) Starting with a set of values for and we performed 50 independent 3-million-step long MD simulations to obtain an ensemble of nuclear configurations. The 500K steps of each trajectory are discarded

      as equilibration. We collected the configurations at every 2000 simulation steps from the rest of the simulation trajectories to compute the ensemble averages defined on the left-hand side of Eq. S13.

      (2) Check the convergence of the optimization by calculating the percentage of error

      defined as . The summation over i includes all the average contact probabilities defined in Eq. S28.

      (3) If the error is less than a tolerance value etol, the optimization has converged, and we stop the simulations. Otherwise, we update the parameters, α, using the Adam optimizer [13]. With the new parameter values, we return to step one and restart the iteration.”

      Previously, the optimization code was included as part of the analysis folder. To avoid confusion and improve readability, a separate folder named optimization has been created. This folder provides the Adam optimization of chromosome-chromosome interactions (chr-chr optimization) and chromosome-nuclear landmarks interactions (chr-NL optimization).

      Comment 7: What was the motivation for choosing the Adam algorithm for optimization? Adam is designed for training on stochastic objective functions. Could the authors elucidate on the ’stochastic’ aspect of their function to be optimized? Why the Adam algorithm was considered the most appropriate choice for this application?

      We thank the reviewer for the comment. As defined in Eq. R1, the cost function measures the difference between the simulated constraints with corresponding experimental values. The estimation of simulation values, by averaging over an ensemble of chromosome configurations, is inherently noisy and stochastic. Exact ensemble averages can only be achieved with unlimited samples obtained from infinite long simulations.

      In the past, we have used the Newton’s method for parameterization, and the detailed algorithm can be found in the SI of Ref. 14. However, we found that Adam is more efficient as it is a first-order approximation method. The Newton’s method, on the other hand, is second-order approximation method and requires estimation of the Hessian matrix. When the number of constraints is large, as is in our case, the computational cost for estimating the Hessian matrix can be significant. Another advantage of the Adam algorithm lies in its adjustment of the learning rate along the optimization to further speedup convergence.

      Comment 8: The authors mention that examples of setting up simulations, parameter optimization, and introducing new features are provided in the GitHub repository. However, I was unable to locate these examples. Could the authors guide me to these specific resources or consider adding them if they are not currently available?

      We thank the reviewer for the comment. We have improved the GitHub repository and all the tutorials can be found using the links provided in Response to Comment 2.

      Comment 9: Furthermore, the paper states that ’a configuration file that provides the position of individual particles in the PDB file format is needed to initialize the simulations.’ It would be beneficial for new users if the authors could elaborate on how this file is generated. And all other input files in general. Detailing the procedures for a new user to run their system using OpenNucleome would be helpful.

      We thank the reviewer for the comment. The procedure for generating initial configurations was explained in the SI Section: Initial configurations for simulations and quoted below.

      “We first created a total of 1000 configurations for the genome by sequentially generating the conformation of each one of the 46 chromosomes as follows. For a given chromosome, we start by placing the first bead at the center (origin) of the nucleus. The positions of the following beads, i, were determined from the (i − 1)-th bead as . v is a normalized random vector, and 0.5 was selected as the bond length between neighboring beads. To produce globular chromosome conformations, we rejected vectors, v, that led to bead positions with distance from the center larger than 4σ. Upon creating the conformation of a chromosome i, we shift its center of mass to a value ri com determined as follows. We first compute a mean radial distance, with the following equation

      where Di is the average value of Lamin B DamID profile for chromosome i. Dhi and Dlo represent the highest and lowest average DamID values of all chromosomes, and 6σ and 2σ represent the upper and lower bound in radial positions for chromosomes. As shown in Fig. S6, the average Lamin B DamID profiles are highly correlated with normalized chromosome radial positions as reported by DNA MERFISH [cite], supporting their use as a proxy for estimating normalized chromosome radial positions. We then select as a uniformly distributed random variable within the range . Without loss of generality, we randomly chose the directions for shifting all 46 chromosomes.

      We further relaxed the 1000 configurations to build more realistic genome structures. Following an energy minimization process, one-million-step molecular dynamics (MD) simulations were performed starting from each configuration. Simulations were performed with the following energy function

      where UGenome is defined as in Eq. S7. UG-La is the excluded volume potential between chromosomes and lamina, i.e, only the second term in Eq. S24. Parameters in UGenome were from a preliminary optimization. The end configurations of the MD simulations were collected to build the final configuration ensemble (FCE).”

      The tutorial for preparing initial configurations can be found at this link.

      Comment 10: In the section discussing the correlation between simulated and experimental contact maps, as referenced in Figure 4A and Figure S2, the authors mention a high degree of correlation. Could the authors specify the exact value of this correlation and explain the method used for its computation? Considering that comparing two Hi-C matrices involves a large number of data points, it would be helpful to know if all data points were included in this analysis.

      We have updated Fig 4A and S2 to include Pearson correlation coefficients next to the contact maps. The reviewer is correct in that all the non-redundant data points of the contact maps are included in computing the correlation coefficients.

      For improved clarity, we added a new section in the supporting information to detail the calculations. The section is titled Computing Pearson correlation coefficients between experimental and simulated contact maps, and the relevant text is quoted below.

      “We computed the Pearson correlation coefficients (PCC) between experimental and simulated contact maps in Fig. 4A and Fig. S2 as

      xi and yi represent the experimental and simulated contact probabilities, and n is the total number of data points. Only non-redundant data points, i.e., half of the pairwise contacts, are used in the PCC calculation.”

      Comment 11: In addition, the author said: ”Moreover, the simulated and experimental average contact probabilities between pairs of chromosomes agree well, and the Pearson correlation coefficient between the two datasets reaches 0.89.” How does this correlation behave when not accounting for polymer compaction or scaling? An analysis presenting the correlation as a function of genomic distance would be interesting.

      Author response image 2.

      Pearson correlation coefficient between experimental and simulated contact probabilities as a function of the sequence separation within specific chromosomes. For each chromosome, we first gathered a set of experimental contacts alongside a matching set of simulated ones for genomic pairs within a particular separation range. The Pearson correlation coefficient at the corresponding sequence separation was then determined using Equation R4. We limited the calculations to half of the chromosome length to ensure the availability of sufficient data.

      We thank the reviewer for the comment. The analysis presenting the correlation as a function of genomic distance (sequence separation) for each chromosome is shown in Figure S12 and also included in the SI. While the correlation coefficients decreases at larger separation, the values around 0.5 is quite reasonable and comparable to results obtained using Open-Michrom.

      We also computed the correlation of whole genome contact maps after excluding intra-chromosomal contacts. The PCC decreased from 0.89 to 0.4. Again, the correlation coefficient is quite reasonable considering that these contacts are purely predicted by the compartmental interactions and were not directly optimized.

      Comment 12: I recommend using the web-server that is familiar to the authors to benchmark the OpenNucleome tool/model: ”3DGenBench: A Web-Server to Benchmark Computational Models for 3D Genomics.” Nucleic Acids Research, vol. 50, no. W1, July 2022, pp. W4-12.

      We appreciate the reviewer’s suggestion. Unfortunately, the website is no longer active during the time of the revision. However, as detailed in Response to comment 11, we used the one of the popular metrics to exclude polymer compact effect and evaluate the agreement between simulation and experiments.

      Comment 13: Regarding the comparison of simulation results with microscopy data from reference 34. Given their different resolutions and data point/space groupings, how do the authors align these datasets? Could the authors describe how they performed this comparison? How were the radial positions calculated in both the simulations and experiments? Since the data from reference 34 indicates a non-globular shape of the nucleus; how did this factor into the calculation of radial distributions?

      We thank the reviewer for the comment and apologize for the confusion. First, the average properties we examined, including radial positions and interchromosomal contacts, were averaged over all genomic loci. Therefore, they are independent of data resolution.

      Secondly, instead of calculating the absolute radial positions, which are subject to variations in nucleus shape and size, we defined the normalized radial positions. They measure the ratio between the distance from the nucleus center to the chromosome center and the distance from the nucleus center to the lamina. This definition was frequently used in prior imaging studies to measure chromosome radial positions.

      The calculation of the simulated normalized radial positions and the experimental normalized radial positions are discussed in the Section: Computing simulated normalized chromosome radial positions

      “For a given chromosome i, we first determined its center of mass position denoted as Ci. Starting from the center of the nucleus, O, we extend the the vector vOC to identify the intersection point with the nuclear lamina as Pi. The normalized chromosome radial position i is then defined as , where ||·|| represents the L2 norm.

      and Section: Computing experimental normalized chromosome radial positions.

      “We followed the same procedure outlined in Section: Computing simulated normalized chromosome radial positions to compute the experimental values. To determine the center of the nucleus using DNA MERFISH data, we used the algorithm, minimum volume enclosing ellipsoid (MVEE)[15], to fit an ellipsoid for each genome structure. The optimal ellipsoid defined as is obtained by optimizing subjecting to the constraint that . xi correspond to the list of chromatin positions determined experimentally.”

      Comment 14: In the sentence: ”It is evident that telomeres exhibit anomalous subdiffusive motion.” I recommend mentioning the work ”Di Pierro, Michele, et al., ”Anomalous Diffusion, Spatial Coherence, and Viscoelasticity from the Energy Landscape of Human Chromosomes.” Proceedings of the National Academy of Sciences, vol. 115, no. 30, July 2018, pp. 7753-58.”.

      We have revised the sentence to include the citation as follows.

      “In line with previous research [cite], telomeres display anomalous subdiffusive motion. When fitted with the equation , these trajectories yield a spectrum of α values, with a peak around 0.59.”

      Comment 15: Regarding the observation that ’chromosomes appear arrested and no significant changes in their radial positions are observed over timescales comparable to the cell cycle,’ could the authors provide more details on the calculations or analyses that led to this conclusion? Specifically, information on the equilibration/relaxation time of chromosome territories relative to rearrangements within a cell cycle would be interesting.

      Our conclusion here was mostly based on the time trace of normalized radial positions shown in Figure 6A of the main text. Over the timescale of an entire cell cycle (24 hours), the relatively little to no changes in the radial positions supports glassy dynamics of chromosomes. We further determined the mean squared displacement (MSD) for chromosome center of masses. As shown in the left panel of Fig. S12, the MSDs are much smaller than the average size of chromosomes (see Rg values in Fig. 5A), supporting arrested dynamics.

      We further computed the auto-correlation function of the normalized chromosome radial position as

      where t indexes over the trajectory frames and ¯r is the mean position. As shown in Fig. S12, the positions are not completely decorrelated over 10 hours, again supporting slow dynamics. It would be interesting to examine the relaxation timescale more closely in future studies.

      Comment 16: The authors also comment on the SI ”Section: Initial configurations for simulations provides more details on preparing the 1000 initial configurations.” and related to reference 34 mentioning that ”the average Lamin B DamID profiles are highly correlated with chromosome radial positions as reported by DNA MERFISH”. How do the authors account for situations where homologous chromosomes are neighbors or have an interacting interface? Ref. 34 indicates that distinguishing between these scenarios can be challenging, potentially leading to ’invalid distributions’ that are filtered out. Clarification on how such cases were handled in the simulations would be helpful.

      We would like to first clarify that when comparing with experimental data, we averaged over the homologous chromosomes to obtain haploid data. We added the following text in the manuscript to emphasize this point

      “Given that the majority of experimental data were analyzed for the haploid genome, we adopted a similar approach by averaging over paternal and maternal chromosomes to facilitate direct comparison. More details on data analysis can be found in the Supporting Information Section: Details of simulation data analysis.”

      Furthermore, we used the processed DNA MERFISH data from the Zhuang lab, which unambiguously assigns a chromosome ID to each data point. Therefore, the issue mentioned by the reviewer is not present in the procssed data. In our simulations, since we keep track of the explicit connection between genomic segments, the trace of individual chromosomes can be determined for any configuration. Therefore, there is no ambiguity in terms of simulation data.

      Comment 17: When discussing the interaction with nuclear lamina and nuclear envelop deformation, I suggest mentioning the following studies: The already cited ref 52 and ”Contessoto, Vin´ıcius G., et al. ”Interphase Chromosomes of the Aedes Aegypti Mosquito Are Liquid Crystalline and Can Sense Mechanical Cues.” Nature Communications, vol. 14, no. 1, Jan. 2023, p. 326.”

      We updated the text to include the suggested reference.

      “Numerous studies have highlighted the remarkable influence of nuclear shape on the positioning of chromosomes and the regulation of gene expression [16, 17].”

      Comment 18: The authors state that ’Tutorials in the format of Python Scripts with extensive documentation are provided to facilitate the adoption of the model by the community.’ However, as I mentioned, the documentation appears to be limited, and the available tutorials could benefit from further expansion. I suggest that the authors consider enhancing these resources to better assist users in adopting and understanding the model.

      As detailed in the Response to Comment 2, we have updated the GitHub repository to better document the included Jupyter notebooks and tutorials.

      Comment 19: In the Methods section, the authors discuss using Langevin dynamics for certain simulations and Brownian dynamics for others. Could the authors provide more detailed reasoning behind the choice of these different dynamics for different aspects of the simulation? Furthermore, it would be insightful to know how the results might vary if only one of these dynamics was utilized throughout the study. Such clarification would help in understanding the implications of these methodological choices on the outcomes of the simulations.

      We thank the reviewer for the comment. As detailed in the supporting information Section: Mapping the Reduced Time Unit to Real Time, the Brownian dynamics simulations provide a rigorous mapping to the biological timescale. By choosing a specific value for the nucleoplasmic viscosity, we determined the time unit in simulations as τ = 0.65s. With this time conversion, the simulated diffusion coefficients of telomeres match well with experimental values. Therefore, Brownian dynamics simulations are recommended for computing time dependent quantities and the large damping coefficients mimics the complex nuclear environment well.

      On the other hand, the large damping coefficient slows down the configuration relaxation of the system significantly. For computing equilibrium statistical properties, it is useful to use a small coefficient and the Langevin integrator with large time steps to facilitate conformational relaxation.

      References

      [1] Rao, S. S.; Huntley, M. H.; Durand, N. C.; Stamenova, E. K.; Bochkov, I. D.; Robinson, J. T.; Sanborn, A. L.; Machol, I.; Omer, A. D.; Lander, E. S.; others A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 2014, 159, 1665–1680.

      [2] Qi, Y.; Zhang, B. Predicting three-dimensional genome organization with chromatin states. PLoS computational biology 2019, 15, e1007024.

      [3] Yildirim, A.; Hua, N.; Boninsegna, L.; Zhan, Y.; Polles, G.; Gong, K.; Hao, S.; Li, W.; Zhou, X. J.; Alber, F. Evaluating the role of the nuclear microenvironment in gene function by population-based modeling. Nature Structural & Molecular Biology 2023, 1–14.

      [4] Junior, A. B. O.; Contessoto, V. G.; Mello, M. F.; Onuchic, J. N. A scalable computational approach for simulating complexes of multiple chromosomes. Journal of molecular biology 2021, 433, 166700.

      [5] Fujishiro, S.; Sasai, M. Generation of dynamic three-dimensional genome structure through phase separation of chromatin. Proceedings of the National Academy of Sciences 2022, 119, e2109838119.

      [6] Caragine, C. M.; Haley, S. C.; Zidovska, A. Nucleolar dynamics and interactions with nucleoplasm in living cells. Elife 2019, 8, e47533.

      [7] Brangwynne, C. P.; Mitchison, T. J.; Hyman, A. A. Active liquid-like behavior of nucleoli determines their size and shape in Xenopus laevis oocytes. Proceedings of the National Academy of Sciences 2011, 108, 4334–4339.

      [8] Farley, K. I.; Surovtseva, Y.; Merkel, J.; Baserga, S. J. Determinants of mammalian nucleolar architecture. Chromosoma 2015, 124, 323–331.

      [9] Qi, Y.; Zhang, B. Chromatin network retards nucleoli coalescence. Nature Communications 2021, 12, 6824.

      [10] Caragine, C. M.; Haley, S. C.; Zidovska, A. Surface fluctuations and coalescence of nucleolar droplets in the human cell nucleus. Physical review letters 2018, 121, 148101.

      [11] Spector, D. L.; Lamond, A. I. Nuclear speckles. Cold Spring Harbor perspectives in biology 2011, 3, a000646.

      [12] Banigan, E. J.; Mirny, L. A. Loop extrusion: theory meets single-molecule experiments. Current opinion in cell biology 2020, 64, 124–138.

      [13] Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 2014,

      [14] Zhang, B.; Wolynes, P. G. Topology, structures, and energy landscapes of human chromosomes. Proceedings of the National Academy of Sciences 2015, 112, 6062–6067.

      [15] Moshtagh, N.; others Minimum volume enclosing ellipsoid. Convex optimization 2005, 111, 1–9.

      [16] Brahmachari, S.; Contessoto, V. G.; Di Pierro, M.; Onuchic, J. N. Shaping the genome via lengthwise compaction, phase separation, and lamina adhesion. Nucleic Acids Res. 2022, 50, 1–14.

      [17] Contessoto, V. G.; Dudchenko, O.; Aiden, E. L.; Wolynes, P. G.; Onuchic, J. N.; Di Pierro, M. Interphase chromosomes of the Aedes aegypti mosquito are liquid crystalline and can sense mechanical cues. Nature Communications 2023, 14, 326.

    1. eLife assessment

      This important work provides another layer of regulatory mechanism for TGF-beta signaling activity. The evidence supports the involvement of microtubules as a reservoir of Smad2/3, however, additional evidence to convincingly demonstrate the functional involvement of Rudhira in this process is highly appreciated. The work will be of broad interest to developmental biologists in general and molecular biologists in the field of growth factor signaling.

    2. Reviewer #1 (Public Review):

      Summary

      This manuscript aimed to study the role of Rudhira (also known as Breast Carcinoma Amplified Sequence 3), an endothelium-restricted microtubules-associated protein, in regulating of TGFβ signaling. The authors demonstrate that Rudhira is a critical signaling modulator for TGFβ signaling by releasing Smad2/3 from cytoskeletal microtubules and how Rudhira is a Smad2/3 target gene. Taken together, the authors provide a model of how Rudhira contributes to TGFβ signaling activity to stabilize the microtubules, which is essential for vascular development.

      Strengths

      The study used different methods and techniques to achieve aims and support conclusions, such as Gene Ontology analysis, functional analysis in culture, immunostaining analysis, and proximity ligation assay. This study provides an unappreciated additional layer of TGFβ signaling activity regulation after ligand-receptor interaction.

      Weaknesses

      (1) It is unclear how current findings provide a better understanding of Rudhira KO mice, which the authors published some years ago.<br /> (2) Why do they use HEK cells instead of SVEC cells in Figure 2 and 4 experiments?<br /> (3) A model shown in Figure 5E needs improvement to grasp their findings easily.

    3. Reviewer #2 (Public Review):

      Summary:

      It was first reported in 2000 that Smad2/3/4 are sequestered to microtubules in resting cells and TGF-β stimulation releases Smad2/3/4 from microtubules, allowing activation of the Smad signaling pathway. Although the finding was subsequently confirmed in a few papers, the underlying mechanism has not been explored. In the present study, the authors found that Rudhira/breast carcinoma amplified sequence 3 is involved in the release of Smad2/3 from microtubules in response to TGF-β stimulation. Rudhira is also induced by TGF-β and is probably involved in the stabilization of microtubules in the delayed phase after TGF-β stimulation. Therefore, Rudhira has two important functions downstream of TGF-β in the early as well as delayed phase.

      Strengths:

      This work aimed to address an unsolved question on one of the earliest events after TGF-β stimulation. Based on loss-of-function experiments, the authors identified a novel and potentially important player, Rudhira, in the signal transmission of TGF-β,

      Weaknesses:

      The authors have identified a key player that triggers Smad2/3 released from microtubules after TGF-β stimulation probably via its association with microtubules. This is an important first step for understanding the regulation of Smad signaling, but underlying mechanisms as well as upstream and downstream events largely remain to be elucidated.

      (1) The process of how Rudhira causes the release of Smad proteins from microtubules remains unclear. The statement that "Rudhira-MT association is essential for the activation and release of Smad2/3 from MTs" (lines 33-34) is not directly supported by experimental data.

      (2) The process of how Rudhira is mobilized to microtubules in response to TGF-β remains unclear.

      (3) After Rudhira releases Smad proteins from microtubules, Rudhira stabilizes microtubules. The process of how cells return to a resting state and recover their responsiveness to TGF-β remains unclear.

      This reviewer is also afraid that some of the biochemical data lack appropriate controls and are not convincing enough.

    4. Author response:

      eLife assessment:

      This important work provides another layer of regulatory mechanism for TGF-beta signaling activity. The evidence supports the involvement of microtubules as a reservoir of Smad2/3, however, additional evidence to convincingly demonstrate the functional involvement of Rudhira in this process is highly appreciated. The work will be of broad interest to developmental biologists in general and molecular biologists in the field of growth factor signaling.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      This manuscript aimed to study the role of Rudhira (also known as Breast Carcinoma Amplified Sequence 3), an endothelium-restricted microtubules-associated protein, in regulating of TGFβ signaling. The authors demonstrate that Rudhira is a critical signaling modulator for TGFβ signaling by releasing Smad2/3 from cytoskeletal microtubules and how Rudhira is a Smad2/3 target gene. Taken together, the authors provide a model of how Rudhira contributes to TGFβ signaling activity to stabilize the microtubules, which is essential for vascular development.

      Strengths

      The study used different methods and techniques to achieve aims and support conclusions, such as Gene Ontology analysis, functional analysis in culture, immunostaining analysis, and proximity ligation assay. This study provides an unappreciated additional layer of TGFβ signaling activity regulation after ligand-receptor interaction.

      We thank the reviewer for acknowledging the importance of our study and providing a clear summary of our findings.

      Weaknesses

      (1) It is unclear how current findings provide a better understanding of Rudhira KO mice, which the authors published some years ago.

      Our previous study demonstrated that Rudhira KO mice have a predominantly developmental cardiovascular phenotype that phenocopies TGFβ loss of function (Shetty, Joshi et al., 2018). Additionally, we found that at the molecular level, Rudhira regulates cytoskeletal organization (Jain et al., 2012; Joshi and Inamdar, 2019). Our current study builds upon these previous findings, showing an essential role of Rudhira in maintaining TGFβ signaling and controlling the microtubule cytoskeleton during vascular development. On one hand Rudhira regulates TGFβ signaling by promoting the release of Smads from microtubules, while on the other, Rudhira is a TGFβ target essential for stabilizing microtubules. Thus, our current study provides a molecular basis for Rudhira function in cardiovascular development.

      (2) Why do they use HEK cells instead of SVEC cells in Figure 2 and 4 experiments?

      Our earlier studies have characterized the role of Rudhira in detail using both loss and gain of function methods in multiple cell types (Jain et al., 2012; Shetty, Joshi et al., 2018; Joshi and Inamdar, 2019). As endothelial cells are particularly difficult to transfect, and because the function of Rudhira in promoting cell migration is conserved in HEK cells, it was practical and relevant to perform these experiments in HEK cells (Figures 2 and 4E).

      (3) A model shown in Figure 5E needs improvement to grasp their findings easily.

      We have modified Figure 5E for clarity.

      Reviewer #2 (Public Review):

      Summary

      It was first reported in 2000 that Smad2/3/4 are sequestered to microtubules in resting cells and TGF-β stimulation releases Smad2/3/4 from microtubules, allowing activation of the Smad signaling pathway. Although the finding was subsequently confirmed in a few papers, the underlying mechanism has not been explored. In the present study, the authors found that Rudhira/breast carcinoma amplified sequence 3 is involved in the release of Smad2/3 from microtubules in response to TGF-β stimulation. Rudhira is also induced by TGF-β and is probably involved in the stabilization of microtubules in the delayed phase after TGF-β stimulation. Therefore, Rudhira has two important functions downstream of TGF-β in the early as well as delayed phase.

      Strengths:

      This work aimed to address an unsolved question on one of the earliest events after TGF-β stimulation. Based on loss-of-function experiments, the authors identified a novel and potentially important player, Rudhira, in the signal transmission of TGF-β.

      We thank the reviewer for the critical evaluation and appreciation of our findings.

      Weaknesses:

      The authors have identified a key player that triggers Smad2/3 released from microtubules after TGF-β stimulation probably via its association with microtubules. This is an important first step for understanding the regulation of Smad signaling, but underlying mechanisms as well as upstream and downstream events largely remain to be elucidated.

      We acknowledge that the mechanisms regulating cytoskeletal control of Smad signaling are far from clear, but these are out of scope of this manuscript. This manuscript rather focuses on Rudhira/Bcas3 as a pivot to understand vascular TGFβ signaling and microtubule connections.

      (1) The process of how Rudhira causes the release of Smad proteins from microtubules remains unclear. The statement that "Rudhira-MT association is essential for the activation and release of Smad2/3 from MTs" (lines 33-34) is not directly supported by experimental data.

      We agree with the reviewer’s comment. Although we provide evidence that the loss of Rudhira (and thereby deduced loss of Rudhira-MT association) prevents release of Smad2/3 from MTs (Fig 3C), it does not confirm the requirement of Rudhira-MT association for this. In light of this, we have modified the statement to ‘Rudhira associates with MTs and is essential for the activation and release of Smad2/3 from MTs”.

      (2) The process of how Rudhira is mobilized to microtubules in response to TGF-β remains unclear.

      Our previous study showed that Rudhira associates with microtubules, and preferentially binds to stable microtubules (Jain et al., 2012; Joshi and Inamdar, 2019). Since TGFβ stimulation is known to stabilize microtubules, we hypothesize that TGFβ stimulation increases Rudhira binding to stable microtubules. We have mentioned this in our revised manuscript.

      (3) After Rudhira releases Smad proteins from microtubules, Rudhira stabilizes microtubules. The process of how cells return to a resting state and recover their responsiveness to TGF-β remains unclear.

      We show that dissociation of Smads from microtubules is an early response and stabilization of microtubules is a late TGFβ response. However, we agree that the sequence of these molecular events has not been characterized in-depth in this or any other study, making it difficult to assign causal roles (eg. whether release of Smads from MTs is a pre-requisite for MT stabilization by Rudhira) or reversibility. However, the TGFβ pathway is auto regulatory, leading to increased turnover of receptors and Smads and increased expression of inhibitory Smads, which may recover responsiveness to TGFβ. Additionally, the still short turnover time of stable microtubules (several minutes to hours) may also promote quick return to resting state.

      We have discussed this in our revised manuscript.

    1. Author response:

      eLife assessment

      This important study provides new insight into the dynamics that underlie the development of therapy resistance in prostate cancer by revealing that divergent tumor evolutionary paths occur in response to different treatment timing and that these converge on common resistance mechanisms. The use of barcoded lineage tracing and characterization of isolated tumor clonal populations provides compelling evidence supporting the importance of clonal dynamics in a tumor ecosystem for treatment resistance. Several open questions remain, however, raising the possibility of alternative interpretations of the data set in its current form. Overall, the findings deepen our understanding of prostate cancer evolution and hold promising implications for how drug resistance can be addressed or prevented.

      We are pleased the reviewers found our work reporting distinct evolutionary paths to resistance based on timing of treatment to be important and supported by compelling evidence.  We also acknowledge the need for additional work to clarify some details, particularly regarding the mechanism of clonal cooperativity as a catalyst of resistance.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Lee, Eugine et al. use in vivo barcoded lineage tracing to investigate the evolutionary paths to androgen receptor signaling inhibition (ARSI) resistance in two different prostate cancer clinical scenario models: measurable disease and minimal residual disease. Using two prostate cancer cell lines, LNCaP/AR and CWR22PC, the authors find that in their minimal residual disease models, the outgrowth of pre-existing resistant clones gives rise to ARSI-resistant tumors. Interestingly, in their measurable disease model or post-engraftment ARSI setting, these pre-existing resistant clones are depleted and rather a subset of clones that give rise to the treatment of naïve tumors adapt to ARSI treatment and are enriched in resistant tumors. For the LNCaP/AR cell line, characterization of pre-existing resistant clones in treatment naïve and ARSI treatment settings reveal increased baseline androgen receptor transcriptional output as well as baseline upregulation of glucocorticoid receptor (GR) as the primary driver of pre-existing resistance. Similarly, the authors found induction of high GR expression over long-term ARSI treatment in ARSI-sensitive clones for adaptive resistance to ARSI. For CWR22Pc cells, HER3/NRG1 signaling was the primary driver for ARSI resistance in both measurable disease and minimal residual disease models. Not only were these findings consistent with the authors' previous reports of GR and NRG1/Her3 as the molecular drivers of ARSI resistance in LNCaP/AR and CWR22Pc, respectively, but also demonstrate conserved resistance mechanisms despite pre-existing or adaptive evolutionary paths to resistance. Lastly, the authors show adaptive ARSI resistance is dependent on interclonal cooperation, where the presence of pre-existing resistant clones or "helper" clones is required to promote adaptive resistance in ARSI-sensitive clones.

      Strengths:

      The authors employ DNA barcoding, powerful a tool already demonstrated by others to track the clonal evolution of tumor populations during resistance development, to study the effects of the timing of therapy as a variable on resistance evolution. The authors use barcoding in two cell line models of prostate cancer in two clinical disease scenarios to demonstrate divergent evolutionary paths converging on common resistant mechanisms. By painstakingly isolating clones with barcodes of interest to generate clonal cell lines from the treatment of naïve cell populations, the authors are able to not only characterize pre-existing resistance but also show cooperativity between resistant and drug-sensitive populations for adaptive resistance.

      Weaknesses:

      While the finding that different evolutionary paths result in common molecular drivers of ARSI resistance is novel and unexpected, this work primarily confirms the authors' previous published work identifying the resistance mechanisms in these cell lines. The impact of the work would be greater with additional studies understanding the specific molecular/genetic mechanisms by which cells become resistant or cooperate within a population to give rise to resistant population subclones.

      We agree that additional insights into the mechanism of adaptive resistant and the role of cell-cell cooperativity are clear next steps for this work. We propose to do so through single cell characterization (RNA-seq, ATAC-seq) of tumor evolution in a time course experiment where we can track each clone using expressed barcodes. This will allow us to explore the dynamics of interaction between the "adaptable" and "helper" clones. Unfortunately, the barcode methodology used in this initial report is DNA-based; therefore, a follow-up study using a transcribable barcode library is needed to address these fascinating questions.

      This study would also benefit from additional explanation or exploration of why the two resistance driver pathways described (GR and NRG1/Her3) are cell line specific and if there are genetic or molecular backgrounds in which specific resistance signaling is more likely to be the predominant driver of resistance.

      In the case of NRG1/HER3 pathway mediated resistance, we know that this mechanism requires that the PTEN/PIK3CA pathway be wildtype.  This is the case for the CWR22Pc model described in the manuscript. Furthermore, we have data showing that PTEN deletion in these cells rescues the phenotype, meaning that CWR22Pc cells with PTEN deletion are no longer dependent on NRG1/HER3 signaling for ARSI resistance.

      In contrast, LNCaP/AR cells are PTEN null at baseline and therefore must evolve alternative mechanisms of ARSI resistance. Since our initial identification of the GR mechanism, we and others have extended the finding to additional models (VCaP, LAPC4) (PMID: 24315100; PMID: 28191869). Another recent insight is the importance of RB1 and TP53 status in maintenance of luminal lineage identity during ARSI therapy, and the recognition of lineage plasticity as a resistance mechanism in cell lines/tumor models that lack these two tumor suppressors. In summary, baseline genetics clearly plays a role in which ARSI resistance pathway is  likely to emerge. We will clarify this point in the revision with additional discussion.

      Reviewer #2 (Public Review):

      Summary

      The authors aimed to characterise the evolutionary dynamics that occur during the resistance to androgen receptor signalling inhibition, and how this differs in established tumours vs. residual disease, in prostate cancer. By using a barcoding method, they aimed to both characterise the distribution of clones that support therapy resistance in these settings, while also then being able to isolate said clones from the pre-graft population via single-cell cloning to characterise the mechanisms of resistance and dependency on cooperativity.

      While, interestingly, the timing of combination therapies has been shown to be critical to avoid cross-resistance, the timing of therapy has not been specifically considered as a factor dictating resistance pathways. Additionally, the role of residual disease and dormant populations in driving relapse is of increasing interest, yet a lot remains to be understood of these populations. The question of whether different clinical manifestations of therapy resistance follow similar evolutionary pathways to resistance is therefore interesting and relevant for the field.

      The methods applied are elegant and the body of work is substantial. The proposed divergent evolutionary pathways pose interesting questions, and the findings on cooperativity provide insight. However, whether the model truly reflects minimal residual disease to the extent that the authors suggest may limit the relevance of the findings at this stage. Certain patterns in the DNA barcoding results also call into question whether the results fully support the strong claims of the authors, or whether alternative explanations could exist. While the potential to isolate individual clones in the pre-graft setting is a great strength of the method applied and the isolation of these clones is a huge body of work in itself, the limited number of clones that could be isolated also somewhat limits the validation of the findings.

      Strengths

      Very relevant and interesting question, clear clinical relevance, applying elegant methods that hold the potential to provide a novel understanding of multiple aspects of therapy resistance, through from evolutionary patterns to intracellular and cooperative mechanisms of resistance.

      The text is clearly written, logical, and the structure is easy to follow.

      Weaknesses

      (1) The extent to which the model used truly mimics residual disease

      The main conclusions of the paper are built upon results using a model for minimal residual disease. However, the extent to which this truly recapitulates minimal residual disease, particularly with regard to their focus on the timings of therapy, could be discussed further. If in the clinical setting residual disease occurs following the existence of a tumour and its microenvironment, there might be many aspects of the process that are missed when coinciding treatment with engraftment of a xenograft tumour with pre-castration. If any characterisation of the minimal residual disease was possible (such as histologically or through RNA sequencing), this may help demonstrate in what ways this model recapitulates minimal residual disease.

      We appreciate the reviewer's feedback on this point and acknowledge that the pre-ARSI setting used in our studies is not precisely identical to minimal residual disease (MRD) seen clinically, where a patient typically undergoes primary treatment (radical prostatectomy surgery or local radiotherapy) then relapses with distant disease from micrometastases that were not initially detectable.  Having uncovered a key difference in the path to resistance using our pre-ARSI model, we believe our data provide a strong rationale to invest additional effort in designing newer MRD models that more closely mimic the clinical scenario, perhaps through surgical resection of a primary tumor that could “seed” micrometatases prior to therapy. We will highlight this aspect in our revised manuscript and provide clarity on the limitations and scope of our study.

      (2) Whether the observed enrichment of pre-resistant clones is truly that

      The authors strongly make the case that their barcoding experiments provide evidence for pre-existing resistance in the context of minimal residual disease. However, it seems that the clones enriched in the ARSIR tumours are consistently the most enriched clones in the pregraft. Is it possible that the high selective pressure in the pre-engraftment ARSI condition simply leads to an enrichment of the most populous clones from the pregraft? Whereas in the control setting, the reduced selective pressure at the point of engraftment allows for a wider variety of clones to establish in the tumour?

      The reviewer raises an important point about enrichment of ARSI resistance clones in the pregraft but we do not believe that explains the subsequent in vivo data for the following reasons:

      (1) The two most enriched clones in the Pre-ARSIR tumors are the second and third the most enriched clones in pre-graft, not first (Supplementary figure 1E). If the clones were enriched in resistant tumors based on their abundance in starting population, we expect to find the most enriched clone in the tumor.

      (2) By varying the androgen concentration in the pregraft culture media, we could selectively deplete or enrich the same clones enriched in the Pre-ARSIR tumors in vivo, indicating the enrichment of these clones in the resistant tumors is unlikely to be solely based on their relative frequency in the pregraft (Supplementary figure 2).

      We will clarify these points in the revised manuscript.

      Additionally, is there the possibility that the clones highly enriched in the pregraft are in fact a heterogeneous group of cells bearing the same barcode due to stochastic events in the process of viral transduction? Addressing these questions would greatly improve the study.

      The barcode library was deep sequenced to confirm even distribution of the barcodes before it was transferred from Novartis (PMID: 258491301) and we intentionally used a low multiplicity of infection (MOI) to generate barcode lines to ensure single copy insertion. That said, we cannot entirely rule out the possibility that the second and third most enriched clones in the pregraft originated from the same ancestral clone and subsequently acquired two different barcodes.  We will clarify this point in the revised manuscript.

      (3) The robustness of the subsequent work based on 1-2 pre-resistant clones

      While appreciating the volume of work involved in isolating and culturing individual pre-resistant clones, given the previous point, the conclusions would benefit from very robust validations with these single-cell clones. There are only two clones, and the results seem to focus more on one than the other, for which the data is less convincing. For instance, the Enz IC50 data, which in the case for pre-ARSI R2 is restricted to the supplementary, compares the clones A-D. In Figure S8 B, pre-ARSI R2 is compared to clone B, which is, of the four clones shown in the main figure when compared to R1, the one with the lowest Enz IC50. Therefore, while the resistant clones seem to have a significantly higher Enz IC50, comparing both clones to clones A-D may not have achieved this significance. It would also be useful to know how abundant the resistant clones were in the original barcode experiments.

      We acknowledge that studies relying on 1-2 biological samples indeed have limitations. Given our extensive prior work into the role of GR in the development of ARSI resistance (and that of other labs), we focused on demonstrating that both pre-ARSIR1 and pre-ARSIR2 clones exhibit pre-existing GR expression and are primed to further upregulate GR levels under ARSI conditions, thereby relying on GR function to sustain resistance. Given the redundancy of resistant mechanisms of the two clones, we made efforts to isolate additional clones enriched in Pre-ARSIR tumors. However, despite our attempts, we were unable to identify further clones. Pre-ARSIR1 and pre-ARSIR2 are second and third most enriched clones in pre-graft (2.1% and 1.7% respectively).

      (4) The logic used in the final section requires further explanation

      In the final section, the authors suggest that a pre-ARSIR clone is able to cooperate with a pre-Intact clone to aid adaptive ARSI resistance. If this is true, then could it not be that rare, pre-resistant clones support adaptive resistance in established tumours? And, therefore, the mechanism underlying resistance could be through pre-existing resistant clones in both settings. The work would benefit from a discussion to clarify this discrepancy in the interpretation of the findings. This is particularly necessary given the strong wording the authors use regarding their findings, such as that they have provided 'conclusive evidence' for acquired resistance.

      We agree that rare, pre-resistant clones could support adaptive resistance (and therefore resistance in this adaptive setting could, technically be called “pre-existing”) but it is critical to recognize that these rare, pre-resistant “helper” clones are vastly outnumbered by pre-Intact clones that “acquire” resistance through their “help.” We find this to be fascinating biology and we will clarify this logic in the resubmission, as well as future experimental approaches to unravel the mechanism.

    2. eLife assessment

      This important study provides new insight into the dynamics that underlie the development of therapy resistance in prostate cancer by revealing that divergent tumor evolutionary paths occur in response to different treatment timing and that these converge on common resistance mechanisms. The use of barcoded lineage tracing and characterization of isolated tumor clonal populations provides compelling evidence supporting the importance of clonal dynamics in a tumor ecosystem for treatment resistance. Several open questions remain, however, raising the possibility of alternative interpretations of the data set in its current form. Overall, the findings deepen our understanding of prostate cancer evolution and hold promising implications for how drug resistance can be addressed or prevented.

    3. Reviewer #1 (Public Review):

      Summary:

      Lee, Eugine et al. use in vivo barcoded lineage tracing to investigate the evolutionary paths to androgen receptor signaling inhibition (ARSI) resistance in two different prostate cancer clinical scenario models: measurable disease and minimal residual disease. Using two prostate cancer cell lines, LNCaP/AR and CWR22PC, the authors find that in their minimal residual disease models, the outgrowth of pre-existing resistant clones gives rise to ARSI-resistant tumors. Interestingly, in their measurable disease model or post-engraftment ARSI setting, these pre-existing resistant clones are depleted and rather a subset of clones that give rise to the treatment of naïve tumors adapt to ARSI treatment and are enriched in resistant tumors. For the LNCaP/AR cell line, characterization of pre-existing resistant clones in treatment naïve and ARSI treatment settings reveal increased baseline androgen receptor transcriptional output as well as baseline upregulation of glucocorticoid receptor (GR) as the primary driver of pre-existing resistance. Similarly, the authors found induction of high GR expression over long-term ARSI treatment in ARSI-sensitive clones for adaptive resistance to ARSI. For CWR22Pc cells, HER3/NRG1 signaling was the primary driver for ARSI resistance in both measurable disease and minimal residual disease models. Not only were these findings consistent with the authors' previous reports of GR and NRG1/Her3 as the molecular drivers of ARSI resistance in LNCaP/AR and CWR22Pc, respectively, but also demonstrate conserved resistance mechanisms despite pre-existing or adaptive evolutionary paths to resistance. Lastly, the authors show adaptive ARSI resistance is dependent on interclonal cooperation, where the presence of pre-existing resistant clones or "helper" clones is required to promote adaptive resistance in ARSI-sensitive clones.

      Strengths:

      The authors employ DNA barcoding, powerful a tool already demonstrated by others to track the clonal evolution of tumor populations during resistance development, to study the effects of the timing of therapy as a variable on resistance evolution. The authors use barcoding in two cell line models of prostate cancer in two clinical disease scenarios to demonstrate divergent evolutionary paths converging on common resistant mechanisms. By painstakingly isolating clones with barcodes of interest to generate clonal cell lines from the treatment of naïve cell populations, the authors are able to not only characterize pre-existing resistance but also show cooperativity between resistant and drug-sensitive populations for adaptive resistance.

      Weaknesses:

      While the finding that different evolutionary paths result in common molecular drivers of ARSI resistance is novel and unexpected, this work primarily confirms the authors' previous published work identifying the resistance mechanisms in these cell lines. The impact of the work would be greater with additional studies understanding the specific molecular/genetic mechanisms by which cells become resistant or cooperate within a population to give rise to resistant population subclones.

      This study would also benefit from additional explanation or exploration of why the two resistance driver pathways described (GR and NRG1/Her3) are cell line specific and if there are genetic or molecular backgrounds in which specific resistance signaling is more likely to be the predominant driver of resistance.

    4. Reviewer #2 (Public Review):

      Summary

      The authors aimed to characterise the evolutionary dynamics that occur during the resistance to androgen receptor signalling inhibition, and how this differs in established tumours vs. residual disease, in prostate cancer. By using a barcoding method, they aimed to both characterise the distribution of clones that support therapy resistance in these settings, while also then being able to isolate said clones from the pre-graft population via single-cell cloning to characterise the mechanisms of resistance and dependency on cooperativity.

      While, interestingly, the timing of combination therapies has been shown to be critical to avoid cross-resistance, the timing of therapy has not been specifically considered as a factor dictating resistance pathways. Additionally, the role of residual disease and dormant populations in driving relapse is of increasing interest, yet a lot remains to be understood of these populations. The question of whether different clinical manifestations of therapy resistance follow similar evolutionary pathways to resistance is therefore interesting and relevant for the field.

      The methods applied are elegant and the body of work is substantial. The proposed divergent evolutionary pathways pose interesting questions, and the findings on cooperativity provide insight. However, whether the model truly reflects minimal residual disease to the extent that the authors suggest may limit the relevance of the findings at this stage. Certain patterns in the DNA barcoding results also call into question whether the results fully support the strong claims of the authors, or whether alternative explanations could exist. While the potential to isolate individual clones in the pre-graft setting is a great strength of the method applied and the isolation of these clones is a huge body of work in itself, the limited number of clones that could be isolated also somewhat limits the validation of the findings.

      Strengths

      • Very relevant and interesting question, clear clinical relevance, applying elegant methods that hold the potential to provide a novel understanding of multiple aspects of therapy resistance, through from evolutionary patterns to intracellular and cooperative mechanisms of resistance.

      • The text is clearly written, logical, and the structure is easy to follow.

      Weaknesses

      (1) The extent to which the model used truly mimics residual disease

      The main conclusions of the paper are built upon results using a model for minimal residual disease. However, the extent to which this truly recapitulates minimal residual disease, particularly with regard to their focus on the timings of therapy, could be discussed further. If in the clinical setting residual disease occurs following the existence of a tumour and its microenvironment, there might be many aspects of the process that are missed when coinciding treatment with engraftment of a xenograft tumour with pre-castration. If any characterisation of the minimal residual disease was possible (such as histologically or through RNA sequencing), this may help demonstrate in what ways this model recapitulates minimal residual disease.

      (2) Whether the observed enrichment of pre-resistant clones is truly that

      The authors strongly make the case that their barcoding experiments provide evidence for pre-existing resistance in the context of minimal residual disease. However, it seems that the clones enriched in the ARSIR tumours are consistently the most enriched clones in the pregraft. Is it possible that the high selective pressure in the pre-engraftment ARSI condition simply leads to an enrichment of the most populous clones from the pregraft? Whereas in the control setting, the reduced selective pressure at the point of engraftment allows for a wider variety of clones to establish in the tumour? Additionally, is there the possibility that the clones highly enriched in the pregraft are in fact a heterogeneous group of cells bearing the same barcode due to stochastic events in the process of viral transduction? Addressing these questions would greatly improve the study.

      (3) The robustness of the subsequent work based on 1-2 pre-resistant clones

      While appreciating the volume of work involved in isolating and culturing individual pre-resistant clones, given the previous point, the conclusions would benefit from very robust validations with these single-cell clones. There are only two clones, and the results seem to focus more on one than the other, for which the data is less convincing. For instance, the Enz IC50 data, which in the case for pre-ARSI R2 is restricted to the supplementary, compares the clones A-D. In Figure S8 B, pre-ARSI R2 is compared to clone B, which is, of the four clones shown in the main figure when compared to R1, the one with the lowest Enz IC50. Therefore, while the resistant clones seem to have a significantly higher Enz IC50, comparing both clones to clones A-D may not have achieved this significance. It would also be useful to know how abundant the resistant clones were in the original barcode experiments.

      (4) The logic used in the final section requires further explanation

      In the final section, the authors suggest that a pre-ARSIR clone is able to cooperate with a pre-Intact clone to aid adaptive ARSI resistance. If this is true, then could it not be that rare, pre-resistant clones support adaptive resistance in established tumours? And, therefore, the mechanism underlying resistance could be through pre-existing resistant clones in both settings. The work would benefit from a discussion to clarify this discrepancy in the interpretation of the findings. This is particularly necessary given the strong wording the authors use regarding their findings, such as that they have provided 'conclusive evidence' for acquired resistance.

    1. eLife assessment

      This valuable study demonstrates that genomic insertion of a G4-containing sequence can be sufficient to induce chromosome loops and alter gene expression. The evidence supporting the conclusions is convincing. Effects were shown by Hi-C as well as qPCR for chromatin modifications and expression, and the specificity of the effects was controlled by mutating the G4-containing sequence or treating with LNA probes to abolish G4 structure formation. The work will be of interest to researchers working on chromatin organization and gene regulation.

    2. Reviewer #1 (Public Review):

      In this manuscript, Chowdhury and co-workers provide interesting data to support the role of G4-structures in promoting chromatin looping and long-range DNA interactions. The authors achieve this by artificially inserting a G4-containing sequence in an isolated region of the genome using CRISPR-Cas9 and comparing it to a control sequence that does not contain G4 structures. Based on the data provided, the authors can conclude that G4-insertion promotes long-range interactions (measured by Hi-C) and affects gene expression (measured by qPCR) as well as chromatin remodelling (measured by ChIP of specific histone markers).

      In this revised version of the manuscript, G4 formation of the inserted sequence was validated by ChIP-qPCR, and the same G4-containing sequence was inserted at a second locus, and similar, though not identical, effects on chromatin and gene expression were observed.

      Strengths:

      This is the first attempt to connect genomics datasets of G4s and HiC with gene expression.<br /> The use of Cas9 to artificially insert a G4 is also very elegant.

    3. Reviewer #2 (Public Review):

      Roy et al. investigated the role of non-canonical DNA structures called G-quadruplexes (G4s) in long-range chromatin interactions and gene regulation. Introducing a G4 array into chromatin significantly increased the number of long-range interactions, both within the same chromosome (cis) and between different chromosomes (trans). G4s functioned as enhancer elements, recruiting p300 and boosting gene expression even 5 megabases away. The study reveals that G4s directly influence 3D chromatin organization via facilitating communication between regulatory elements and genes.

      Strengths:

      The authors' findings are valuable for understanding the role of G4-DNA in 3D genome organization and gene transcription. The authors provide convincing evidence to support their claims.

    4. Reviewer #3 (Public Review):

      Summary:

      This paper aims to demonstrate the role of G-quadruplex DNA structures in the establishment of chromosome loops. The authors introduced an array of G4s spanning 275 bp, naturally found within a very well characterized promoter region of the hTERT promoter, in an ectopic region devoid of G-quadruplex and annotated gene. As a negative control, they used a mutant version of the same sequence in which G4 folding is impaired. Due to the complexity of the region, 3 G4s on the same strand and one on the opposite strand, 12 point mutations were made simultaneously (G to T and C to A). Analysis of the 3D genome organization shows that the WT array establishes more contact within the TAD and throughout the genome than the control array. Additionally, a slight enrichment of H3K4me1 and p300, both enhancer markers, was observed locally near the insertion site. The authors tested whether the expression of genes located either nearby or away up to 5 Mb were up-regulated based on this observation. They found that four genes were up-regulated from 1.5 to 3 fold. An increased interaction between the G4 array compared to the mutant was confirmed by the 3C assay. For in-depth analysis of the long-range changes, they also performed Hi-C experiments and showed a genome-wide increase in interactions of the WT array versus the mutated form.

      Strengths:

      The experiments were well-executed and the results indicate a statistical difference between the G4 array inserted cell line and the mutated modified cell line.

      Weaknesses:

      (1) It would have been nice to have an internal control corresponding to a region known to be folded in several cell lines to compare the level of pG4 signal within their construct with a well-characterised control (for example, the KRAS promoter region).<br /> (2) The mutations introduced into the G4 sequence may also affect Sp1 or other transcription factor binding sites present in this region, and some of the observations may depend on these sites rather than G4 structures. While this is acknowledged in the text, the conclusion in the title of the paper seems an overstatement.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Chowdhury and co-workers provide interesting data to support the role of G4-structures in promoting chromatin looping and long-range DNA interactions. The authors achieve this by artificially inserting a G4-containing sequence in an isolated region of the genome using CRISPR-Cas9 and comparing it to a control sequence that does not contain G4 structures. Based on the data provided, the authors can conclude that G4-insertion promotes long-range interactions (measured by Hi-C) and affects gene expression (measured by qPCR) as well as chromatin remodelling (measured by ChIP of specific histone markers).

      Whilst the data presented is promising and partially supports the authors' conclusion, this reviewer feels that some key controls are missing to fully support the narrative. Specifically, validation of actual G4-formation in chromatin by ChIP-qPCR (at least) is essential to support the association between G4-formation and looping. Moreover, this study is limited to a genomic location and an individual G4-sequence used, so the findings reported cannot yet be considered to reflect a general mechanism/effect of G4-formation in chromatin looping.

      Strengths:

      This is the first attempt to connect genomics datasets of G4s and HiC with gene expression. The use of Cas9 to artificially insert a G4 is also very elegant.

      Weaknesses:

      Lack of controls, especially to validate G4-formation after insertion with Cas9. The work is limited to a single G4-sequence and a single G4-site, which limits the generalisation of the findings.

      In the revised version we validated G4 formation inside cells at the insertion site using the reported G4-selective antibody BG4. Significant BG4 binding (by ChIP-qPCR) was clear in the G4-array insert, and not in the G4-mutated insert, supporting formation of G4s by the inserted G4-array (included as Figure S4).

      To directly address the second point, we inserted the G4-sequence, or the mutated control, at a second relatively isolated locus (at the 10 millionth position on Chr12, denoted as 10M site in text). First, BG4 ChIP was done to confirm intracellular G4 formation by the inserted array. BG4 ChIP-qPCR binding was significant within the inserted region, and not in the negative control region (Figure S8), consistent with the 79M locus. Together these demonstrate intracellular G4 formation by inserted sequences at two different loci.

      We next checked the state of chromatin of the G4-array inserted at the 10M locus, or its negative control. Histone marks H3K4Me1, H3K27Ac, H3K27Me3, H3K9me3 and H3K4Me3 were tested at the G4-array, or the negative control locus. Relative increase in the enhancer histone marks was evident, relative to the control sequence. This was largely similar to the 79M locus, supporting an enhancer-like state. Interestingly, here we further noted presence of the H3K27me3 histone mark. The presence of the H3K27Me3 repressor histone mark, along with H3K4Me1/H3K27Ac enhancer histone marks, support a poised enhancer-like status of the inserted G4 region, as has been observed earlier in other studies. Together, although data from the two distinct G4 insertion sites support the enhancer-like state, there are contextual differences likely due to the sequence/chromatin of the sites adjacent to the inserted sequence.

      Effect of the 10M G4-insertion on activation of surrounding genes (10 Mb window), and not the G4-mutant insert, was evident for most genes. Consistent with the enhancer-like state of the G4-array insert; in line with the 79M G4-array insert.

      These results have been added as the final section in the revised version, data is shown in Figure 7.

      Reviewer #2 (Public Review):

      Summary:

      Roy et al. investigated the role of non-canonical DNA structures called G-quadruplexes (G4s) in long-range chromatin interactions and gene regulation. Introducing a G4 array into chromatin significantly increased the number of long-range interactions, both within the same chromosome (cis) and between different chromosomes (trans). G4s functioned as enhancer elements, recruiting p300 and boosting gene expression even 5 megabases away. The study proposes a mechanism where G4s directly influence 3D chromatin organization, facilitating communication between regulatory elements and genes.

      Strength:

      The findings are valuable for understanding the role of G4-DNA in 3D genome organization and gene transcription.

      Weaknesses:

      The study would benefit from more robust and comprehensive data, which would add depth and clarity.

      (1) Lack of G4 Structure Confirmation: The absence of direct evidence for G4 formation within cells undermines the study's foundation. Relying solely on in vitro data and successful gene insertion is insufficient.

      Using the reported G4-specific antibody, BG4, we performed BG4 ChIP-qPCR at the 79M locus. In addition, a second G4-insertion site was created and BG4 ChIP-qPCR was used to validate intracellular G4 formation. Briefed below, more details in the response above.

      In the revised version we validated G4 formation inside cells at the insertion site using the reported G4-selective antibody BG4. Significant BG4 binding (by ChIP-qPCR) was clear in the G4-array insert, and not in the G4-mutated insert, supporting formation of G4s by the inserted G4-array (included as Figure S4).

      Further, we inserted the G4-sequence, or the mutated control, at a second relatively isolated locus (at the 10 millionth position on Chr12, denoted as 10M site in text). First, BG4 ChIP was done to confirm intracellular G4 formation by the inserted array. BG4-ChIP-qPCR was significant within the G4-array inserted region, and not in the negative control region (Figure S8), consistent with the 79M locus. Together these demonstrate intracellular G4 formation by inserted sequences at two different loci. Added in revised text in the second and the final sections of results, data shown in Figures 7, S4 and S8.

      (2) Alternative Explanations: The study does not sufficiently address alternative explanations for the observed results. The inserted sequences may not form G4s or other factors like G4-RNA hybrids may be involved.

      As mentioned in response to the previous comment, we confirmed that the inserted sequence indeed forms G4s inside the cells. RNA-DNA hybrid G4s can form within R-loops with two or more tandem G-tracks (G-rich sequences) on the nascent RNA transcript as well as the non-template DNA strand (Fay et al., 2017, 28554731). A recent study has observed that R-loop-associated G4 formation can enhance chromatin looping by strengthening CTCF binding (Wulfridge et al., 2023, 37552993). As pointed out by the reviewer, the possibility of G4-RNA hybrids remains, we have mentioned this possibility for readers in the second last paragraph of the Discussion.

      (3) Limited Data Depth and Clarity: ChIP-qPCR offers limited scope and considerable variation in some data makes conclusions difficult.

      We noted variation with one of the primers in a few ChIP-qPCR experiments (in Figures 2 and 3D). The changes however were statistically significant across replicates, and consistent with the overall trend of the experiments (Figures 2, 3 and 4). Enhancer function, in addition to ChIP, was also confirmed using complementary assays like 3C and RNA expression.

      (4) Statistical Significance and Interpretation: The study could be more careful in evaluating the statistical significance and magnitude of the effects to avoid overinterpreting the results.

      We reconfirmed our statistical calculations from biological replicate experiments. We carefully looked at potential overinterpretations, and made appropriate changes in the manuscript (details of the changes given below in response to comment to authors).

      Reviewer #3 (Public Review):

      Summary:

      This paper aims to demonstrate the role of G-quadruplex DNA structures in the establishment of chromosome loops. The authors introduced an array of G4s spanning 275 bp, naturally found within a very well-characterized promoter region of the hTERT promoter, in an ectopic region devoid of G-quadruplex and annotated gene. As a negative control, they used a mutant version of the same sequence in which G4 folding is impaired. Due to the complexity of the region, 3 G4s on the same strand and one on the opposite strand, 12 point mutations were made simultaneously (G to T and C to A). Analysis of the 3D genome organization shows that the WT array establishes more contact within the TAD and throughout the genome than the control array. Additionally, a slight enrichment of H3K4me1 and p300, both enhancer markers, was observed locally near the insertion site. The authors tested whether the expression of genes located either nearby or up to 5 Mb away was up-regulated based on this observation. They found that four genes were up-regulated from 1.5 to 3-fold. An increased interaction between the G4 array compared to the mutant was confirmed by the 3C assay. For in-depth analysis of the long-range changes, they also performed Hi-C experiments and showed a genome-wide increase in interactions of the WT array versus the mutated form.

      Strengths:

      The experiments were well-executed and the results indicate a statistical difference between the G4 array inserted cell line and the mutated modified cell line.

      Weaknesses:

      The control non-G4 sequence contains 12 point mutations, making it difficult to draw clear conclusions. These mutations not only alter the formation of G4, but also affect at least three Sp1 binding sites that have been shown to be essential for the function of the hTERT promoter, from which the sequence is derived. The strong intermingling of G4 and Sp1 binding sites makes it impossible to determine whether all the observations made are dependent on G4 or Sp1 binding. As a control, the authors used Locked Nucleic Acid probes to prevent the formation of G4. As for mutations, these probes also interfere with two Sp1 binding sites. Therefore, using this alternative method has the same drawback as point mutations. This major issue should be discussed in the paper. It is also possible that other unidentified transcription factor binding sites are affected in the presented point mutants.

      Since the sequence we used to test the effects of G4 structure formation is highly G-rich, we had to introduce at least 12 mutations to be sure that a stable G4 structure would not form in the mutated control sequence. Sp1 has been reported to bind to G4 structures (Raiber et al., 2012). Therefore, Sp1 binding is likely to be associated with the G4-dependent enhancer functions observed here. We also appreciate that apart from Sp1, other unidentified transcription factor binding sites might be affected by the mutations we introduced. We have discussed these possibilities in the fourth paragraph of the Discussion section in the revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Whilst the data presented is promising and partially supports the authors' conclusion, this reviewer feels that some key controls are missing to fully support the narrative used. Below are my main concerns:

      (1) The main thing missing in the current manuscript is to validate the actual formation of G4 in chromatin context for the repeat inserted by CRISPR-Cas. Whilst I appreciate this will form promptly a G4 in vitro, to fully support the conclusions proposed the authors would need to demonstrate actual G4-formation in cells after insertion. This could be done by ChIP-qPCR using the G4-selective antibody BG4 for example. This is an essential piece of evidence to be added to link with confidence G4-formation to chromatin looping.

      To address the concern regarding whether the inserted G4 sequence forms G4s in cells, as suggested, we used the G4-selective antibody BG4. PCR primers in the study were designed keeping multiple points in mind: Primers should not bind to any site of G/C alteration in the mutated control insert; either the forward/reverse primer is from the adjacent region for specificity; covers adjacent regions for studying any effects on chromatin; and, PCRs optimized keeping in mind the repeats within the inserted sequence. Given these, primer pairs R1-R4 were chosen for further work following optimizations (Figure 2, top panel). For BG4 ChIP-qPCR we used primer pairs R2, which covered >100 bases of the inserted G4-array, or the G4-mutated control. Significant BG4 binding was clear in the G4-array insert, and not in the G4-mutated insert, demonstrating formation of G4s by the inserted G4-array (Figure S4).

      In response to comment #3 below, we inserted the G4-forming sequence (or its mutated control) at a second locus. This insertion was near the 10 millionth position of chromosome 12 (10M insertion locus in text). Here also, BG4 binding was significant within the G4-array inserted region, and not in the negative control region (Figure S8). Together these demonstrate G4 formation by the inserted sequence at two different loci.

      (2) I found the LNA experiment very elegant. However, what would be the effect of LNA treatment on the control sequence that does not form G4s? This control is essential to disentangle the effect of LNA pairing to the sequence itself vs disrupting the G4-structure.

      As per the reviewer’s suggestion, we performed a control experiment where we treated the G4-mutated insert (control) cells with the G4-disrupting LNA probes. The changes in the expression of the surrounding genes in this case were not significant, indicating that the effects observed in the G4-array insert cells were possibly due to disruption of the inserted G4 structures. This data is presented in Figure S5.

      (3) The authors describe their work and present its conclusion as if this were a genome-wide study, whilst the work is focused on a specific genomic location, and the looping, along with the effect on histone acetylation and gene expression, is limited to this. The authors cannot conclude, therefore, that this is a generic effect and the discussion should be more focused on the specific G4s used and the genomic location investigated. Ideally, insertion of a different G4-forming sequence or of the same in a different genomic location is recommended to really claim a generic effect.

      To address this we inserted the G4-array sequence, or the G4-mutated control sequence, at another relatively isolated locus – at the 10 millionth position of chromosome 12 – denoted as 10M. Using BG4 ChIP-qPCR intracellular G4 formation was confirmed. We observed that the enhancer-like features in terms of enhancer histone marks and increase in the expression of surrounding genes were largely reproduced at the 10M locus on G4 insertion (Figure 7). These results are added as the final section under Results.

      Reviewer #2 (Recommendations For The Authors):

      The study proposes a mechanism where G4s directly influence 3D chromatin organization, facilitating communication between regulatory elements and genes.

      While the present manuscript presents an interesting hypothesis, it would benefit from enhanced novelty and more robust data. The study complements existing G4 research (e.g., PMID: 31177910). While the conclusions hold biological relevance, they largely reiterate established knowledge. Furthermore, the presented data appear preliminary and still lack depth and clarity.

      Hou et al., 2019 (PMID: 31177910) showed presence of potential G4-forming sequences correlated with TAD boundaries, along with enrichment of architectural proteins and transcription factor binding sites. Also, other studies noted enrichment of potential G4-forming sequences at enhancers along with nucleosome depletion and higher transcription factor binding (Hou et al., 2021; Williams et al., 2020). These studies proposed the role of G4s in chromatin/TAD states based on analysis of potential G4-forming sequences using correlative bioinformatics analyses. Here we sought to directly test causality. Insertion of G4 sequence, and formation of intracellular G4s in an isolated, G4-depleted region resulted in altered characteristics of chromatin, and not in the negative control insertion that does not form G4s. These, in contrast to earlier studies, directly demonstrates the causal role of G4s as functional elements that impact local and distant chromatin.

      Major concerns:

      (1) Lack of G4 Structure Confirmation: Implement G4-specific antibodies or fluorescent probes to verify G4 structures inside the cells.

      Detailed response given above. Briefly, in the revised version we validated G4 formation inside cells at the insertion site using the reported G4-selective antibody BG4. Significant BG4 binding (by ChIP-qPCR) was clear in the G4-array insert, and not in the G4-mutated insert, supporting formation of G4s by the inserted G4-array (included as Figure S4).

      Further, we inserted the G4-sequence, or the mutated control, at a second relatively isolated locus (at the 10 millionth position on Chr12, denoted as 10M site in text). First, BG4 ChIP was done to confirm intracellular G4 formation by the inserted array. BG4 ChIP-qPCR binding was significant within the G4-array inserted region, and not in the negative control region (Figure S8), consistent with the 79M locus. Together these demonstrate intracellular G4 formation by inserted sequences at two different loci. Added in revised text in the second and the final sections of results, data shown in Figures 7, S4 and S8.

      (2) Alternative Explanations: Explore the possibility that the sequences may not form G4s or that other factors like G4-RNA hybrids are involved.

      Response provided in the public reviews section.

      (3) Limited Data Depth and Clarity: ChIP-qPCR offers limited scope. Consider employing G4 ChIP-seq for genome-wide analysis of G4 association with histone modifications. Address inconsistencies in data like H3K27me3 variation and incomplete H3K9me3 data sets.

      A recent study performed G4 CUT&Tag (Lyu et al., 2022, 34792172) and observed G4 formation at both active promoters and active and poised enhancers. We have discussed this in the sixth paragraph of the Discussion. The H3K27Me3 occupancy at the 79M locus insertions did not have any significant G4-dependent changes, however, at the second insertion site at the 10M locus (introduced in the revised manuscript, Figure 7) there was significant G4-dependent increase in H3K27Me3 occupancy along with the H3K4Me1 and H3K27Ac enhancer histone marks, indicating formation of a poised enhancer-like element.

      We completed the H3K9me3 data sets for both insertion sites.

      (4) Statistical Significance and Interpretation: Re-evaluate the statistical significance of results and interpret them in the context of relevant biological knowledge. Avoid overstating the impact of minor changes.     

      We revised several lines to avoid overstating results. Some of the changes are as below (changes underline/strikethrough)

      - There was an a relatively modest increase in the recruitment of both p300 and a substantial increase in the recruitment of the more functionally active acetylated p300/CBP to the G4-array when compared against the mutated control.

      - As expected, although modest, a decrease in the H3K4Me1 and H3K27Ac enhancer histone modifications was evident within the insert upon the LNAs treatment.

      - Moreover, the enhancer marks were relatively reduced, although not markedly, when the inserted G4s were specifically disrupted.

      (5) Unexplored Aspects: Investigate the relationship between G4 DNA and R-loops, and consider the role of CTCF and cohesin proteins in mediating long-range interactions. Integrate existing research to build a more comprehensive framework and draw more robust conclusions.

      As mentioned in response to one of the earlier comments, a recent publication extensively studied the association between G4s, R-loops, and CTCF binding (Wulfridge et al., 2023). While, here we focused on the primary features of a potential enhancer, further work will be necessary to establish how G4s influence the coordinated action between cohesin and CTCF and consequent chromatin looping. We have described this for readers in the second last paragraph of the Discussion in the revised version.

      Minor Concern:

      (1) Enhancer Definition: The term "enhancer" requires specific criteria. Modify the section heading or provide evidence demonstrating the G4 sequence fulfills all conditions for being an enhancer, such as position independence and long-range effects.

      Although we checked some of the primary features of a potential enhancer: Like expression of surrounding genes, enhancer histone marks, chromosomal looping interactions, and recruitment of transcriptional coactivators, further aspects may need to be validated. As suggested, in the revised manuscript the section heading has been modified to ‘Enhancer-like features emerged upon insertion of G4s.’

      Reviewer #3 (Recommendations For The Authors):

      In addition to the points in my public review, I would like to mention some less significant points.

      The authors mention that "the array of G4-forming sequences used for insertion was previously reported to form stable G4s in human cells" (Lim et al., 2010; Monsen et al., 2020; Palumbo et al., 2009). However, upon reading the publications, I found that these observations were made in vitro. I may have missed something, but there are now several mappings of folded-G4 in human cells based on different approaches. It would be beneficial to investigate whether the hTERT promoter is a site of G-quadruplex formation in vivo. If confirmed, a similar analysis should be conducted on the 275 bp region inserted into the ectopic region to determine if it also has the ability to form a structured G4.

      We performed BG4 ChIP to confirm in vivo G4 formation by the inserted G4-array as suggested (Figures S4, S8). Detail response given above. Briefly, in the revised version we validated G4 formation inside cells at the insertion site using the reported G4-selective antibody BG4. Significant BG4 binding (by ChIP-qPCR) was clear in the G4-array insert, and not in the G4-mutated insert, supporting formation of G4s by the inserted G4-array (included as Figure S4).

      Further, we inserted the G4-sequence, or the mutated control, at a second relatively isolated locus (at the 10 millionth position on Chr12, denoted as 10M site in text). First, BG4 ChIP was done to confirm intracellular G4 formation by the inserted array. BG4-ChIP-qPCR was significant within the inserted region, and not in the negative control region (Figure S8). Consistent with the 79M locus. Together these demonstrate intracellular G4 formation by inserted sequences at two different loci. Added in revised text in the second and the final sections of results, data shown in Figures 7, S4 and S8.

      The inserted sequence originates from a well-characterized promoter. The authors suggest that placing it in an ectopic position creates an enhancer-like region, based on the observation of increased levels of H3K27Ac and H3K4me1 on the WT array. To provide a control that it is not a promoter, it would be useful to also analyze a specific mark of promoter activity, such as H3K4me3.

      As suggested by reviewer, we also analysed the H3K4Me3 promoter activation mark at both the 79M and 10M (introduced in the revised manuscript, Figure 7) insertion loci. We did not observe any significant G4-dependent changes in the recruitment of H3K4Me3 (Figures 2, 7).

      In the discussion, the authors mention "it was proposed that inter-molecular G4 formation between distant stretches of Gs may lead to DNA looping". To investigate this further, it would be worthwhile to examine whether the promoter regions of activated genes (PAWR, PPP1R12A, NAV3, and SLC6A15) contain potentially forming G-quadruplexes (pG4). Additionally, sites that establish more contact with the G4 array described in Figure 6F could be analyzed for enrichment in pG4.

      Thank you for pointing this out. We found promoters of the four genes (PAWR, PPP1R12A, NAV3, and SLC6A15) harbour potential G4-forming sequences (pG4s). Also as suggested, we analysed the contact regions in Fig 6F, along with the whole locus, for pG4s. Relative enrichment in pG4 was seen, particularly within the significantly enhanced interacting regions, which at times spreads beyond the interacting regions also. This is shown in the lower panel of Figure 6F in the revised version. We have described this in Discussion for readers.

    1. eLife assessment

      This important study addresses the idea that defective lysosomal clearance might be causal to renal dysfunction in cystinosis. They observe that restoring expression of vATPase subunits and treatment with Astaxanthin ameliorate mitochondrial function in a model of renal epithelial cells, opening opportunities for translational application to humans. The data are convincing, but the description of methodologies is incomplete.

    2. Reviewer #1 (Public Review):

      Cystinosis is a rare hereditary disease caused by biallelic loss of the CTNS gene, encoding two cystinosin protein isoforms; the main isoform is expressed in lysosomal membranes where it mediates cystine efflux whereas the minor isoform is expressed at the plasma membrane and in other subcellular organelles. Sur et al proceed from the assumption that the pathways driving the cystinosis phenotype in the kidney might be identified by comparing the transcriptome profiles of normal vs CTNS-mutant proximal tubular cell lines. They argue that key transcriptional disturbances in mutant kidney cells might not be present in non-renal cells such as CTNS-mutant fibroblasts.

      Using cluster analysis of the transcriptomes, the authors selected a single vacuolar H+ATPase (ATP6VOA1) for further study, asserting that it was the "most significantly downregulated" vacuolar H+ATPase (about 58% of control) among a group of similarly downregulated H+ATPases. They then showed that exogenous ATP6VOA1 improved CTNS(-/-) RPTEC mitochondrial respiratory chain function and decreased autophagosome LC3-II accumulation, characteristic of cystinosis. The authors then treated mutant RPTECs with 3 "antioxidant" drugs, cysteamine, vitamin E, and astaxanthin (ATX). ATX (but not the other two antioxidant drugs) appeared to improve ATP6VOA1 expression, LC3-II accumulation, and mitochondrial membrane potential. Respiratory chain function was not studied. RTPC cystine accumulation was not studied.

      The major strengths of this manuscript reside in its two primary findings.<br /> (1) Plasmid expression of exogenous ATP6VOA1 improves mitochondrial integrity and reduces aberrant autophagosome accumulation.<br /> (2) Astaxanthin partially restores suboptimal endogenous ATP6VOA1 expression.

      Taken together, these observations suggest that astaxanthin might constitute a novel therapeutic strategy to ameliorate defective mitochondrial function and lysosomal clearance of autophagosomes in the cystinotic kidney. This might act synergistically with the current therapy (oral cysteamine) which facilitates defective cystine efflux from the lysosome.

      There are, however, several weaknesses in the manuscript.<br /> (1) The reductive approach that led from transcriptional profiling to focus on ATP6VOA1 is not transparent and weakens the argument that potential therapies should focus on correction of this one molecule vs the other H+ ATPase transcripts that were equally reduced - or transcripts among the 1925 belonging to at least 11 pathways disturbed in mutant RPTECs.<br /> (2) A precise description of primary results is missing -- the Results section is preceded by or mixed with extensive speculation. This makes it difficult to dissect valid conclusions from those derived from less informative experiments (eg data on CDME loading, data on whole-cell pH instead of lysosomal pH, etc).<br /> (3) Data on experimental approaches that turned out to be uninformative (eg CDME loading, or data on whole=cell pH assessment with BCECF).<br /> (4) The rationale for the study of ATX is unclear and the mechanism by which it improves mitochondrial integrity and autophagosome accumulation is not explored (but does not appear to depend on its anti-oxidant properties).<br /> (5) Thoughtful discussion on the lack of effect of ATP6VOA1 correction on cystine efflux from the lysosome is warranted, since this is presumably sensitive to intralysosomal pH.<br /> (6) Comparisons between RPTECs and fibroblasts cannot take into account the effects of immortalization on cell phenotype (not performed in fibroblasts).

      This work will be of interest to the research community but is self-described as a pilot study. It remains to be clarified whether transient transfection of RPTECs with other H+ATPases could achieve results comparable to ATP6VOA1. Some insight into the mechanism by which ATX exerts its effects on RPTECs is needed to understand its potential for the treatment of cystinosis.

    3. Reviewer #2 (Public Review):

      Sur and colleagues investigate the role of ATP6V0A1 in mitochondrial function in cystinotic proximal tubule cells. They propose that loss of cystinosin downregulates ATP6V0A1 resulting in acidic lysosomal pH loss, and adversely modulates mitochondrial function and lifespan in cystinotic RPTECs. They further investigate the use of a novel therapeutic Astaxanthin (ATX) to upregulate ATP6V0A1 that may improve mitochondrial function in cystinotic proximal tubules.

      The new information regarding the specific proximal tubular injuries in cystinosis identifies potential molecular targets for treatment. As such, the authors are advancing the field in an experimental model for potential translational application to humans.

    4. Author response:

      eLife assessment

      This important study addresses the idea that defective lysosomal clearance might be causal to renal dysfunction in cystinosis. They observe that restoring expression of vATPase subunits and treatment with Astaxanthin ameliorate mitochondrial function in a model of renal epithelial cells, opening opportunities for translational application to humans. The data are convincing, but the description of methodologies is incomplete.

      Public Reviews:

      Reviewer #1 (Public Review):

      Cystinosis is a rare hereditary disease caused by biallelic loss of the CTNS gene, encoding two cystinosin protein isoforms; the main isoform is expressed in lysosomal membranes where it mediates cystine efflux whereas the minor isoform is expressed at the plasma membrane and in other subcellular organelles. Sur et al proceed from the assumption that the pathways driving the cystinosis phenotype in the kidney might be identified by comparing the transcriptome profiles of normal vs CTNS-mutant proximal tubular cell lines. They argue that key transcriptional disturbances in mutant kidney cells might not be present in non-renal cells such as CTNS-mutant fibroblasts.

      Using cluster analysis of the transcriptomes, the authors selected a single vacuolar H+ATPase (ATP6VOA1) for further study, asserting that it was the "most significantly downregulated" vacuolar H+ATPase (about 58% of control) among a group of similarly downregulated H+ATPases. They then showed that exogenous ATP6VOA1 improved CTNS(-/-) RPTEC mitochondrial respiratory chain function and decreased autophagosome LC3-II accumulation, characteristic of cystinosis. The authors then treated mutant RPTECs with 3 "antioxidant" drugs, cysteamine, vitamin E, and astaxanthin (ATX). ATX (but not the other two antioxidant drugs) appeared to improve ATP6VOA1 expression, LC3-II accumulation, and mitochondrial membrane potential. Respiratory chain function was not studied. RTPC cystine accumulation was not studied.

      In this manuscript, as an initial step, we have studied the first step in respiratory chain function by performing the Seahorse Mito Stress Test to demonstrate that the genetic manipulation (knocking out the CTNS gene and plasmid-mediated expression correction of ATP6V0A1) impacts mitochondrial energetics. We did not investigate the respirometry-based assays that can identify locations of electron transport deficiency, which we plan to address in a follow-up paper.

      We would like to draw attention to Figure 3D, where cystine accumulation has been studied. This figure demonstrates an increased intracellular accumulation of cystine.

      The major strengths of this manuscript reside in its two primary findings.

      (1) Plasmid expression of exogenous ATP6VOA1 improves mitochondrial integrity and reduces aberrant autophagosome accumulation.

      (2) Astaxanthin partially restores suboptimal endogenous ATP6VOA1 expression.

      Taken together, these observations suggest that astaxanthin might constitute a novel therapeutic strategy to ameliorate defective mitochondrial function and lysosomal clearance of autophagosomes in the cystinotic kidney. This might act synergistically with the current therapy (oral cysteamine) which facilitates defective cystine efflux from the lysosome.

      There are, however, several weaknesses in the manuscript.

      (1) The reductive approach that led from transcriptional profiling to focus on ATP6VOA1 is not transparent and weakens the argument that potential therapies should focus on correction of this one molecule vs the other H+ ATPase transcripts that were equally reduced - or transcripts among the 1925 belonging to at least 11 pathways disturbed in mutant RPTECs.

      The transcriptional profiling studies on ATP6V0A1 have been fully discussed and publicly shared. Table 2 lists the v-ATPase transcripts that are significantly downregulated in cystinosis RPTECs. We have also clarified and justified the choice of further studies on ATP6V0A1, where we state the following: "The most significantly perturbed member of the V-ATPase gene family found to be downregulated in cystinosis RPTECs is ATP6V0A1 (Table 2). Therefore, further attention was focused on characterizing the role of this particular gene in a human in vitro model of cystinosis."

      (2) A precise description of primary results is missing -- the Results section is preceded by or mixed with extensive speculation. This makes it difficult to dissect valid conclusions from those derived from less informative experiments (eg data on CDME loading, data on whole-cell pH instead of lysosomal pH, etc).

      We appreciate the reviewer highlighting areas for further improving the manuscript's readership. In our resubmission, we have revised the results section to provide a more precise description of the primary findings and restrict the inferences to the discussion section only.

      (3) Data on experimental approaches that turned out to be uninformative (eg CDME loading, or data on whole=cell pH assessment with BCECF).

      We have provided data whether it was informative or uninformative. Though lysosome-specific pH measurement would be important to measure, it was not possible to do it in our cells as they were very sick and the assay did not work. Hence we provide data on pH assessment with BCECF, which measures overall cytoplasmic and organelle pH, which is also informative for whole cell pH that is an overall pH of organelle pH and cytoplasmic pH.

      (4) The rationale for the study of ATX is unclear and the mechanism by which it improves mitochondrial integrity and autophagosome accumulation is not explored (but does not appear to depend on its anti-oxidant properties).

      We have provided rationale for the study of ATX; provided in the introduction and result section, where we mentioned the following: “correction of ATP6V0A1 in CTNS-/- RPTECs and treatment with antioxidants specifically, astaxanthin (ATX) increased the production of cellular ATP6V0A1, identified from a custom FDA-drug database generated by our group, partially rescued the nephropathic RPTEC phenotype. ATX is a xanthophyll carotenoid occurring in a wide variety of organisms. ATX is reported to have the highest known antioxidant activity and has proven to have various anti-inflammatory, anti-tumoral, immunomodulatory, anti-cancer, and cytoprotective activities both in vivo and in vitro”.

      We are still investigating the mechanism by which ATX improves mitochondrial integrity and this will be the focus of a follow-on manuscript.

      (5) Thoughtful discussion on the lack of effect of ATP6VOA1 correction on cystine efflux from the lysosome is warranted, since this is presumably sensitive to intralysosomal pH.

      We have provided a thoughtful discussion in the revised manuscript on some possible mechanisms that may result in an effect of ATP6V0A1 correction on cysteine efflux from the lysosome.

      (6) Comparisons between RPTECs and fibroblasts cannot take into account the effects of immortalization on cell phenotype (not performed in fibroblasts).

      The purpose of examining different tissue sources of primary cells in nephropathic cystinosis was to assess if any of the changes in these cells were tissue source specific. We used primary cells isolated from patients with nephropathic cystinosis—RPTECs from patients' urine and fibroblasts from patients' skin—these cells are not immortalized and can therefore be compared. This is noted in the results section - “Specific transcriptional signatures are observed in cystinotic skin-fibroblasts and RPTECs obtained from the same individual with cystinosis versus their healthy counterparts”.

      We next utilized the immortalized RPTEC cell line to create CRISPR-mediated CTNS knockout RPTECs as a resource for studying the pathophysiology of cystinosis. These cells were not compared to the primary fibroblasts.

      (7) This work will be of interest to the research community but is self-described as a pilot study. It remains to be clarified whether transient transfection of RPTECs with other H+ATPases could achieve results comparable to ATP6VOA1. Some insight into the mechanism by which ATX exerts its effects on RPTECs is needed to understand its potential for the treatment of cystinosis.

      In future studies we will further investigate the effect of ATX on RPTECs for treatment of cystinosis- this will require the conduct of Phase 1 and Phase 2 clinical studies which are beyond the scope of this current manuscript.

      Reviewer #2 (Public Review):

      Sur and colleagues investigate the role of ATP6V0A1 in mitochondrial function in cystinotic proximal tubule cells. They propose that loss of cystinosin downregulates ATP6V0A1 resulting in acidic lysosomal pH loss, and adversely modulates mitochondrial function and lifespan in cystinotic RPTECs. They further investigate the use of a novel therapeutic Astaxanthin (ATX) to upregulate ATP6V0A1 that may improve mitochondrial function in cystinotic proximal tubules.

      The new information regarding the specific proximal tubular injuries in cystinosis identifies potential molecular targets for treatment. As such, the authors are advancing the field in an experimental model for potential translational application to humans.

    1. eLife assessment

      This study provides valuable findings that improve our understanding of the evolutionary conservation of the role of DDX6 in mRNA decay. The evidence supporting the authors' conclusions is convincing. This work will be of interest to molecular, cell biologists and biochemists, especially those studying RNA.

    2. Reviewer #1 (Public Review):

      Weber et al. investigated the role of human DDX6 in messenger RNA decay using CRISPR/Cas9 mediated knockout (KO) HEK293T cells. The authors showed that stretches of rare codons or codons known to cause ribosome stalling in reporter mRNAs leads to a DDX6 specific loss of mRNA decay. The authors moved on to show that there is a physical interaction between DDX6 and the ribosome. Using co-immunoprecipitation (co-IP) experiments, the authors determined that the FDF-binding surface of DDX6 is necessary for binding to the ribosome, the same domain which is necessary for binding several decapping factors such as EDC3, LSM14A, and PatL. However, they determine the interaction between DDX6, and the ribosome is independent of the DDX6 interaction with the NOT1 subunit of the CCR4-NOT complex. Interestingly, the authors were able to determine that all known functional domains, including the ATPase activity of DDX6, are required for its effect on mRNA decay. Using ribosome profiling and RNA-sequencing, the authors were able to identify a group of 260 mRNAs that exhibit increased translational efficiency (TE) in DDX6 Knockout cells, suggesting that DDX6 translationally represses certain mRNAs. The authors determined this group of mRNAs has decreased GC content, which has been previously noted to coincide with low codon optimality, the authors thus conclude DDX6 may translationally repress transcripts of low codon optimality. Furthermore, the authors identify 35 transcripts that are both upregulated in DDX6 KO cells and exhibit locally increased ribosome footprints (RBFs), suggestive of a ribosome stalling sequence. Lastly, the authors showed that both endogenous and tethering of DDX6 to reporter mRNAs with and without these translational stalling sequences leads to a relative increase in ribosome association to a transcript. Overall, this work confirms that the role of DDX6 in mRNA decay shares several conserved features with the yeast homolog Dhh1. Dhh1 is known to bind slow-moving ribosomes and lead to the differential decay of non-optimal mRNA transcripts (Radhakrishnan et al. 2016). The novelty of this work lies primarily in the identification of the physical interaction between DDX6 and the ribosome and the breakdown of which domains of DDX6 are necessary for this interaction. This work provides major insight into the role of the human DDX6 in the process of mRNA decay and emphasizes the evolutionary conservation of this process across Eukaryotes.

      Overall, the work done by Weber et al. is sound, with the proper controls. The authors expand significantly on the knowledge of what we know about DDX6 in the process of mRNA decay in humans, confirming the evolutionary conservation of the role of this factor across eukaryotes. The analysis of the RNA-seq and Ribo-seq data could be more in-depth, however, the authors were able to show with certainty that some transcripts containing known repetitive sequences or polybasic sequences exhibited a DDX6-mRNA decay effect.

    3. Reviewer #2 (Public Review):

      In the manuscript by Weber and colleagues, the authors investigated the role of a DEAD-box helicase DDX6 in regulating mRNA stability upon ribosome slowdown in human cells. The authors knocked out DDX6 KO in HEK293T cells and showed that the half-life of a reporter containing a rare codon repeat is elongated in the absence of DDX6. By analogy to the proposed function of fission yeast Dhh1p (DDX6 homolog) as a sensor for slow ribosomes, the authors demonstrated that recombinant DDX6 interacted with human ribosomes. The interaction with the ribosome was mediated by the FDF motif of DDX6 located in its RecA2 domain, and rescue experiments showed that DDX6 requires the FDF motif as well as its interaction with the CCR4-NOT deadenylase complex and ATPase activity for degrading a reporter mRNA with rare codons. To identify endogenous mRNAs regulated by DDX6, they performed RNA-Seq and ribosome footprint profiling. The authors focused on mRNAs whose stability is increased in DDX6 KO cells with high local ribosome density and validated that such mRNA sequences induced mRNA degradation in a DDX6-dependent manner.

      The experiments were well-performed, and the results clearly demonstrated the requirement of DDX6 in mRNA degradation induced by slowed ribosomes.

      [Editors' note: The authors have addressed the key points from the previous public reviews in their revised manuscript.]

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses:

      The authors fail to truly define codon optimality, rare codons, and stalling sequences in their work, all of which are distinct terminologies. They use reporters with rare codon usage but do not mention what metrics they use to determine this, such as cAI, codon usage bias, or tAI. The distinction between the type of codon sequences that DDX6 affects is very important to differentiate and should be done here as certain stretches of codons are known to lead to different quality control RNA decay pathways that are not reliant on canonical mRNA decay factors.

      Thank you for the reviewer’s feedback on our work. Clearly defining codon optimality, rare codons, and stalling sequences is indeed crucial. We will emphasize this distinction more in our revisions to help readers better understand our analysis and findings.

      Likewise, the authors sort their Ribo-seq data to determine genes that might exhibit a DDX6 specific mRNA decay effect but fail to go into great depth about common features shared among these genes other than GO term analysis, GC content, and coding sequence (CDS) length. The authors then sort out 35 genes that are both upregulated at the mRNA level and have increased local ribosome footprint along the ORF. They are then able to show that 6 out of 9 of those genes had a DDX6-dependent mRNA decay effect. There was no comment or effort as to why 2 out of those 6 genes tested did not show as strong of a DDX6-dependent decay effect relative to the other targets tested. Thus, the efforts to identify mRNA features at a global level that exhibited DDX6-dependent mRNA decay effects are lacking in this analysis.

      We appreciate the reviewer's insightful comments regarding the need to further characterize the genes influenced by DDX6-mediated mRNA decay. To address this, we carried out additional analyses to identify potential traits of these genes. Our findings revealed that DDX6-regulated coding sequences tend to be longer and exhibit lower predicted mRNA stability scores compared to the average across the transcriptome. This observation indicates a possible connection to codon optimality. It suggests that DDX6 could play a role in regulating a specific subset of mRNAs with inherently lower stability, potentially shedding light on why some genes may exhibit varied decay patterns when DDX6 is depleted.

      Overall, the work done by Weber et al. is sound, with the proper controls. The authors expand significantly on the knowledge of what we know about DDX6 in the process of mRNA decay in humans, confirming the evolutionary conservation of the role of this factor across eukaryotes. The analysis of the RNA-seq and Ribo-seq data could be more in-depth, however, the authors were able to show with certainty that some transcripts containing known repetitive sequences or polybasic sequences exhibited a DDX6-mRNA decay effect.

      We appreciate the reviewer’s acknowledgment of the soundness of our work and the inclusion of proper controls. We are committed to refining our manuscript to meet your expectations and ensure the accuracy and depth of our findings.

      Reviewer #2 (Public Review):

      The experiments were well-performed, and the results clearly demonstrated the requirement of DDX6 in mRNA degradation induced by slowed ribosomes. However, in some cases, the authors interpreted their data in a biased way, possibly influenced by the yeast study, and drew too strong conclusions. In addition, the authors should have cited important studies about codon optimality in mammalian cells. This lack of information hinders placing their important discoveries in a correct context.

      (1) Although the authors concluded that DDX6 acts as a sensor of the slowed ribosome, it is not clear if DDX6 indeed senses the ribosome speed. What the authors showed is a requirement of DDX6 for mRNA decay induced by rare codons, and DDX6 binds to the ribosome to exert this role. For example, DDX6 may bridge the sensor and decay machinery on the ribosome. Without structural or biochemical data on the recognition of the slowed ribosome by DDX6, the role of DDX6 as a sensor remains one of the possible models. It should be described in the discussion section.

      We greatly appreciate the reviewer’s comments and suggestions. We agree that our study does not directly establish that DDX6 senses ribosome speed. We also agree that without structural or biochemical data demonstrating recognition of the slowed ribosome by DDX6, the role of DDX6 as a sensor remains one of the possible models. We will incorporate this point into the discussion section and acknowledge it as an important direction for future research.

      (2) It is not clear if DDX6 directly binds the ribosome. The authors used ribosomes purified by sucrose cushion, but ribosome-associating and FDF motif-interacting factors might remain on ribosomes, even after RNaseI treatment. Without structural or biochemical data of the direct interaction between the ribosome and DDX6, the authors should avoid description as if DDX6 directly binds to the ribosome.

      We agree with the reviewer’s perspective that, even after RNase I treatment, factors associated with the ribosome and interacting with the FDF motif might still remain on the ribosomes that were purified via a sucrose cushion. In the revised manuscript, we will describe the relationship between DDX6 and the ribosome more cautiously, avoiding the depiction of DDX6 directly binding to the ribosome.

      (3) Although the authors performed rigorous reporter assays recapitulating the effect of ribosome-retardation sequences on mRNA stability, this is not the first report showing that codon optimality determines mRNA stability in human cells. The authors did not cite important previous studies, such as Wu et al., 2019 (PMID: 31012849), Hia et al., 2019 (PMID: 31482640), Narula et al., 2019 (PMID: 31527111), and Forrest et al., 2020 (PMID: 32053646). These milestone papers should be cited in the Introduction, Results, and Discussion.

      Thank you for the reviewer’s correction. We apologize for the oversight in our references. In the revised manuscript, we will ensure these key studies are appropriately cited.

      (4) While both DDX6 and deadenylation by the CCR4-NOT were required for mRNA decay by the slowed ribosome, whether DDX6 is required for deadenylation was not investigated. Given that the CCR4-NOT deadenylate complex directly interacts with the empty ribosome E-site in yeast and humans (Buschauer et al., 2020 PMID: 32299921 and Absmeier et al., 2023 PMID: 37653243), whether the loss of DDX6 also affected the action of the CCR4-NOT complex is an important point to investigate, or at least should be discussed in this paper.

      We sincerely appreciate the reviewer's valuable suggestions. This point is indeed crucial, and we have addressed it in the revised version of our manuscript. We have included experimental results confirming that the knockout of DDX6 does not impact the CCR4-NOT complex’s deadenylation function. This addition will contribute to a more comprehensive discussion of the relevant issues and refine our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors should explain what they use to determine rare codons in their system and distinguish this feature from codon optimality. Codon optimality is a distinct feature from rare codon usage, and both should be defined better in the context of the paper. The authors interchange between the use of codon optimality, rare codon usage, and translation stalling sequences frequently and should explain and clarify these terms or consider only referring to translation stalling sequences for their discussion.

      We appreciate the reviewer's valuable feedback, we have been able to improve the clarity and rigor of the relevant statements in the manuscript. In the revised manuscript, we have provided more explicit and detailed explanations regarding the definition and use of rare codons, and differentiated this from codon optimality, in order to help readers better understand the basis of our analysis and research findings. Furthermore, in the revised manuscript, we are now referring exclusively to 'translation stalling sequences' in our discussion, in order to provide greater clarity.

      Reviewer #2 (Recommendations For The Authors):

      Interestingly, the translation efficiency of zinc-finger domain mRNAs was increased in DDX6 KO cells. This finding is consistent with the previous study reporting that mRNAs encoding zinc-finger domains are enriched with non-optimal codons and unstable. (Diez et al., 2022 PMID: 35840631). The authors might want to cite this paper and mention the consistency of the two studies.

      Thank you for noting the relevance of the increased translation efficiency of zinc-finger domain mRNAs in DDX6 KO cells. We will reference the study by Diez et al. (2022) and emphasize the consistency between their findings and ours, which supports the idea that DDX6 is involved in regulating the translation of mRNAs with these characteristics.

      A mutagenesis analysis of the poly-basic residues of BMP2 would further strengthen the authors' claim that this sequence is a primal cause of ribosome slowdown and mRNA decay.

      We greatly appreciate the reviewer’s suggestion to conduct a mutagenesis analysis of the poly-basic residues of BMP2. We agree that such an analysis could potentially strengthen our claim. However, considering the constraints we are currently encountering, and our study has already provided substantial evidence to support our findings, we believe that at this stage of our research, conducting this analysis may not be the most immediate priority. We will consider undertaking a mutagenesis analysis in future studies to further validate our conclusions.

      In the Introduction, RQC is not commonly referred to as "ribosome-based quality control." Please consider the use of "ribosome-associated quality control."

      We appreciate the reviewer providing this suggestion. During the revision process, we corrected the relevant terminology to ensure more precise and appropriate usage.

      In the Introduction, the authors should avoid introducing NMD as a part of RQC. NMD was discovered and defined independently of RQC.

      Thank you for pointing out this important distinction. We recognize that NMD was discovered and defined independently from RQC, and should not be presented as an integral part of the RQC process. In the revised manuscript, we have made sure to avoid introducing nonsense-mediated decay (NMD) as a component of ribosome-associated quality control (RQC).

    1. eLife assessment

      This study presents a useful description of RNA in extracellular vesicles (EV-RNAs) and highlights the potential to develop biomarkers for the early detection of colorectal cancer (CRC) and precancerous adenoma (AA). The data were analysed using overall solid methodology and would benefit from further validation of predicted lncRNAs and biomarker validation at each stage of CRC/AA to evaluate the potential application to early detection of CRC and AA.

    2. Joint Public Review:

      Detection of early-stage colorectal cancer is of great importance. Laboratory scientists and clinicians have reported different exosomal biomarkers to identify colorectal cancer patients. This is a proof-of-principle study of whether exosomal RNAs, and particularly predicted lncRNAs, are potential biomarkers of early-stage colorectal cancer and its precancerous lesions.

      Strengths:

      The study provides a valuable dataset of the whole-transcriptomic profile of circulating sEVs, including miRNA, mRNA, and lncRNA. This approach adds to the understanding of sEV-RNAs' role in CRC carcinogenesis and facilitates the discovery of potential biomarkers.

      The developed 60-gene t-SNE model successfully differentiated T1a stage CRC/AA from normal controls with high specificity and sensitivity, indicating the potential of sEV-RNAs as diagnostic markers for early-stage colorectal lesions.

      The study combines RNA-seq, RT-qPCR, and modelling algorithms to select and validate candidate sEV-RNAs, maximising the performance of the developed RNA signature. The comparison of different algorithms and consideration of other factors enhance the robustness of the findings.

      Weaknesses:

      Validation in larger cohorts would be required to establish as biomarkers and to demonstrate whether the predicted lncRNAs implicated in these biomarkers are indeed present and whether they are robustly predictive/prognostic.

      The following points were noted during preprint review:

      (1) Lack of analysis on T1-only patients in the validation cohort: While the study identifies key sEV-RNAs associated with T1a stage CRC and AA, the validation cohort is only half of the patients in T1(25 out of 49). It would be better to do an analysis using only the T1 patients in the validation cohort, so the conclusion is not affected by the T2-T3 patients.

      (2) Lack of performance analysis across different demographic and tumor pathology factors listed in Supplementary Table 12. It's important to know if the sEV-RNAs identified in the study work better/worse in different age/sex/tumor size/Yamada subtypes etc.

      (3) The authors tested their models in a medium size population of 124 individuals, which is not enough to obtain an accurate evaluation of the specificity and sensitivity of the biomarkers proposed here. External validation would be required.

      (4) Depicting the full RNA landscape of circulating exosomes is still quite challenging. The authors annotated 58,333 RNA species in exosomes, most of which were lncRNAs, with annotation methods briefly described in Suppl Methods.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      Detection of early-stage colorectal cancer is of great importance. Laboratory scientists and clinicians have reported different exosomal biomarkers to identify colorectal cancer patients. This is a proof-of-principle study of whether exosomal RNAs, and particularly predicted lncRNAs, potential biomarkers of early-stage colorectal cancer and its precancerous lesions.

      Strengths:

      The study provides a valuable dataset of the whole-transcriptomic profile of circulating sEVs, including miRNA, mRNA, and lncRNA. This approach adds to the understanding of sEV-RNAs' role in CRC carcinogenesis and facilitates the discovery of potential biomarkers.

      The developed 60-gene t-SNE model successfully differentiated T1a stage CRC/AA from normal controls with high specificity and sensitivity, indicating the potential of sEV-RNAs as diagnostic markers for early-stage colorectal lesions.

      The study combines RNA-seq, RT-qPCR, and modelling algorithms to select and validate candidate sEV-RNAs, maximising the performance of the developed RNA signature. The comparison of different algorithms and consideration of other factors enhance the robustness of the findings.

      Weaknesses:

      Validation in larger cohorts would be required to establish as biomarkers, and to demonstrate whether the predicted lncRNAs implicated in these biomarkers are indeed present, and whether they are robustly predictive/prognostic.

      Thank you for your careful evaluation and valuable suggestions, which have provided valuable guidance for the improvement of our paper. In response to your feedback, we have implemented the following improvements.

      (1) More detail about how lncRNA and miRNA candidates were defined, and how this compares to previously published miRNA and lncRNA predictions. The Suppl Methods section for lncRNAs does not describe in detail how the "CPC/CNCI/Pfam" "methods" were combined to define lncRNAs here.

      Author response and action taken: Thanks for your comments. In the Supplementary Methods section titled " Selection of Predictive Biomarkers", we have provided a more detailed illustration regarding the screening process for candidate RNA biomarkers. The revised section is as follows: To ensure the predictive performance of the sEV-RNA signature, candidate sEV-RNAs were ultimately selected based on their fold change in colorectal cancer/ precancerous advanced adenoma, absolute abundance, and module attribution. In detail, we initially selected the top 10 RNAs from each category (mRNA, miRNA, and lncRNA) with a fold change greater than 4. In cases where fewer than 10 RNAs were meeting this criterion, all RNAs with a fold change greater than 4 were included. Subsequently, we filtered out RNAs with low abundance, and we selected the top-ranked RNAs from each module based on the fold change ranking for inclusion in the final model.

      Compared to most previous studies on EV biomarkers, the overall discriminative performance of the biomarker model we constructed is considerable, holding clinical value for practical application. In contrast, the supplementary merit of this study lies in uncovering the heterogeneity at the whole transcriptome level among samples of different categories, providing a more comprehensive insight into the dynamic changes of biological states. For instance, we inferred the cell subtypes of EV origins through ssGSEA and correlated them with the tumor microenvironment status. The regulatory relationships among different RNA categories were delineated, and their impacts on biological signaling pathways were analyzed, a feat challenging to accomplish solely through sequencing of a single RNA category.

      In the Supplementary Methods section titled " Identification of mRNAs and lncRNAs", we have provided a more detailed explanation regarding how the "CPC/CNCI/Pfam" methods were combined to define lncRNAs. The revised section is as follows: Three computational approaches including CPC (Coding Potential Calculator)/CNCI (Coding-Non-Coding Index)/Pfam were combined to sort non-protein coding RNA candidates from putative protein-coding RNAs in the unknown transcripts. CPC is a sequence alignment-based tool used to assess protein-coding capacity. By aligning transcripts with known protein databases, CPC evaluates the biological sequence characteristics of each coding frame of the transcript to determine its coding potential and identify non-coding RNAs.1 CNCI analysis is a method used to distinguish between coding and non-coding transcripts based on adjacent nucleotide triplets. This tool does not rely on known annotation files and can effectively predict incomplete transcripts and antisense transcript pairs.2 Pfam divides protein domains into different protein families and establishes statistical models for the amino acid sequences of each family through protein sequence alignment.3 Transcripts that can be aligned are considered to have a certain protein domain, indicating coding potential, while transcripts without alignment results are potential lncRNAs. Putative protein-coding RNAs were filtered out using a minimum length and exon number threshold. Transcripts above 200 nt with more than two exons were selected as lncRNA candidates and further screened by CPC/CNCI/Pfam. We distinguished lncRNAs from protein-coding genes by intersecting the results of the three determination methods mentioned above.

      (2) The role and function of many lncRNAs are unknown, and some lncRNA species may simply be the product of pervasive transcription. Although this is an exploratory and descriptive study of potential biomarkers, it would benefit from some discussion of potential mechanisms because the proposed prediction models include lncRNAs. Do the authors have a hypothesis as to why lncRNAs were informative and predictive in this study? Are these lncRNAs well-studied and/or known to be functional? Or are they markers for pervasive transcription, for example?

      Author response and action taken: Thanks for your comments. Whole transcriptome sequencing results facilitate the discussion of regulatory mechanisms between different biomarkers, supplying evidence for future investigations. Among the three lncRNAs involved in this study, lnc-MKRN2-42:1 is involved in the occurrence and development of Parkinson's disease4. The other two lncRNAs, however, lack relevant reports. Therefore, we cannot confirm that these lncRNAs have specific biological functions. In the Supplementary Methods section titled " Identification of mRNAs and lncRNAs", we acknowledge the limited understanding of sEV-lncRNAs in current research. In contrast, many miRNAs in the model have been proven to participate in the occurrence and development of colorectal cancer, such as miR-36155, miR-425-5p6, and miR-106b-3p7. These data provide biological support for the performance of the model, which is particularly valuable for model prediction.

      (3) In the Results section "Cell-specific features of the sEV-RNA profile indicated the different proportion of cells of sEV origin among different groups", the sEV-RNA profiles were correlated with existing transcriptome profiles from specific cell types (ssGSEA) and used to estimate "tumour microenvironment-associated scores". This transcriptomic correlation is a valuable observation, but there is no further evidence provided that the sEV-RNAs profiles truly reflect differential cell types of sEV origin between the sample subgroups.

      Could the authors clarify the strength of evidence for the cells-of-origin estimates, which are based only on sEV-RNA transcriptome profiles? Would sEV-RNA-derived cells-of-origin be expected to correlate with histopath-derived scores (tumour microenvironment; immune infiltrate) for example? Or is this section intended as an exploratory description of sEV-RNAs, perhaps a check on the plausibility of the sEV-RNA profiles, rather than an accurate estimation of cells-of-origin in each subgroup?

      Author response: Thanks for your comments. This section explores the proportional distribution of EVs from different cellular subgroups solely based on transcriptome profiles and algorithms, rather than providing precise estimates of cellular origins within each subgroup.

      (4) Software and R package version numbers should be provided.

      Author response and action taken: Thanks for your comments. We have added version information for relevant R packages at the first mention in the original text (e.g., WGCNA (version 1.61), Rtsne (version 0.15), GSVA (version 1.42.0), ESTIMATE (version 1.0.13), DOSE (version 3.8.0)).

      References

      (1) Kong L, et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35, W345-349 (2007).

      (2) Sun L, et al. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Res. 41, e166 (2013).

      (3) Finn RD, et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222-230 (2014).

      (4) Wang Q, et al. Integrated analysis of exosomal lncRNA and mRNA expression profiles reveals the involvement of lnc-MKRN2-42:1 in the pathogenesis of Parkinson's disease. CNS Neurosci Ther. 26, 527-537 (2020).

      (5) Zheng G, et al. Identification and validation of reference genes for qPCR detection of serum microRNAs in colorectal adenocarcinoma patients. PLoS One. 8, e83025 (2013).

      (6) Liu D, Zhang H, Cui M, Chen C, Feng Y. Hsa-miR-425-5p promotes tumor growth and metastasis by activating the CTNND1-mediated β-catenin pathway and EMT in colorectal cancer. Cell Cycle. 19, 1917-1927 (2020).

      (7) Liu H, et al. Colorectal cancer-derived exosomal miR-106b-3p promotes metastasis by down-regulating DLC-1 expression. Clin Sci (Lond). 134, 419-434 (2020).

    1. eLife assessment

      In this study, Ger and colleagues present a valuable new technique that uses recurrent neural networks to distinguish between model misspecification and behavioral stochasticity when interpreting cognitive-behavioral model fits. Simulations provide solid evidence for the validity of this technique and broadly support the claims of the paper, although more work is needed to understand its applicability to real behavioral experiments. This technique addresses a long-standing problem that is likely to be of interest to researchers pushing the limits of cognitive computational modeling.

    2. Reviewer #1 (Public Review):

      Summary:

      Ger and colleagues address an issue that often impedes computational modeling: the inherent ambiguity between stochasticity in behavior and structural mismatch between the assumed and true model. They propose a solution to use RNNs to estimate the ceiling on explainable variation within a behavioral dataset. With this information in hand, it is possible to determine the extent to which "worse fits" result from behavioral stochasticity versus failures of the cognitive model to capture nuances in behavior (model misspecification). The authors demonstrate the efficacy of the approach in a synthetic toy problem and then use the method to show that poorer model fits to 2-step data in participants with low IQ are actually due to an increase in inherent stochasticity, rather than systemic mismatch between model and behavior.

      Strengths:

      Overall I found the ideas conveyed in the paper interesting and the paper to be extremely clear. The method itself is clever and intuitive and I believe it could potentially be useful in certain circumstances, particularly ones where the sources of structure in behavioral data are unknown. Support for the method from synthetic data is clear and compelling. The flexibility of the method means that it could potentially be applied to different types of behavioral data - without any hypotheses about the exact behavioral features that might be present in a given task.

      Weaknesses:

      That said, I have some concerns with the manuscript in its current form, largely related to the applicability of the proposed methods for problems of importance in computational cognitive neuroscience. This concern stems from the fact that the toy problem explored in the manuscript is somewhat simple, and the theoretical problem addressed in it could have been identified through other means (for example through use of posterior predictive checking for model validation), and the actual behavioral data analyzed were interpreted as a null result (failure to reject that the behavioral stochasticity hypothesis), rather than actual identification of model misspecification. Thus, in my opinion, the jury is still out on whether this method could be used to identify a case of model misspecification that actually affects how individual differences are interpreted in a real cognitive task. Furthermore, the method requires considerable data for pretraining, well beyond what would be collected in a typical behavioral study, raising further questions about its applicability in problems of practical relevance. I expand on these primary concerns and raise several smaller points below.

      A primary concern I have about this work is that it is unclear whether the method described could provide any advantage for real cognitive modeling problems beyond what is typically done to minimize the chance of model misspecification (in particular, posterior predictive checking). The toy problem examined in the manuscript is pretty extreme (two of the three synthetic agents are very far from what a human would do on the task, and the models deviate from one another to a degree that detecting the difference should not be difficult for any method). The issue posed in the toy data would easily be identified by following good modeling practices, which include using posterior predictive checking over summary measures to identify model insufficiencies, which in turn would call for the need for a broader set of models (See Wilson & Collins 2019). In this manuscript descriptive analyses are not performed ( which, to me, feels a bit problematic for a paper that aims to improve cognitive modeling practices), however I think it is almost certain that the differences between the toy models would be evident by eye in standard summary measures of two-step task data. The primary question posed in the analysis of the empirical data is as to whether fit differences related IQ might reflect systematic differences in the model across individuals, but in this case application of the newly developed method provides little evidence for structural (model) differences. Thus, it remains unclear whether the method could identify model misspecification in real world data, and even more so whether it could reveal misspecification in situations where standard posterior predictive checking techniques would fall short. The rebuttal highlighted the better fit of the RNN on the empirical data as providing positive evidence for the ability of the method to identify model insufficiency, but I see this result as having limited epistemological value, given that there is no follow up to explore what the insufficiency actually was, or why accounting for it might be important. The authors list many of the points above as limitations in their discussion section, but in my opinion, they are relatively major ones.

      The manuscript now mentions in the discussion that the newly developed methods should be seen as being just one tool in the larger toolkit of the computational cognitive modeler. However, one practical consideration here is that, since other existing tools such as simulation and descriptive analyses can be combined to 1) identify model insufficiency, 2) motivate specific model changes that can fix the problem, it is not exactly clear what the value added from the proposed method is.

      One final practical limitation of the method is that it requires extensive pretraining (on >500 participants) in existing study, limiting its applicability for most use cases.

    3. Reviewer #2 (Public Review):

      SUMMARY:

      In this manuscript, Ger and colleagues propose two complementary analytical methods aimed at quantifying the model misspecification and irreducible stochasticity in human choice behavior. The first method involves fitting recurrent neural networks (RNNs) and theoretical models to human choices and interpreting the better performance of RNNs as providing evidence of the misspecifications of theoretical models. The second method involves estimating the number of training iterations for which the fitted RNN achieves the best prediction of human choice behavior in a separate, validation data set, following an approach known as "early stopping". This number is then interpreted as a proxy for the amount of explainable variability in behavior, such that fewer iterations (earlier stopping) correspond to a higher amount of irreducible stochasticity in the data. The authors validate the two methods using simulations of choice behavior in a two-stage task, where the simulated behavior is generated by different known models. Finally, the authors use their approach in a real data set of human choices in the two-stage task, concluding that low-IQ subjects exhibit greater levels of stochasticity than high-IQ subjects.

      STRENGTHS:

      The manuscript explores an extremely important topic to scientists interested in characterizing human decision-making. While it is generally acknowledged that any computational model of behavior will be limited in its ability to describe a particular data set, one should hope to understand whether these limitations arise due to model misspecification or due to irreducible stochasticity in the data. Evidence for the former suggests that better models ought to exist; evidence for the latter suggests they might not.

      To address this important topic, the authors elaborate carefully on the rationale of their proposed approach. They describe a variety of simulations -- for which the ground truth models and the amount of behavioral stochasticity are known -- to validate their approaches. This enables the reader to understand the benefits (and limitations) of these approaches when applied to the two-stage task, a task paradigm commonly used in the field. Through a set of convincing analyses, the authors demonstrate that their approach is capable of identifying situations where an alternative, untested computational model can outperform the set of tested models, before applying these techniques to a realistic data set.

      WEAKNESSES:

      The most significant weakness is that the paper rests on the implicit assumption that the fitted RNNs explain as much variance as possible, an assumption that is likely incorrect and which can result in incorrect conclusions. While in low-dimensional tasks RNNs can predict behavior as well as the data-generating models, this is not always the case, and the paper itself illustrates (in Figure 3) several cases where the fitted RNNs fall short of the ground-truth model. In such cases, we cannot conclude that a subject exhibiting a relatively poor RNN fit necessarily has a relatively high degree of behavioral stochasticity. Instead, it is at least conceivable that this subject's behavior is generated precisely (i.e., with low noise) by an alternative model that is pooly fit by an RNN -- e.g., a model with long-term sequential dependencies, which RNNs are known to have difficulties in capturing.

      These situations could lead to incorrect conclusions for both of the proposed methods. First, the model mis-specification analysis might show equal predictive performance for a particular theoretical model and for the RNN. While a scientist might be inclined to conclude that the theoretical model explains the maximum amount of explainable variance and therefore that no better model should exist, the scenario in the previous paragraph suggests that a superior model might nonetheless exist. Second, in the early-stopping analysis, a particular subject may achieve optimal validation performance with fewer epochs than another, leading the scientist to conclude that this subject exhibits higher behavioral noise. However, as before, this could again result from the fact that this subject's behavior is produced with little noise by a different model. The possibility of such scenarios does not mean that such scenarios are common, and the conclusions drawn in the paper are likely appropriate for the particular examples analyzed. However, it is much less obvious that the RNNs will provide optimal fits in other types of tasks, particularly those with more complex rules and long-term sequential dependencies, and in such scenarios, an ill-advised scientist might end up drawing incorrect conclusions from the application of the proposed approaches. The authors acknowledge this limitation in their discussion, but it remains a significant caveat that readers should be aware of when using the technique proposed.

      In addition to this general limitation, the relationship between the number of optimal epochs and behavioral stochasticity may not hold for every task and every subject. For example, Figure 4 highlights the relationship between the optimal epochs and agent noise. Yet, it is nonetheless possible that the optimal epoch is influenced by model parameters other than inverse temperature (e.g., hyperparameters such as learning rate, etc). This could again lead to invalid conclusions, such as concluding that low-IQ is associated with optimal epoch when an alternative account might be that low-IQ is associated with low learning rate, which in turn is associated with optimal epoch. Additional factors such as the deep double-descent (Nakkiran et al., ICLR 2020) can also influence the optimal epoch value as computed by the authors. These concerns are partially addressed by the authors in the revised manuscript, where they show that the number of optimal epochs is primarily sensitive to the amount of true underlying noise, assuming the number of trials and network size are constant. The authors also acknowledge, in the discussion section, that many factors can affect the number of optimal epochs, and that inferring behavioral stochasticity from this number should be done with caution.

      APPRAISAL AND DISCUSSION:

      Overall, the authors propose a novel method that aims to solve an important problem, but since the evidence provided refers to a single task and to a single dataset, it is not clear that the method would be appropriate in general settings. In the future, it would be beneficial to test the proposed approach in a broader setting, including simulations of different tasks, different model classes, and different model parameters. Nonetheless, even without such additional work, the proposed methods are likely to be used by cognitive scientists and neuroscientists interested in assessing the quality and limits of their behavioral models.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      In this study, Ger and colleagues present a valuable new technique that uses recurrent neural networks to distinguish between model misspecification and behavioral stochasticity when interpreting cognitivebehavioral model fits. Evidence for the usefulness of this technique, which is currently based primarily on a relatively simple toy problem, is considered incomplete but could be improved via comparisons to existing approaches and/or applications to other problems. This technique addresses a long-standing problem that is likely to be of interest to researchers pushing the limits of cognitive computational modeling.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Ger and colleagues address an issue that often impedes computational modeling: the inherent ambiguity between stochasticity in behavior and structural mismatch between the assumed and true model. They propose a solution to use RNNs to estimate the ceiling on explainable variation within a behavioral dataset. With this information in hand, it is possible to determine the extent to which "worse fits" result from behavioral stochasticity versus failures of the cognitive model to capture nuances in behavior (model misspecification). The authors demonstrate the efficacy of the approach in a synthetic toy problem and then use the method to show that poorer model fits to 2-step data in participants with low IQ are actually due to an increase in inherent stochasticity, rather than systemic mismatch between model and behavior.

      Strengths:

      Overall I found the ideas conveyed in the paper interesting and the paper to be extremely clear and wellwritten. The method itself is clever and intuitive and I believe it could be useful in certain circumstances, particularly ones where the sources of structure in behavioral data are unknown. In general, the support for the method is clear and compelling. The flexibility of the method also means that it can be applied to different types of behavioral data - without any hypotheses about the exact behavioral features that might be present in a given task.

      Thank you for taking the time to review our work and for the positive remarks regarding the manuscript. Below is a point-by-point response to the concerns raised.

      Weaknesses:

      That said, I have some concerns with the manuscript in its current form, largely related to the applicability of the proposed methods for problems of importance in computational cognitive neuroscience. This concern stems from the fact that the toy problem explored in the manuscript is somewhat simple, and the theoretical problem addressed in it could have been identified through other means (for example through the use of posterior predictive checking for model validation), and the actual behavioral data analyzed were interpreted as a null result (failure to reject that the behavioral stochasticity hypothesis), rather than actual identification of model-misspecification. I expand on these primary concerns and raise several smaller points below.

      A primary question I have about this work is whether the method described would actually provide any advantage for real cognitive modeling problems beyond what is typically done to minimize the chance of model misspecification (in particular, post-predictive checking). The toy problem examined in the manuscript is pretty extreme (two of the three synthetic agents are very far from what a human would do on the task, and the models deviate from one another to a degree that detecting the difference should not be difficult for any method). The issue posed in the toy data would easily be identified by following good modeling practices, which include using posterior predictive checking over summary measures to identify model insufficiencies, which in turn would call for the need for a broader set of models (See Wilson & Collins 2019). Thus, I am left wondering whether this method could actually identify model misspecification in real world data, particularly in situations where standard posterior predictive checking would fall short. The conclusions from the main empirical data set rest largely on a null result, and the utility of a method for detecting model misspecification seems like it should depend on its ability to detect its presence, not just its absence, in real data.

      Beyond the question of its advantage above and beyond data- and hypothesis-informed methods for identifying model misspecification, I am also concerned that if the method does identify a modelinsufficiency, then you still would need to use these other methods in order to understand what aspect of behavior deviated from model predictions in order to design a better model. In general, it seems that the authors should be clear that this is a tool that might be helpful in some situations, but that it will need to be used in combination with other well-described modeling techniques (posterior predictive checking for model validation and guiding cognitive model extensions to capture unexplained features of the data). A general stylistic concern I have with this manuscript is that it presents and characterizes a new tool to help with cognitive computational modeling, but it does not really adhere to best modeling practices (see Collins & Wilson, eLife), which involve looking at data to identify core behavioral features and simulating data from best-fitting models to confirm that these features are reproduced. One could take away from this paper that you would be better off fitting a neural network to your behavioral data rather than carefully comparing the predictions of your cognitive model to your actual data, but I think that would be a highly misleading takeaway since summary measures of behavior would just as easily have diagnosed the model misspecification in the toy problem, and have the added advantage that they provide information about which cognitive processes are missing in such cases.

      As a more minor point, it is also worth noting that this method could not distinguish behavioral stochasticity from the deterministic structure that is not repeated across training/test sets (for example, because a specific sequence is present in the training set but not the test set). This should be included in the discussion of method limitations. It was also not entirely clear to me whether the method could be applied to real behavioral data without extensive pretraining (on >500 participants) which would certainly limit its applicability for standard cases.

      The authors focus on model misspecification, but in reality, all of our models are misspecified to some degree since the true process-generating behavior almost certainly deviates from our simple models (ie. as George Box is frequently quoted, "all models are wrong, but some of them are useful"). It would be useful to have some more nuanced discussion of situations in which misspecification is and is not problematic.

      We thank the reviewer for these comments and have made changes to the manuscript to better describe these limitations. We agree with the reviewer and accept that fitting a neural network is by no means a substitute for careful and dedicated cognitive modeling. Cognitive modeling is aimed at describing the latent processes that are assumed to generate the observed data, and we agree that careful description of the data-generating mechanisms, including posterior predictive checks, is always required. However, even a well-defined cognitive model might still have little predictive accuracy, and it is difficult to know how much resources should be put into trying to test and develop new cognitive models to describe the data. We argue that RNN can lead to some insights regarding this question, and highlight the following limitations that were mentioned by the review: 

      First, we accept that it is important to provide positive evidence for the existence of model misspecification. In that sense, a result where the network shows dramatic improvement over the best-fitting theoretical model is easier to interpret compared to when the network shows no (or very little) improvement in predictive accuracy. This is because there is always an option that the network, for some reason, was not flexible enough to learn the data-generating model, or because the data-generating mechanism has changed from training to test. We have now added this more clearly in the limitation section. However, when it comes to our empirical results, we would like to emphasize that the network did in fact improve the predictive accuracy for all participants. The result shows support in favor of a "null" hypothesis in the sense that we seem to find evidence that the change in predictive accuracy between the theoretical model and RNN is not systematic across levels of IQ. This allows us to quantify evidence (use Bayesian statistics) for no systematic model misspecification as a function of IQ. While it is always possible that a different model might systematically improve the predictive accuracy of low vs high IQ individuals' data, this seems less likely given the flexibility of the current results.  

      Second, we agree that our current study only applies to the RL models that we tested. In the context of RL, we have used a well-established and frequently applied paradigm and models. We emphasize in the discussion that simulations are required to further validate other uses for this method with other paradigms.  

      Third, we also accept that posterior predictive checks should always be capitalized when possible, which is now emphasized in the discussion. However, we note that these are not always easy to interpret in a meaningful way and may not always provide details regarding model insufficiencies as described by the reviewer. It is very hard to determine what should be considered as a good prediction and since the generative model is always unknown, sometimes very low predictive accuracy can still be at the peak of possible model performance. This is because the data might be generated from a very noisy process, capping the possible predictive accuracy at a very low point. However, when strictly using theoretical modeling, it is very hard to determine what predictive accuracy to expect. Also, predictive checks are not always easy to interpret visually or otherwise. For example, in two-armed bandit tasks where there are only two actions, the prediction of choices is easier to understand in our opinion when described using a confusion matrix that summarizes the model's ability to predict the empirical behavior (which becomes similar to the predictive estimation we describe in eq 22).  

      Finally, this approach indeed requires a large dataset, with at least three sessions for each participant (training, validation, and test). Further studies might shed more light on the use of optimal epochs as a proxy for noise/complexity that can be used with less data (i.e., training and validation, without a test set).

      Please see our changes at the end of this document.  

      Reviewer #2 (Public Review):

      SUMMARY:

      In this manuscript, Ger and colleagues propose two complementary analytical methods aimed at quantifying the model misspecification and irreducible stochasticity in human choice behavior. The first method involves fitting recurrent neural networks (RNNs) and theoretical models to human choices and interpreting the better performance of RNNs as providing evidence of the misspecifications of theoretical models. The second method involves estimating the number of training iterations for which the fitted RNN achieves the best prediction of human choice behavior in a separate, validation data set, following an approach known as "early stopping". This number is then interpreted as a proxy for the amount of explainable variability in behavior, such that fewer iterations (earlier stopping) correspond to a higher amount of irreducible stochasticity in the data. The authors validate the two methods using simulations of choice behavior in a two-stage task, where the simulated behavior is generated by different known models. Finally, the authors use their approach in a real data set of human choices in the two-stage task, concluding that low-IQ subjects exhibit greater levels of stochasticity than high-IQ subjects.

      STRENGTHS:

      The manuscript explores an extremely important topic to scientists interested in characterizing human decision-making. While it is generally acknowledged that any computational model of behavior will be limited in its ability to describe a particular data set, one should hope to understand whether these limitations arise due to model misspecification or due to irreducible stochasticity in the data. Evidence for the former suggests that better models ought to exist; evidence for the latter suggests they might not.

      To address this important topic, the authors elaborate carefully on the rationale of their proposed approach. They describe a variety of simulations - for which the ground truth models and the amount of behavioral stochasticity are known - to validate their approaches. This enables the reader to understand the benefits (and limitations) of these approaches when applied to the two-stage task, a task paradigm commonly used in the field. Through a set of convincing analyses, the authors demonstrate that their approach is capable of identifying situations where an alternative, untested computational model can outperform the set of tested models, before applying these techniques to a realistic data set.

      Thank you for reviewing our work and for the positive tone. Please find below a point-by-point response to the concerns you have raised.

      WEAKNESSES:

      The most significant weakness is that the paper rests on the implicit assumption that the fitted RNNs explain as much variance as possible, an assumption that is likely incorrect and which can result in incorrect conclusions. While in low-dimensional tasks RNNs can predict behavior as well as the data-generating models, this is not *always* the case, and the paper itself illustrates (in Figure 3) several cases where the fitted RNNs fall short of the ground-truth model. In such cases, we cannot conclude that a subject exhibiting a relatively poor RNN fit necessarily has a relatively high degree of behavioral stochasticity. Instead, it is at least conceivable that this subject's behavior is generated precisely (i.e., with low noise) by an alternative model that is poorly fit by an RNN - e.g., a model with long-term sequential dependencies, which RNNs are known to have difficulties in capturing.

      These situations could lead to incorrect conclusions for both of the proposed methods. First, the model misspecification analysis might show equal predictive performance for a particular theoretical model and for the RNN. While a scientist might be inclined to conclude that the theoretical model explains the maximum amount of explainable variance and therefore that no better model should exist, the scenario in the previous paragraph suggests that a superior model might nonetheless exist. Second, in the earlystopping analysis, a particular subject may achieve optimal validation performance with fewer epochs than another, leading the scientist to conclude that this subject exhibits higher behavioral noise. However, as before, this could again result from the fact that this subject's behavior is produced with little noise by a different model. Admittedly, the existence of such scenarios *in principle* does not mean that such scenarios are common, and the conclusions drawn in the paper are likely appropriate for the particular examples analyzed. However, it is much less obvious that the RNNs will provide optimal fits in other types of tasks, particularly those with more complex rules and long-term sequential dependencies, and in such scenarios, an ill-advised scientist might end up drawing incorrect conclusions from the application of the proposed approaches.

      Yes, we understand and agree. A negative result where RNN is unable to overcome the best fitting theoretical model would always leave room for doubt regarding the fact that a different approach might yield better results. In contrast, a dramatic improvement in predictive accuracy for RNN is easier to interpret since it implies that the theoretical model can be improved. We have made an effort to make this issue clear and more articulated in the discussion. We specifically and directly mention in the discussion that “Equating RNN performance with the generative model should be avoided”.   

      However, we would like to note that our empirical results provided a somewhat more nuanced scenario where we found that the RNN generally improved the predictive accuracy of most participants. Importantly, this improvement was found to be equal across participants with no systematic benefits for low vs high IQ participants. We understand that there is always the possibility that another model would show a systematic benefit for low vs. high IQ participants, however, we suggest that this is less likely given the current evidence. We have made an effort to clearly note these issues in the discussion.  

      In addition to this general limitation, the paper also makes a few additional claims that are not fully supported by the provided evidence. For example, Figure 4 highlights the relationship between the optimal epochs and agent noise. Yet, it is nonetheless possible that the optimal epoch is influenced by model parameters other than inverse temperature (e.g., learning rate). This could again lead to invalid conclusions, such as concluding that low-IQ is associated with optimal epoch when an alternative account might be that low-IQ is associated with low learning rate, which in turn is associated with optimal epoch. Yet additional factors such as the deep double-descent (Nakkiran et al., ICLR 2020) can also influence the optimal epoch value as computed by the authors.

      An additional issue is that Figure 4 reports an association between optimal epoch and noise, but noise is normalized by the true minimal/maximal inverse-temperature of hybrid agents (Eq. 23). It is thus possible that the relationship does not hold for more extreme values of inverse-temperature such as beta=0 (extremely noisy behavior) or beta=inf (deterministic behavior), two important special cases that should be incorporated in the current study. Finally, even taking the association in Figure 4 at face value, there are potential issues with inferring noise from the optimal epoch when their correlation is only r~=0.7. As shown in the figures, upon finding a very low optimal epoch for a particular subject, one might be compelled to infer high amounts of noise, even though several agents may exhibit a low optimal epoch despite having very little noise.

      Thank you for these comments. Indeed, there is much we do not yet fully understand about the factors that influence optimal epochs. Currently, it is clear to us that the number of optimal epochs is influenced by a variety of factors, including network size, the data size, and other cognitive parameters, such as the learning rate. We hope that our work serves as a proof-of-concept, suggesting that, in certain scenarios, the number of epochs can be utilized as an empirical estimate. Moreover, we maintain that, at least within the context of the current paradigm, the number of optimal epochs is primarily sensitive to the amount of true underlying noise, assuming the number of trials and network size are constant. We are therefore hopeful that this proofof-concept will encourage research that will further examine the factors that influence the optimal epochs in different behavioral paradigms.  

      To address the reviewer's justified concerns, we have made several amendments to the manuscript. First, we added an additional version of Figure 4 in the Supplementary Information material, where the noise parameter values are not scaled. We hope this adjustment clarifies that the parameters were tested across a broad spectrum of values (e.g., 0 to 10 for the hybrid model), spanning the two extremes of complete randomness and high determinism. Second, we included a linear regression analysis showing the association of all model parameters (including noise) with the optimal number of epochs. As anticipated by the reviewer, the learning rate was also found to be associated with the number of optimal epochs. Nonetheless, the noise parameter appears to maintain the most substantial association with the number of optimal epochs. We have also added a specific mentioning of these associations in the discussion, to inform readers that the association between the number of optimal epochs and model parameters should be examined using simulation for other paradigms/models. Lastly, we acknowledge in the discussion that the findings regarding the association between the number of optimal epochs and noise warrant further investigation, considering other factors that might influence the determination of the optimal epoch point and the fact that the correlation with noise is strong, but not perfect (in the range of 0.7).

      The discussion now includes the following:

      “Several limitations should be considered in our proposed approach. First, fitting a data-driven neural network is evidently not enough to produce a comprehensive theoretical description of the data generation mechanisms. Currently, best practices for cognitive modeling \citep{wilson2019ten} require identifying under what conditions the model struggles to predict the data (e.g., using posterior predictive checks), and describing a different theoretical model that could account for these disadvantages in prediction. However, identifying conditions where the model shortcomings in predictive accuracy are due to model misspecifications rather than noisier behavior is a challenging task. We propose leveraging data-driven RNNs as a supplementary tool, particularly when they significantly outperform existing theoretical models, followed by refined theoretical modeling to provide insights into what processes were mis-specified in the initial modeling effort.

      Second, although we observed a robust association between the optimal number of epochs and true noise across varying network sizes and dataset sizes (see Fig.~\ref{figS2}), additional factors such as network architecture and other model parameters (e.g., learning rate, see .~\ref{figS7}) might influence this estimation. Further research is required to allow us to better understand how and why different factors change the number of optimal epochs for a given dataset before it can be applied with confidence to empirical investigations. 

      Third, the empirical dataset used in our study consisted of data collected from human participants at a single time point, serving as the training set for our RNN. The test set data, collected with a time interval of approximately $\sim6$ and $\sim18$ months, introduced the possibility of changes in participants' decision-making strategies over time. In our analysis, we neglected any possible changes in participants' decision-making strategies during that time, changes that may lead to poorer generalization performance of our approach. Thus, further studies are needed to eliminate such possible explanations.

      Fourth, our simulations, albeit illustrative, were confined to known models, necessitating in-silico validation before extrapolating the efficacy of our approach to other model classes and tasks. Our aim was to showcase the potential benefits of using a data-driven approach, particularly when faced with unknown models. However, whether RNNs will provide optimal fits for tasks with more complex rules and long-term sequential dependencies remains uncertain.

      Finally, while positive outcomes where RNNs surpass theoretical models can prompt insightful model refinement, caution is warranted in directly equating RNN performance with that of the generative model, as seen in our simulations (e.g., Figure 3). We highlight that our empirical findings depict a more complex scenario, wherein the RNN enhanced the predictive accuracy for all participants uniformly. Notably, we also provide evidence supporting a null effect among individuals, with no consistent difference in RNN improvement over the theoretical model based on IQ. Although it remains conceivable that a different datadriven model could systematically heighten the predictive accuracy for individuals with lower IQs in this task, such a possibility seems less probable in light of the current findings.”

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      Is the t that gets fed as input to RNN just timestep?

      t = last transition type (rare/common). not timestep

      Line 378: what does "optimal epochs" mean here?

      The number of optimal training epochs that minimize both underfitting and overfitting (define in the line ~300)

      Line 443: I don't think "identical" is the right word here - surely the authors just mean that there is not an obvious systematic difference in the distributions.

      Fixed

      I was expecting to see ~500 points in Figure 7a, but there seem to be only 50... why weren't all datasets with at least 2 sessions used for this analysis?

      We used the ~500 subjects (only 2 datasets) to pre-train the RNN, and then fine-tuned the pre-trained RNN on the other 54 subjects that have 3 datasets. The correlation of IQ and optimal epoch also hold for the 500 subjects as shown below. 

      Author response image 1.

      Reviewer #2 (Recommendations For The Authors):

      Figure 3b: despite spending a long time trying to understand the meaning of each cell of the confusion matrix, I'm still unsure what they represent. Would be great if you could spell out the meaning of each cell individually, at least for the first matrix in the paper.

      We added a clarification to the Figure caption. 

      Figure 5: Why didn't the authors show this exact scenario using simulated data? It would be much easier to understand the predictions of this figure if they had been demonstrated in simulated data, such as individuals with different amounts of behavioral noise or different levels of model misspecifications.

      In Figure 5 the x-axis represents IQ. Replacing the x-axis with true noise would make what we present now as Figure 4. We have made an effort to emphasize the meaning of the axes in the caption. 

      Line 195 ("...in the action selection. Where"). Typo? No period is needed before "where".

      Fixed

      Line 213 ("K dominated-hand model"). I was intrigued by this model, but wasn't sure whether it has been used previously in the literature, or whether this is the first time it has been proposed.

      This is the first time that we know of that this model is used.  

      Line 345 ("This suggests that RNN is flexible enough to approximate a wide range of different behavioral models"): Worth explaining why (i.e., because the GRUs are able to capture dependencies across longer delays than a k-order Logistic Regression model).

      Line 356 ("We were interested to test"): Suggestion: "We were interested in testing".

      Fixed

      Line 389 ("However, as long as the number of observations and the size of the network is the same between two datasets, the number of optimal epochs can be used to estimate whether the dataset of one participant is noisier compared with a second dataset."): This is an important claim that should ideally be demonstrated directly. The paper only illustrates this effect through a correlation and a scatter plot, where higher noise tends to predict a lower optimal epoch. However, is the claim here that, in some circumstances, optimal epoch can be used to *deterministically* estimate noise? If so, this would be a strong result and should ideally be included in the paper.

      We have now omitted this sentenced and toned down our claims, suggesting that while we did find a strong association between noise and optimal epochs, future research is required to established to what extent this could be differentiated from other factors (i.e., network size, amount of observations).

    1. Reviewer #1 (Public Review):

      In this paper the authors provide a characterisation of auditory responses (tones, noise, and amplitude modulated sounds) and bimodal (somatosensory-auditory) responses and interactions in the higher order lateral cortex (LC) of the inferior colliculus (IC) and compare these characteristic with the higher order dorsal cortex (DC) of the IC - in awake and anaesthetised mice. Dan Llano's group have previously identified gaba'ergic patches (modules) in the LC distinctly receiving inputs from somatosensory structures, surrounded by matrix regions receiving inputs from auditory cortex. They here use 2P calcium imaging combined with an implanted prism to - for the first time - get functional optical access to these subregions (modules and matrix) in the lateral cortex of IC in vivo, in order to also characterise the functional difference in these subparts of LC. They find that both DC and LC of both awake and anaesthetised appears to be more responsive to more complex sounds (amplitude modulated noise) compared to pure tones and that under anesthesia the matrix of LC is more modulated by specific frequency and temporal content compared to the gaba'ergic modules in LC. However, while both LC and DC appears to have low frequency preferences, this preference for low frequencies is more pronounced in DC. Furthermore, in both awake and anesthetized mice somatosensory inputs are capable of driving responses on its own in the modules of LC, but very little in the matrix. The authors now compare bimodal interactions under anaesthesia and awake states and find that effects are different in some cases under awake and anesthesia - particularly related to bimodal suppression and enhancement in the modules.

      The paper provides new information about how subregions with different inputs and neurochemical profiles in the higher order auditory midbrain process auditory and multisensory information, and is useful for the auditory and multisensory circuits neuroscience community.

      The manuscript is improved by the response to reviewers. The authors have addressed my comments by adding new figures and panels, streamlining the analysis between awake and anaesthetised data (which has led to a more nuanced, and better supported conclusion), and adding more examples to better understand the underlying data. In streamlining the analyses between anaesthetised and awake data I would probably have opted for bringing these results into merged figures to avoid repetitiveness and aid comparison, but I acknowledge that that may be a matter of style. The added discussions of differences between awake and anaesthesia in the findings and the discussion of possible reasons why these differences are present help broaden the understanding of what the data looks like and how anaesthesia can affect these circuits.

      As mentioned in my previous review, the strength of this study is in its demonstration of using prism 2p imaging to image the lateral shell of IC to gain access to its neurochemically defined subdivisions, and they use this method to provide a basic description of the auditory and multisensory properties of lateral cortex IC subdivisions (and compare it to dorsal cortex of IC). The added analysis, information and figures provide a more convincing foundation for the descriptions and conclusions stated in the paper. The description of the basic functionality of the lateral cortex of the IC are useful for researchers interested in basic multisensory interactions and auditory processing and circuits. The paper provides a technical foundation for future studies (as the authors also mention), exploring how these neurochemically defined subdivisions receiving distinct descending projections from cortex contribute to auditory and multisensory based behaviour.

      Minor comment:<br /> - The authors have now added statistics and figures to support their claims about tonotopy in DC and LC. I asked for and I think allows readers to better understand the tonotopical organisation in these areas. One of the conclusions by the authors is that the quadratic fit is a better fit that a linear fit in DCIC. Given the new plots shown and previous studies this is likely true, though it is worth highlighting that adding parameters to a fitting procedure (as in the case when moving from linear to quadratic fit) will likely lead to a better fit due to the increased flexibility of the fitting procedure.

    2. Reviewer #2 (Public Review):

      Summary:

      The study describes differences in responses to sounds and whisker deflections as well as combinations of these stimuli in different neurochemically defined subsections of the lateral and dorsal cortex of the inferior colliculus in anesthetised and awake mice.

      Strengths:

      A major achievement of the work lies in obtaining the data in the first place as this required establishing and refining a challenging surgical procedure to insert a prism that enabled the authors to visualise the lateral surface of the inferior colliculus. Using this approach, the authors were then able to provide the first functional comparison of neural responses inside and outside of the GABA-rich modules of the lateral cortex. The strongest and most interesting aspects of the results, in my opinion, concern the interactions of auditory and somatosensory stimulation. For instance, the authors find that a) somatosensory-responses are strongest inside the modules and b) somatosensory-auditory suppression is stronger in the matrix than in the modules. This suggests that, while somatosensory inputs preferentially target the GABA-rich modules, they do not exclusively target GABAergic neurons within the modules (given that the authors record exclusively from excitatory neurons we wouldn't expect to see somatosensory responses if they targeted exclusively GABAergic neurons) and that the GABAergic neurons of the modules (consistent with previous work) preferentially impact neurons outside the modules, i.e. via long-range connections.

      Weaknesses:

      While the findings are of interest to the subfield they have only rather limited implications beyond it and the writing is not quite as precise as it could be.

    3. Reviewer #3 (Public Review):

      The lateral cortex of the inferior colliculus (LC) is a region of the auditory midbrain noted for receiving both auditory and somatosensory input. Anatomical studies have established that somatosensory input primarily impinges on "modular" regions of the LC, which are characterized by high densities of GABAergic neurons, while auditory input is more prominent in the "matrix" regions that surround the modules. However, how auditory and somatosensory stimuli shape activity, both individually and when combined, in the modular and matrix regions of the LC has remained unknown.

      The major obstacle to progress has been the location of the LC on the lateral edge of the inferior colliculus where it cannot be accessed in vivo using conventional imaging approaches. The authors overcame this obstacle by developing methods to implant a microprism adjacent to the LC. By redirecting light from the lateral surface of the LC to the dorsal surface of the microprism, the microprism enabled two-photon imaging of the LC via a dorsal approach in anesthetized and awake mice. Then, by crossing GAD-67-GFP mice with Thy1-jRGECO1a mice, the authors showed that they could identify LC modules in vivo using GFP fluorescence while assessing neural responses to auditory, somatosensory, and multimodal stimuli using Ca2+ imaging. Critically, the authors also validated the accuracy of the microprism technique by directly comparing results obtained with a microprism to data collected using conventional imaging of the dorsal-most LC modules, which are directly visible on the dorsal IC surface, finding good correlations between the approaches.

      Through this innovative combination of techniques, the authors found that matrix neurons were more sensitive to auditory stimuli than modular neurons, modular neurons were more sensitive to somatosensory stimuli than matrix neurons, and bimodal, auditory-somatosensory stimuli were more likely to suppress activity in matrix neurons and enhance activity in modular neurons. Interestingly, despite their higher sensitivity to somatosensory stimuli than matrix neurons, modular neurons in the anesthetized prep were overall more responsive to auditory stimuli than somatosensory stimuli (albeit with a tendency to have offset responses to sounds). This suggests that modular neurons should not be thought of as primarily representing somatosensory input, but rather as being more prone to having their auditory responses modified by somatosensory input. However, this trend was different in the awake prep, where modular neurons became more responsive to somatosensory stimuli. Thus, to this reviewer, one of the most intriguing results of the present study is the extent to which neural responses in the LC changed in the awake preparation. While this is not entirely unexpected, the magnitude and stimulus specificity of the changes caused by anesthesia highlight the extent to which higher-level sensory processing is affected by anesthesia and strongly suggests that future studies of LC function should be conducted in awake animals.

      Together, the results of this study expand our understanding of the functional roles of matrix and module neurons by showing that responses in LC subregions are more complicated than might have been expected based on anatomy alone. The development of the microprism technique for imaging the LC will be a boon to the field, finally enabling much-needed studies of LC function in vivo. The experiments were well-designed and well-controlled, the limitations of two-photon imaging for tracking neural activity are acknowledged, and appropriate statistical tests were used.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides important new insights into how multisensory information is processed in the lateral cortex of the inferior colliculus, a poorly understood part of the auditory midbrain. By developing new imaging techniques that provide the first optical access to the lateral cortex in a living animal, the authors provide convincing in vivo evidence that this region contains separate subregions that can be distinguished by their sensory inputs and neurochemical profiles, as suggested by previous anatomical and in vitro studies. Additional information and analyses are needed, however, to allow readers to fully appreciate what was done, and the comparison of multisensory interactions between awake and anesthetized mice would benefit from being explored in more detail.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, the authors provide a characterisation of auditory responses (tones, noise, and amplitude-modulated sounds) and bimodal (somatosensory-auditory) responses and interactions in the higher-order lateral cortex (LC) of the inferior colliculus (IC) and compare these characteristics with the higher order dorsal cortex (DC) of the IC - in awake and anaesthetised mice. Dan Llano's group has previously identified gaba'ergic patches (modules) in the LC distinctly receiving inputs from somatosensory structures, surrounded by matrix regions receiving inputs from the auditory cortex. They here use 2P calcium imaging combined with an implanted prism to - for the first time - get functional optical access to these subregions (modules and matrix) in the lateral cortex of IC in vivo, in order to also characterise the functional difference in these subparts of LC. They find that both DC and LC of both awake and anaesthetised mice appear to be more responsive to more complex sounds (amplitude-modulated noise) compared to pure tones and that under anesthesia the matrix of LC is more modulated by specific frequency and temporal content compared to the gabaergic modules in LC. However, while both LC and DC appear to have low-frequency preferences, this preference for low frequencies is more pronounced in DC. Furthermore, in both awake and anesthetized mice, somatosensory inputs are capable of driving responses on their own in the modules of LC, but very little (possibly not at all) in the matrix. However, bimodal interactions may be different under awake and anesthesia in LC, which warrants deeper investigation by the authors: They find, under anesthesia, more bimodal enhancement in modules of LC compared to the matrix of LC and bimodal suppression dominating the matrix of LC. In contrast, under awake conditions bimodal enhancement is almost exclusively found in the matrix of LC, and bimodal suppression dominates both matrix and modules of LC.

      The paper provides new information about how subregions with different inputs and neurochemical profiles in the higher-order auditory midbrain process auditory and multisensory information, and is useful for the auditory and multisensory circuits neuroscience community.

      Strengths:

      The major strength of this study is undoubtedly the fact that the authors for the first time provide optical access to a subcortical region (the lateral cortex of the inferior colliculus (i.e. higher order auditory midbrain)) which we know (from previous work by the same group) have optically identifiable subdivisions with unique inputs and neurotransmitter release, and plays a central role in auditory and multisensory processing. A description of basic auditory and multisensory properties of this structure is therefore very useful for understanding auditory processing and multisensory interactions in subcortical circuits.

      Weaknesses:

      I have divided my comments about weaknesses and improvements into major and minor comments. All of which I believe are addressable by the reviewers to provide a more clear picture of their characterisation of the higher-order auditory midbrain.

      Major comment:

      (1) The differences between multisensory interactions in LC in anaesthetised and awake preparations appear to be qualitatively different, though the authors claim they are similar (see also minor comment related to figure 10H for further explanation of what I mean). However, the findings in awake and anaesthetised conditions are summarised differently, and plotting of similar findings in the awake figures and anaesthetised figures are different - and different statistics are used for the same comparisons. This makes it very difficult to assess how multisensory integration in LC is different under awake and anaesthetised conditions. I suggest that the authors plot (and test with similar statistics) the summary plots in Figure 8 (i.e. Figure 8H-K) for awake data in Figure 10, and also make similar plots to Figures 10G-H for anaesthetised data. This will help the readers understand the differences between bimodal stimulation effects on awake and anaesthetised preparations - which in its current form, looks very distinct. In general, it is unclear to me why the awake data related to Figures 9 and 10 is presented in a different way for similar comparisons. Please streamline the presentation of results for anaesthetised and awake results to aid the comparison of results in different states, and explicitly state and discuss differences under awake and anaesthetised conditions.

      We thank the reviewer for the valuable suggestion. We only highlighted the similarities between the data obtained from anesthetized and awake preparations to indicate the ability to reproduce the technique in awake animals for future assessment. Identifying those similarities between the two experimental setups was based on the comparison between modules vs matrix or LC vs DC within each experimental setup (awake vs anesthetized). Therefore, the statistics were chosen differently for each setup based on the size of the subjects (n) within each experimental preparation. However, we agree with the reviewer’s comment that there are differences between the anesthetized and awake data. To examine these differences, we ran the same statistics for Figure 5 (tonotopy of LC vs. DC-anesthetic animals) and Figure 9 (tonotopy of LC vs DC-awake animals). In addition, we added a new figure after Figure 9 to separate the statistical analysis from the maps. Accordingly, Figures 4 and 5 (maps and analysis, respectively -anesthetized animals) now match Figures 9 and 10 (maps and analysis, respectively – awake animals). We also did the same thing for Figures 7 (microprism imaging of the LC - anesthetized animals), 8 (imaging of the LC from the dorsal surface - anesthetized animals) as well as Figure 11 or old Figure 10 (microprism imaging of the LC - awake animals) to address the similarities and differences of the multisensory data between awake and anesthetized animals. We edited the text accordingly in the result and discussion sections.

      (2) The claim about the degree of tonotopy in LC and DC should be aided by summary statistics to understand the degree to which tonotopy is actually present. For example, the authors could demonstrate that it is not possible/or is possible to predict above chance a cell's BF based on the group of other cells in the area. This will help understand to what degree the tonotopy is topographic vs salt and pepper. Also, it would be good to know if the gaba'ergic modules have a higher propensity of particular BFs or tonotopic structure compared to matrix regions in LC, and also if general tuning properties (e.g. tuning width) are different from the matrix cells and the ones in DC.

      Thank you for the reviewer’s suggestion. We have examined the tonotopy of LC and DC using two regression models (linear and quadratic polynomial) between the BFs of the cells and their location on the anatomical axis. Therefore, the tonotopy is indicated by a significant regression fit with a high R2 between the BFs the cells, and their location within each structure. For the DC, there was a significant regression fit between the BFs of the cells and their locations over the rostromedial to the caudolateral axis. Additionally, the R2 of the quadratic polynomial fit was higher than that of the linear fit, which indicates a nonlinear distribution of cells based on their BFs, which is consistent with the presence of high-low-high tuning over the DC surface. Given that the microprism cannot image the whole area of the LC, and it images a slightly different area in each animal, it was very difficult to get a consistent map for the LC as well as a solid conclusion about the LC tonotopy. However, we have examined the regression fit between the BFs of cells and their location along the main four anatomical axes of the field of view obtained from each animal (dorsal to ventral), (rostral to caudal), (dorsocaudal to ventrorostral) (dorsorostral to ventrocoudal). Unlike the DC, the LC imaged via microprism showed a lower R2 for both linear and quadratic regression mostly in the dorsoventral axis. We show the fitting curves of these regressions in Figure 4-figure supplement 1 (anesthetized data) and Figure 9-figure supplement 1 (awake data). Despite the inconsistent tonotopy of the LC imaged via microprism, the modules were found to have a higher BFs median at 10 kHz compared to matrix that had a lower BFs median at 7.1 kHz, which was consistent across the anesthetized and awake animals. We have added these results in the corresponding spot in the results section (lines 193-197 and 361-364). We have examined the tuning width using the binarized receptive field sum (RFS) method in which each neuron was given a value of 1 if it responds to a single frequency (Narrow RF), but this value increases if the neuron responds to more neighbor frequencies (wide RF). We did this calculation across all the sound levels. Both DC and LC of the anesthetized animals had higher RFS mean and median than those of awake animals given that ketamine was known to broaden the RF. However, in both preparations (anesthetized and awake), the DC had a higher RFS mean than that of the LC, which could be consistent with the finding that the DC had a relatively lower SMI than the LC. To show these new data, we made a new Figure 10-figure supplement 1, and we edited the text accordingly [lines 372-379 & 527-531].

      (3) Throughout the paper more information needs to be given about the number of cells, sessions, and animals used in each panel, and what level was used as n in the statistical tests. For example, in Figure 4 I can not tell if the 4 mice shown for LC imaging are the only 4 mice imaged, and used in the Figure 4E summary or if these are just examples. In general, throughout the paper, it is currently not possible to assess how many cells, sessions, and animals the data shown comes from.

      Thank you for the reviewer’s comment. We do apologize for not adding this information. We added all the information regarding the size of the statistical subjects (number of cells or number of animals used) for every test outcome. To keep the flow of the text, we added the details of the statistical tests in the legends of the figures.

      (4) Throughout the paper, to better understand the summary maps and plots, it would be helpful to see example responses of the different components investigated. For example, given that module cells appear to have more auditory offset responses, it would be helpful to see what the bimodal, sound-only, and somatosensory responses look like in example cells in LC modules. This also goes for just general examples of what the responses to auditory and somatosensory inputs look like in DC vs LC. In general example plots of what the responses actually look like are needed to better understand what is being summarised.

      Thank you for the reviewer’s comment and suggestion. We modified Figure 6 and the text accordingly to include all the significant examples of cells discussed throughout the work.

      Reviewer #2 (Public Review):

      Summary:

      The study describes differences in responses to sounds and whisker deflections as well as combinations of these stimuli in different neurochemically defined subsections of the lateral and dorsal cortex of the inferior colliculus in anesthetised and awake mice.

      Strengths:

      The main achievement of the work lies in obtaining the data in the first place as this required establishing and refining a challenging surgical procedure to insert a prism that enabled the authors to visualise the lateral surface of the inferior colliculus. Using this approach, the authors were then able to provide the first functional comparison of neural responses inside and outside of the GABA-rich modules of the lateral cortex. The strongest and most interesting aspects of the results, in my opinion, concern the interactions of auditory and somatosensory stimulation. For instance, the authors find that a) somatosensory-responses are strongest inside the modules and b) somatosensory-auditory suppression is stronger in the matrix than in the modules. This suggests that, while somatosensory inputs preferentially target the GABA-rich modules, they do not exclusively target GABAergic neurons within the modules (given that the authors record exclusively from excitatory neurons we wouldn't expect to see somatosensory responses if they targeted exclusively GABAergic neurons), and that the GABAergic neurons of the modules (consistent with previous work) preferentially impact neurons outside the modules, i.e. via long-range connections.

      Weaknesses:

      While the findings are of interest to the subfield they have only rather limited implications beyond it. The writing is not as precise as it could be. Consequently, the manuscript is unclear in some places. For instance, the text is somewhat confusing as to whether there is a difference in the pattern (modules vs matrix) of somatosensory-auditory suppression between anesthetized and awake animals. Furthermore, there are aspects of the results which are potentially very interesting but have not been explored. For example, there is a remarkable degree of clustering of response properties evident in many of the maps included in the paper. Taking Figure 7 for instance, rather than a salt and pepper organization we can see auditory responsive neurons clumped together and non-responsive neurons clumped together and in the panels below we can see off-responsive neurons forming clusters (although it is not easy to make out the magenta dots against the black background). This degree of clustering seems much stronger than expected and deserves further attention.

      Thank you for the reviewer’s comment. We do apologize if some areas in the manuscript were imprecisely written. For anesthetized and awake data, we have only emphasized the similarities between the two setups to show the ability to use microprism in awake animals for future assessment. To highlight the differences between anesthetized and awake animals, we have now run uniform statistics for all the data collected from both setups. Accordingly, we have edited Figures 4 and 5 (tonotopy-anesthetized) to match Figures 9 and new Figure 10 (tonotopy-awake). Also, we edited Figures 7 and 8 (multisensory- anesthetized) to match Figure 11 or old Figure 10 (multisensory- awake). We edited the text accordingly in the results section and discussed the possible differences between anesthetized and awake data in the discussion section [lines 521-553].

      We agree with the reviewer’s comment that the cells were topographically clustered based on their responses. Some of these clusters include the somatosensory responsive cells, which were located mostly in the modules (Figures 7D and 8E). Also, the auditory responsive cells with offset responses were clustered mostly in the modules (Figures 7C and 8F). Accordingly, we have edited the text to emphasize this finding.

      We noticed also that some responsive cells to the tested stimulations were surrounded by nonresponsive cells. By comparing the response of the cells to different stimuli we found that while Figures 7 and 11 (old Figure 10) showed only the response of the cells to auditory stimulation (unmodulated broadband noise at 80 dB) and somatosensory stimulation (whisker deflection), some nonresponsive cells to these specific stimulations were found to be responsive to pure tones of different frequencies and amplitudes. As an indicator of the cells' viability, we additionally examined the spontaneous activity of the nonresponsive cells across different data sets. We note that spontaneous activity was rare for all cells even among the responsive cells to sound or somatosensory stimulations. This finding could be related to the possibility that the 2P imaging of calcium signals may not be sensitive enough to track spontaneous activity that may originate from single spikes. However, in some data sets, we have found that the cells that did not respond to any tested stimuli showed spontaneous activity when no stimulation was given indicating the viability of those cells. We have addressed the activity of the non-responsive cells in the text along with a new Figure 11-figure supplement 1.

      We changed the magenta into a green color to be suitable for the dark background. Also, we have completely changed the color palette of all of our images to be suitable for color-blind readers as suggested by reviewer 1.

      Reviewer #3 (Public Review):

      The lateral cortex of the inferior colliculus (LC) is a region of the auditory midbrain noted for receiving both auditory and somatosensory input. Anatomical studies have established that somatosensory input primarily impinges on "modular" regions of the LC, which are characterized by high densities of GABAergic neurons, while auditory input is more prominent in the "matrix" regions that surround the modules. However, how auditory and somatosensory stimuli shape activity, both individually and when combined, in the modular and matrix regions of the LC has remained unknown.

      The major obstacle to progress has been the location of the LC on the lateral edge of the inferior colliculus where it cannot be accessed in vivo using conventional imaging approaches. The authors overcame this obstacle by developing methods to implant a microprism adjacent to the LC. By redirecting light from the lateral surface of the LC to the dorsal surface of the microprism, the microprism enabled two-photon imaging of the LC via a dorsal approach in anesthetized and awake mice. Then, by crossing GAD-67-GFP mice with Thy1-jRGECO1a mice, the authors showed that they could identify LC modules in vivo using GFP fluorescence while assessing neural responses to auditory, somatosensory, and multimodal stimuli using Ca2+ imaging. Critically, the authors also validated the accuracy of the microprism technique by directly comparing results obtained with a microprism to data collected using conventional imaging of the dorsal-most LC modules, which are directly visible on the dorsal IC surface, finding good correlations between the approaches.

      Through this innovative combination of techniques, the authors found that matrix neurons were more sensitive to auditory stimuli than modular neurons, modular neurons were more sensitive to somatosensory stimuli than matrix neurons, and bimodal, auditory-somatosensory stimuli were more likely to suppress activity in matrix neurons and enhance activity in modular neurons. Interestingly, despite their higher sensitivity to somatosensory stimuli than matrix neurons, modular neurons in the anesthetized prep were far more responsive to auditory stimuli than somatosensory stimuli (albeit with a tendency to have offset responses to sounds). This suggests that modular neurons should not be thought of as primarily representing somatosensory input, but rather as being more prone to having their auditory responses modified by somatosensory input. However, this trend was reversed in the awake prep, where modular neurons became more responsive to somatosensory stimuli than auditory stimuli. Thus, to this reviewer, the most intriguing result of the present study is the dramatic extent to which neural responses in the LC changed in the awake preparation. While this is not entirely unexpected, the magnitude and stimulus specificity of the changes caused by anesthesia highlight the extent to which higher-level sensory processing is affected by anesthesia and strongly suggest that future studies of LC function should be conducted in awake animals.

      Together, the results of this study expand our understanding of the functional roles of matrix and module neurons by showing that responses in LC subregions are more complicated than might have been expected based on anatomy alone. The development of the microprism technique for imaging the LC will be a boon to the field, finally enabling much-needed studies of LC function in vivo. The experiments were well-designed and well-controlled, and the limitations of two-photon imaging for tracking neural activity are acknowledged. Appropriate statistical tests were used. There are three main issues the authors should address, but otherwise, this study represents an important advance in the field.

      (1) Please address whether the Thy1 mouse evenly expresses jRGECO1a in all LC neurons. It is known that these mice express jRGECO1a in subsets of neurons in the cerebral cortex, and similar biases in the LC could have biased the results here.

      Thank you for the reviewer’s comment. In the work published by Dana, et al, the expression of jRGECO1a in all Thy1 mouse lines was determined by the brightness of the jRGECO1a in the soma. Given that some cells do not show a detected level of jRGECO1a fluorescence until activated, the difference in expression shown in different brain regions could be related to the level of neuronal activity at the time of sample processing and not the expression levels of the indicator itself. To the best of our knowledge, there is no antibody for jRGECO1a, which can be used for detecting the expression levels of the indicator regardless of the neuronal activity. To test the hypothesis that DC and LC have different levels of jRGECO1a, we examined the expression levels of jRGECO1a after we perfused the mice with high potassium saline to elicit a general neuronal depolarization in the whole brain. Then we immunostained against NeuN (the neuronal marker) to quantify the percentage of the neurons expressing jRGECO1a to the total number of neurons (indicated by NeuN). To have a fair comparison, we restricted our analysis to include the areas imaged only by 2P as some regions were not accessible by microprism such as the deep ventral regions of the LC. There is a similar % of cells expressing jRGECO1a in DC and LC. As expected, the neurons expressing jRGECO1a were only nonGABAergic cells. We addressed these findings in the new Figure 3-figure Supplement 1 as well as the corresponding text in the results [lines 178-184] and methods sections [lines 878-892].

      (2) I suggest adding a paragraph or two to the discussion to address the large differences observed between the anesthetized and awake preparations. For example, somatosensory responses in the modules increased dramatically from 14.4% in the anesthetized prep to 63.6% in the awake prep. At the same time, auditory responses decreased from 52.1% to 22%. (Numbers for anesthetized prep include auditory responses and somatosensory + auditory responses.). In addition, the tonotopy of the DC shifted in the awake condition. These are intriguing changes that are not entirely expected from the switch to an awake prep and therefore warrant discussion.

      Thank you for the reviewer’s comment. To determine if differences exist between anesthetized and awake data, we have now used the same statistics and edited Figures 4,5,7,8,9, and 10 as well as added a new Figure 11. Accordingly, we have edited the result section and added a paragraph addressing the possible differences between the two preparations in the Discussion section [lines 521-553]..

      (3) For somatosensory stimuli, the authors used whisker deflection, but based on the anatomy, this is presumably not the only somatosensory stimulus that affects LC. The authors could help readers place the present results in a broader context by discussing how other somatosensory stimuli might come into play. For example, might a larger percentage of modular neurons be activated by somatosensory stimuli if more diverse stimuli were used?

      We agree with the reviewer’s point. Indeed, the modules are receiving different inputs from different somatosensory sources such as somatosensory cortex and dorsal column nuclei, which could indicate that the activity of the cells in the modular areas could be evoked by different types of somatosensory stimulations, which is an open area for future studies. We have discussed this point in the revised Discussion section [lines 516-520].

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      (1) Figure 3H: The lateral surface seems quite damaged by the prism. An example slice of the imaging area of each mouse would help the reader better understand the extent of damage the prism leaves in the area of interest.

      Thank you for the reviewer’s comment. We already have included such images in Figures 4A, 7A, and 9A to present the field of view of all prism experiments. However, we need to clarify the point of tissue damage. The insertion of microprism may be associated with some tissue damage as a result of making the pocket for the microprism to be inserted, but it is not possible to get neuronal signals from a damaged field of view. Therefore, we do not believe that there is tissue damage to the parts of the LC imaged by microprism. However, there may be some areas where the microprism is not in direct contact with the LC surface. These areas are located mostly in the periphery of the field of view, and they are completely black as they are out of focus (i.e., the left side of Figure 3B). The right side of Figure 3b as well as Figure 3A have some black areas, which present the vasculatures, where there are no red signals because of the lack of jRGECO1a expression in those areas.

      (2) In relation to the data shown in Figure 4E it is claimed that LC is tuned to higher frequencies (lines 195-196). However, the majority of cells appear to be tuned to frequencies below 14kHz (with a median of 7.5 kHz), which is quite low for the mouse. I assume that the authors mean frequencies that are relatively higher than the DC, but it is worth mentioning in the text that the BFs found in the LC are quite low-frequency responses for the mouse.

      Thank you for the reviewer’s comment, which we agree with. We edited this part by acknowledging that around 50% of the LC cells had a low-frequency bias to 5 and 7.1 kHz. Then we mentioned that most of the LC cells are tuned to relatively higher frequencies than those of the DC [lines 215-218].

      (3) Figure 5A-C: Is it the tone-responsive cells plus an additional ~22% of cells that respond to AM, or are there also cells that respond to tones that do not respond to AM. Please break down to which degree the tone and AM responsive cells are overlapping.

      Thank you for the reviewer’s comment and suggestion. We broke down the responsive cells into cells responsive only to pure tone (tone selective cells or Tone-sel) or to only AM-noise (noise selective cells or Noise-sel) as well as cells responding to both sounds (nonselective cells or Non-sel). We examined the fractions of these categories of cells in both LC and DC within all responsive neurons. Accordingly, we have edited Figure 5A-C as well as the text [lines 229-243].

      (4) Figure 5D. It is unclear to me how a cell is classified as SMI or TMI responsive after computing the SMI or TMI for each cell. What statistic was used to determine if the cell was responsive or not?

      Thank you for the reviewer’s comment. We do apologize for the confusion caused by Figures 5D and E. These figures do not show the values of SMI or TMI, respectively. Rather, the figures show the percentage of the spectrally or temporally modulated cells, respectively. At each sound level, the cells were categorized into two main types. The spectrally modulated cells are those responsive to pure tones or unmodulated noise, so they can detect the spectral features of the sound (old Figure 5D or new Figure 5E). The temporally modulated cells are those responsive to AM-noise, so they can detect the temporal features of the sound of complex spectra like the broadband noise (old Figure 5E or new Figure 5F). To clear this confusion, we removed the words SMI and TMI from the figures, and then we renamed the x-axis label into “% of spectrally modulated cells” and “% of temporally modulated cells” for Figures 5D (new 5E) and E (new 5F), respectively.

      (5) Figure 5 D, E: Is the decrease in SMI and TMI modulated cells in the modules a result of simply lower sensitivity to sounds (i.e. higher response thresholds)? If a cell responds to neither tone, AM, or noise it will have a low SMI and TMI index. If this is the case that affects the interpretation, as it is then not a decrease in sensitivity to spectral or temporal modulation, but instead a difference in overall sound sensitivity.

      Thank you for the reviewer’s comment. We apologize for the confusion about Figures 5E and D, which did not show the SMI and TMI values. Rather, they show the percentage of spectrally or temporally modulated cells, respectively, as explained in our previous response. Therefore, Figure 5D shows the percentage of cells that can detect the spectral features of sound, while Figure 5E shows the percentage of cells that can detect the temporal features of sounds of complex spectra like broadband noise. Accordingly, Figures 5D and E show the sensitivity to different features of sound and not the overall sound sensitivity.

      (6) Figure 7 and 8: What is the false positive rate expected of the responsive cells using the correlation cell flagging criteria? Especially given that the fraction of cells responsive to somatosensory stimulation in LC (matrix) is 0.88% and 1.3% in DC, it is important to know what the expected false positive rate is in order to be able to state that there are actually somatosensory responses there or if this is what you would expect from false positives given the inclusion test used. Please provide an estimate of the false positive rate given your inclusion test and show that the rate found is statistically significantly above that level - and show this rate with a line in Figure 7 H, I.

      Thank you for the reviewer’s comment. To test the efficiency of the correlation method to determine the responsive cells, we initially ran an ROC curve comparing the automated method to a blinded human interpretation. The AUC of the ROC curve was 0.88. This high AUC value indicates that the correlation method can rank the random responsive cells than the random nonresponsive cells. At the correlation coefficient (0.4), which was the cutoff value to determine the responsive cells for somatosensory stimulation, the specificity was 87% and the sensitivity 72%, the positive predictive value was 73%, and the negative predictive value was 86%. Although the above percentages indicate the efficiency of the correlation method, we excluded all the false responsive cells from the analysis. Therefore, the fractions of cells in the graphs are the true responsive cells with no contamination of the non-responsive cells. We also modified Figures 7H and I to match the other data sets obtained from awake animals. Therefore, Figures 7H and I no longer show the average of the responsive cells. Instead, they show the % of different fractions of responsive cells within each cellular motif (modules and matrix). Accordingly, we believe that there is no need to include a rate line on the graph. We added the section describing the validation part to the methods section [lines 808-815].

      (7) Figure 7: Please clarify what is meant by a cell responding to 'both responding to somatosensory and auditory stimulation'. Does it mean that the cell has responses to both auditory and somatosensory stimulation when presented individually or if it responds to both presented together? If it is the former, I don't understand how the number to both can be higher than the number of somatosensory alone (as both requires it also to respond to somatosensory alone). If it is the latter (combined auditory and somatosensory) then it seems that somatosensory inputs remove the responsiveness of most cells that were otherwise responsive to auditory alone (e.g. in the module while 42% respond to sound alone, combined stimulation would leave only 10% of cells responsive). Please clarify what exactly the authors are plotting and stating here.

      Thank you for the reviewer’s comment. The responsive cells in Figure 7 are divided into three categories. Each category has a completely different group of cells. The first category is for the cells responding only to auditory stimulation (auditory-selective cells or Aud-sel). The second category is for the cells that respond only to somatosensory stimulation (somatosensory selective cells or Som-sel). The third category is for the cells that respond to both auditory and somatosensory stimulations when both stimulations are presented individually (auditory/somatosensory nonselective cells or Aud/Som-nonsel). Accordingly, the number of cells may be different across all these categories. We have clarified this part in the text [lines 299-303]. We have modified Figures 7, 8, and 11 (old Figure 10) to match the data from anesthetized and awake animals, so Figures 7H and I now show the collective % of the cells from all animals within modules vs matrix.

      (8) Why are the inferential statistics used in Figure 9F (chi-square test) and Figure 5A-C (t-test) when it tests the same thing (the only difference is one is anaesthetised data and the other awake)? Indeed, all Figure 9 and 10 (awake data figures) plots use chi-square tests to test differences in percentages instead of t-tests used in earlier (anaesthetised data figures) plots to test differences in percentages between groups. Please clarify the reason for this change in statistics used for similar comparisons.

      Thank you for the reviewer’s comment. Imaging the LC via microprism from awake animals confirmed the ability to run this technique with no interference to the ambulatory functions of the animals. Therefore, the main goal was to highlight the similarities between the data obtained from awake and anesthetized setups by highlighting the comparison between the LC and DC or between modules and matrix within each preparation (anesthetized vs awake). Accordingly, the statistics used to run these comparisons were chosen based on the number of the tested animals at each setup (7 anesthetized animals and 3 awake animals for prism insertion). The low number of animals used for awake data made us use the number of cells collectively from all animals instead of the number of animals, so we used the Chi-square test to examine the differences in percentages.

      (9) Figure 10H: The main text describes the results shown here as similar to what was seen in anaesthetised animals. But it looks to me like the results in awake animals are qualitatively different from the multisensory interaction seen in anaesthetised animals. In anaesthetised animals the authors find that there is a higher chance of auditory responses being enhanced by somatosensory inputs when cells are in the modules compared to in the matrix. However, in awake data, this relationship is flipped, with more bimodal enhancement found in the matrix compared to the modules. Furthermore, almost all cells in the modules are suppressed by combined somatosensory input which looks like it is different from what is found in anaesthestised mice and what is described in the discussion: 'we observed that combined auditory-somatosensory stimulation generally suppressed neural responses to auditory stimuli and that this suppression was most prominent in the LC matrix'.

      Thank you for the reviewer’s comment. Our statement was meant to show how the data obtained from awake and anesthetized animals were generally similar. However, we agree that the statement may not be suitable due to the possible differences between awake and anesthetized animals. To address a fair comparison between the anesthetized and awake preparations, we ran similar statistics and graphs for Figures 7, 8, and 11 (old Figure 10). Given that the areas occupied by modules and matrix are different across animals due to the irregular shape of the modules, we chose to run a chi-square test for all the data to quantify the collective % of responding cells within modules vs matrix from all tested animals for each experimental setup (anesthetized vs awake). The anesthetized and awake animals similarly showed that modules and matrix had higher fractions of auditory responsive cells. However, matrix had more cells responding to auditory stimulations than modules, while modules had more cells responding to somatosensory stimulation than matrix. In contrast, while the anesthetized animals showed higher fractions of offset auditory-responsive cells, which were mostly clustered in the modules, the offset auditory-responsive cells were very rare in awake animals (6 cells/one animal).

      Based on the fractions of cells with suppressed or enhanced auditory response induced by bimodal stimulation, the data obtained from anesthetized and awake animals showed that the auditory response in the matrix was suppressed more than enhanced by bimodal stimulation. In contrast, modules had different profiles across the experimental setups and locations. For instance, the modules imaged via microprism in the anesthetized and awake animals showed suppressed more than enhanced auditory responses, but modules imaged from the dorsal surface in anesthetized animals showed enhanced more than suppressed auditory responses. Additionally, modules had less suppressed and more enhanced auditory responses compared to matrix in the anesthetized animals regardless of the location of the modules (microprism or dorsal surface). Yet, modules from awake animals had more suppressed and less enhanced auditory responses compared to matrix. We have addressed these differences in the results and discussion section.

      Additional minor comments that I think the authors could use to aid their manuscript clarity:

      (1) The figure colour selection - especially in Figures 7 and 8 - is really hard to tell apart. Please choose more distinct colours, and a colour scheme that is appropriate for colour blind readers.

      Thank you for the reviewer’s suggestion. We have noticed that the magenta color assigned for the cells with offset responses was very difficult to distinguish from the black background. We have changed the magenta color to green to be different from the color of other cells. Using Photoshop, we chose a color scheme that is suitable for color-blind readers in all our maps.

      (2) The sentence in lines 331-334 should be rephrased for clarity.

      Thank you for the reviewer’s suggestion. We have rephrased the statement for clarity [lines 364-371].

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in the public review the strong clustering evident in some of the maps (some of which may be related to module/matrix differences but certainly not all of it) seems worth scrutinizing further. Would we expect such a strong spatial segregation of auditory responsive and non-responsive neurons? Would we expect response properties (e.g. off-responsiveness) other than frequency tuning to show evidence of a topographic arrangement in the IC? In addressing this it would, of course, be important to rule out that this clustering is not down to some trivial experimental variables and truly reflects functional organization. For instance, are the patches of non-responsive neurons found in parts of the field of view with poor visibility, poor labelling, etc which may explain why it is difficult to pick up responses there? Are the neurons in non-responsive areas otherwise active (i.e. do they show spontaneous activity) or could they be 'dead'? Could the way neuropil signals are dealt with play a role here (it is weighted by 0.4 which strikes me as quite low)? In relation to this, I am also wondering to what extent the extreme overrepresentation (Figure 4) of neurons with a BF of 5kHz (some of this is, of course, down to the fact that the lower end of the frequency range was 5kHz and that the step size was 0.5 octaves), especially in the DC, is to be interpreted.

      Thank you for the reviewer’s comment. Before analysis, the ROIs of all cells were set around the cell bodies using the jRGECO1a signals as a reference, so all cells (responsive and nonresponsive) were collected from areas of good visibility of jRGECO1a signals. In other words, no cells were collected from regions having poor jRGECO1a signals. In Figures 7, 8, and 11 (old Figure 10), the cells showed response either only to unmodulated broadband noise at 80 dB as an auditory stimulus or to whisker deflection with specific speed and power as a somatosensory stimulus. Given that the two stimuli above had specific parameters, the remaining non-responsive cells may respond to auditory or somatosensory stimulations with other features. For instance, some nonresponsive cells to the unmodulated broadband noise were responding to pure tones with different amplitudes and frequencies or to different AM-noise with different amplitudes and modulation frequencies.  Also, these nonresponsive cells may not respond to any of our tested stimuli and may respond to other sensory stimulations. Some of the non-responsive cells showed spontaneous activity when no stimulations were presented. However, we can not rule out the possibility that some of these nonresponsive cells may not be viable. We have addressed the clustering properties in the revised version of the manuscript in the corresponding spots of the results and discussion sections. We have added a new supplementary figure (Figure 11- Figure Supplement 1) to show how the nonresponsive cells to the unmodulated noise may respond to other types of sound and to show the spontaneous activity of some non-responsive cells.

      For the neuropil, previous reports used the contamination factor (r) in a range of 0.3-0.7 (we referenced these studies in the method section [line 776) based on the tissue or cells imaged, vasculatures, and the objective used for imaging. Therefore, we optimized the contamination factor (r) to be 0.4 through a preliminary analysis based on the tissue we image (LC), and the objective used (16x with NA = 0.8 and 3 mm as a working distance).

      We agree that there is an overrepresentation of 5 kHz as the best tuning frequency for DC cells. The previous report (A. B. Wong & Borst, 2019) showed a large zone of the DC where cells were tuned to (2-8 kHz). Given that 5kHz was the lowest tested frequency in our experiment, we think that the low-frequency bias of the DC surface is consistent between studies. This finding also could be supported by the electrophysiology data obtained by spanning the recording electrodes through the IC tissue along the dorsoventral axis. In those experiments, the cells were tuned to lower frequencies at the dorsal surface of the IC.

      We have changed the magenta-colored cells to green ones, so it will be easier to identify the cells. As required by another reviewer, we changed the color pallets of some images and cellular maps to be suitable for color-blind readers. 

      The manuscript would benefit from more precise language in a number of places, especially in the results section.

      Line 220/221, for instance: "... a significant fraction of cells that did not respond to pure tones did respond to AM-noise" Strictly speaking, this sentence suggests that you considered here only the subset of neurons that did not respond to pure tones and then ran a test on that subset. The test that was done seems to suggest though that the authors tested whether the percentage of responsive cells was greater for pure tones or for AM noise.

      Thank you for the reviewer’s comment. We do apologize for the confusion. In the revised manuscript, we categorized the cells according to their response into cells responding to pure tone only (tone-selective cells or Tone-sel), Am-noise only (noise-selective cells or Nose-sel), and to both pure tone and am-noise (nonselective cells or Non-sel). We have modified Figure 5 accordingly. We did the same thing for the data obtained from awake animals and showed that in a new figure to easily match the analysis done for the anesthetized animals.

      Please refer to the figure panels in the text in consecutive order. 2B, for instance, is mentioned after 2H.

      Thank you for the reviewer’s comment. Throughout the paper, we kept the consecutive order of the figure panels within each figure to be in a smooth flow with the text. Yet, figure 2 was just the only exception for a good reason. Figure 2 is a complex one that includes many panels to show a parallel comparison between LC imaged via microprism and DC through single photon images, two-photon images, validating laser lesioning, and histology. Accordingly, we navigated many panels of the figure to efficiently highlight the aspects of this comparison. We prefer to keep Figure 2 as one figure with its current format to show this parallel comparison between LC and DC.

      The legend for Figure 2 could be clearer. For instance, there are two descriptions for panel D. Line 1009: "(C-E)" [i.e. C, D, E] and line 1010: "(D and F)".

      Thank you for the reviewer’s comment. It should be C and E, not C-E. We have fixed the mistake [line 1224]

      Line 275: What does 'with no preference' mean?

      Thank you for the reviewer’s comment. We do apologize for the confusion. There are three categories of cells. Some cells respond only to auditory stimulation, while others respond to only somatosensory stimulation. However, there is another group of cells that respond nonselectively to auditory and somatosensory stimulations or Aud/Som-nonsel cells. We edited the sentence to be clearer [lines 303-304].

      Line 281 (and other places): What does 'normalized against modules' mean?

      Thank you for the reviewer’s comment. This normalization was done by dividing the number of responsive cells of the same response type in the matrix by that in the modules. Therefore, the value taken by modules was always 1 and the value taken by the matrix is something around 1. Accordingly, the value for matrix could be > 1 if matrix had more cells than modules. In contrast, the value of matrix would be < 1 if matrix had fewer cells than modules. In the revised version, we used this normalization method to make the revised Figures 5C and 10C to describe the cell fractions responding to pure tone only, AM-noise only, or to both stimuli in the matrix vs modules. 

      Sentence starting on line 288. I don't find that point to be as obvious from the figures as the sentences seem to suggest. Are we to compare magenta points (auditory off cells) from 7C with green points in 7F?

      Thank you for the reviewer’s comment. We came to this conclusion based on our visual comparison of magenta points (now green in the revised version to increase the visibility) representing the auditory offset cells in Figure 7C and the green points in Figure 7F representing the cells responding to both somatosensory and auditory stimulations. In the revised manuscript, we statistically examined if the percentage of onset auditory response and offset auditory responses are different within the responsive cells to both somatosensory and auditory stimulations in the modules vs matrix. We have found that most of the cells responding to both somatosensory and auditory stimulations inside the modules had offset auditory responses, which could indicate a level of multisensory integration between somatosensory input and the offset auditory responses in these cells. We have added the statistical results to the revised manuscript to address this effect [lines 312-317]

      Lines 300-302: "These data suggest that the module/matrix system permits preservation of distinct multimodal response properties in the face of massive integration of inputs in the LC". First, I'm not quite sure what that sentence means. Second, it would be more appropriate for the discussion. Third, the fact that we are more likely to find response enhancement in the modules than in the matrix is nicely consistent with the idea (supported by work from the senior author's lab and others) that excitatory somatosensory input predominantly targets neurons in the modules (which is why we see mostly response enhancement in the modules) and that this input targets GABAergic neurons which then project to and inhibit neurons both outside and inside of their module. Therefore, I would recommend that the authors replace the aforementioned sentence with one that interprets these results in light of what we know about this somatosensory-auditory circuitry.

      Thank you for the reviewer’s comment. Despite the massive multimodal inputs, the LC receives from auditory vs nonauditory regions, the module/matrix system is a platform for distinct multimodal responses indicated by more somatosensory responsive cells in modules versus more auditory responsive cells in matrix, which matches the anatomical differences that were reported before. We edited the sentence in the light of the comparison between the data obtained from awake and anesthetized animals and moved it to the discussion section [lines 503-506].

      The term 'LC imaged via microprism' is used dozens of times throughout the manuscript. Replacing it with a suitable acronym or initialism could improve the flow of the text and would make some of the sentences less cumbersome.

      Thank you for the reviewer’s suggestion. We changed the term “LC imaged via microprism” into LC (microprism) throughout the revised manuscript.

      5A-C: It is unclear what is being compared here. What are the Ns? Different animals?

      Thank you for the reviewer’s comment. We do apologize for this missing information. We have added the number of subjects used in every statistical test in each corresponding figure legend.

      5G: minus symbol missing on the y-axis.

      Thank you for the reviewer’s comment. We gladly have fixed that.

      Figure 6: Are these examples or population averages?

      Thank you for the reviewer’s question. Every figure panel of the old Figure 6 represents a single trace of an example cell. However, we modified Figure 6 to include more examples of cells showing different responses complying with another reviewer’s suggestion. Each panel of the new Figure 6 represents the average response of 5 stimulations of the corresponding stimulus type. We preferred to show the average signal because it was the one used for the subsequent analysis.

      How are module borders defined?

      Thank you for the reviewer’s question. The modules were defined based on the intensity of the green channel that shows the expression of the GFP signals. The boundaries of modules were determined according to the distinction between high and low GFP signal boundaries of the modules. This step was done before data analysis to avoid any bias.

      7JKL: How are these to be interpreted? Does panel 7K, for instance, indicate that the fraction of neurons showing 'on' responses was roughly twice as large in the matrix than in the modules and that the fraction of neurons showing 'off' responses was roughly 10 times larger in the modules than in the matrix (the mean seems to be at about 1/10).

      Thank you for the reviewer’s comment. The data represented by Figures 7J-L defined the normalization of the number of cells of the same response type in the matrix against the modules. This normalization was done per animal, and then the data of the matrix were plotted against the normalization line at 1 representing the modules. The matrix will be claimed to have more cells than modules if the median of the matrix values > 1. In contrast, the matrix will be claimed to have fewer cells than the modules if the median of the matrix values < 1. Finally, if the median of matrix values = 1, this means there is no difference between matrix and modules. However, to match the data obtained from anesthetized animals (Figures 7 and 8) with those obtained from awake animals (Figure 11 or old Figure 10), we ran all data through the Chi-square test in the revised manuscript. Therefore, the format of Figures 7K-L was changed in the revised manuscript, so they became new Figures 7I-K.

      10A suggests that significantly more than half the neurons shown here are not auditory responsive. Perhaps I am misinterpreting something here but isn't that in contrast to what is shown in panel 9F?

      Thank you for the reviewer’s comment. The data shown in Figure 10A (or revised Figure 11A) represents the cellular response to only one stimulus (broadband noise at 80 dB with no modulation frequency), while Figure 9F (revised 10B) represents the cells responding to varieties of auditory stimulations of different combinations of frequencies and amplitudes (pure tones) as well as to AM-noise of different amplitudes and modulation frequencies. Accordingly, the old Figure 9F or revised Figure 10B shows different cell types based on their responses. For instance, some cells respond only to pure tone. Others respond only to AM-noise or to both pure tones and AM-noise. This may also support that the nonresponsive cells in Figure 10A (revised 11A) can respond to other types of sound features.

      The way I understood panels 7L and 8K there were more suppressed neurons in the matrix than in the modules (line 296: "cells in the modules had a higher odds of having an enhancement response to bimodal stimulation than matrix, while cells in the matrix had a higher odds of having a suppressive response to bimodal stimulation"). Now, panel 10F indicates that in awake mice there is a greater proportion of suppressed neurons in the modules than in the matrix. I may very well have overlooked or misread something but I may not be the only reader confused by this so please clarify.

      Thank you for the reviewer’s comment. We do apologize for this confusion. The ambiguity between Figures 7 and 8 (anesthetized animals) as well as Figure 10 (awake animals) comes from the fact that different statistics have been used for each preparation. In the revised version, we have fixed that by running the same statistics for all the data, and we accordingly revised Figures 7, 8, and 10 (new Figure 11). In brief, the matrix preserves a higher percentage of cells with suppressed auditory responses than those with enhanced auditory responses induced by bimodal stimulation in all conditions (anesthetized vs awake). In contrast, modules act differently across all tested conditions. While modules had more cells with enhanced auditory responses induced by bimodal interaction in anesthetized animals, they had more cells with suppressed response in awake animals indicating that modules could be sensitive to the effect of anesthesia compared to matrix. We addressed this effect in the discussion of the revised manuscript [lines 521-553].

      Line 438: ...as early AS...

      Thank you for the reviewer’s comment. We gladly fixed that [line 512].  

      Reviewer #3 (Recommendations For The Authors):

      My minor recommendations for the authors are as follows:

      (1) The text can be a bit difficult to follow in places. This is partly due to the convoluted nature of the results, but I suggest a careful read-through to look for opportunities to improve the prose. In particular, there is a tendency to use long sentences and long paragraphs. For example, the third paragraph of the introduction runs for almost fifty lines.

      Thank you for the reviewer’s comment and suggestion. We have fixed that.

      (2) This might be due to journal compression, but some of the bar graphs in the figures are difficult to read. For example, the individual data points, especially when filled with striped background colors get lost. Axes can become invisible, like the y-axis in 7L, and portions of bars, like in 7F, are not always rendered correctly. Error bars are sometimes hidden behind data points, as in 5C. Increasing line thickness and shifting individual data points away from error bars might help with this.

      Thank you for the reviewer’s comment and suggestion. We made all the data points with black color and filled circles to make the data points visible. We put all the data points behind the main columns, so they don’t block the error bars. We have fixed figures 7 and 5.

      (3) Throughout the manuscript, the authors use a higher SMI to indicate a preference of cells for auditory stimuli with "greater spectral... complexity" (e.g., lines 219 and 372). I find this interpretation a bit challenging since SMI compares a neuron's preference for tones over noise, and to me, tones seem like the least spectrally complex of all auditory stimuli. Perhaps some clarification of what the authors mean by this would help. For example, is the assumption that a neuron that prefers tones over noise is, either directly or indirectly, receiving input sculpted by inhibitory processes?

      Thank you for the reviewer’s comment. In general, higher SMI values indicate an increase in the preference of the cells to respond to pure tones than noise with no modulation (less spectral complexity). We will clarify this statement throughout the manuscript. However, the SMI value was not mentioned in lines 219 and 372. The statement mentioned in line 219 describes the revised figure 5C (old 5B), where more cells in matrix specifically respond to AM-noise compared to modules, which indicates the preference of the matrix to respond to sounds of greater spectral and temporal complexity. The statement in 372 in the discussion section refers to the finding in revised figures 5E and F (old 5D and E). In the revised figure 5E or old 5D, the data show that matrix has more cells responding to pure tones or noise with no modulation than modules, so matrix has a lower threshold to detect the spectral features of sound (revised figure 5E or old 5D). In the revised figure 5F or old 5E, the data show that matrix has more cells responding to AM-noise than modules, which indicates that matrix functions more to process the temporal features of sound. As explained above, all findings were related to the percentage of cells responding to specific sound stimuli and not the exact SMI values. We have revised the figures accordingly by removing the terms SMI and TMI from the figures, and we have clarified that in the text.

      (4) Lines 250-253: How does a decrease in SMI correspond to "an increase in pure tone responsiveness?" Doesn't a decrease suggest the opposite?

      Thank you for the reviewer’s comment, which we agree with. We do apologize for that. We have fixed this statement [lines 275-277] and any related findings accordingly.

      (5) Line 304: Add "imaged via microprism" or similar after "response profiles with the LC.".

      Thank you for the reviewer’s suggestion. We have fixed that. However, we changed the term “LC imaged via microprism” into “LC(microprism)” for simplicity as suggested by another reviewer [line 330].

      (6) Figure 5A and C: Both plots show that more neurons responded to AM-noise than tones, but it would be interesting to know how much the tone-responsive and AM-noise responsive populations overlapped. Were all tone-responsive neurons also responsive to AM-noise?

      Thank you for the reviewer’s comment. We have categorized the cells based on their response to pure tone only, AM-only, and both pure tone and AM-noise when each stimulus is presented individually. We have modified Figures 5A and C, and they are now Figures 5B and D.

      (7) Figure 5G: Missing negative sign before "0.5.".

      Thank you for the reviewer’s suggestion. We gladly have fixed that. However, old Figure 5G became a revised Figure 5H.  

      (8) Figure 7 legend, Line 1102: Missing period after "(C and E)".

      Thank you for the reviewer’s suggestion. We think that the period should be placed before (C and E) at the end of “respectively”. The parentheses refer to the statements after them. We gladly fixed that. [line 1394]

    1. eLife assessment

      This useful study aimed to examine the relationship of spatial frequency selectivity of single macaque inferotemporal (IT) neurons to category selectivity. There are some interesting findings in this report but some of these findings were difficult to evaluate because several critical details of the analysis are incomplete. The conclusion that single-unit spatial frequency selectivity can predict object coding needs further evidence to confirm.

    2. Reviewer #1 (Public Review):

      This study reports that spatial frequency representation can predict category coding in the inferior temporal cortex. The original conclusion was based on likely problematic stimulus timing (33 ms which was too brief). Now the authors claim that they also have a different set of data on the basis of longer stimulus duration (200 ms).

      One big issue in the original report was that the experiments used a stimulus duration that was too brief and could have weakened the effects of high spatial frequencies and confounded the conclusions. Now the authors provided a new set of data on the basis of a longer stimulus duration and made the claim that the conclusions are unchanged. These new data and the data in the original report were collected at the same time as the authors report.

      The authors may provide an explanation why they performed the same experiments using two stimulus durations and only reported one data set with the brief duration. They may also explain why they opted not to mention in the original report the existence of another data set with a different stimulus duration, which would otherwise have certainly strengthened their main conclusions.

      I suggest the authors upload both data sets and analyzing codes, so that the claim could be easily examined by interested readers.

    3. Reviewer #2 (Public Review):

      Summary:

      This paper aimed to examine the spatial frequency selectivity of macaque inferotemporal (IT) neurons and its relation to category selectivity. The authors suggest in the present study that some IT neurons show a sensitivity for the spatial frequency of scrambled images. Their report suggests a shift in preferred spatial frequency during the response, from low to high spatial frequencies. This agrees with a coarse-to-fine processing strategy, which is in line with multiple studies in the early visual cortex. In addition, they report that the selectivity for faces and objects, relative to scrambled stimuli, depends on the spatial frequency tuning of the neurons.

      Strengths:

      Previous studies using human fMRI and psychophysics studied the contribution of different spatial frequency bands to object recognition, but as pointed out by the authors little is known about the spatial frequency selectivity of single IT neurons. This study addresses this gap and shows spatial frequency selectivity in IT for scrambled stimuli that drive the neurons poorly. They related this weak spatial frequency selectivity to category selectivity, but these findings are premature given the low number of stimuli they employed to assess category selectivity.

      The authors revised their manuscript and provided some clarifications regarding their experimental design and data analysis. They responded to most of my comments but I find that some issues were not fully or poorly addressed. The new data they provided confirmed my concern about low responses to their scrambled stimuli. Thus, this paper shows spatial frequency selectivity in IT for scrambled stimuli that drive the neurons poorly (see main comments below). They related this (weak) spatial frequency selectivity to category selectivity, but these findings are premature given the low number of stimuli to assess category selectivity.

      Main points.

      (1) They have provided now the responses of their neurons in spikes/s and present a distribution of the raw responses in a new Figure. These data suggest that their scrambled stimuli were driving the neurons rather poorly and thus it is unclear how well their findings will generalize to more effective stimuli. Indeed, the mean net firing rate to their scrambled stimuli was very low: about 3 spikes/s. How much can one conclude when the stimuli are driving the recorded neurons that poorly? Also, the new Figure 2- Appendix 1 shows that the mean modulation by spatial frequency is about 2 spikes/s, which is a rather small modulation. Thus, the spatial frequency selectivity the authors describe in this paper is rather small compared to the stimulus selectivity one typically observes in IT (stimulus-driven modulations can be at least 20 spikes/s).<br /> (2) Their new Figure 2-Appendix 1 does not show net firing rates (baseline-subtracted; as I requested) and thus is not very informative. Please provide distributions of net responses so that the readers can evaluate the responses to the stimuli of the recorded neurons.<br /> (3) The poor responses might be due to the short stimulus duration. The authors report now new data using a 200 ms duration which supported their classification and latency data obtained with their brief duration. It would be very informative if the authors could also provide the mean net responses for the 200 ms durations to their stimuli. Were these responses as low as those for the brief duration? If so, the concern of generalization to effective stimuli that drive IT neurons well remains.<br /> (4) I still do not understand why the analyses of Figures 3 and 4 provide different outcomes on the relationship between spatial frequency and category selectivity. I believe they refer to this finding in the Discussion: "Our results show a direct relationship between the population's category coding capability and the SF coding capability of individual neurons. While we observed a relation between SF and category coding, we have found uncorrelated representations. Unlike category coding, SF relies more on sparse, individual neuron representations.". I believe more clarification is necessary regarding the analyses of Figures 3 and 4, and why they can show different outcomes.<br /> (5) The authors found a higher separability for faces (versus scrambled patterns) for neurons preferring high spatial frequencies. This is consistent for the two monkeys but we are dealing here with a small amount of neurons. Only 6% of their neurons (16 neurons) belonged to this high spatial frequency group when pooling the two monkeys. Thus, although both monkeys show this effect I wonder how robust it is given the small number of neurons per monkey that belong to this spatial frequency profile. Furthermore, the higher separability for faces for the low-frequency profiles is not consistent across monkeys which should be pointed out.<br /> (6) I agree that CNNs are useful models for ventral stream processing but that is not relevant to the point I was making before regarding the comparison of the classification scores between neurons and the model. Because the number of features and trial-to-trial variability differs between neural nets and neurons, the classification scores are difficult to compare. One can compare the trends but not the raw classification scores between CNN and neurons without equating these variables.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study reports that IT neurons have biased representations toward low spatial frequency

      (SF) and faster decoding of low SFs than high SFs. High SF-preferred neurons, and low SF-preferred neurons to a lesser degree, perform better category decoding than neurons with other profiles (U and inverted U shaped). SF coding also shows more sparseness than category coding in the earlier phase of the response and less sparseness in the later phase. The results are also contrasted with predictions of various DNN models.

      Strengths:

      The study addressed an important issue on the representations of SF information in a high-level visual area. Data are analyzed with LDA which can effectively reduce the dimensionality of neuronal responses and retain category information.

      We would like to express our sincere gratitude for your insightful and constructive comments which greatly contributed to the refinement of the manuscript. We appreciate the time and effort you dedicated to reviewing our work and providing suggestions. We have carefully considered each of your comments and addressed the suggested revisions accordingly.

      Weaknesses:

      The results are likely compromised by improper stimulus timing and unmatched spatial frequency spectrums of stimuli in different categories.

      The authors used a very brief stimulus duration (35ms), which would degrade the visual system's contrast sensitivity to medium and high SF information disproportionately (see Nachmias, JOSAA, 1967). Therefore, IT neurons in the study could have received more degraded medium and high SF inputs compared to low SF inputs, which may be at least partially responsible for higher firing rates to low SF R1 stimuli (Figure 1c) and poorer recall performance with median and high SF R3-R5 stimuli in LDA decoding. The issue may also to some degree explain the delayed onset of recall to higher SF stimuli (Figure 2a), preferred low SF with an earlier T1 onset (Figure 2b), lower firing rate to high SF during T1 (Figure 2c), somewhat increased firing rate to high SF during T2 (because weaker high SF inputs would lead to later onset, Figure 2d).

      We appreciate your concern regarding the course-to-fine nature of SF processing in the vision hierarchy and the short exposure time of our paradigm. According to your comment, we repeated the analysis of SF representation with 200ms exposure time as illustrated in Appendix 1 - Figure 4. Our recorded data contains the 200ms version of exposure time for all neurons in the main phase. As can be seen, the results are similar to what we found with 33 ms experiments.

      Next, we bring your attention to the following observations:

      (1) According to Figure 2d, the average firing rate of IT neurons for HSF could be higher than LSF in the late response phase. Therefore, the amount of HSF input received by the IT neurons is as much as LSF, however, its impact on the IT response is observable in the later phase of the response. Thus, the LSF preference is because of the temporal advantage of the LSF processing rather than contrast sensitivity.

      (2) According to Figure 3a, 6% of the neurons are HSF-preferred and their firing rate in HSF is comparable to the LSF firing rate in the LSF-preferred group. This analysis is carried out in the early phase of the response (70-170 ms). While most of the neurons prefer LSF, this observation shows that there is an HSF input that excites a small group of neurons. Furthermore, the highest separability index also belongs to the HSF-preferred profile in the early phase of the response which supports the impact of the HSF part of the input.

      (3) Similar LSF-preferred responses are also reported by Chen et al. (2018) (50ms for SC) and Zhang et al. (2023) (3.5 - 4 secs for V2 and V4) for longer duration times.

      Our results suggest that the LSF-preferred nature of the IT responses in terms of firing rate and recall, is not due to the weakness or lack of input source (or information) for HSF but rather to the processing nature of the SF in the vision hierarchy.

      To address this issue in the manuscript:

      Figure Appendix 1 - Figure 4 is added to the manuscript and shows the recall value and onset for R1-R5 with 200ms of exposure time.

      We added the following description to the discussion:

      “To rule out the degraded contrast sensitivity of the visual system to medium and high SF information because of the brief exposure time, we repeated the analysis with 200ms exposure time as illustrated in Appendix 1 - Figure 4 which indicates the same LSF-preferred results. Furthermore, according to Figure 2, the average firing rate of IT neurons for HSF could be higher than LSF in the late response phase. It indicates that the amount of HSF input received by the IT neurons in the later phase is as much as LSF, however, its impact on the IT response is observable in the later phase of the response. Thus, the LSF preference is because of the temporal advantage of the LSF processing rather than contrast sensitivity. Next, according to Figure 3(a), 6\% of the neurons are HSF-preferred and their firing rate in HSF is comparable to the LSF firing rate in the LSF-preferred group. This analysis is carried out in the early phase of the response (70-170ms). While most of the neurons prefer LSF, this observation shows that there is an HSF input that excites a small group of neurons. Additionally, the highest SI belongs to the HSF-preferred profile in the early phase of the response which supports the impact of the HSF part of the input. Similar LSF-preferred responses are also reported by Chen et. al. (2018) (50ms for SC) and Zhang et. al. (2023) (3.5 - 4 secs for V2 and V4). Therefore, our results show that the LSF-preferred nature of the IT responses in terms of firing rate and recall, is not due to the weakness or lack of input source (or information) for HSF but rather to the processing nature of the SF in the IT cortex.”

      Figure 3b shows greater face coding than object coding by high SF and to a lesser degree by low SF neurons. Only the inverted-U-shaped neurons displayed slightly better object coding than face coding. Overall the results give an impression that IT neurons are significantly more capable of coding faces than coding objects, which is inconsistent with the general understanding of the functions of IT neurons. The problem may lie with the selection of stimulus images (Figure 1b). To study SF-related category coding, the images in two categories need to have similar SF spectrums in the Fourier domain. Such efforts are not mentioned in the manuscript, and a look at the images in Figure 1b suggests that such efforts are likely not properly made. The ResNet18 decoding results in Figure 6C, in that IT neurons of different profiles show similar face and object coding, might be closer to reality.

      Because of the limited number of stimuli in our experiments, it is hard to discuss the category selectivity, which needs a higher number of stimuli. To overcome the limited number of stimuli in our experiment, we fixed 60% (nine out of 15 stimuli) while varying the remaining stimuli to reduce the selective bias. To check the coding capability of the IT neurons for face and non-face objects, we evaluated the recall of face vs. non-face classification in intact stimuli (similar to classifiers stated in the manuscript). Results show that at the population level, the recall value for objects is 90.45%, and for faces is 92.45%. However, the difference is not significant (p-value=0.44). On the other hand, we note that a large difference in the SI value does not translate directly to the classification accuracy, rather it illustrates the strength of representation.

      Regarding the SF spectrums, after matching the luminance and contrast of the images we matched the power of the images concerning SF and category. Powers are calculated using the sum of the absolute value of the Fourier transform of the image. Considering all stimuli, the ANOVA analysis shows that various SF bands have similar power (one-way ANOVA, p-value=0.24). Furthermore, comparing the power of faces and images in all SF bands (including intact) and both unscrambled and scrambled images indicates no significant difference between face and object (p-vale > 0.1). Therefore, the result of Figure 3b suggests that IT employs various SF bands for the recognition of various objects.

      Comparing the results of CNNs and IT shows that the CNNs do not capture the complexities of the IT cortex in terms of SF. One of the sources of this difference is because of the behavioral saliency of the face stimulus in the training of the primate visual system.

      To address this issue in the manuscript:

      The following description is added to the discussion:

      “… the decoding performance of category classification (face vs. non-face) in intact stimuli is 94.2%. The recall value for objects vs. scrambled is 90.45%, and for faces vs. scrambled is 92.45% (p-value=0.44), which indicates the high level of generalizability and validity characterizing our results.”

      The following description is added to the method section, SF filtering.

      “Finally, we equalized the stimulus power in all SF bands (intact, R-R5). The SF power among all conditions (all SF bands, face vs. non-face and unscrambled vs. scrambled) does not vary significantly (p-value > 0.1). SF power is calculated as the sum of the square value of the image coefficients in the Fourier domain.”

      Reviewer #2 (Public Review):

      Summary:

      This paper aimed to examine the spatial frequency selectivity of macaque inferotemporal (IT) neurons and its relation to category selectivity. The authors suggest in the present study that some IT neurons show a sensitivity for the spatial frequency of scrambled images. Their report suggests a shift in preferred spatial frequency during the response, from low to high spatial frequencies. This agrees with a coarse-to-fine processing strategy, which is in line with multiple studies in the early visual cortex. In addition, they report that the selectivity for faces and objects, relative to scrambled stimuli, depends on the spatial frequency tuning of the neurons.

      Strengths:

      Previous studies using human fMRI and psychophysics studied the contribution of different spatial frequency bands to object recognition, but as pointed out by the authors little is known about the spatial frequency selectivity of single IT neurons. This study addresses this gap and they show that at least some IT neurons show a sensitivity for spatial frequency and

      interestingly show a tendency for coarse-to-fine processing.

      We extend our sincere appreciation for your thoughtful and constructive feedback on our paper. We are grateful for the time and expertise you invested in reviewing our work. Your detailed suggestions have been instrumental in addressing several key aspects of the paper, contributing to its clarity and scholarly merit. We have carefully considered each of your comments and have made revisions accordingly.

      Weaknesses and requested clarifications:

      (1) It is unclear whether the effects described in this paper reflect a sensitivity to spatial frequency, i.e. in cycles/ deg (depends on the distance from the observer and changes when rescaling the image), or is a sensitivity to cycles /image, largely independent of image scale. How is it related to the well-documented size tolerance of IT neuron selectivity?

      Our stimuli are filtered using cycles/images and knowing the distance of the subject to the monitor, we can calculate the cycles/degrees. To the best of our knowledge, this is also the case for all other SF-related studies. To find the relation of observations to the cycles/image and degree/image, one should keep one of them fixed while changing the other, for example changing the subject's distance to the monitor will change the SF content in terms of cycle/degree. With our current data, we cannot discriminate this effect. To address this issue, we added the following description to the discussion. To address this issue, we added the following description to the discussion:

      “Finally, since our experiment maintains a fixed SF content in terms of both cycles per degree and cycles per image, further experiments are needed to discern whether our observations reflect sensitivity to cycles per degree or cycles per image.”

      (2) The authors' band-pass filtered phase scrambled images of faces and objects. The original images likely differed in their spatial frequency amplitude spectrum and thus it is unclear whether the differing bands contained the same power for the different scrambled images. If not, this could have contributed to the frequency sensitivity of the neurons.

      After equalizing the luminance and contrast of the images, we equilized their power concerning SF and category. The powers were calculated using the sum of the absolute values of the Fourier transform of the images. The results of the ANOVA analysis across all stimuli indicate that various SF bands exhibit similar power (one-way ANOVA, p-value = 0.24). Additionally, a comparison of power between faces and objects in all SF bands (including intact), for both unscrambled and scrambled images, reveals no significant differences (p-value > 0.1). To clarify this point, we have incorporated the following information into the Methods section.

      “Finally, we equalized the stimulus power in all SF bands (intact, R-R5). The SF power among all conditions (all SF bands, face vs. non-face and unscrambled vs. scrambled) does not vary significantly (ANOVA, p-value > 0.1).”

      (3) How strong were the responses to the phase-scrambled images? Phase-scrambled images are expected to be rather ineffective stimuli for IT neurons. How can one extrapolate the effect of the spatial frequency band observed for ineffective stimuli to that for more effective stimuli, like objects or (for some neurons) faces? A distribution should be provided, of the net responses (in spikes/s) to the scrambled stimuli, and this for the early and late windows.

      The sample neuron in Figure 1c is chosen to be a good indicator of the recorded neurons. In the early response phase, the average firing rate to scrambled stimuli is 26.3 spikes/s which is significantly higher than the response in -50 to 50ms which is 23.4. In comparison, the mean response to intact face stimuli is 30.5 spikes/s, while object stimuli elicit an average response of 28.8 spikes/s. Moving to the late phase, T2, the responses to scrambled, face, and object stimuli are 19.5, 19.4, and 22.4 spikes/s, respectively. Moreover, when the classification accuracy for SF exceeds chance levels, it indicates a significant impact of SF bands on the IT response. This raises a direct question about the explicit coding for SF bands in the IT cortex observed for ineffective stimuli and how it relates to complex and effective stimuli, such as faces. To show the strength of neuron responses to the SF bands in scrambled images, We added Appendix 1 - Figure 2 and also added Appendix 1 - Figure 1, according to comment 4, which shows the average and std of the responses to all SF bands. The following description is added to the results section.

      “Considering the strength of responses to scrambled stimuli, the average firing rate in response to scrambled stimuli is 26.3 Hz, which is significantly higher than the response observed between -50 and 50 ms, where it is 23.4 Hz (p-value=3x10-5). In comparison, the mean response to intact face stimuli is 30.5 Hz, while non-face stimuli elicit an average response of 28.8 Hz. The distribution of neuron responses for scrambled, face, and non-face in T1 is illustrated in Appendix 1 - Figure 2.

      […]

      Moreover, the average firing rates of scrambled, face, and non-face stimuli are 19.5 Hz, 19.4 Hz, and 22.4 Hz, respectively. The distribution of neuron responses is illustrated in Appendix 1 Figure 2.”

      (4) The strength of the spatial frequency selectivity is unclear from the presented data. The authors provide the result of a classification analysis, but this is in normalized units so that the reader does not know the classification score in percent correct. Unnormalized data should be provided. Also, it would be informative to provide a summary plot of the spatial frequency selectivity in spikes/s, e.g. by ranking the spatial frequency bands for each neuron based on half of the trials and then plotting the average responses for the obtained ranks for the other half of the trials. Thus, the reader can appreciate the strength of the spatial frequency selectivity, considering trial-to-trial variability. Also, a plot should be provided of the mean response to the stimuli for the two analysis windows of Figure 2c and 2d in spikes/s so one can appreciate the mean response strengths and effect size (see above).

      The normalization of the classification result is just obtained by subtracting the chance level, which is 0.2, from the whole values. Therefore the values could still be interpreted in percent as we did in the results section. To make this clear, we removed the “a.u.” from the figure and we added the following description to the results section.

      “The accuracy value is normalized by subtracting the chance level (0.2).”

      Regarding the selectivity of the neuron, as suggested by your comment, we added a new figure in the appendix section, Appendix 1 - figure 2. This figure shows the strength of SF selectivity, considering trial-to-trial variability. The following description is added to the results section:

      “The strength of SF selectivity, considering the trial-to-trial variability is provided in Appendix 1 Figure 2, by ranking the SF bands for each neuron based on half of the trials and then plotting the average responses for the obtained ranks for the other half of the trials.”

      The firing rates of Figures 2c and 2d are normalized for better illustration since the variation in firing rates is high across neurons, as can be observed in Figure Appendix 1 - Figure 1. Since we seek trends in the response, the absolute values are not important (since the baseline firing rates of neurons are different), but the values relative to the baseline firing rate determine the trend. To address the mean response and the strength of the SF response, the following description is added to the results section.

      “Considering the strength of responses to scrambled stimuli, the average firing rate in response to scrambled stimuli is 26.3 Hz, which is significantly higher than the response observed between -50 and 50 ms, where it is 23.4 Hz (p-value=3x10-5). In comparison, the mean response to intact face stimuli is 30.5 Hz, while non-face stimuli elicit an average response of 28.8 Hz. The distribution of neuron responses for scrambled, face, and non-face in T1 is illustrated in Appendix 1 - Figure 2.

      […]

      Moreover, the average firing rates of scrambled, face, and non-face stimuli are 19.5 Hz, 19.4

      Hz, and 22.4 Hz, respectively. The distribution of neuron responses is illustrated in Appendix 1 Figure 2.”

      Furthermore, we added a figure, Appendix 1 - Figure 3, to illustrate the strength of SF selectivity in our profiles. The following is added to the results section:

      “To check the robustness of the profiles, considering the trial-to-trial variability, the strength of SF selectivity in each profile is provided in Appendix 1 - Figure 3, by forming the profile of each neuron based on half of the trials and then plotting the average SF responses with the other

      half of the trials.”

      (5) It is unclear why such brief stimulus durations were employed. Will the results be similar, in particular the preference for low spatial frequencies, for longer stimulus durations that are more similar to those encountered during natural vision?

      Please refer to the first comment of Reviewer 1.

      (6) The authors report that the spatial frequency band classification accuracy for the population of neurons is not much higher than that of the best neuron (line 151). How does this relate to the SNC analysis, which appears to suggest that many neurons contribute to the spatial frequency selectivity of the population in a non-redundant fashion? Also, the outcome of the analyses should be provided (such as SNC and decoding (e.g. Figure 1D)) in the original units instead of undefined arbitrary units.

      The population accuracy is approximately 5% higher than the best neuron. However, we have no reference to compare the effect size (the value is roughly similar for face vs object while the chance levels are different). However, as stated in Methods, SNC is calculated for two label modes (LSF and HSF) and it can not be directly compared to the best neuron accuracy. Regarding the unit of SNC, it can be interpreted directly to percent by multiplying by a factor of 100. We removed the “a.u.” to prevent misunderstanding and modified the results section for clearance.

      “… SNC score for SF (two labels, LSF (R1 and R2) vs. HSF (R4 and R5)) and category … (average SNC for SF=0.51\%±0.02 and category=0.1\%±0.04 …”

      (7) To me, the results of the analyses of Figure 3c,d, and Figure 4 appear to disagree. The latter figure shows no correlation between category and spatial frequency classification accuracies while Figure 3c,d shows the opposite.

      In Figure 3c,d, following what we observed in Figure 3a,b about the category coding capabilities in the population of neurons based on the profile of the single neurons, we tested a similar idea if the coding capability of single neurons in SF/category could predict the coding capability of population neurons in terms of category/SF. Therefore, both analyses investigate a relation between a characteristic of single neurons and the coding capability of a population of similar neurons. On the other hand, in Figure 4, the idea is to check the characteristics of the coding mechanisms behind SF and category coding. In Figure 4a, we check if there exists any relation between category and SF coding capability within a single neuron activity without the impact of other neurons, to investigate the idea that SF coding may be a byproduct of an object recognition mechanism. In Figure 4b, we investigated the contribution of all neurons in population decision, again to check whether the mechanisms behind the SF and category coding are the same or not. This analysis shows how individual neurons contribute to SF or category coding at the population level. Therefore, the experiments in Figures 3 and 4 are different in the analysis method and what they were designed to investigate and we cannot directly compare the results.

      (8) If I understand correctly, the "main" test included scrambled versions of each of the "responsive" images selected based on the preceding test. Each stimulus was presented 15 times (once in each of the 15 blocks). The LDA classifier was trained to predict the 5 spatial frequency band labels and they used 70% of the trials to train the classifier. Were the trained and tested trials stratified with respect to the different scrambled images? Also, LDA assumes a normal distribution. Was this the case, especially because of the mixture of repetitions of the same scrambled stimulus and different scrambled stimuli?

      In response to your inquiry regarding the stratification of trials, both the training and testing data were representative of the entire spectrum of scrambled images used in our experiment. To address your concern about the assumption of a normal distribution, especially given the mixture of repetitions of the same scrambled stimulus and different stimuli, our analysis of firing rates reveals a slightly left-skewed normal distribution. While there is a deviation from a perfectly normal distribution, we are confident that this skewness does not compromise the robustness of the LDA classifier.

      (9) The LDA classifiers for spatial frequency band (5 labels) and category (2 labels) have different chance and performance levels. Was this taken into account when comparing the SNC between these two classifiers? Details and SNC values should be provided in the original (percent difference) instead of arbitrary units in Figure 5a. Without such details, the results are impossible to evaluate.

      For both SNC and CMI calculations in SF, we considered two labels of HSF (R4 and R5) and LSF (R1 and R2). This was mentioned in the Methods section, after equation (5). According to your comment, to make it clear in the results section, we also added this description to the results section.

      “… illustrates the SNC score for SF (two labels, LSF (R1 and R2) vs. HSF (R4 and R5)) and category (face vs. non-face) … conditioned on the label, SF (LSF (R1 and R2) vs. HSF (R4 and R5)) or category, to assess the information.”

      The value of SNC can also be directly converted to the percent by a factor of 100. To make it clear, we removed “a.u.” from the y-axis.

      (10) Recording locations should be described in IT, since the latter is a large region. Did their recordings include the STS? A/P and M/L coordinate ranges of recorded neurons?

      We appreciate your suggestion for the recording location. Nevertheless, given the complexities associated with neurophysiological recordings and the limitations imposed by our methodologies, we face challenges in precisely localizing every unit if they are located in STS or not. To address your comment, We added Appendix 1 - Figure 5 which shows the SF and category coding capability of neurons along their recorded locations.

      (11) The authors should show in Supplementary Figures the main data for each of the two animals, to ensure the reader that both monkeys showed similar trends.

      We added Appendix 2 which shows the consistency of the main results in the two monkeys.

      (12) The authors found that the deep nets encoded better the spatial frequency bands than the IT units. However, IT units have trial-to-trial response variability and CNN units do not. Did they consider this when comparing IT and CNN classification performance? Also, the number of features differs between IT and CNN units. To me, comparing IT and CNN classification performances is like comparing apples and oranges.

      Deep convolutional neural networks are currently considered the state-of-the-art models of the primate visual pathway. However, as you mentioned and based on our results, they do not yet capture various complexities of the visual ventral stream. Yet studying the similarities and differences between CNN and brain regions, such as the IT cortex, is an active area of research, such as:

      a. Kubilius, Jonas, et al. "Brain-like object recognition with high-performing shallow recurrent ANNs." Advances in neural information processing systems 32 (2019).

      b. Xu, Yaoda, and Maryam Vaziri-Pashkam. "Limits to visual representational correspondence between convolutional neural networks and the human brain." Nature Communications, 12.1 (2021).

      c. Jacob, Georgin, et al. "Qualitative similarities and differences in visual object representations between brains and deep networks." Nature Communications, 12.1 (2021).

      Therefore, we believe comparing IT and CNN, despite all of the differences in terms of their characteristics, can help both fields grow faster, especially in introducing brain-inspired networks.

      (13) The authors should define the separability index in their paper. Since it is the main index to show a relationship between category and spatial frequency tuning, it should be described in detail. Also, results should be provided in the original units instead of undefined arbitrary units. The tuning profiles in Figure 3A should be in spikes/s. Also, it was unclear to me whether the classification of the neurons into the different tuning profiles was based on an ANOVA assessing per neuron whether the effect of the spatial frequency band was significant (as should be done).

      Based on your comment, we added the description of the separability index to the methods section. However, since the separability index is defined as the division of two dispersion matrices, it has no units by nature. The tuning profiles in Figure 3a are normalized for better illustration since the variation in firing rates is high. Since we seek trends in the response, the absolute values are not important. Regarding the SF profile formation, to better present the SF profile assignment, we updated the method section. Furthermore, The strength of responses for scrambled stimuli can be observed in Appendix 1 - Figures 1 and 2.

      (14) As mentioned above, the separability analysis is the main one suggesting an association between category and spatial frequency tuning. However, they compute the separability of each category with respect to the scrambled images. Since faces are a rather homogeneous category I expect that IT neurons have on average a higher separability index for faces than for the more heterogeneous category of objects, at least for neurons responsive to faces and/or objects. The higher separability for faces of the two low- and high-pass spatial frequency neurons could reflect stronger overall responses for these two classes of neurons. Was this the case? This is a critical analysis since it is essential to assess whether it is category versus responsiveness that is associated with the spatial frequency tuning. Also, I do not believe that one can make a strong claim about category selectivity when only 6 faces and 3 objects (and 6 other, variable stimuli; 15 stimuli in total) are employed to assess the responses for these categories (see next main comment). This and the above control analysis can affect the main conclusion and title of the paper.

      We appreciate your concern regarding category selectivity or responsiveness of the SF profiles. First, we note that we used SI since it overcomes the limitations of the accuracy and recall metrics as they are discrete and can be saturated. Using SI, we cannot directly calculate face vs object with SI, since this index only reports one value for the whole discrimination task. Therefore, we have to calculate the SI for face/object vs scrambled to obtain a value per category. However, as you suggested, it raises the question of whether we assess how well the neural responses distinguish between actual images (faces or objects) and their scrambled versions or if we just assess the responsiveness. Based on Figure 3b, since we have face-selective (LSF and HSF preferred profiles), object-selective (inverse U), and the U profile, where SI is the same for both face and object, we believe the SF profile is associated with the category selectivity, otherwise we would have the same face/object recall in all profiles, as we have in the U shape profile.

      To analyze this issue further, we calculated the number of face/object selective neurons in 70-170ms. We found 43 face-selective neurons and 36 object-selective neurons (FDR corrected p-value < 0.05). Therefore, the number of face-selective and object-selective neurons is similar. Next, we check the selectivity of the neurons within each profile. Number of face/object selective neurons is LP=13/3, HP=6/2, IU=3/9, U=14/13, and the remaining belong to the NP group. Results show higher face-selective neurons in LP and HP and a higher number of object-selective neurons in the IU class. The U class contains roughly the same number of face and object-selective neurons. This observation supports the relationship between category selectivity and profiles.

      Next, we examined the average neuron response to the face and object in each profile. The difference between the firing rate of the face and object in none of the profiles was significant (Ranksum with a significance level of 0.05). However, the rates are as follows. The average firing rate (spikes/s) of face/object is LP=36.72/28.77, HP=28.55/25.52, IU=21.55/27.25, U=38.48/36.28. While the differences are not significant, they support the relationship between profiles and categories instead of responsiveness.

      The following description is added to the results section to cover this point of view.

      “To assess whether the SF profiles distinguish category selectivity or merely evaluate the neuron's responsiveness, we quantified the number of face/non-face selective neurons in the 70-170ms time window. Our analysis shows a total of 43 face-selective neurons and 36 non-face-selective neurons (FDR-corrected p-value < 0.05). The results indicate a higher proportion of face-selective neurons in LP and HP, while a greater number of non-face-selective neurons is observed in the IU category (number of face/non-face selective neurons: LP=13/3, HP=6/2, IU=3/9). The U category exhibits a roughly equal distribution of face and non-face-selective neurons (U=14/13). This finding reinforces the connection between category selectivity and the identified profiles. We then analyzed the average neuron response to faces and non-faces within each profile. The difference between the firing rates for faces and non-faces in none of the profiles is significant (face/non-face average firing rate (Hz): LP=36.72/28.77, HP=28.55/25.52, IU=21.55/27.25, U=38.48/36.28, Ranksum with significance level of 0.05). Although the observed differences are not statistically significant, they provide support for the association between profiles and categories rather than mere responsiveness.”

      About the low number of stimuli, please check the next comment.

      (15) For the category decoding, the authors employed intact, unscrambled stimuli. Were these from the main test? If yes, then I am concerned that this represents a too small number of stimuli to assess category selectivity. Only 9 fixed + 6 variable stimuli = 15 were in the main test. How many faces/ objects on average? Was the number of stimuli per category equated for the classification? When possible use the data of the preceding selectivity test which has many more stimuli to compute the category selectivity.

      We used only the main phase recorded data, which contains 15 images in each session. Each image results in 12 stimuli (intact, R1-R5, and phase-scrambled version). Thus, there exists a total of 180 unique stimuli in each session. Increasing the number of images would have increased the recording time. We compensated for this limitation by increasing the diversity of images in each session by picking the most responsive ones from the selectivity phase. On average, 7.54 of the stimuli were face in each session. We added this information to the Methods section. Furthermore, as mentioned in the discussion, for each classification run, the number of samples per category is equalized. We note that we cannot use the selectivity data for analysis, since the SF-related stimuli are filtered in different bands.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      I suggest that the authors double-check their results by performing control experiments with longer stimulus duration and SF-spectrum-matched face and object stimuli.

      Thanks for your suggestion, according to your comment, we added Appendix 1 - Figure 3.

      In addition, I had a very difficult time understanding the differences between Figure 3c and Figure 4a. Please rewrite the descriptions to clarify.

      Thanks for your suggestion, we tried to revise the description of these two figures. The following description is added to the results section for Figure 3c.

      “Next, to examine the relation between the SF (category) coding capacity of the single neurons and the category (SF) coding capability of the population level, we calculated the correlation between coding performance at the population level and the coding performance of single neurons within that population (Figure 3 c and d). In other words, we investigated the relation between single and population levels of coding capabilities between SF and category. The SF (or category) coding performance of a sub-population of 20 neurons that have roughly the same single-level coding capability of the category (or SF) is examined.”

      Lines 147-148: The text states that 'The maximum accuracy of a single neuron was 19.08% higher than the chance level'. However, in Figure 4, the decoding accuracies of individual neurons for category and SF range were between 49%-90% and 20%-40%, respectively.

      Please explain the discrepancies.

      The first number is reported according to chance level which is 20%, thus the unnormalized number is 39% which is consistent with the SF accuracy in Figure 4. We added the following description to prevent any misunderstanding.

      “… was 19.08\% higher than the chance level (unnormalized accuracy is 49.08\%, neuron \#193, M2).”

      Lines 264-265: Should 'the alternative for R3 and R4' be 'the alternative for R4 and R5'?

      Thanks for your attention, it's “R4 and R5”. We corrected that mistake.

      Lines 551-562: The labels for SF classification are R1-R5. Is it a binary or a multi-classification task?

      It’s a multi-label classification. We made it clear in the text.

      “… labels were SF bands (R1, R2, ..., R5, multi-label classifier).”

      Figure 4b: Neurons in SF/category decoding exhibit both positive and negative weights. However, in the analysis of sparse neuron weights in Equation 6, only the magnitude of the weights is considered. Is the sign of weight considered too?

      We used the absolute value of the neuron weight to calculate sparseness. We also corrected Equation 6.

      Reviewer #2 (Recommendations For The Authors):

      (1) Line 52: what do the authors mean by coordinate processing in object recognition?

      To avoid any potential misunderstanding, we used the exact phrase in Saneyoshi and Michimata (2015). It is in fact, coordinate relations processing. Coordinate relations specify the metric information of the relative locations of objects.

      (2) About half of the Introduction is a summary of the Results. This can be shortened.

      Thanks for your suggestion.

      (3) Line 134: Peristimulus time histogram instead of Prestimulus time histogram.

      Thanks for your attention. We corrected that.

      (4) Line 162: the authors state that R1 is decoded faster than R5, but the reported statistic is only for R1 versus R2.

      It was a typo, the p-value is only reported for R1 and R5.

      (5) Line 576: which test was used for the asses the statistical significance?

      The test is Wilcoxon signed-rank. We added it to the text.

      (6) How can one present a 35 ms long stimulus with a 60 Hz frame rate (the stimuli were presented on a 60Hz monitor (line 470))? Please correct.

      Thanks for your attention. We corrected that. The time of stimulus presentation is 33ms and the monitor rate is 120Hz.

    1. Author response:

      The following is the authors’ response to the original reviews.

      These are valuable findings that support a link between low-dimensional brain network organization, patterns of ongoing thought, and trait-level personality factors, making it relevant for researchers in the field of spontaneous cognition, personality, and neuropsychiatry. While this link is not entirely new, the paper brings to bear a rich dataset and a well-conducted study, to approach this question in a novel way. The evidence in support of the findings is convincing.

      We thank the reviewers and editors for their time, feedback, and recommendations for improvement. We have revised the manuscript with those recommendations in mind and provide a point-by-point description of the revisions below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors ran an explorative analysis in order to describe how a "tri-partite" brain network model could describe the combination of resting fMRI data and individual characteristics. They utilized previously obtained fMRI data across four scanning runs in 144 individuals. At the end of each run, participants rated their patterns of thinking on 12 statements (short multi-dimensional experience sampling-MDES) using a 0-100% visual analog scale. Also, 71 personality traits were obtained on 21 questionnaires. The authors ran two separate principal component analyses (PCA) to obtain low dimensional summaries of the two individual characteristics (personality traits from questionnaires, and thought patterns from MDES). The dimensionality reduction of the fMRI data was done by means of gradient analysis, which was combined with Neurosynth decoding to visualize the functional axis of the gradients. To test the reliability of thought components across scanning time, intra-class correlation coefficients (ICC) were calculated for the thought patterns, and discriminability indices were calculated for whole gradients. The relationship between individual differences in traits, thoughts, and macro-scale gradients was tested with multivariate regression.

      The authors found: a) reliability of thought components across the one hour of scanning, b) Gradient 1 differentiated between visual regions and DMN, Gradient 2 dissociated somatomotor from visual cortices, Gradient 3 differentiated the DMN from the fronto-parietal system, c) the associations between traits/thought patterns and brain gradients revealed significant effects of "introversion" and "specific internal" thought: "Introversion" was associated with variant parcels on the three gradients, with most of parcels belonging to the VAN and then to the DMN; and "Specific internal thought" was associated with variant parcels on the three gradients with most of parcels belonging to the DAN and then the visual. The authors conclude that interactions between attention systems and the DMN are important influences on ongoing thought at rest.

      Strengths:

      The study's strength lies in its attempt to combine brain activity with individual characteristics using state-of-the-art methodologies.

      Weaknesses:

      The study protocol in its current form restricts replicability. This is largely due to missing information on the MRI protocol and data preprocessing. The article refers the reader to the work of Mendes et al 2019 which is said to provide this information, but the paper should rather stand alone with all this crucial material mentioned here, as well. Also, effect sizes are provided only for the multiple multivariate regression of the inter-class correlations, which makes it difficult to appreciate the power of the other obtained results. 

      Thank you for these comments. We have addressed both issues by adding effect sizes for reported trait and thought related effects within the results table (Table 3, Line 427) and providing more information about the fMRI protocol and preprocessing steps.  (Lines 162- 188)

      Reviewer #2 (Public Review):

      The authors set out to draw further links between neural patterns observed at "rest" during fMRI, with their related thought content and personality traits. More specifically, they approached this with a "tri-partite network" view in mind, whereby the ventral attention network (VAN), the dorsal attention network (DAN), and the default mode network (DMN) are proposed to play a special role in ongoing conscious thought. They used a gradients approach to determine the low dimensional organisation of these networks. In concert, using PCA they reduced thought patterns captured at four time points during the scan, as well as traits captured from a large battery of questionnaires.

      The main findings were that specific thought and trait components were related to variations in the organisation of the tri-partite networks, with respect to cortical gradients.

      Strengths of the methods/results: Having a long (1 hr) resting state MRI session, which could be broken down into four separate scanning/sampling components is a strength. Importantly, the authors could show (via intra-class correlation coefficients) the similarity of thoughts and connectivity gradients across the entire session. Not only did this approach increase the richness of the data available to them, it speaks in an interesting way to the stability of these measures. The inclusion of both thought patterns during scanning along with trait-level dispositional factors is most certainly a strength, as many studies will often include either/or of these, rather than trying to reconcile across. Of the two main findings, the finding that detailed self-generated thought was associated with a decoupling of regions of DAN from regions in DMN was particularly compelling, in light of mounting literature from several fields that support this.

      Weaknesses of the methods/results: Considering the richness of the thought and personality data, I was a little surprised that only two main findings emerged (i.e., a relationship with trait introversion, and a relationship with the "specific internal" thought pattern). I wondered whether, at least in part and in relation to traits, this might stem from the large and varied set of questionnaires used to discern the traits. These questionnaires mostly comprised personality/mood, but some sampled things that do not fall into that category (e.g., musicality, internet addition, sleep), and some related directly to spontaneous thought properties (e.g., mind wandering, musical imagery). It would be interesting to see what relationships would emerge by being more selective in the traits measured, and in the tools to measure them.

      We agree that being more selective in trait measures and measuring tools could lead to more insights into trait – brain relationships. In part the emergence of only two main findings could also be a trade-off of multiple comparison corrections inherent in our current approach (i.e. 400 separate models for all parcels). Furthermore, we have adjusted the text in the discussion in this revision to highlight that more targeted measures of personality (e.g. self-consciousness) could provide a more nuanced view of the relationship between traits and patterns of thought at rest. (Line 532):

      “In the future it may also be important to consider measures of traits that could have relationships to both neural activity and or experience at rest (e.g. self-consciousness de Caso et al., 2017, or autistic tendencies, Turnbull et al., 2020a).”  

      Taken together, the main findings are interesting enough. However, the real significance of this work, and its impact, lie in the richness of the approach: combing across fMRI, spontaneous thought, and trait-level factors. Triangulating these data has important potential for furthering our understanding of brain-behaviour relationship across different levels of organisation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Recommendations for improving the writing and presentation.

      - Frame the study objectives more clearly. If it's about which theoretical framework best supports the data, you might need to advocate on why the tri-partite approach is a more efficient framework than others. If not, the argument will beg the question: you will find an effect on this model, so you will claim that this is an informative model. For example, if the focus is on these three RSNs and thought reporting, the authors might want to contextualize it historically, like how from two networks (DMN-antagonistic; Vanhaudenhuyse JCognNeurosci 2012; Demertzi et al, NetwNeuroci 2022) we end up to three and why this is a more suitable approach. What about whole-brain connectomic approaches, such as the work by Amico et al? 

      We have expanded on the objectives and rationale of the study by editing/ expanding the introduction as follows (Lines 84-87): 

      “Traditionally, it was argued that the DMN was thought to have an antagonistic relationship with systems linked to external processing (Fox et al., 2005). However, according to the ‘tri-partite’ network accounts the relationship between the DMN and other brain systems is more nuanced. From this perspective key hubs of the ventral attention network, such as the anterior insula and dorso-lateral prefrontal cortex, help gate access to conscious experience, influencing regardless of the focus of attention. This is hypothesised to occur because the VAN influences interactions between the DAN, which is more important for external mental content (Corbetta and Shulman, 2002), and the DMN which is important when states (including tasks) rely more on internal representations (Smallwood et al., 2021a)..”  (… and Lines 112:125):

      “Our current study explored whether this “tri-partite network” view of ongoing conscious thought derived from studies focused on understanding conscious experience, provides a useful organizing framework for understanding the relation between observed brain activity at rest and patterns of cognition/ personality traits. Such analysis is important because at rest there are multiple features of brain activity that can be identified via complex analyses that include regions that show patterns of coactivation (which are traditionally viewed as forming a cohesive network, (Biswal et al., 1995) as well as patterns of anti-correlation with other regions (e.g. Fox et al., 2005). However, it is unclear which of these relationships reflect aspects of cognition or behaviour or are in fact aspects of the functional organization of the cortex (Fox and Raichle, 2007). Consequently, our study builds on foundational work (e.g. Vanhaudenhuyse et al., 2011) in order to better understand which aspects of neural function observed at rest are mostly likely linked to cognition and behaviour. With this aim in mind, we examined links between macro-scale neural activation and both (i) trait descriptions of individuals and (ii) patterns of ongoing thought.”

      - As there was no explicit description of the adopted design and the fMRI procedure, I deduced that it was about a within-subject design, 1-hour scanning session, comprised of four runs, each lasting 15 min, can that be correct? In any case, an explicit description of the design and the fMRI procedure eases the reading and replicability. 

      Thank you for pointing this out. We have now restructured and edited the text relating to write those details clearly and explain the MDES part of the procedure in the same section. It now reads (Lines 162:167): 

      “Resting state fMRI with Multidimensional Experience Sampling (MDES)

      The current sample includes one hour of fully pre-processed rs-fMRI data from 144 participants (four scans from 135 participants, and three scans from nine participants whose data were missing or incomplete). The rs-fMRI was performed in four adjacent 15-minute sessions each immediately followed by MDES which retrospectively measured various dimensions of spontaneous thought during the scan.”

      - Was there a control to the analysis, such as a gradient which also associated with these characteristics? Anything else?

      In our analyses we explore multiple gradients and how they link to traits and thoughts at rest. While there is no explicit control, each analyses provides a constraint on the interpretation of the other analyses. We have added the following text to expand on this point (Line 372): 

      “To this end, we performed a multiple multivariate regression with thoughts, traits, and nuisance variables (motion, age and gender) as independent variables, with whole brain functional organisation, as captured by the first three gradients, as dependent variables. In this analytic approach relationships between cognition along one gradient but not along another help identify which relationships between brain systems are mostly likely to relate to the feature of cognition in question (i.e. each gradient acts as a control for the other).”  

      - I feel that Table 1 (list of tests) carries less information compared to Supplementary Table 1 (how spontaneous thought was reported and scored). I would suggest swapping them, unless Table 1 further contains which outcome measures per test were used for the analysis.  

      Thank you for this suggestion. Table showing the MDES questions has now been moved to the main text (Table 1, Line 194). However, as there is no other description of the questionnaires included in the main text, we have also retained the table listing personality/ trait questionnaires (Table 2, Line 200).

      - Ten group-level gradients were calculated out of which three were shown on the basis of previous work. Please, visualize all 10 gradients as complementary material to inform potential future works on how these look.  

      Thank you for this suggestion. Supplementary figure 3 now shows all 10 gradients.

      - Please provide more information on preprocessing, especially with motion artifacts and how the global signal was processed.  

      Thank you for pointing this out. We have now included the following text, summarized from Mendes et al., 2019, to describe the preprocessing in brief (Line 171:188): 

      “Motion correction parameters were derived by rigid-body realignment of the timeseries to the first (after discarding the first five volumes) volume with FSL MCFLIRT (Jenkinson et al., 2002). Parameters for distortion correction were calculated by rigidly registering a temporal mean image of this time series to the fieldmap magnitude image using FSL FLIRT (Jenkinson and Smith, 2001) which was then unwarped using FSL FUGUE (Jenkinson et al., 2012). Transformation parameters were derived by coregistering the unwarped temporal mean to the subject’s structural scan using FreeSurfer’s boundary-based registration algorithm (Greve and Fischl, 2009). All three spatial transformations were then combined and applied to each volume of the original time series in a single interpolation step. The time series was residualised against the six motion parameters, their first derivatives, “outliers” identified by Nipype’s rapidart algorithm (https://nipype.readthedocs.io/en/latest/interfaces/ A CompCor (Behzadi et al., 2007) approach was implemented to remove physiological noise from the residual time-series- which included first six principal components from all the voxels identified as white-matter cerebrospinal fluid. The denoised time series were temporally filtered to a frequency range between 0.01 and 0.1 Hz using FSL, mean centered and variance normalized using Nitime (Rokem et al., 2009). Imaging and pre-processing protocols are described in detail in Mendes et al (Mendes et al., 2019).”

      - Please, describe the duration of the whole process, and when the questionnaire data were collected.

      We apologize for the lack of clarity. “Data” section of the Methods has now been edited to explain this more clearly, it now reads (Line 146:154):

      “The dataset used here is part of the MPI-Leipzig Mind-Brain-Body (MPILMBB) database (Mendes et al., 2019). The complete dataset consists of a battery of selfreported personality measures, measures of spontaneous thought, task data, and structural and resting-state functional MRI (one hour, divided into four adjacent 15-min sessions) from participants between 20 and 75 years of age. Data were collected over a period of five days, with the MRI sessions always falling on day 3. The questionnaires were completed by participants before and after this day, using Limesurvey (https://www.limesurvey.org: version 2.00+) at their own convenience and using penand-paper on-site. A detailed description of the participants, measures, and data acquisition protocol has been previously published along with the dataset (Mendes et al., 2019).”

      - In light of the discussion about sample sizes and the power of the correlations, can you indicate the effect sizes of the reported results?  

      Thank you for pointing this out. Effect sizes have been added to the results table (Table 3, Line 427)

      Minor corrections to the text and figures

      - Introduction: "Our sample was a cohort....states were explanatory variables": Better move this part to Methods. Ideally, provide the hypotheses here, the ways you wanted to test them, and how you would negate them. What would it mean that you got the hypotheses confirmed? What would the opposite outcome mean? 

      We have added the following text before this part to clarify expand on the objective of the study (Lines 112:125): 

      “Our current study explored whether this “tri-partite network” view of ongoing conscious thought derived from studies focused on understanding conscious experience, provides a useful organising framework for understanding the relation between observed brain activity at rest and patterns of cognition/ personality traits. Such analysis is important because at rest there are multiple features of brain activity that can be identified via complex analyses that include regions that show patterns of coactivation (which are traditionally viewed as forming a cohesive network, (Biswal et al., 1995) as well as patterns of anti-correlation with other regions (e.g. Fox et al., 2005). However, it is unclear which of these relationships reflect aspects of cognition or behaviour or are in fact aspects of the functional organisation of the cortex (Fox and Raichle, 2007). Consequently, our study builds on foundational work (e.g. Vanhaudenhuyse et al., 2011) in order to better understand which aspects of neural function observed at rest are mostly likely linked to cognition and behaviour. With this aim in mind, we examined links between macro-scale neural activation and both (i) trait descriptions of individuals and (ii) patterns of ongoing thought.”   

      We have refrained from listing hypothesis, as the analyses we performed were data driven rather than hypothesis driven to include all possible associations between largescale connectivity patterns and individual state and trail level differences in personality and thought/ experience. We hope that the added text provides more context to understand this rationale.  

      - Please, clarify whether "conscious thought" means "reportable. 

      Thank you for this suggestion. We have now edited the first reference to thought patterns in the discussions to read “self-reports of ongoing thought”, instead of just “ongoing thought” (Line 432)

      - Please, clarify whether "experience" and "thought" are used interchangeably. This is because experience can also be ineffable, beyond thought reporting. 

      To clarify this in the context of the current study, we have edited first reference to “ongoing experience” in the introduction to “self-reports of ongoing experience”. (Line 75)

      - To ease reading comprehension for each Figure, communicate the main findings first, before describing the figures. 

      We believe this lack of clarity is caused by including the figure reference in the heading of the results subsections. We hope this issue is fixed by editing the text in the following manner (Line 381):

      “Trait Introversion 

      Along the first gradient, a parcel within the right orbitofrontal cortex (within the executive control network, shown in orange) showed more similarity with transmodal regions for individuals high on introversion. Six parcels within the ventral attention network, including anterior insula, operculum and cingulate cortex were closer to the somatomotor end along gradient two (shown in purple). The same regions showed lower scores along the third gradient in participants with higher introversion scores, indicating stronger integration with the default mode network. A parcel within posterior cingulate cortex (control) was also more segregated from the visual end of gradient two in participants with higher introversion scores. Associations between trait “introversion” and brain-wide activity are shown in Figure 4.”

    2. eLife assessment

      These are important findings that support a link between low-dimensional brain network organisation, patterns of ongoing thought, and trait-level personality factors, making it relevant for researchers in the field of spontaneous cognition, personality, and neuropsychiatry. While this link is not entirely new, the paper brings to bear a rich dataset and a well-conducted study, to approach this question in a novel way. The evidence in support of the findings is convincing.

    3. Reviewer #1 (Public Review):

      Summary:

      The authors ran an explorative analysis in order to describe how a "tri-partite" brain network model could describe the combination between resting fMRI data and individual characteristics. They utilized previously obtained fMRI data across four scanning runs in 144 individuals. At the end of each run, participants rated their patterns of thinking on 12 statements (short multi-dimensional experience sampling-MDES) using a 0-100% visual analog scale. Also, 71 personality traits were obtained on 21 questionnaires. The authors ran two separate principal component analyses (PCAs) to obtain low dimensional summaries of the two individual characteristics (personality traits from questionnaires, and thought patterns from MDES). The dimensionality reduction of the fMRI data was done by means of gradient analysis, which was combined with Neurosynth decoding to visualize the functional axis of the gradients. To test the reliability of thought components across scanning time, intra-class correlation coefficients (ICC) were calculated for the thought patterns, and discriminability indices were calculated for whole gradients. The relationship between individual differences in traits, thoughts, and macro-scale gradients was tested with multivariate regression. The authors found: a) reliability of thought components across the one hour of scanning, b) Gradient 1 differentiated between visual regions and DMN, Gradient 2 dissociated somatomotor from visual cortices, Gradient 3 differentiated the DMN from the fronto-parietal system), c) the associations between traits/thought patterns and brain gradients revealed significant associations with "introversion" and "specific internal" thought: "Introversion" was associated with variant parcels on the three gradients, with most of parcels belonging to the VAN and then to the DMN; and "Specific internal thought" was associated with variant parcels on the three gradients with most of parcels belonging to the DAN and then the visual. The authors conclude that interactions between attention systems and the DMN are important influences on ongoing thought at rest.

      Strengths:

      The study's strength lies in its attempt to combine brain activity with individual characteristics using state-of-the-art methodologies.

      Weaknesses:<br /> The study protocol in its current form restricts replicability. This is largely due to missing information on the MRI protocol and data preprocessing. The article refers the reader to the work of Mendes et al 2019 which is said to provide this information, but the paper should rather stand alone with all this crucial material mentioned here, as well. Also, effect sizes are provided only for the multiple multivariate regression of the inter-class correlations, which makes it difficult to appreciate the power of the other obtained results.

    4. Reviewer #2 (Public Review):

      The authors set out to draw further links between neural patterns observed at "rest" during fMRI, with their related thought content and personality traits. More specifically, they approached this with a "tri-partite network" view in mind, whereby the ventral attention network (VAN), the dorsal attention network (DAN) and the default mode network (DMN) are proposed to play a special role in ongoing conscious thought. They used a gradient approach to determine the low dimensional organisation of these networks. In concert, using PCA they reduced thought patterns captured at four time points during the scan, as well as traits captured from a large battery of questionnaires.

      The main findings were that specific thought and trait components were related to variations in the organisation of the tri-partite networks, with respect to cortical gradients.

      Strengths of the methods/results: Having a long (1 hour) resting state MRI session, which could be broken down into four separate scanning/sampling components is a strength. Importantly, the authors could show (via intra-class correlation coefficients) similarity of thoughts and connectivity gradients across the entire session. Not only did this approach increase the richness of the data available to them, it speaks in an interesting way to the stability of these measures. The inclusion of both thought patterns during scanning along with trait-level dispositional factors is most certainly a strength, as many studies will often include either/or of these, rather than trying to reconcile across. Of the two main findings, the finding that detailed self-generated thought was associated with a decoupling of regions of DAN from regions in DMN was particularly compelling, in light of mounting literature from several fields that support this.

      Weaknesses of the methods/results: Considering the richness of the thought and personality data, I was a little surprised that only two main findings emerged (i.e., a relationship with trait introversion, and a relationship with the "specific internal" thought pattern). I wondered whether, at least in part and in relation to traits, this might stem from the large and varied set of questionnaires used to discern the traits. These questionnaires mostly comprised personality/mood, but some sampled things that do not fall into that category (e.g., musicality, internet addition, sleep) and some related directly to spontaneous thought properties (e.g., mind wandering, musical imagery). It would be interesting to see what relationships would emerge by being more selective in the traits measured, and in the tools to measure them.

      Taken together, the main findings are interesting enough. However, the real significance of this work and its impact, lie in the richness of the approach: combing across fMRI, spontaneous thought, and trait-level factors. Triangulating across these data has important potential for furthering our understanding of brain-behaviour relationship across different levels of organisation.

    1. eLife assessment

      This is a fundamental study examining the role of prediction error in state allocation of memories. The data provided are convincing and largely support the conclusion that a gradual change between acquisition and extinction maintains the memory state of acquisition and thus results in extinction that is resistant to restoration. This paper is of interest to behavioural and neuroscience researchers studying learning, memory, and the neural mechanisms of those processes as well as to clinicians using extinction-based therapies in treating anxiety-based disorders

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, Kennedy et al examine how new information is organized in memory. They tested an idea based on latent theory that suggests that large prediction error leads to the formation of a new memory, whereas small prediction error leads to memory updating. They directly tested the prediction by extinguishing fear conditioned rats with gradual extinction. For their experiment, gradual extinction was carried out by progressively reducing the intensity of shocks that were co-terminated with the CS, until the CS was presented alone. Doing so resulted in diminished spontaneous recovery and reinstatement compared to Standard Extinction. The results are compelling and have important implications for the field of fear learning and memory as well as translation to anxiety-related disorders.

      The authors carried out the Spontaneous Recovery experiment in 2 separate experiments. In one, they found differences between the Gradual and Standard Extinction groups, but in the second, they did not. It seems that their reinstatement test was more robust, and showed significant differences between the Gradual and Standard Extinction groups.

      The authors carried out important controls which enable proper contextualization of the findings. They included a "Home" group, in which rats received fear conditioning, but not an extinction manipulation. Relative to this group, the Gradual and Standard extinction groups showed a reduction in freezing.

      In Experiments 3 and 4, the authors essentially carried out clever controls which served to examine whether shock devaluation (Experiment 4) and reduction in shock intensity (rather than a gradual decrease in shock intensity) (Experiment 3) would also yield a decrease in the return of fear. In-line with a latent-cause updating explanation for accounting for the Gradual Extinction, they did not.

      In Experiment 5, the authors examined whether a prediction error produced by a change of context might contribute interference to the latent cause updating afforded by the Gradual Extinction. Such a prediction would align with a more flexible interpretation of a latent-cause model, such as those proposed by Redish (2007) and Gershman et al (2017), but not the latent-cause interpretation put forth by the Cochran-Cisler model (2019). Their findings showed that whereas Gradual Extinction carried out in the same context as acquisition resulted in less return of fear than Standard Extinction, it actually yielded a greater degree of return of fear when carried out in a different context, in support of the Redish and Gershman accounts, but not Cochran-Cisler.

      Experiment 6 extended the findings from Experiment 5 in a different state-splitting modality: timing. In this experiment, the authors tested whether a shift in temporal context also influenced the gradual extinction effect. They thus carried out the extinction sessions 21 days after conditioning. They found that while Gradual Extinction was indeed effective when carried out one day after fear conditioning, it did not when conducted 21 days later.

      The authors next carried out an omnibus analysis which included all the data from their 6 experiments, and found that overall, Gradual Extinction resulted in diminished return of fear relative to Standard Extinction. I thought the omnibus analysis was a great idea, and an appropriate way to do their data justice.

      Strengths: Compelling findings. The data support the conclusions. 6 rigorous experiments were conducted which included clever controls. Data include male and female rats. I really liked the omnibus analysis.

      Weaknesses: None noted

    3. Reviewer #2 (Public Review):

      Summary:

      The present article describes a series of experiments examining how a gradual reduction in unconditional stimulus intensity facilitates fear reduction and reduces relapse (spontaneous recovery and reinstatement) relative to a standard extinction procedure. The experiments provide compelling, if somewhat inconsistent, evidence of this effect and couch the results in a scholarly discussion surrounding how mechanisms of prediction error contribute to this effect.

      Strengths:

      The experiments are theoretically motivated and hypothesis-driven, well-designed, and appropriately conducted and analyzed. The results are clear and appropriately contextualized into the broader relevant literature. Further, the results are compelling and ask fundamental questions regarding how to persistently weaken fear behavior, which has both strong theoretical and real-world implications. I found the 'scrambled' experiment especially important in determining the mechanism through which this reduction in shock intensity persistently weakens fear behavior.

      Weaknesses:

      Overall, I found very few weaknesses with this paper. I think some might view the somewhat inconsistent effects on relapse between experiments to be a substantial weakness, I appreciate the authors directly confronting this and using it as an opportunity to aggregate data to look at general trends. Further, while Experiment 1 only used males, this was corrected in the rest of the experiments and therefore is not a substantial concern.

    4. Reviewer #3 (Public Review):

      Summary:

      The manuscript examined the role or large versus small prediction errors (PEs) in creating a state-based memory distinction between acquisition and extinction. The premise of the paper is based on theoretical claims and empirical findings that gradual changes between acquisition and extinction would lead to the potential overwriting of the acquisition memory with extinction, resulting in a more durable reduction in conditioned responding (i.e. more durable extinction effect). The paper tests the hypotheses in a series of elegant experiments in which the shock intensity is decreased across extinction sessions before non-reinforced CS presentations are given. Additional manipulations include context change, shock devaluation, controlling for lower shock intensity exposure. The critical comparison was standard non-reinforced extinction training. The critical tests were done in spontaneous recovery and reinstatement.

      Strengths:

      The findings are of tremendous importance in understanding how memories can be updated and reveal a well-defined role of PE in this process. It is well-established that PE is critical for learning, so delineating how PE is critical for generating memory states and the role it serves in keeping memories dissociable (or not) is exciting and clever. As such the paper addresses a fundamental question in the field.

      The studies test clear and defined predictions derived from simulations of the state-belief model of Cochran & Cisler (2019). The designs are excellent: well-controlled and address the question.

      The authors have done an excellent job at explaining the value of the latent state models.

      The authors have studied both sexes in the studied presented, providing generality across the sexes in their findings. The figures depict the individual data points for males and females allowing the reader to see the responses for each sex.

      The authors have addressed the previously raised weaknesses. They noted that procedurally it would be difficult to provide independent evidence that delivering a lower intensity shock will generate a smaller PE than say no shock. The differences in the data obtained based on error vs shock devaluation are convincing, although direct evidence for shock devaluation would have strengthened the argument.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In "Prediction error determines how memories are organized in the brain: a study of Pavlovian fear 2 extinction in rats", Kennedy et al examine how new information is organized in memory. They tested an idea based on latent theory that suggests that a large prediction error leads to the formation of a new memory, whereas a small prediction error leads to memory updating. They directly tested the prediction by extinguishing fear-conditioned rats with gradual extinction. For their experiment, gradual extinction was carried out by progressively reducing the intensity of shocks that were co-terminated with the CS, until the CS was presented alone. Doing so resulted in diminished spontaneous recovery and reinstatement compared to Standard Extinction. The results are compelling, and have important implications for the field of fear learning and memory as well as translation to anxiety-related disorders.

      The authors carried out the Spontaneous Recovery experiment in 2 separate experiments. In one, they found differences between the Gradual and Standard Extinction groups, but in the second, they did not. It seems that their reinstatement test was more robust, and showed significant differences between the Gradual and Standard Extinction groups.

      The authors carried out important controls that enable proper contextualization of the findings. They included a "Home" group, in which rats received fear conditioning, but not extinction manipulation. Relative to this group, the Gradual and Standard extinction groups showed a reduction in freezing.

      In Experiments 3 and 4, the authors essentially carried out clever controls that served to examine whether shock devaluation (Experiment 4) and reduction in shock intensity (rather than a gradual decrease in shock intensity) (Experiment 3) would also yield a decrease in the return of fear. In line with a latent-cause updating explanation for accounting for the Gradual Extinction, they did not.

      In Experiment 5, the authors examined whether a prediction error produced by a change of context might contribute interference to the latent cause updating afforded by the Gradual Extinction. Such a prediction would align with a more flexible interpretation of a latent-cause model, such as those proposed by Redish (2007) and Gershman et al (2017), but not the latent-cause interpretation put forth by the Cochran-Cisler model (2019). Their findings showed that whereas Gradual Extinction carried out in the same context as acquisition resulted in less return of fear than Standard Extinction, it actually yielded a greater degree of return of fear when carried out in a different context, in support of the Redish and Gershman accounts, but not Cochran-Cisler.

      Experiment 6 extended the findings from Experiment 5 in a different state-splitting modality: timing. In this experiment, the authors tested whether a shift in temporal context also influenced the gradual extinction effect. They thus carried out the extinction sessions 21 days after conditioning. They found that while Gradual Extinction was indeed effective when carried out one day after fear conditioning, it did not when conducted 21 days later.

      The authors next carried out an omnibus analysis which included all the data from their 6 experiments, and found that overall, Gradual Extinction resulted in diminished return of fear relative to Standard Extinction. I thought the omnibus analysis was a great idea and an appropriate way to do their data justice.

      Strengths:

      Compelling findings. The data support the conclusions. 6 rigorous experiments were conducted which included clever controls. Data include male and female rats. I really liked the omnibus analysis.

      We thank the reviewer for their positive comments – they are appreciated.

      Weaknesses:

      None noted

      Reviewer #2 (Public Review):

      Summary:

      The present article describes a series of experiments examining how a gradual reduction in unconditional stimulus intensity facilitates fear reduction and reduces relapse (spontaneous recovery and reinstatement) relative to a standard extinction procedure. The experiments provide compelling, if somewhat inconsistent, evidence of this effect and couch the results in a scholarly discussion surrounding how mechanisms of prediction error contribute to this effect.

      Strengths:

      The experiments are theoretically motivated and hypothesis-driven, well-designed, and appropriately conducted and analyzed. The results are clear and appropriately contextualized into the broader relevant literature. Further, the results are compelling and ask fundamental questions regarding how to persistently weaken fear behavior, which has both strong theoretical and real-world implications. I found the 'scrambled' experiment especially important in determining the mechanism through which this reduction in shock intensity persistently weakens fear behavior.

      We thank the reviewer for their positive comments – they are appreciated.

      Weaknesses:

      Overall, I found very few weaknesses in this paper. I think some might view the somewhat inconsistent effects on relapse between experiments to be a substantial weakness, I appreciate the authors directly confronting this and using it as an opportunity to aggregate data to look at general trends. Further, while Experiment 1 only used males, this was corrected in the rest of the experiments and therefore is not a substantial concern.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript examined the role of large versus small prediction errors (PEs) in creating a state-based memory distinction between acquisition and extinction. The premise of the paper is based on theoretical claims and empirical findings that gradual changes between acquisition and extinction would lead to the potential overwriting of the acquisition memory with extinction, resulting in a more durable reduction in conditioned responding (i.e. more durable extinction effect). The paper tests the hypotheses in a series of elegant experiments in which the shock intensity is decreased across extinction sessions before non-reinforced CS presentations are given. Additional manipulations include context change, shock devaluation, and controlling for lower shock intensity exposure. The critical comparison was standard non-reinforced extinction training. The critical tests were done in spontaneous recovery and reinstatement.

      Strengths:

      The findings are of tremendous importance in understanding how memories can be updated and reveal a well-defined role of PE in this process. It is well-established that PE is critical for learning, so delineating how PE is critical for generating memory states and the role it serves in keeping memories dissociable (or not) is exciting and clever. As such the paper addresses a fundamental question in the field.

      The studies test clear and defined predictions derived from simulations of the state-belief model of Cochran & Cisler (2019). The designs are excellent: well-controlled and address the question.

      The authors have done an excellent job of explaining the value of the latent state models.

      The authors have studied both sexes in the study presented, providing generality across the sexes in their findings. However, depicting the individual data points in the bar graphs and noting which data represent males and which represent females would be of great value.

      We thank the reviewer for their positive comments. We have included individual data points in the bar graphs and indicated which represent males and females.

      Weaknesses:

      (1) While it seems obvious that delivering a lower intensity shock will generate a smaller PE than say no shock, it would have been nice to see data from say a compound testing procedure that confirms this.

      It would be great if we could provide independent evidence that shifting from a 0.8 mA shock to a 0.4 mA shock (first session of gradual extinction) produces a smaller prediction error than shifting from a 0.8 mA shock to no shock at all (first session of standard extinction). In theory, this could be assessed using Rescorla’s (2000) compound test procedure. However, application of this procedure requires the use of a within-subject design and latent state theories would not predict the gradual extinction effect in such a design (as all prediction errors generated in such a design would affect the state-splitting process). That is, the between-subject design used to generate the gradual extinction effect is not amenable to application of the compound test procedure; and the within-subject design in which the compound test procedure could be applied is unlikely to generate the gradual extinction effect. Thus, we instead rely on the high degree of similarity between our results and those predicted by Cochran & Cisler (2019) to argue that the gradual extinction protocol produces a series of smaller prediction errors than does the standard extinction protocol: hence the present pattern of results.

      (2) The devaluation experiment is quite clever, but it also would be strengthened if there was evidence in the paper that this procedure does indeed lead to shock devaluation.

      The aim of Experiment 3 was to determine whether the gradual extinction effect is due to prediction error-based memory updating or shock devaluation. If the effect was due to shock devaluation, the group that received the gradual extinction treatment should have displayed the same low level of spontaneous recovery as the group that only experienced the shock at its lowest (0.1 mA) intensity (i.e., the shock devaluation group). Contrary to this prediction, the results showed that the gradually extinguished group displayed less spontaneous recovery than the shock devaluation group. That is, in this experiment, the slow and progressive reduction in shock intensity was processed differently to the repeated 0.1 mA shock exposures but the results were inconsistent with any shock devaluation effect. Hence, we conclude that the gradual extinction effect does not involve shock devaluation but instead is due to prediction error-based memory updating.

      (3) It would have been very exciting to see even more parametric examinations of this idea, like maintaining shock intensity but gradually reducing shock duration, which would have increased the impact of the paper.

      We appreciate the reviewer’s point. As each shock was presented for just 0.5 s, we are not confident that rats would detect gradual and progressive changes in its duration in the same way as they can obviously detect gradual and progressive changes in its intensity. We are, however, investigating the effects of gradual extinction in a second order conditioning protocol, which will allow us to examine the full range of parameters that are important for its regulation, including manipulations of stimulus duration. In our second-order conditioning protocol, rats are first exposed to pairings of a 10 s S1 and a 0.5 s foot shock US; and then exposed to pairings of a 30 s S2 and the 10 s S1. Across the latter pairings, rats acquire second-order conditioned fear responses to S2. Importantly, these responses can be extinguished through repeated presentations of the S2 in the absence of its S1-associate; and the duration of the S1 can be progressively and gradually reduced from 10 s to 0 s across the shift to this extinction. These experiments are currently in progress and will eventually represent an extension of the present findings.

      (4) Individual data points should be represented in the test figures (see above also).

      We have updated the figures to show these data points.

      Rescorla, R. A. (2000). Associative changes in excitors and inhibitors differ when they are conditioned in compound. Journal of Experimental Psychology: Animal Behavior Processes26(4), 428.

      Reviewing Editor (Recommendations For The Authors):

      The eLife assessment relates to the present form of the paper. However, following a discussion with the reviewers, the significance of the findings could be bolstered to fundamental if you decided to revise the current manuscript by scaling up the investigation to examine a wider set of parameters and conditions under which error can influence state allocation of memories. One way of doing this, but not limited to this, is suggested in the reviews (e.g. maintaining shock intensity, reducing its duration). Relatedly, a more extensive discussion of the Gershamn et al. (2013) paper would be relevant.

      As noted in our response to Reviewer 3, we are currently investigating the effects of gradual extinction in a second order conditioning protocol, which will allow us to examine the full range of parameters that are important for its regulation, including manipulations of stimulus duration. These experiments are in-progress and will eventually represent an extension of the present findings. They are not, however, ready to be included as part of the present study.

      We have further referenced the Gershman et al., (2013) paper as well as the related Bouton et al., (2004) paper on the effects of gradually reducing the frequency of the US across extinction. This appears in the fifth paragraph of the Discussion: “The present study adds to a growing body of evidence that manipulations applied across the shift from CS-US pairings to presentations of the CS alone can influence the effectiveness of extinction. For example, Gershman et al., (2013) and Bouton et al., (2004) showed that gradually reducing the proportion of reinforced CS presentations results in less spontaneous recovery and slower reacquisition, respectively; though both studies left open fundamental questions about the basis of their findings (see also Woods & Bouton, 2007).”

      Reviewer #1 (Recommendations For The Authors):

      I don't have any strong recommendations. I think the paper is really great as is.

      One minor suggestion to consider:

      The authors carried out the Spontaneous Recovery experiment in 2 separate experiments. In one, they found differences between the Gradual and Standard Extinction groups, but in the second, they did not. This is perhaps not entirely surprising, since their extinction test was conducted 2 weeks post-extinction, and not all rats show spontaneous recovery within that timeframe. The authors mention that the lack of SR might be due to the low level of freezing reported in their test, but since they are showing group mean data, they might consider showing the individual data points to showcase the range of SR freezing as an additional way to make sense of the variability (ie, maybe a few rats that showed very low freezing carried the mean down in the Standard Extinction group, while others showed return of fear).

      We agree and have included individual data points for test results in Figures 2D, 2F, 3D, 3H, 4D and 4H. Hence, these figures now reflect both group and individual freezing levels.

      Reviewer #2 (Recommendations For The Authors):

      Overall, I thought this was an exceptional paper. Aside from the comments listed above which I'm not sure are inherently addressable, the only real changes I would like to see are that individual data points should be depicted in the main testing figures, as is becoming more conventional in the field.

      We thank the reviewer for their positive comments. As indicated in our response to the other reviewers, we have added individual data points to the histograms showing test results.

      Reviewer #3 (Recommendations For The Authors):

      Figures

      (1) The test data are presented as bars, but I did wonder if there were differences between the groups from the start of testing or if those emerged across testing (SR vs extinction savings).

      We have added two new figures to the supplementary section, Figures 8 and 9. These display the trial-by-trial data from spontaneous recovery and reinstatements tests in each experiment. The data clearly show that the between-group differences in freezing were very stable across the test sessions.

      (2) While I understand the importance of presenting the last extinction session, I felt depicting the entire CS session would be more informative. Alternatively, removing this altogether and leaving the information from the extinction session in the supplemental would focus the reader on the key test data.

      We appreciate the reviewer’s point. It is important to show that the groups displayed equivalent freezing in the final extinction session prior to testing. Given that the test data are conveniently and best presented in a histogram, we have chosen to present the data from the final extinction session in the same way. The full, trial-by-trial trajectory of freezing across conditioning and extinction, as well as the analyses of these data, are presented in the supplementary A.

      (3) I did not find the figures to be very aesthetically pleasing (in part because some panels were unnecessarily large). For example, I found it rather odd that the simulation panels were split in Figure 1. One suggestion of how this figure could look better is to keep the size of panels B, C, and D the same and align them on the same row with the design figure above them. The other option is to have the design figure above the test figure and the two simulation figures above each other and next to the design and test. Also, there are grey lines that appear around the simulation figures on my PDF.

      We have updated the figures so that they are consistent across experiments and more aesthetically pleasing. Specifically, we have consistently: 1) inserted the simulations of Cochran & Cisler (2019) next to the design schematic; 2) inserted the extinction and test data beneath the design schematic; and 3) Made the sizing of figures more uniform across Experiments 1-6.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      This study presents valuable findings as it shows that sleep rhythm formation and memory capabilities depend on a balanced and rich diet in fly larvae. The evidence supporting the claims of the authors is convincing with rigorous behavioral assays and state-of-the-art genetic manipulations. The work will be of interest to researchers working on sleep and memory. 

      Public Reviews: 

      Summary: 

      This manuscript investigates how energetic demands affect the sleep-wake cycle in Drosophila larvae. L2 stage larvae do not show sleep rhythm and long-term memory (LTM), however, L3 larvae do. The authors manipulate food content to provide insufficient nutrition, which leads to more feeding, no LTM, and no sleep even in older larvae. Similarly, activation of NPF neurons suppresses sleep rhythm. Furthermore, they try to induce a sleep-like state using pharmacology or genetic manipulations in L2 larvae, which can mimic some of the L3 behaviours. A key experimental finding is that activation of DN1a neurons activate the downstream DH44 neurons, as assayed by GCaMP calcium imaging. This occurs only in third instar and not in second instar, in keeping with the development of sleep-wake and feeding separation. The authors also show that glucose metabolic genes are required in Dh44 neurons to develop sleep rhythm and that DH44 neurons respond differently in malnutrition or younger larvae. 

      Strengths: 

      Previous studies from the same lab have shown the sleep is required for LTM formation in the larvae, and that this requires DN1a and DH44 neurons. The current work builds upon this observation and addresses in more detail when and how this might develop. The authors can show that low quality food exposure and enhanced feeding during larval stage of Drosophila affects the formation of sleep rhythm and long-term memory. This suggests that the development of sleep and LTM are only possible under well fed and balanced nutrition in fly larvae. Non-sleep larvae were fed in low sugar conditions and indeed, the authors also find glucose metabolic genes to be required for a proper sleep rhythm. The paper presents precise genetic manipulations of individual classes of neurons in fly larvae followed by careful behavioural analysis. The authors also combine thermogenetic or peptide bath application experiments with direct calcium imaging of specific neurons. 

      Weaknesses: 

      The authors tried to induce sleep in younger L2 larvae, however the behavioral results suggest that they were not able to induce proper sleep behaviour as in normal L3 larvae. Thus, they cannot show that sleep during L2 stage would be sufficient to form LTM. 

      We agree that the experiments with Gaboxadol feeding in L2 did not perfectly mimic L3 sleep behaviors. However, genetic induction of sleep in L2 was effective in increasing sleep duration and depth similar to that observed in normal L3. As noted below in response to specific reviewer comments, because gaboxadol feeding is standard in the field for adult sleep induction, we prefer to still include this data in the manuscript for transparency. Moreover, the gaboxadol manipulation did cause a significant decrease in arousal threshold compared to control larvae. Together these approaches support the hypothesis that sleeping more/more deeply is not sufficient to promote LTM in L2.

      The authors suggest that larval Dh44 neurons may integrate "information about the nutritional environment through the direct sensing of glucose levels to modulate sleep-wake rhythm development". They identify glucose metabolism genes (e.g., Glut1) in the downstream DH44 neurons as being required for the organization of the sleep-wake-feeding rhythm, and that CCHa signaling in DN1a signaling to the DH44 cells via the receptor. However, how this is connected is not well explained. Do the authors think that the nutrient sensing is only occurring in the DH44 neurons and not in DN1a or other neurons? Would not knocking down glucose metabolism in any neuron lead to a functional defect? What is the evidence that Dh44 neurons are specific sensors of nutritional state? For example, do the authors think that e.g. the overexpression of Glut1 in Dh44 neurons, a manipulation that can increase transport of glucose into cells, would rescue the effects of low-sugar food? 

      We thank the reviewer for these suggestions and have added the experiment proposed. We found that knockdown of Hex-C in DN1a neurons did not disrupt sleep-wake rhythms (Fig. S4G-I) suggesting that Dh44 neurons are specialized in requiring glucose metabolism to drive sleep-wake rhythms. We have also added further clarification in the text regarding the existing evidence that Dh44 neurons act has nutrient sensors.

      Some of the genetic controls seem to be inconsistent suggesting some genetic background effects. In Figure 2B, npf-gal4 flies without the UAS show no significant circadian change in sleep duration, whereas UAS-TrpA flies do. The genetic control data in Figure 2D are also inconsistent. Npf-Gal4 seems to have some effect by itself without the UAS. The same is not seen with R76G11-Gal4. Suppl Fig 2: Naïve OCT and AM preference in L3 expressing various combinations of the transgenes show significant differences. npf-Gal4 alone seems to influence preference. 

      The sleep duration and bout number/length data are highly variable. 

      All experiments are performed in isogenized background so variability seen in genetic controls likely reflects stochastic nature of behavioral experiments. Indeed, adult sleep data also shows a great deal of variability within the same genetic background (PMID: 29228366). We agree it is an important point, and we attempt to minimize variability as much as possible with backcrossing of flies and tight control of environmental conditions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Low sugar exposure and activation of NPF neurons might not induce the same behavioral changes. LS exposure does not enhance mouth hook movements, but overall food intake. NPF activation seems to enhance mouth hook movements, but the data for food intake is not shown. This information would be necessary to compare the two different manipulations. 

      We thank the reviewer for this suggestion. However, we elected not to perform food intake experiments with the NPF activation experiments. Since we are not directly comparing the low sugar and NPF manipulations to each other, we think that both experiments together support the conclusion that immature food acquisition strategies (whether food intake or feeding rate) limit LTM performance. 

      The authors write that the larval feeding assays run for 4 hours, can they explain why that long? Larvae should already have processed food within 4 hours, so that the measurement would not include all eaten food.

      We clarified the rationale for doing 4 hour feeding assays in the results section. We did 4 hours on blue dyed food because initial experiments of 1 hour with control L3 at CT1-4 were difficult to interpret. The measurement does not include all of the eaten food in the 4 hours but does reflect more long-term changes in food intake.

      Sleep induction with Gaboxadol seems to not really work - sleep duration, bout number and length are not enhanced, and arousal threshold is only slightly lower. Thus, the authors should not use this data as an example for inducing sleep behaviour. 

      We agree this approach did not have a large effect in larvae. However, because gaboxadol feeding is standard in the field for adult sleep induction, we prefer to still include this data in the manuscript for transparency. Moreover, the Gaboxadol manipulation did cause a mild (but significant) decrease in arousal threshold compared to control larvae. Gaboxadol feeding also caused a significant decrease in total body weight compared to control larvae indicating that even slightly deeper sleep could be detrimental to younger animals.

      Activation of R76G11 with TrpA1 seems to work better for inducing sleep like behaviour. However, the authors describe that they permanently activated neurons. To induce a "normal" sleep pattern, the authors might try to only activate these neurons during the normal enhanced sleep time in L3 (CT13?) and not during the whole day. This might also allow larvae to eat during day time and gain more weight. 

      We apologize that this point was not clearer, but we did do acute activation of R76G11(+) neurons, as proposed by the reviewer. We have clarified the text to make this point.

      It would be interesting to see how larvae fed with high sucrose and low protein diet would behave in this assay. Do the authors suggest that sugar is most important for the development of sleep behaviour or that it is a combination of sugar and protein that might be required? 

      We agree that feeding larvae a high sucrose and low protein diet would be interesting. However, we initially tried a low protein diet and observed significant developmental delays. Therefore, we are concerned that developmental defects on a high sucrose and low protein diet would confound behavioral results. Additionally, the Dh44 manipulations (glucose & GCN2 signaling) suggest that sugar is the most important for the development of sleep behaviors.

      Reviewer #3 (Recommendations For The Authors): 

      The authors could discuss if the interaction between DN1a clock neurons and Dh44 neurons is mediated synaptic or by volume transmission following the extracellular release of the CCHa1 neuropeptide. They write that "the development of Dh44 neuronal competency to receive clock-driven cues" and that "DN1a clock neurons anatomically and functionally connect to Dh44" but a discussion about volume vs. synaptic signalling would be of interest. 

      We thank the reviewer for this suggestion. We revised the discussion to address this point.

      line 223 " demonstrating that post-synaptic processes likely". It would be interesting to read a discussion on whether it is known if these are postsynaptic or peptide-mediated volume effects? 

      We added additional text to the discussion to address these points.

      - The authors may want to include a schematic of the circuit and how its position in the general anatomy of the fly larva. 

      We thank the reviewer for this suggestion. We have added a model figure to Fig. S6.

      "Dh44 neurons act through glucose metabolic genes" - consider rewording e.g. require glucose metabolic genes 

      We revised the text.

      - line 45 "Early in development, young animals must obtain enough nutrients to ensure proper growth" - this is too general, many animals do not feed in early life-cycle stages (e.g. lecitotrophic development), consider rewording 

      We revised the text to be more specific.

      - line 90 "however, L3 at CT1 consume more than L3 at CT12 (Figure S1A)" - typo CT13, also consider rewording to match the structure of the sentence before 'however, L3 consumed more at CT1 than at CT13' 

      We revised the text to fix this error.

      - Line 111 "and loss of deep sleep" - how is deep sleep defined and measured in the larvae? It is not clear from the data or the text. 

      We revised the text to define deep sleep in the results section. We also have a description of how arousal threshold is calculated in the methods.

      - In Figure 3B and G the individual data points are not shown 

      We did not show individual data points for those graphs because we are plotting the average percentage of 4 biological replicates.

      Typo: 

      Figure 1 legend "F, n= n=100-172 " 

      We revised the text to fix this typo.

    2. eLife assessment

      This study presents valuable findings as it shows that sleep rhythm formation and memory capabilities depend on a balanced and rich diet in fly larvae. The evidence supporting the claims of the authors is convincing with rigorous behavioral assays and state-of-the-art genetic manipulations. The work will be of interest to researchers working on sleep and memory.

    3. Joint Public Review:

      Summary:

      This manuscript investigates how energetic demands affect the sleep-wake cycle in Drosophila larvae. L2 stage larvae do not show sleep rhythm and long-term memory (LTM), however, L3 larvae do. The authors manipulate food content to provide insufficient nutrition, which leads to more feeding, no LTM, and no sleep even in older larvae. Similarly, activation of NPF neurons suppresses sleep rhythm. Furthermore, they try to induce a sleep-like state using pharmacology or genetic manipulations in L2 larvae, which can mimic some of the L3 behaviours. A key experimental finding is that activation of DN1a neurons activates the downstream DH44 neurons, as assayed by GCaMP calcium imaging. This occurs only in the third instar and not in the second instar, in keeping with the development of sleep-wake and feeding separation. The authors also show that glucose metabolic genes are required in Dh44 neurons to develop sleep rhythm and that DH44 neurons respond differently in malnutrition or younger larvae.

      Strengths:

      Previous studies from the same lab have shown that sleep is required for LTM formation in the larvae, and that this requires DN1a and DH44 neurons. The current work builds upon this observation and addresses in more detail when and how this might develop. The authors can show that low quality food exposure and enhanced feeding during larval stage of Drosophila affects the formation of sleep rhythm and long-term memory. This suggests that the development of sleep and LTM are only possible under well fed and balanced nutrition in fly larvae. Non-sleep larvae were fed in low sugar conditions and indeed, the authors also find glucose metabolic genes to be required for a proper sleep rhythm. The paper presents precise genetic manipulations of individual classes of neurons in fly larvae followed by careful behavioural analysis. The authors also combine thermogenetic or peptide bath application experiments with direct calcium imaging of specific neurons.

      Weaknesses:

      The authors tried to induce sleep in younger L2 larvae with Gaboxadol feeding, however, the behavioral results suggest that they were not able to induce proper sleep behaviour as in normal L3 larvae.

      Some of the genetic controls seem to be inconsistent. Given that the experiments were carried out in isogenized background, this is likely due to the high variability of some of the behaviours.

    1. eLife assessment

      This important study provides new insights into the maturation of ribbon synapses in zebrafish neuromast hair cells. Convincing evidence, based on live-cell imaging and pharmacological and genetic manipulations, is provided to show that the formation of this synaptic organelle is a dynamic process involving the fusion of presynaptic elements and microtubule transport. These findings will be of interest to neuroscientists studying synapse formation and function and should inspire further research into the molecular basis for synaptic ribbon maturation.

    2. Reviewer #1 (Public Review):

      Summary:

      The manuscript by Hussain and collaborators aims at deciphering the microtubule-dependent ribbon formation in zebrafish hair cells. By using confocal imaging, pharmacology tools, and zebrafish mutants, the group of Katie Kindt convincingly demonstrated that ribbon, the organelle that concentrates glutamate-filled vesicles at the hair cell synapse, originates from the fusion of precursors that move along the microtubule network. This study goes hand in hand with a complementary paper (Voorn et al.) showing similar results in mouse hair cells.

      Strengths:

      This study clearly tracked the dynamics of the microtubules, and those of the microtubule-associated ribbons and demonstrated fusion ribbon events. In addition, the authors have identified the critical role of kinesin Kif1aa in the fusion events. The results are compelling and the images and movies are magnificent.

      Weaknesses:

      The lack of functional data regarding the role of Kif1aa. Although it is difficult to probe and interpret the behavior of zebrafish after nocodazole treatment, I wonder whether deletion of kif1aa in hair cells may result in a functional deficit that could be easily tested in zebrafish?

      Impact:

      The synaptogenesis in the auditory sensory cell remains still elusive. Here, this study indicates that the formation of the synaptic organelle is a dynamic process involving the fusion of presynaptic elements. This study will undoubtedly boost a new line of research aimed at identifying the specific molecular determinants that target ribbon precursors to the synapse and govern the fusion process.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors set out to resolve a long-standing mystery in the field of sensory biology - how large, presynaptic bodies called "ribbon synapses" migrate to the basolateral end of hair cells. The ribbon synapse is found in sensory hair cells and photoreceptors, and is a critical structural feature of a readily-releasable pool of glutamate that excites postsynaptic afferent neurons. For decades, we have known these structures exist, but the mechanisms that control how ribbon synapses coalesce at the bottom of hair cells are not well understood. The authors addressed this question by leveraging the highly-tractable zebrafish lateral line neuromast, which exhibits a small number of visible hair cells, easily observed in time-lapse imaging. The approach combined genetics, pharmacological manipulations, high-resolution imaging, and careful quantifications. The manuscript commences with a developmental time course of ribbon synapse development, characterizing both immature and mature ribbon bodies (defined by position in the hair cell, apical vs. basal). Next, the authors show convincing (and frankly mesmerizing) imaging data of plus end-directed microtubule trafficking toward the basal end of the hair cells, and data highlighting the directed motion of ribbon bodies. The authors then use a series of pharmacological and genetic manipulations showing the role of microtubule stability and one particular kinesin (Kif1aa) in the transport and fusion of ribbon bodies, which is presumably a prerequisite for hair cell synaptic transmission. The data suggest that microtubules and their stability are necessary for normal numbers of mature ribbons and that Kif1aa is likely required for fusion events associated with ribbon maturation. Overall, the data provide a new and interesting story on ribbon synapse dynamics.

      Strengths:

      (1) The manuscript offers a comprehensive Introduction and Discussion sections that will inform generalists and specialists.

      (2) The use of Airyscan imaging in living samples to view and measure microtubule and ribbon dynamics in vivo represents a strength. With rigorous quantification and thoughtful analyses, the authors generate datasets often only obtained in cultured cells or more diminutive animal models (e.g., C. elegans).

      (3) The number of biological replicates and the statistical analyses are strong. The combination of pharmacology and genetic manipulations also represents strong rigor.

      (4) One of the most important strengths is that the manuscript and data spur on other questions - namely, do (or how do) ribbon bodies attach to Kinesin proteins? Also, and as noted in the Discussion, do hair cell activity and subsequent intracellular calcium rises facilitate ribbon transport/fusion?

      Weaknesses:

      (1) Neither the data or the Discussion address a direct or indirect link between Kinesins and ribbon bodies. Showing Kif1aa protein in proximity to the ribbon bodies would add strength.

      (2) Neither the data or Discussion address the functional consequences of loss of Kif1aa or ribbon transport. Presumably, both manipulations would reduce afferent excitation.

      (3) It is unknown whether the drug treatments or genetic manipulations are specific to hair cells, so we can't know for certain whether any phenotypic defects are secondary.

    4. Reviewer #3 (Public Review):

      Summary:

      The manuscript uses live imaging to study the role of microtubules in the movement of ribeye aggregates in neuromast hair cells in zebrafish. The main findings are that<br /> (1) Ribeye aggregates, assumed to be ribbon precursors, move in a directed motion toward the active zone;<br /> (2) Disruption of microtubules and kif1aa increases the number of ribeye aggregates and decreases the number of mature synapses.

      The evidence for point 2 is compelling, while the evidence for point 1 is less convincing. In particular, the directed motion conclusion is dependent upon fitting of mean squared displacement that can be prone to error and variance to do stochasticity, which is not accounted for in the analysis. Only a small subset of the aggregates meet this criteria and one wonders whether the focus on this subset misses the bigger picture of what is happening with the majority of spots.

      Strengths:

      (1) The effects of Kif1aa removal and nocodozole on ribbon precursor number and size are convincing and novel.

      (2) The live imaging of Ribeye aggregate dynamics provides interesting insight into ribbon formation. The movies showing the fusion of ribeye spots are convincing and the demonstrated effects of nocodozole and kif1aa removal on the frequency of these events is novel.

      (3) The effect of nocodozole and kif1aa removal on precursor fusion is novel and interesting.

      (4) The quality of the data is extremely high and the results are interesting.

      Weaknesses:

      (1) To image ribeye aggregates, the investigators overexpressed Ribeye-a TAGRFP under the control of a MyoVI promoter. While it is understandable why they chose to do the experiments this way, expression is not under the same transcriptional regulation as the native protein, and some caution is warranted in drawing some conclusions. For example, the reduction in the number of puncta with maturity may partially reflect the regulation of the MyoVI promoter with hair cell maturity. Similarly, it is unknown whether overexpression has the potential to saturate binding sites (for example motors), which could influence mobility.

      (2) The examples of punctae colocalizing with microtubules look clear (Figures 1 F-G), but the presentation is anecdotal. It would be better and more informative, if quantified.

      (3) It appears that any directed transport may be rare. Simply having an alpha >1 is not sufficient to declare movement to be directed (motor-driven transport typically has an alpha approaching 2). Due to the randomness of a random walk and errors in fits in imperfect data will yield some spread in movement driven by Brownian motion. Many of the tracks in Figure 3H look as though they might be reasonably fit by a straight line (i.e. alpha = 1).

      (4) The "directed motion" shown here does not really resemble motor-driven transport observed in other systems (axonal transport, for example) even in the subset that has been picked out as examples here. While the role of microtubules and kif1aa in synapse maturation is strong, it seems likely that this role may be something non-canonical (which would be interesting).

      (5) The effect of acute treatment with nocodozole on microtubules in movie 7 and Figure 6 is not obvious to me and it is clear that whatever effect it has on microtubules is incomplete.

    5. Author response:

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The manuscript by Hussain and collaborators aims at deciphering the microtubule-dependent ribbon formation in zebrafish hair cells. By using confocal imaging, pharmacology tools, and zebrafish mutants, the group of Katie Kindt convincingly demonstrated that ribbon, the organelle that concentrates glutamate-filled vesicles at the hair cell synapse, originates from the fusion of precursors that move along the microtubule network. This study goes hand in hand with a complementary paper (Voorn et al.) showing similar results in mouse hair cells. 

      Strengths: 

      This study clearly tracked the dynamics of the microtubules, and those of the microtubule-associated ribbons and demonstrated fusion ribbon events. In addition, the authors have identified the critical role of kinesin Kif1aa in the fusion events. The results are compelling and the images and movies are magnificent. 

      Weaknesses: 

      The lack of functional data regarding the role of Kif1aa. Although it is difficult to probe and interpret the behavior of zebrafish after nocodazole treatment, I wonder whether deletion of kif1aa in hair cells may result in a functional deficit that could be easily tested in zebrafish? 

      We have examined functional deficits in kif1aa mutants in another paper David et al. 2024. In Submission, preprint available:  

      https://www.biorxiv.org/content/10.1101/2024.05.20.595037v1

      In addition to playing a role in ribbon fusions, Kif1aa is also responsible for enriching glutamate-filled secretory vesicles at the presynaptic active zone. In kif1aa mutants (and crispants), vesicles are no longer localized to the hair cell base, and there is a reduction in the number of vesicles associated with presynaptic ribbons. Kif1aa mutants also have functional defects including reductions in spontaneous vesicle release and evoked postsynaptic calcium responses. Behaviorally, kif1aa mutants exhibit impaired rheotaxis, indicating defects in the lateral-line system and an inability to accurately detect water flow.  Since our paper focuses on microtubule-associated ribbon movement and dynamics early in hair cell development, we have only discussed the effects of Kif1aa directly related to ribbon dynamics during this time window in this paper. In our revision, we will reference this recently submitted work.

      Impact: 

      The synaptogenesis in the auditory sensory cell remains still elusive. Here, this study indicates that the formation of the synaptic organelle is a dynamic process involving the fusion of presynaptic elements. This study will undoubtedly boost a new line of research aimed at identifying the specific molecular determinants that target ribbon precursors to the synapse and govern the fusion process. 

      Reviewer #2 (Public Review): 

      Summary:

      In this manuscript, the authors set out to resolve a long-standing mystery in the field of sensory biology - how large, presynaptic bodies called "ribbon synapses" migrate to the basolateral end of hair cells. The ribbon synapse is found in sensory hair cells and photoreceptors, and is a critical structural feature of a readily-releasable pool of glutamate that excites postsynaptic afferent neurons. For decades, we have known these structures exist, but the mechanisms that control how ribbon synapses coalesce at the bottom of hair cells are not well understood. The authors addressed this question by leveraging the highly-tractable zebrafish lateral line neuromast, which exhibits a small number of visible hair cells, easily observed in time-lapse imaging. The approach combined genetics, pharmacological manipulations, high-resolution imaging, and careful quantifications. The manuscript commences with a developmental time course of ribbon synapse development, characterizing both immature and mature ribbon bodies (defined by position in the hair cell, apical vs. basal). Next, the authors show convincing (and frankly mesmerizing) imaging data of plus end-directed microtubule trafficking toward the basal end of the hair cells, and data highlighting the directed motion of ribbon bodies. The authors then use a series of pharmacological and genetic manipulations showing the role of microtubule stability and one particular kinesin (Kif1aa) in the transport and fusion of ribbon bodies, which is presumably a prerequisite for hair cell synaptic transmission. The data suggest that microtubules and their stability are necessary for normal numbers of mature ribbons and that Kif1aa is likely required for fusion events associated with ribbon maturation. Overall, the data provide a new and interesting story on ribbon synapse dynamics. 

      Strengths: 

      (1) The manuscript offers a comprehensive Introduction and Discussion sections that will inform generalists and specialists. 

      (2) The use of Airyscan imaging in living samples to view and measure microtubule and ribbon dynamics in vivo represents a strength. With rigorous quantification and thoughtful analyses, the authors generate datasets often only obtained in cultured cells or more diminutive animal models (e.g., C. elegans). 

      (3) The number of biological replicates and the statistical analyses are strong. The combination of pharmacology and genetic manipulations also represents strong rigor. 

      (4) One of the most important strengths is that the manuscript and data spur on other questions - namely, do (or how do) ribbon bodies attach to Kinesin proteins? Also, and as noted in the Discussion, do hair cell activity and subsequent intracellular calcium rises facilitate ribbon transport/fusion? 

      These are important strengths and we do plan to investigate adaptors and how hair cell activity impacts ribbon fusion and transport in the future!

      Weaknesses: 

      (1) Neither the data or the Discussion address a direct or indirect link between Kinesins and ribbon bodies. Showing Kif1aa protein in proximity to the ribbon bodies would add strength.

      This is a great point, and we are working to create a transgenic line with fluorescently labelled Kif1aa to directly visualize its association with ribbons. At present, we have not obtained a transgenic line, and localization of Kif1aa and ribbons in live hair cells it is beyond the scope of this paper. In our revision we will discuss this caveat.

      (2) Neither the data or Discussion address the functional consequences of loss of Kif1aa or ribbon transport. Presumably, both manipulations would reduce afferent excitation.

      Excellent point. Please see the response above to Reviewer #1 weaknesses.  

      (3) It is unknown whether the drug treatments or genetic manipulations are specific to hair cells, so we can't know for certain whether any phenotypic defects are secondary. 

      This is correct and is a caveat of our Kif1aa and drug experiments. However, to mitigate this in the pharmacological experiments, we have done the drug treatments at 3 different timescales: long-term (overnight), short-term (4 hr) and fast (30 min) treatments. The faster experiment done after 30 min drug treatment is where we observe reduced directional motion and fusions. This later experiment should not be affected by any long-term changes or developmental defects that could be caused by the drugs as hair cell development occurs over 8-12 hrs. However, we acknowledge that these treatments and genetic experiments could have secondary phenotypic defects that are not hair-cell specific. In our revision, we will discuss these issues.

      Reviewer #3 (Public Review): 

      Summary: 

      The manuscript uses live imaging to study the role of microtubules in the movement of ribeye aggregates in neuromast hair cells in zebrafish. The main findings are that 

      (1) Ribeye aggregates, assumed to be ribbon precursors, move in a directed motion toward the active zone; 

      (2) Disruption of microtubules and kif1aa increases the number of ribeye aggregates and decreases the number of mature synapses. 

      The evidence for point 2 is compelling, while the evidence for point 1 is less convincing. In particular, the directed motion conclusion is dependent upon fitting of mean squared displacement that can be prone to error and variance to do stochasticity, which is not accounted for in the analysis. Only a small subset of the aggregates meet this criteria and one wonders whether the focus on this subset misses the bigger picture of what is happening with the majority of spots. 

      Strengths: 

      (1) The effects of Kif1aa removal and nocodozole on ribbon precursor number and size are convincing and novel. 

      (2) The live imaging of Ribeye aggregate dynamics provides interesting insight into ribbon formation. The movies showing the fusion of ribeye spots are convincing and the demonstrated effects of nocodozole and kif1aa removal on the frequency of these events is novel. 

      (3) The effect of nocodozole and kif1aa removal on precursor fusion is novel and interesting. 

      (4) The quality of the data is extremely high and the results are interesting. 

      Weaknesses: 

      (1) To image ribeye aggregates, the investigators overexpressed Ribeye-a TAGRFP under the control of a MyoVI promoter. While it is understandable why they chose to do the experiments this way, expression is not under the same transcriptional regulation as the native protein, and some caution is warranted in drawing some conclusions. For example, the reduction in the number of puncta with maturity may partially reflect the regulation of the MyoVI promoter with hair cell maturity. Similarly, it is unknown whether overexpression has the potential to saturate binding sites (for example motors), which could influence mobility. 

      We agree that overexpression in transgenic lines is a common issue and would have loved to do these experiments with endogenously expressed fluorescent proteins under a native promoter. However, this was not technically possible for us. We originally characterized several transgenic Ribeye lines in the past to ensure they have normal ribbon numbers and size (myo6b:ribb-mcherry, myo6b:riba-tagRFP and myo6b:riba-GFP) - in 2014. Unfortunately, we no longer have the raw data from this analysis. In our revision, we will repeat our immunolabel on myo6b:riba-tagRFP transgenic fish and examine ribbon numbers and size and show what impact (or not) exogenous Ribeye expression has on ribbon formation.

      (2) The examples of punctae colocalizing with microtubules look clear (Figures 1 F-G), but the presentation is anecdotal. It would be better and more informative, if quantified. 

      We attempted a co-localization study between microtubules and ribbons but decided not to move forward with it due to several issues:

      (1)  Hair cells have an extremely crowded environment, especially since the nucleus occupies the majority of the cell. All proteins are pushed together in the small space surrounding the nucleus and hence co-localization is not meaningful because the distances are so small.

      (2) We also attempted to segment microtubules in these images and quantify how many ribbons were associated with microtubules, but 3D microtubule segmentation was not accurate in these hair cells due to highly varying filament intensities, and diffuse cytoplasmic tubulin signal.

      Therefore, we decided that a better measure of ribbon-microtubule association would be a demonstration that individual ribbons keep their association with microtubules over time (in our time lapses), rather than a co-localization study. We see that ribbons localize to microtubules in all our timelapses, including the examples shown. We observed that if a ribbon dissociates, it is just to switch from one filament to another. We have not observed free-floating ribbons in our study.

      (3) It appears that any directed transport may be rare. Simply having an alpha >1 is not sufficient to declare movement to be directed (motor-driven transport typically has an alpha approaching 2). Due to the randomness of a random walk and errors in fits in imperfect data will yield some spread in movement driven by Brownian motion. Many of the tracks in Figure 3H look as though they might be reasonably fit by a straight line (i.e. alpha = 1). 

      As we have stated in the paper, we only see a small subset of the ribbon precursors moving directionally. The majority of the ribbons are stationary. We cannot say for sure what is happening with the stationary ribbons, but our hypothesis is that these ribbons eventually exhibit directed motion. This idea is supported by the fact that we have seen ribbons that are stationary begin movement, and ribbons that are moving come to a stop during the acquisition of our timelapses. The ribbons that are stationary may not have enough motors attached, or they may be in a sort of ‘seeding’ phase where the ribeye protein could be condensing on the ribbon. We have discussed the possibility of ribbons being biomolecular condensates in our Discussion.

      In our revision we will discuss why ribbon transport does not resemble typical motor-driven transport (also see response to point 4 below). We will also reexamine our MSD data in more detail as suggested by Reviewer 3 and provide distributions of alpha values in our revision.

      (4) The "directed motion" shown here does not really resemble motor-driven transport observed in other systems (axonal transport, for example) even in the subset that has been picked out as examples here. While the role of microtubules and kif1aa in synapse maturation is strong, it seems likely that this role may be something non-canonical (which would be interesting). 

      One major difference between axonal and ribbon transport is that microtubules are very stable and linear in axonal transport. Therefore, the directed motion observed is ‘canonical’. In hair cells, the microtubules are extremely dynamic, especially towards the hair cell base. Within a single time frame (60-100 s), we see the network changing (moving and branching). This dynamic network adds another layer of complexity onto the motion of the ribbon, as the filament track itself is changing. Therefore, we see a lot of stalling, filament switching, and reversals of ribbon movement in our movies. However, we have demonstrated in our movies as well as using MSD analysis, that a subset of ribbons exhibit directional motion. In our revision we will discuss why directed motion in hair cells does not resemble canonical motor-driven transport in axons.

      (5) The effect of acute treatment with nocodozole on microtubules in movie 7 and Figure 6 is not obvious to me and it is clear that whatever effect it has on microtubules is incomplete. 

      When using Nocodazole, it is important to optimize the concentration of the drug such that there is minimal cytotoxicity, while still being effective. Microtubules in the apical region of hair cells are very stable and do not respond well to Nocodazole treatment at concentrations that are tolerable to hair cells. While a few stable filaments remain largely at the cell apex, there are almost no filaments at the hair cell base, which is different from the wild-type hair cells. In addition, Nocodazole-treated hair cells have more cytoplasmic YFP-tubulin signal compared to wild type. We will add additional images and quantification in our revision to illustrate these points.

    1. Reviewer #1 (Public Review):

      Summary:

      The study by Pudlowski et al. investigates how the intricate structure of centrioles is formed by studying the role of a complex formed by delta- and epsilon-tubulin and the TEDC1 and TEDC2 proteins. For this, they employ knockout cell lines, EM, and ultrastructure expansion microscopy as well as pull-downs. Previous work has indicated a role of delta- and epsilon-tubulin in triplet microtubule formation. Without triplet microtubules centriolar cylinders can still form, but are unstable, resulting in futile rounds of de novo centriole assembly during S phase and disassembly during mitosis. Here the authors show that all four proteins function as a complex and knockout of any of the four proteins results in the same phenotype. They further find that mutant centrioles lack inner scaffold proteins and contain an extended proximal end including markers such as SAS6 and CEP135, suggesting that triplet microtubule formation is linked to limiting proximal end extension and formation of the central region that contains the inner scaffold. Finally, they show that mutant centrioles seem to undergo elongation during early mitosis before disassembly, although it is not clear if this may also be due to prolonged mitotic duration in mutants.

      Strengths:

      Overall this is a well-performed study, well presented, with conclusions mostly supported by the data. The use of knockout cell lines and rescue experiments is convincing.

      Weaknesses:

      In some cases, additional controls and quantification would be needed, in particular regarding cell cycle and centriole elongation stages, to make the data and conclusions more robust.

    2. eLife assessment

      The study by Pudlowski et al. shows that a protein complex composed of delta- and epsilon-tubulin together with TEDC1 and TEDC2, which was previously identified, functions in generating centriolar triplet microtubules, and that this is crucial for the proper formation of centriolar subdomains and the stability of centrioles throughout the cell cycle. The findings are valuable for a better understanding of centriole biogenesis and structure and are largely supported by solid evidence based on knockout cell lines, immunoprecipitation, and ultrastructure expansion microscopy. The work is of interest to cell biologists, in particular researchers with interest in centrosome biology.

    3. Reviewer #2 (Public Review):

      Summary:

      In this article, the authors study the function of TEDC1 and TEDC2, two proteins previously reported to interact with TUBD1 and TUBE1. Previous work by the same group had shown that TUBD1 and TUBE1 are required for centriole assembly and that human cells lacking these proteins form abnormal centrioles that only have singlet microtubules that disintegrate in mitosis. In this new work, the authors demonstrate that TEDC1 and TEDC2 depletion results in the same phenotype with abnormal centrioles that also disintegrate into mitosis. In addition, they were able to localize these proteins to the proximal end of the centriole, a result not previously achieved with TUBD1 and TUBE1, providing a better understanding of where and when the complex is involved in centriole growth.

      Strengths:

      The results are very convincing, particularly the phenotype, which is the same as previously observed for TUBD1 and TUBE1. The U-ExM localization is also convincing: despite a signal that's not very homogeneous, it's clear that the complex is in the proximal region of the centriole and procentriole. The phenotype observed in U-ExM on the elongation of the cartwheel is also spectacular and opens the question of the regulation of the size of this structure. The authors also report convincing results on direct interactions between TUBD1, TUBE1, TEDC1, and TEDC2, and an intriguing structural prediction suggesting that TEDC1 and TEDC2 form a heterodimer that interacts with the TUBD1- TUBE1 heterodimer.

      Weaknesses:

      The phenotypes observed in U-ExM on cartwheel elongation merit further quantification, enabling the field to appreciate better what is happening at the level of this structure.

    4. Reviewer #3 (Public Review):

      Summary:

      Human cells deficient in delta-tubulin or epsilon-tubulin form unstable centrioles, which lack triplet microtubules and undergo a futile formation and disintegration cycle. In this study, the authors show that human cells lacking the associated proteins TEDC1 or TEDC2 have these identical phenotypes. They use genetics to knockout TEDC1 or TEDC2 in p53-negative RPE-1 cells and expansion microscopy to structurally characterize mutant centrioles. Biochemical methods and AlphaFold-multimer prediction software are used to investigate interactions between tubulins and TEDC1 and TEDC2.

      The study shows that mutant centrioles are built only of A tubules, which elongate and extend their proximal region, fail to incorporate structural components, and finally disintegrate in mitosis. In addition, they demonstrate that delta-tubulin or epsilon-tubulin and TEDC1 and TEDC2 form one complex and that TEDC1 TEDC2 can interact independently of tubulins. Finally, they show that the localization of four proteins is mutually dependent.

      Strengths:

      The results presented here are mostly convincing, the study is exciting and important, and the manuscript is well-written. The study shows that delta-tubulin, epsilon-tubulin, TEDC1, and TEDC2 function together to build a stable and functional centriole, significantly contributing to the field and our understanding of the centriole assembly process.

      Weaknesses:

      The ultrastructural characterization of TEDC1 and TEDC2 obtained by U-ExM is inconclusive. Improving the quality of the signals is paramount for this manuscript.

    1. Reviewer #2 (Public Review):

      Summary:

      This study looks at sex differences in alcohol drinking behaviour in a well-validated model of binge drinking. They provide a comprehensive analysis of drinking behaviour within and between sessions for males and females, as well as looking at the calcium dynamics in neurons projecting from the anterior insula cortex to the dorsolateral striatum.

      Strengths:

      Examining specific sex differences in drinking behaviour is important. This research question is currently a major focus for preclinical researchers looking at substance use. Although we have made a lot of progress over the last few years, there is still a lot that is not understood about sex-differences in alcohol consumption and the clinical implications of this.

      Identifying the lateralisation of activity is novel, and has fundamental importance for researchers investigating functional anatomy underlying alcohol-driven behaviour (and other reward-driven behaviours).

      Weaknesses:

      Very small and unequal sample sizes, especially females (9 males, 5 females). This is probably ok for the calcium imaging, especially with the G-power figures provided, however, I would be cautious with the outcomes of the drinking behaviour, which can be quite variable.

      For female drinking behaviour, rather than this being labelled "more efficient", could this just be that female mice (being substantially smaller than male mice) just don't need to consume as much liquid to reach the same g/kg. In which case, the interpretation might not be so much that females are more efficient, as that mice are very good at titrating their intake to achieve the desired dose of alcohol.

      I may be mistaken, but is ANCOVA, with sex as the covariate, the appropriate way to test for sex differences? My understanding was that with an ANCOVA, the covariate is a continuous variable that you are controlling for, not looking for differences in. In that regard, given that sex is not continuous, can it be used as a covariate? I note that in the results, sex is defined as the "grouping variable" rather than the covariate. The analysis strategy should be clarified.

    2. eLife assessment

      This valuable manuscript describes evidence of sex differences in specific corticostriatal projections during alcohol consumption, and this is noteworthy given the increasing rates/levels of drinking in females and the liability for Alcohol Use disorder. They provide solid evidence of the lateralisation of the activity of the circuit, but other evidence is incomplete, particularly with regard to its description of the drinking measure and how this relates to intoxication. The analyses of the histology data are not complete, and there are further inconsistencies that make it difficult to reconcile the photometry and behavioral data. The findings will be of partial interest to researchers investigating functional circuitry underlying alcohol-driven behaviors.

    3. Reviewer #1 (Public Review):

      Summary:

      This paper uses a model of binge alcohol consumption in mice to examine how the behaviour and its control by a pathway between the anterior insular cortex (AIC) to the dorsolateral striatum (DLS) may differ between males and females. Photometry is used to measure the activity of AIC terminals in the DLS when animals are drinking and this activity seems to correspond to drink bouts in males but not females. The effects appear to be lateralized with inputs to the left DLS being of particular interest.

      Strengths:

      Increasing alcohol intake in females is of concern and the consequences for substance use disorder and brain health are not fully understood, so this is an area that needs further study. The attempt to link fine-grained drinking behaviour with neural activity has the potential to enrich our understanding of the neural basis of behaviour, beyond what can be gleaned from coarser measures of volumes consumed etc.

      Weaknesses:

      The introduction to the drinking in the dark (DID) paradigm is rather narrow in scope (starting line 47). This would be improved if the authors framed this in the context of other common intermittent access paradigms and gave due credit to important studies and authors that were responsible for the innovation in this area (particularly studies by Wise, 1973 and returned to popular use by Simms et al 2010 and related papers; e.g., Wise RA (1973). Voluntary ethanol intake in rats following exposure to ethanol on various schedules. Psychopharmacologia 29: 203-210; Simms, J., Bito-Onon, J., Chatterjee, S. et al. Long-Evans Rats Acquire Operant Self-Administration of 20% Ethanol Without Sucrose Fading. Neuropsychopharmacol 35, 1453-1463 (2010).) The original drinking in the dark demonstrations should also be referenced (Rhodes et al., 2005). Line 154 Theile & Navarro 2014 is a review and not the original demonstration.

      When sex differences in alcohol intake are described, more care should be taken to be clear about whether this is in terms of volume (e.g. ml) or blood alcohol levels (BAC, or at least g/kg as a proxy measure). This distinction was often lost when lick responses were being considered. If licking is similar (assuming a single lick from a male and female brings in a similar volume?), this might mean males and females consume similar volumes, but females due to their smaller size would become more intoxicated so the implications of these details need far closer consideration. What is described as identical in one measure, is not in another.

      No conclusions regarding the photometry results can be drawn based on the histology provided. Localization and quantification of viral expression are required at a minimum to verify the efficacy of the dual virus approach (the panel in Supplementary Figure 1 is very small and doesn't allow terminals to be seen, and there is no quantification). Whether these might differ by sex is also necessary before we can be confident about any sex differences in neural activity.

      While the authors have some previous data on the AIC to DLS pathway, there are many brain regions and pathways impacted by alcohol and so the focus on this one in particular was not strongly justified. Since photometry is really an observational method, it's important to note that no causal link between activity in the pathway and drinking has been established here.

      It would be helpful if the authors could further explain whether their modified lickometers actually measure individual licks. While in some systems contact with the tongue closes a circuit which is recorded, the interruption of a photobeam was used here. It's not clear to me whether the nose close to the spout would be sufficient to interrupt that beam, or whether a tongue protrusion is required. This detail is important for understanding how the photometry data is linked to behaviour. The temporal resolution of the GCaMP signal is likely not good enough to capture individual links but I think more caution or detail in the discussion of the correspondence of these events is required.

      Even if the pattern of drinking differs between males and females, the use of the word "strategy" implies a cognitive process that was never described or measured.

    4. Reviewer #3 (Public Review):

      Summary:

      In this manuscript by Haggerty and Atwood, the authors use a repeated binge drinking paradigm to assess how water and ethanol intake changes in male in female mice as well as measure changes in anterior insular cortex to dorsolateral striatum terminal activity using fiber photometry. They find that overall, males and females have similar overall water and ethanol intake, but females appear to be more efficient alcohol drinkers. Using fiber photometry, they show that the anterior insular cortex (AIC) to dorsolateral striatum projections (DLS) projections have sex, fluid, and lateralization differences. The male left circuit was most robust when aligned to ethanol drinking, and water was somewhat less robust. Male right, and female and left and right, had essentially no change in photometry activity. To some degree, the changes in terminal activity appear to be related to fluid exposure over time, as well as within-session differences in trial-by-trial intake. Overall, the authors provide an exhaustive analysis of the behavioral and photometric data, thus providing the scientific community with a rich information set to continue to study this interesting circuit. However, although the analysis is impressive, there are a few inconsistencies regarding specific measures (e.g., AUC, duration of licking) that do not quite fit together across analytic domains. This does not reduce the rigor of the work, but it does somewhat limit the interpretability of the data, at least within the scope of this single manuscript.

      Strengths:

      - The authors use high-resolution licking data to characterize ingestive behaviors.<br /> - The authors account for a variety of important variables, such as fluid type, brain lateralization, and sex.<br /> - The authors provide a nice discussion on how this data fits with other data, both from their laboratory and others'.<br /> - The lateralization discovery is particularly novel.

      Weaknesses:

      - The volume of data and number of variables provided makes it difficult to find a cohesive link between data sets. This limits interpretability.<br /> - The authors describe a clear sex difference in the photometry circuit activity. However, I am curious about whether female mice that drink more similarly to males (e.g., less efficiently?) also show increased activity in the left circuit, similar to males. Oppositely, do very efficient males show weaker calcium activity in the circuit? Ultimately, I am curious about how the circuit activity maps to the behaviors described in Figures 1 and 2.<br /> - What does the change in water-drinking calcium imaging across time in males mean? Especially considering that alcohol-related signals do not seem to change much over time, I am not sure what it means to have water drinking change.

    1. eLife assessment

      Using electrophysiological recordings in freely moving rats, this valuable study investigates the role of different gamma frequency bands in the development of spatial representations in the hippocampus. However, the evidence is incomplete as the methods and data analysis need significant improvement. Critically, alternative interpretations and analyses must be provided, especially regarding the nature of gamma oscillations in the hippocampus and their interaction with neuronal firing dynamics and theta sequence features. This study will be of interest to neuroscientists working in the field of spatial navigation and neuronal dynamics.

    2. Reviewer #1 (Public Review):

      Hippocampal place cells display a sequence of firing activities when the animal travels through a spatial trajectory at a behavioral time scale of seconds to tens of seconds. Interestingly, parts of the firing sequence also occur at a much shorter time scale: ~120 ms within individual cycles of theta oscillation. These so-called theta sequences are originally thought to naturally result from the phenomenon of theta phase precession. However, there is evidence that theta sequences do not always occur even when theta phase precession is present, for example, during the early experience of a novel maze. The question is then how they emerge with experience (theta sequence development). This study presents evidence that a special group of place cells, those tuned to fast-gamma oscillations, may play a key role in theta sequence development.

      The authors analyzed place cells, LFPs, and theta sequences as rats traveled a circular maze in repeated laps. They found that a group of place cells were significantly tuned to a particular phase of fast-gamma (FG-cells), in contrast to others that did not show such tunning (NFG-cells). The authors then omitted FG-cells or the same number of NFG-cells, in their algorithm of theta sequence detection and found that the quality of theta sequences, quantified by a weighted correlation, was worse with the FG-cell omission, compared to that with the NFG-cell omission, during later laps, but not during early laps. What made the FG-cells special for theta sequences? The authors found that FG-cells, but not NFG-cells, displayed phase recession to slow-gamma (25 - 45 Hz) oscillations (within theta cycles) during early laps (both FG- and NFG-cells showed slow-gamma phase precession during later laps). Overall, the authors conclude that FG-cells contribute to theta sequence development through slow-gamma phase precession during early laps.

      How theta sequences are formed and developed during experience is an important question, because these sequences have been implicated in several cognitive functions of place cells, including memory-guided spatial navigation. The identification of FG-cells in this study is straightforward. Evidence is also presented for the role of these cells in theta sequence development. However, given several concerns elaborated below, whether the evidence is sufficiently strong for the conclusion needs further clarification, perhaps, in future studies.

      (1) The results in Figure 3 and Figure 8 seems contradictory. In Figure 8, all theta sequences displayed a seemingly significant weighted correlation (above 0) even in early laps, which was mostly due to FG-cell sequences but not NFG-cell sequences (correlation for NFG-sequences appeared below 0). However, in Figure 3H, omitting FG-cells and omitting NFG-cells did not produce significant differences in the correlation. Conversely, FG-cell and NFG-cell sequences were similar in later laps in Figure 8 (NFG-cell sequences appeared even better than FG-cell sequences), yet omitting NFG-cells produced a better correlation than omitting FG-cells. This confusion may be related to how "FG-cell-dominant sequences" were defined, which is unclear in the manuscript. Nevertheless, the different results are not easy to understand.

      (2) The different contributions between FG-cells and NFG-cells to theta sequences are supposed not to be caused by their different firing properties (Figure 5). However, Figure 5D and E showed a large effect size (Cohen's D = 07, 0.8), although not significant (P = 0.09, 0.06). But the seemingly non-significant P values could be simply due to smaller N's (~20). In other parts of the manuscript, the effect sizes were comparable or even smaller (e.g. D = 0.5 in Figure 7B), but interpreted as positive results: P values were significant with large N's (~480 in Fig. 7B). Drawing a conclusion purely based on a P value while N is large often renders the conclusion only statistical, with unclear physical meaning. Although this is common in neuroscience publications, it makes more sense to at least make multiple inferences using similar sample sizes in the same study.

      (3) In supplementary Figure 2 - S2, FG-cells displayed stronger theta phase precession than NFG-cells, which could be a major reason why FG-cells impacted theta sequences more than NFG cells. Although factors other than theta phase precession may contribute to or interfere with theta sequences, stronger theta phase precession itself (without the interference of other factors), by definition, can lead to stronger theta sequences.

      (4) The slow-gamma phase precession of FG-cells during early laps is supposed to mediate or contribute to the emergence of theta sequences during late laps (Figure 1). The logic of this model is unclear. The slow-gamma phase precession was present in both early and late laps for FG-cells, but only present in late laps for NFG-cells. It seems more straightforward to hypothesize that the difference in theta sequences between early and later laps is due to the difference in slow-gamma phase precession of NFG cells between early and late laps. Although this is not necessarily the case, the argument presented in the manuscript is not easy to follow.

      (5) There are several questions on the description of methods, which could be addressed to clarify or strengthen the conclusions.

      (i) Were the identified fast- and slow-gamma episodes mutually exclusive?

      (ii) Was the task novel when the data were acquired? How many days (from the 1st day of the task) were included in the analysis? When the development of the theta sequence was mentioned, did it mean the development in a novel environment, in a novel task, or purely in a sense of early laps (Lap 1, 2) on each day?

      (iii) How were the animals' behavioral parameters equalized between early and later laps? For example, speed or head direction could potentially produce the differences in theta sequences.

    3. Reviewer #2 (Public Review):

      This manuscript addresses an important question that has not yet been solved in the field, what is the contribution of different gamma oscillatory inputs to the development of "theta sequences" in the hippocampal CA1 region? Theta sequences have received much attention due to their proposed roles in encoding short-term behavioral predictions, mediating synaptic plasticity, and guiding flexible decision-making. Gamma oscillations in CA1 offer a readout of different inputs to this region and have been proposed to synchronize neuronal assemblies and modulate spike timing and temporal coding. However, the interactions between these two important phenomena have not been sufficiently investigated. The authors conducted place cell and local field potential (LFP) recordings in the CA1 region of rats running on a circular track. They then analyzed the phase locking of place cell spikes to slow and fast gamma rhythms, the evolution of theta sequences during behavior, and the interaction between these two phenomena. They found that place cells with the strongest modulation by fast gamma oscillations were the most important contributors to the early development of theta sequences and that they also displayed a faster form of phase precession within slow gamma cycles nested with theta. The results reported are interesting and support the main conclusions of the authors. However, the manuscript needs significant improvement in several aspects regarding data analysis, description of both experimental and analytical methods, and alternative interpretations, as I detail below.

      • The experimental paradigm and recordings should be explained at the beginning of the Results section. Right now, there is no description whatsoever which makes it harder to understand the design of the study.

      • An important issue that needs to be addressed is the very small fraction of CA1 cells phased-locked to slow gamma rhythms (3.7%). This fraction is much lower than in many previous studies, that typically report it in the range of 20-50 %. However, this discrepancy is not discussed by the authors. This needs to be explained and additional analysis considered. One analysis that I would suggest, although there are also other valid approaches, is to, instead of just analyzing the phase locking in two discrete frequency bands, compute the phase locking will all LFP frequencies from 25-100 Hz. This will offer a more comprehensive and unbiased view of the gamma modulation of place cell firing. Alternative metrics to mean vector length that is less sensitive to firing rates, such as pairwise phase consistency index (Vinck et a., Neuroimage, 2010), could be implemented. This may reveal whether the low fraction of phase-locked cells could be due to a low number of spikes entering the analysis.

      • From the methods, it is not clear to me whether the reference LFP channel was consistently selected to be a different one that where the spikes analyzed were taken. This is the better practice to reduce the contribution of spike leakage that could substantially inflate the coupling with faster gamma frequencies. These analyses need to be described in more detail.

      • The initial framework of the authors of classifying cells into fast gamma and not fast gamma modulated implies a bimodality that may be artificial. The authors should discuss the nuances and limitations of this framework. For example, several previous work has shown that the same place cell can couple to different gamma oscillations (e.g., Lastoczni et al., Neuron, 2016; Fernandez-Ruiz et al., Neuron, 2017; Sharif et al., Neuron,2021).

      • It would be useful to provide a more thorough characterization of the physiological properties of FG and NFG cells, as this distinction is the basis of the paper. Only very little characterization of some place cell properties is provided in Figure 5. Important characteristics that should be very feasible to compare include average firing rate, burstiness, estimated location within the layer (i.e., deep vs superficial sublayers) and along the transverse axis (i.e., proximal vs distal), theta oscillation frequency, phase precession metrics (given their fundamental relationship with theta sequences), etc.

      • It is not clear to me how the analysis in Figure 6 was performed. In Figure 6B I would think that the grey line should connect with the bottom white dot in the third panel, which would be the interpretation of the results.

    4. Reviewer #3 (Public Review):

      [Editors' note: This review contains many criticisms that apply to the whole sub-field of slow/fast gamma oscillations in the hippocampus, as opposed to this particular paper. In the editors' view, these comments are beyond the scope of any single paper. However, they represent a view that, if true, should contextualise the interpretation of this paper and all papers in the sub-field. In doing so, they highlight an ongoing debate within the broader field.]

      Summary:

      The authors aimed to elucidate the role of dynamic gamma modulation in the development of hippocampal theta sequences, utilizing the traditional framework of "two gammas," a slow and a fast rhythm. This framework is currently being challenged, necessitating further analyses to establish and secure the assumed premises before substantiating the claims made in the present article.

      The results are too preliminary and need to integrate contemporary literature. New analyses are required to address these concerns. However, by addressing these issues, it may be possible to produce an impactful manuscript.

      I. Introduction<br /> Within the introduction, multiple broad assertions are conveyed that serve as the premise for the research. However, equally important citations that are not mentioned potentially contradict the ideas that serve as the foundation. Instances of these are described below:

      (1) Are there multiple gammas? The authors launched the study on the premise that two different gamma bands are communicated from CA3 and the entorhinal cortex. However, recent literature suggests otherwise, offering that the slow gamma component may be related to theta harmonics:

      From a review by Etter, Carmichael and Williams (2023)<br /> "Gamma-based coherence has been a prominent model for communication across the hippocampal-entorhinal circuit and has classically focused on slow and fast gamma oscillations originating in CA3 and medial entorhinal cortex, respectively. These two distinct gammas are then hypothesized to be integrated into hippocampal CA1 with theta oscillations on a cycle-to-cycle basis (Colgin et al., 2009; Schomburg et al., 2014). This would suggest that theta oscillations in CA1 could serve to partition temporal windows that enable the integration of inputs from these upstream regions using alternating gamma waves (Vinck et al., 2023). However, these models have largely been based on correlations between shifting CA3 and medial entorhinal cortex to CA1 coherence in theta and gamma bands. In vivo, excitatory inputs from the entorhinal cortex to the dentate gyrus are most coherent in the theta band, while gamma oscillations would be generated locally from presumed local inhibitory inputs (Pernía-Andrade and Jonas, 2014). This predominance of theta over gamma coherence has also been reported between hippocampal CA1 and the medial entorhinal cortex (Zhou et al., 2022). Another potential pitfall in the communication-through-coherence hypothesis is that theta oscillations harmonics could overlap with higher frequency bands (Czurkó et al., 1999; Terrazas et al., 2005), including slow gamma (Petersen and Buzsáki, 2020). The asymmetry of theta oscillations (Belluscio et al., 2012) can lead to harmonics that extend into the slow gamma range (Scheffer-Teixeira and Tort, 2016), which may lead to a misattribution as to the origin of slow-gamma coherence and the degree of spike modulation in the gamma range during movement (Zhou et al., 2019)."

      And from Benjamin Griffiths and Ole Jensen (2023)<br /> "That said, in both rodent and human studies, measurements of 'slow' gamma oscillations may be susceptible to distortion by theta harmonics [53], meaning open questions remain about what can be attributed to 'slow' gamma oscillations and what is attributable to theta."

      This second statement should be heavily considered as it is from one of the original authors who reported the existence of slow gamma.

      Yet another instance from Schomburg, Fernández-Ruiz, Mizuseki, Berényi, Anastassiou, Christof Koch, and Buzsáki (2014):<br /> "Note that modulation from 20-30 Hz may not be related to gamma activity but, instead, reflect timing relationships with non-sinusoidal features of theta waves (Belluscio et al., 2012) and/or the 3rd theta harmonic."

      One of this manuscript's authors is Fernández-Ruiz, a contemporary proponent of the multiple gamma theory. Thus, the modulation to slow gamma offered in the present manuscript may actually be related to theta harmonics.

      With the above emphasis from proponents of the slow/fast gamma theory on disambiguating harmonics from slow gamma, our first suggestion to the authors is that they A) address these statements (citing the work of these authors in their manuscript) and B) demonstrably quantify theta harmonics in relation to slow gamma prior to making assertions of phase relationships (methodological suggestions below). As the frequency of theta harmonics can extend as high as 56 Hz (PMID: 32297752), overlapping with the slow gamma range defined here (25-45 Hz), it will be important to establish an approach that decouples the two phenomena using an approach other than an arbitrary frequency boundary.

      (2) Can gammas be segregated into different lamina of the hippocampus? This idea appears to be foundational in the premise of the research but is also undergoing revision.

      As discussed by Etter et al. above, the initial theory of gamma routing was launched on coherence values. However, the values reported by Colgin et al. (2009) lean more towards incoherence (a value of 0) rather than coherence (1), suggesting a weak to negligible interaction. Nevertheless, this theory is coupled with the idea that the different gamma frequencies are exclusive to the specific lamina of the hippocampus.

      Recently, Deschamps et al. (2024) suggested a broader, more nuanced understanding of gamma oscillations than previously thought, emphasizing their wide range and variability across hippocampal layers. This perspective challenges the traditional dichotomy of gamma sub-bands (e.g., slow vs. medium gamma) and their associated cognitive functions based on a more rigid classification according to frequency and phase relative to the theta rhythm. Moreover, they observed all frequencies across all layers.

      Similarly, the current source density plots from Belluscio et al. (2012) suggest that SG and FG can be observed in both the radiatum and lacunosum-moleculare.

      Therefore, if the initial coherence values are weak to negligible and both slow and fast gamma are observed in all layers of the hippocampus, can the different gammas be exclusively related to either anatomical inputs or psychological functions (as done in the present manuscript)? Do these observations challenge the authors' premise of their research? At the least, please discuss.

      (3) Do place cells, phase precession, and theta sequences require input from afferent regions? It is offered in the introduction that "Fast gamma (~65-100Hz), associated with the input from the medial entorhinal cortex, is thought to rapidly encode ongoing novel information in the context (Fernandez-Ruiz et al., 2021; Kemere, Carr, Karlsson, & Frank, 2013; Zheng et al., 2016)".

      CA1 place fields remain fairly intact following MEC inactivation include Ipshita Zutshi, Manuel Valero, Antonio Fernández-Ruiz , and György Buzsáki (2022)- "CA1 place cells and assemblies persist despite combined mEC and CA3 silencing" and from Hadas E Sloin, Lidor Spivak, Amir Levi, Roni Gattegno, Shirly Someck, Eran Stark (2024) - "These findings are incompatible with precession models based on inheritance, dual-input, spreading activation, inhibition-excitation summation, or somato-dendritic competition. Thus, a precession generator resides locally within CA1."

      These publications, at the least, challenge the inheritance model by which the afferent input controls CA1 place field spike timing. The research premise offered by the authors is couched in the logic of inheritance, when the effect that the authors are observing could be governed by local intrinsic activity (e.g., phase precession and gamma are locally generated, and the attribution to routed input is perhaps erroneous). Certainly, it is worth discussing these manuscripts in the context of the present manuscript.

      II. Results

      (1) Figure 2-<br /> a. There is a bit of a puzzle here that should be discussed. If slow and fast frequencies modulate 25% of neurons, how can these rhythms serve as mechanisms of communication/support psychological functions? For instance, if fast gamma is engaged in rapid encoding (line 72) and slow gamma is related to the integration processing of learned information (line 84), and these are functions of the hippocampus, then why do these rhythms modulate so few cells? Is this to say 75% of CA1 neurons do not listen to CA3 or MEC input?

      b. Figure 2. It is hard to know if the mean vector lengths presented are large or small. Moreover, one can expect to find significance due to chance. For instance, it is challenging to find a frequency in which modulation strength is zero (please see Figure 4 of PMID: 30428340 or Figure 7 of PMID: 31324673).

      i. Please construct the histograms of Mean Vector Length as in the above papers, using 1 Hz filter steps from 1-120Hz and include it as part of Figure 2 (i.e., calculate the mean vector length for the filtered LFP in steps of 1-2 Hz, 2-3 Hz, 3-4 Hz,... etc). This should help the authors portray the amount of modulation these neurons have relative to the theta rhythm and other frequencies. If the theta mean vector length is higher, should it be considered the primary modulatory influence of these neurons (with slow and fast gammas as a minor influence)?

      ii. It is possible to infer a neuron's degree of oscillatory modulation without using the LFP. For instance, one can create an ISI histogram as done in Figure 1 here (https://www.biorxiv.org/content/10.1101/2021.09.20.461152v3.full.pdf+html; "Distinct ground state and activated state modes of firing in forebrain neurons"). The reciprocal of the ISI values would be "instantaneous spike frequency". In favor of the Douchamps et al. (2024) results, the figure of the BioRXiV paper implies that there is a single gamma frequency modulate as there is only a single bump in the ISIs in the 10^-1.5 to 10^-2 range. Therefore, to vet the slow gamma results and the premise of two gammas offered in the introduction, it would be worth including this analysis as part of Figure 2.

      c. There are some things generally concerning about Figure 2.

      i. First, the raw trace does not seem to have clear theta epochs (it is challenging to ascertain the start and end of a theta cycle). Certainly, it would be worth highlighting the relationship between theta and the gammas and picking a nice theta epoch.

      ii. Also, in panel A, there looks to be a declining amplitude relationship between the raw, fast, and slow gamma traces, assuming that the scale bars represent 100uV in all three traces. The raw trace is significantly larger than the fast gamma. However, this relationship does not seem to be the case in panel B (in which both the raw and unfiltered examples of slow and fast gamma appear to be equal; the right panels of B suggest that fast gamma is larger than slow, appearing to contradict the A= 1/f organization of the power spectral density). Please explain as to why this occurs. Including the power spectral density (see below) should resolve some of this.

      iii. Within the example of spiking to phase in the left side of Panel B (fast gamma example)- the neuron appears to fire near the trough twice, near the peak twice, and somewhere in between once. A similar relationship is observed for the slow gamma epoch. One would conclude from these plots that the interaction of the neuron with the two rhythms is the same. However, the mean vector lengths and histograms below these plots suggest a different story in which the neuron is modulated by FG but not SG. Please reconcile this.

      iv. For calculating the MVL, it seems that the number of spikes that the neuron fires would play a significant role. Working towards our next point, there may be a bias of finding a relationship if there are too few spikes (spurious clustering due to sparse data) and/or higher coupling values for higher firing rate cells (cells with higher firing rates will clearly show a relationship), forming a sort of inverse Yerkes-Dodson curve. Also, without understanding the magnitude of the MVL relative to other frequencies, it may be that these values are indeed larger than zero, but not biologically significant.

      - Please provide a scatter plot of Neuron MVL versus the Neuron's Firing Rate for 1) theta (7-9 Hz), 2) slow gamma, and 3) fast gamma, along with their line of best fit.

      - Please run a shuffle control where the LFP trace is shifted by random values between 125-1000ms and recalculate the MVL for theta, slow, and fast gamma. Often, these shuffle controls are done between 100-1000 times (see cross-correlation analyses of Fujisawa, Buzsaki et al.).

      - To establish that firing rate does not play a role in uncovering modulation, it would be worth conducting a spike number control, reducing the number of spikes per cell so that they are all equal before calculating the phase plots/MVL.

      (2) Something that I anticipated to see addressed in the manuscript was the study from Grosmark and Buzsaki (2016): "Cell assembly sequences during learning are "replayed" during hippocampal ripples and contribute to the consolidation of episodic memories. However, neuronal sequences may also reflect preexisting dynamics. We report that sequences of place-cell firing in a novel environment are formed from a combination of the contributions of a rigid, predominantly fast-firing subset of pyramidal neurons with low spatial specificity and limited change across sleep-experience-sleep and a slow-firing plastic subset. Slow-firing cells, rather than fast-firing cells, gained high place specificity during exploration, elevated their association with ripples, and showed increased bursting and temporal coactivation during postexperience sleep. Thus, slow- and fast-firing neurons, although forming a continuous distribution, have different coding and plastic properties."

      My concern is that much of the reported results in the present manuscript appear to recapitulate the observations of Grosmark and Buzsaki, but without accounting for differences in firing rate. A parsimonious alternative explanation for what is observed in the present manuscript is that high firing rate neurons, more integrated into the local network and orchestrating local gamma activity (PING), exhibit more coupling to theta and gamma. In this alternative perspective, it's not something special about how the neurons are entrained to the routed fast gamma, but that the higher firing rate neurons are better able to engage and entrain their local interneurons and, thus modulate local gamma. However, this interpretation challenges the discussion around the importance of fast gamma routed from the MEC.

      a. Please integrate the Grosmark & Buzsaki paper into the discussion.

      b. Also, please provide data that refutes or supports the alternative hypothesis in which the high firing rate cells are just more gamma modulated as they orchestrate local gamma activity through monosynaptic connections with local interneurons (e.g., Marshall et al., 2002, Hippocampal pyramidal cell-interneuron spike transmission is frequency dependent and responsible for place modulation of interneuron discharge). Otherwise, the attribution to a MEC routed fast gamma routing seems tenuous.<br /> c. It is mentioned that fast-spiking interneurons were removed from the analysis. It would be worth including these cells, calculating the MVL in 1 Hz increments as well as the reciprocal of their ISIs (described above).

      (3) Methods - Spectral decomposition and Theta Harmonics.

      a. It is challenging to interpret the exact parameters that the authors used for their multi-taper analysis in the methods (lines 516-526). Tallon-Baudry et al., (1997; Oscillatory γ-Band (30-70 Hz) Activity Induced by a Visual Search Task in Humans) discuss a time-frequency trade-off where frequency resolution changes with different temporal windows of analysis. This trade-off between time and frequency resolution is well known as the uncertainty principle of signal analysis, transcending all decomposition methods. It is not only a function of wavelet or FFT, and multi-tapers do not directly address this. (The multitaper method, by using multiple specially designed tapers -like the Slepian sequences- smooths the spectrum. This smoothing doesn't eliminate leakage but distributes its impact across multiple estimates). Given the brevity of methods and the issues of theta harmonics as offered above, it is worth including some benchmark trace testing for the multi-taper as part of the supplemental figures.

      i. Please spectrally decompose an asymmetric 8 Hz sawtooth wave showing the trace and the related power spectral density using the multiple taper method discussed in the methods.

      ii. Please also do the same for an elliptical oscillation (perfectly symmetrical waves, but also capable of casting harmonics). Matlab code on how to generate this time series is provided below:<br /> A = 1; % Amplitude<br /> T = 1/8; % Period corresponding to 8 Hz frequency<br /> omega = 2*pi/T; % Angular frequency<br /> C = 1; % Wave speed<br /> m = 0.9; % Modulus for the elliptic function (0<br /> x = linspace(0, 2*pi, 1000); % temporal domain<br /> t = 0; % Time instant

      % Calculate B based on frequency and speed<br /> B = sqrt(omega/C);

      % Cnoidal wave equation using the Jacobi elliptic function<br /> u = A .* ellipj(B.*(x - C*t), m).^2;

      % Plotting the cnoidal wave<br /> figure;<br /> plot(x./max(x), u);<br /> title('8 Hz Cnoidal Wave');<br /> xlabel('time (x)');<br /> ylabel('Wave amplitude (u)');<br /> grid on;

      The Symbolic Math Toolbox needs to be installed and accessible in your MATLAB environment to use ellipj. Otherwise, I trust that, rather than plotting a periodic orbit around a circle (sin wave) the authors can trace the movement around an ellipse with significant eccentricity (the distance between the two foci should be twice the distance between the co-vertices).

      iii. Line 522: "The power spectra across running speeds and absolute power spectrum (both results were not shown)...". Given the potential complications of multi-taper discussed above, and as each convolution further removes one from the raw data, it would be the most transparent, simple, and straightforward to provide power spectra using the simple fft.m code in Matlab (We imagine that the authors will agree that the results should be robust against different spectral decomposition methods. Otherwise, it is concerning that the results depend on the algorithm implemented and should be discussed. If gamma transience is a concern, the authors should trigger to 2-second epochs in which slow/fast gamma exceeds 3-7 std. dev. above the mean, comparing those resulting power spectra to 2-second epochs with ripples - also a transient event). The time series should be at least 2 seconds in length (to avoid spectral leakage issues and the issues discussed in Talon-Baudry et al., 1997 above).

      Please show the unmolested power spectra (Y-axis units in mV2/Hz, X-axis units as Hz) as a function of running speed (increments of 5 cm/s) for each animal. I imagine three of these PSDs for 3 of the animals will appear in supplemental methods while one will serve as a nice manuscript figure. With this plot, please highlight the regions that the authors are describing as theta, slow, and fast gamma. Also, any issues should be addressed should there be notable differences in power across animals or tetrodes (issues with locations along proximal-distal CA1 in terms of MEC/LEC input and using a local reference electrode are discussed below).

      iv. Schomberg and colleagues (2014) suggested that the modulation of neurons in the slow gamma range could be related to theta harmonics (see above). Harmonics can often extend in a near infinite as they regress into the 1/f background (contributing to power, but without a peak above the power spectral density slope), making arbitrary frequency limits inappropriate. Therefore, in order to support the analyses and assertions regarding slow gamma, it seems necessary to calculate a "theta harmonic/slow gamma ratio". Aru et al. (2015; Untangling cross-frequency coupling in neuroscience) offer that: " The presence of harmonics in the signal should be tested by a bicoherence analysis and its contribution to CFC should be discussed." Please test both the synthetic signals above and the raw LFP, using temporal windows of greater than 4 seconds (again, the large window optimizes for frequency resolution in the time-frequency trade-off) to calculate the bicoherence. As harmonics are integers of theta coupled to itself and slow gamma is also coupled to theta, a nice illustration and contribution to the field would be a method that uses the bispectrum to isolate and create a "slow gamma/harmonic" ratio.

      (4) I appreciate the inclusion of the histology for the 4 animals. Knerim and colleagues describe a difference in MEC projection along the proximal-distal axis of the CA1 region (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866456/)- "There are also differences in their direct projections along the transverse axis of CA1, as the LEC innervates the region of CA1 closer to the subiculum (distal CA1), whereas the MEC innervates the region of CA1 closer to CA2 and CA3 (proximal CA1)" From the histology, it looks like some of the electrodes are in the part of CA1 that would be dominated by LEC input while a few are closer to where the MEC would project.

      a. How do the authors control for these differences in projections? Wouldn't this change whether or not fast gamma is observed in CA1?

      b. I am only aware of one manuscript that describes slow gamma in the LEC which appeared in contrast to fast gamma from the MEC (https://www.science.org/doi/10.1126/science.abf3119). One would surmise that the authors in the present manuscript would have varying levels of fast gamma in their CA1 recordings depending on the location of the electrodes in the Proximal-distal axis, to the extent that some of the more medial tetrodes may need to be excluded (as they should not have fast gamma, rather they should be exclusively dominated by slow gamma). Alternatively, the authors may find that there is equal fast gamma power across the entire proximal-distal axis. However, this would pose a significant challenge to the LEC/slow gamma and MEC/fast gamma routing story of Fernandez-Ruiz et al. and require reconciliation/discussion.

      c. Is there a difference in neuron modulation to these frequencies based on electrode location in CA1?

      (5) Given a comment in the discussion (see below), it will be worth exploring changes in theta, theta harmonic, slow gamma, and fast gamma power with running speed as no changes were observed with theta sequences or lap number versus. Notably, Czurko et al., report an increase in theta and harmonic power with running speed (1999) while Ahmed and Mehta (2012) report a similar effect for gamma.

      a. Please determine if the oscillations change in power and frequency of the rhythms discussed above change with running speed using the same parameters applied in the present manuscript. The specific concern is that how the authors calculate running speed is not sensitive enough to evaluate changes.

      b. It is astounding that animals ran as fast as they did in what appears to be the first lap (Figure 3F), especially as rats' natural proclivity is thigmotaxis and inquisitive exploration in novel environments. Can the authors expand on why they believe their rats ran so quickly on the first lap in a novel environment and how to replicate this? Also, please include the individual values for each animal on the same plot.

      c. Can the authors explain how the statistics on line 169 (F(4,44)) work? Specifically, it is challenging to determine how the degrees of freedom were calculated in this case and throughout if there were only 4 animals (reported in methods) over 5 laps (depicted in Figure 3F. Given line 439, it looks like trials and laps are used synonymously). Four animals over 5 laps should have a DOF of 16.

      (6) Throughout the manuscript, I am concerned about an inflation of statistical power. For example on line 162, F(2,4844). The large degrees of freedom indicate that the sample size was theta sequences or a number of cells. Since multiple observations were obtained from the same animal, the statistical assumption of independence is violated. Therefore, the stats need to be conducted using a nested model as described in Aarts et al. (2014; https://pubmed.ncbi.nlm.nih.gov/24671065/). A statistical consult may be warranted.

      (7) It is stated that one tetrode served as a quiet recording reference. The "quiet" part is an assumption when often, theta and gamma can be volume conducted to the cortex (e.g., Sirota et al., 2008; This is often why laboratories that study hippocampal rhythms use the cerebellum for the differential recording electrode and not an electrode in the corpus callosum). Generally, high frequencies propagate as well as low frequencies in the extracellular milieu (https://www.eneuro.org/content/4/1/ENEURO.0291-16.2016). For transparency, the authors should include a limitation paragraph in their discussion that describes how their local tetrode reference may be inadvertently diminishing and/or distorting the signal that they are trying to isolate. Otherwise, it would be worth hearing an explanation as to how the author's approach avoids this issue.

      Apologetically, this review is already getting long. Moreover, I have substantial concerns that should be resolved prior to delving into the remainder of the analyses. e.g., the analyses related to Figure 3-5 assert that FG cells are important for sequences. However, the relationship to gamma may be secondary to either their relationship to theta or, based on the Grosmark and Buzsaki paper, it may just be a phenomenon coupled to the fast-firing cells (fast-firing cells showing higher gamma modulation due to a local PING dynamic). Moreover, the observation of slow gamma is being challenged as theta harmonics, even by the major proponents of the slow/fast gamma theory. Therefore, the report of slow gamma precession would come as an unsurprising extension should they be revealed to be theta harmonics (however, no control for harmonics was implemented; suggestions were made above). Following these amendments, I would be grateful for the opportunity to provide further feedback.

      III. Discussion.

      a. Line 330- it was offered that fast gamma encodes information while slow gamma integrates in the introduction. However, in a task such as circular track running (from the methods, it appears that there is no new information to be acquired within a trial), one would guess that after the first few laps, slow gamma would be the dominant rhythm. Therefore, one must wonder why there are so few neurons modulated by slow gamma (~3.7%).

      b. Line 375: The authors contend that: "...slow gamma, related to information compression, was also required to modulate fast gamma phase-locked cells during sequence development. We replicated the results of slow gamma phase precession at the ensemble level (Zheng et al., 2016), and furthermore observed it at late development, but not early development, of theta sequences." In relation to the idea that slow gamma may be coupled to - if not a distorted representation of - theta harmonics, it has been observed that there are changes in theta relative to novelty.

      i. A. Jeewajee, C. Lever, S. Burton, J. O'Keefe, and N. Burgess (2008) report a decrease in theta frequency in novel circumstances that disappears with increasing familiarity.

      ii. One could surmise that this change in frequency is associated with alterations in theta harmonics (observed here as slow gamma), challenging the author's interpretation.

      iii. Therefore, the authors have a compelling opportunity to replicate the results of Jeewajee et al., characterizing changes of theta along with the development of slow gamma precession, as the environment becomes familiar. It will become important to demonstrate, using bicoherence as offered by Aru et al., how slow gamma can be disambiguated from theta harmonics. Specifically, we anticipate that the authors will be able to quantify A) theta harmonics (the number, and their respective frequencies and amplitudes), B) the frequency and amplitude of slow gamma, and C) how they can be quantitatively decoupled. Through this, their discussion of oscillatory changes with novelty-familiarity will garner a significant impact.

      c. Broadly, it is interesting that the authors emphasize the gamma frequency throughout the discussion. Given that the power spectral density of the Local Field Potential (LFP) exhibits a log-log relationship between amplitude and frequency, as described by Buzsáki (2005) in "Rhythms of the Brain," and considering that the LFP is primarily generated through synaptic transmembrane currents (Buzsáki et al., 2012), it seems parsimonious to consider that the bulk of synaptic activity occurs at lower frequencies (e.g., theta). Since synaptic transmission represents the most direct form of inter-regional communication, one might wonder why gamma (characterized by lower amplitude rhythms) is esteemed so highly compared to the higher amplitude theta rhythm. Why isn't the theta rhythm, instead, regarded as the primary mode of communication across brain regions? A discussion exploring this question would be beneficial.

    1. eLife assessment

      This useful study explores the relationship between the sequence of prokaryotic promoter elements and their activity using mutagenesis to generate thousands of mutant sequences. The evidence supporting these findings is incomplete, and would benefit from additional experiments, clarification of methods, and a more detailed discussion of related literature. This work will appeal to those interested in bacterial genetics, genome evolution, and gene regulation.

    2. Reviewer #1 (Public Review):

      Summary:

      This study by Fuqua et al. studies the emergence of sigma70 promoters in bacterial genomes. While there have been several studies to explore how mutations lead to promoter activity, this is the first to explore this phenomenon in a wide variety of backgrounds, which notably contain a diverse assortment of local sigma70 motifs in variable configurations. By exploring how mutations affect promoter activity in such diverse backgrounds, they are able to identify a variety of anecdotal examples of gain/loss of promoter activity and propose several mechanisms for how these mutations interact within the local motif landscape. Ultimately, they show how different sequences have different probabilities of gaining/losing promoter activity and may do so through a variety of mechanisms.

      Major strengths and weaknesses of the methods and results:

      This study uses Sort-Seq to characterize promoter activity, which has been adopted by multiple groups and shown to be robust. Furthermore, they use a slightly altered protocol that allows measurements of bi-directional promoter activity. This combined with their pooling strategy allows them to characterize expressions of many different backgrounds in both directions in extremely high throughput which is impressive! A second key approach this study relies on is the identification of promoter motifs using position weight matrices (PWMs). While these methods are prone to false positives, the authors implement a systematic approach which is standard in the field. However, drawing these types of binary definitions (is this a motif? yes/no) should always come with the caveat that gene expression is a quantitative trait that we oversimplify when drawing boundaries.

      Their approach to randomly mutagenizing promoters allowed them to find many anecdotal examples of different types of evolutions that may occur to increase or decrease promoter activity. However, the lack of validation of these phenomena in more controlled backgrounds may require us to further scrutinize their results. That is, their explanations for why certain mutations lead or obviate promoter activity may be due to interactions with other elements in the 'messy' backgrounds, rather than what is proposed.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The authors express a key finding that the specific landscape of promoter motifs in a sequence affects the likelihood that local mutations create or destroy regulatory elements. The authors have described many examples, including several that are non-obvious, and show convincingly that different sequence backgrounds have different probabilities for gaining or losing promoter activity. While this overarching conclusion is supported by the manuscript, the proposed mechanisms for explaining changes in promoter activity are not sufficiently validated to be taken for absolute truth. There is not sufficient description of the strength of emergent promoter motifs or their specific spacings from existing motifs within the sequence. Furthermore, they do not define a systematic process by which mutations are assigned to different categories (e.g. box shifting, tandem motifs, etc.) which may imply that the specific examples are assigned based on which is most convenient for the narrative.

      Impact of the work on the field, and the utility of the methods and data to the community:

      From this study, we are more aware of different types of ways promoters can evolve and devolve, but do not have a better ability to predict when mutations will lead to these effects. Recent work in the field of bacterial gene regulation has raised interest in bidirectional promoter regions. While the authors do not discuss how mutations that raise expression in one direction may affect another, they have created an expansive dataset that may enable other groups to study this interesting phenomenon. Also, their variation of the Sort-Seq protocol will be a valuable example for other groups who may be interested in studying bidirectional expression. Lastly, this study may be of interest to groups studying eukaryotic regulation as it can inform how the evolution of transcription factor binding sites influences short-range interactions with local regulator elements.

      Any additional context to understand the significance of the work:

      The task of computationally predicting whether a sequence drives promoter activity is difficult. By learning what types of mutations create or destroy promoters from this study, we are better equipped for this task.

    3. Reviewer #2 (Public Review):

      Summary:

      Fuqua et al investigated the relationship between prokaryotic box motifs and the activation of promoter activity using a mutagenesis sequencing approach. From generating thousands of mutant daughter sequences from both active and non-active promoter sequences they were able to produce a fantastic dataset to investigate potential mechanisms for promoter activation. From these large numbers of mutated sequences, they were able to generate mutual information with gene expression to identify key mutations relating to the activation of promoter island sequences.

      Strengths:

      The data generated from this paper is an important resource to address this question of promoter activation. Being able to link the activation of gene expression to mutational changes in previously nonactive promoter regions is exciting and allows the potential to investigate evolutionary processes relating to gene regulation in a statistically robust manner. Alongside this, the method of identifying key mutations using mutual information in this paper is well done and should be standard in future studies for identifying regions of interest.

      Weaknesses:

      While the generation of the data is superb the focus only on these mutational hotspots removes a lot of the information available to the authors to generate robust conclusions. For instance.

      (1) The linear regression in S5 used to demonstrate that the number of mutational hotspots correlates with the likelihood of a mutation causing promoter activation is driven by three extreme points.

      (2) Many of the arguments also rely on the number of mutational hotspots being located near box motifs. The context-dependent likelihood of this occurring is not taken into account given that these sequences are inherently box motif rich. So, something like an enrichment test to identify how likely these hot spots are to form in or next to motifs.

      (3) The link between changes in expression and mutations in surrounding motifs is assessed with two-sided Mann Whitney U tests. This method assumes that the sequence motifs are independent of one another, but the hotspots of interest occur either in 0, 3, 4, or 5s in sequences. There is therefore no sequence where these hotspots can be independent and the correlation causation argument for motif change on expression is weakened.

      (4) The distance between -10 and -35 was mentioned briefly but not taken into account in the analysis.

      The authors propose mechanisms of promoter activation based on a few observations that are treated independently but occur concurrently. To address this using complementary approaches such as analysis focusing on identifying important motifs, using something like a glm lasso regression to identify significant motifs, and then combining with mutational hotspot information would be more robust. Other elements known to be involved in promoter activation including TGn or UP elements were not investigated or discussed.

    4. Reviewer #3 (Public Review):

      Summary:

      Like many papers in the last 5-10 years, this work brings a computational approach to the study of promoters and transcription, but unfortunately disregards or misrepresents much of the existing literature and makes unwarranted claims of novelty. My main concerns with the current paper are outlined below although the problems are deeply embedded.

      Strengths:

      The data could be useful if interpreted properly, taking into account i) the role of translation ii) other promoter elements, and iii) the relevant literature.

      Weaknesses:

      (1) Incorrect assumptions and oversimplification of promoters.

      - There is a critical error on line 68 and Figure 1A. It is well established that the -35 element consensus is TTGACA but the authors state TTGAAA, which is also the sequence represented by the sequence logo shown and so presumably the PWM used. It is essential that the authors use the correct -35 motif/PWM/consensus.

      -Likely, the authors have made this mistake because they have looked at DNA sequence logos generated from promoter alignments anchored by either the position of the -10 element or transcription start site (TSS), most likely the latter. The distance between the TSS and -10 varies. Fewer than half of E. coli promoters have the optimal 7 bp separation with distances of 8, 6, and 5 bp not being uncommon (PMID: 35241653). Furthermore, the distance between the -10 and -35 elements is also variable (16,17, and 18 bp spacings are all frequently found, PMID: 6310517). This means that alignments, used to generate sequence logos, have misaligned -35 hexamers. Consequently, the true consensus is not represented. If the alignment discrepancies are corrected, the true consensus emerges. This problem seems to permeate the whole study since this obviously incorrect consensus/motif has been used throughout to identify sequences that resemble -35 hexamers.

      - An uninformed person reading this paper would be led to believe that prokaryotic promoters have only two sequence elements: the -10 and -35 hexamers. This is because the authors completely ignore the role of the TG motif, UP element, and spacer region sequence. All of these can compensate for the lack of a strong -35 hexamer and it's known that appending such elements to a lone -10 sequence can create an active promoter (e.g. PMIDs 15118087, 21398630, 12907708, 16626282, 32297955). Very likely, some of the mutations, classified as not corresponding to a -10 or -35 element in Figure 2, target some of these other promoter motifs.

      - The model in Figure 4C is highly unlikely. There is no evidence in the literature that RNAP can hang on with one "arm" in this way. In particular, structural work has shown that sequence-specific interactions with the -10 element can only occur after the DNA has been unwound (PMID: 22136875). Further, -10 elements alone, even if a perfect match to the consensus, are non-functional for transcription. This is because RNAP needs to be directed to the -10 by other promoter elements, or transcription factors. Only once correctly positioned, can RNAP stabilise DNA opening and make sequence-specific contacts with the -10 hexamer. This makes the notion that RNAP may interact with the -10 alone, using only domain 2 of sigma, extremely unlikely.

      (2) Reinventing the language used to describe promoters and binding sites for regulators.

      - The authors needlessly complicate the narrative by using non-standard language. For example, On page 1 they define a motif as "a DNA sequence computationally predicted to be compatible with TF binding". They distinguish this from a binding site "because binding sites refer to a location where a TF binds the genome, rather than a DNA sequence". First, these definitions are needlessly complicated, why not just say "putative binding sites" and "known binding sites" respectively? Second, there is an obvious problem with the definitions; many "motifs" with also be "bindings sites". In fact, by the time the authors state their definitions, they have already fallen foul of this conflation; in the prior paragraph they stated: "controlled by DNA sequences that encode motifs for TFs to bind". The same issue reappears throughout the paper.

      - The authors also use the terms "regulatory" and non-regulatory" DNA. These terms are not defined by the authors and make little sense. For instance, I assume the authors would describe promoter islands lacking transcriptional activity (itself an incorrect assumption, see below)as non-regulatory. However, as horizontally acquired sections of AT-rich DNA these will all be bound by H-NS and subject to gene silencing, both promoters for mRNA synthesis and spurious promoters inside genes that create untranslated RNAs. Hence, regulation is occurring.

      - Line 63: "In prokaryotes, the primary regulatory sequences are called promoters". Promoters are not generally considered regulatory. Rather, it is adjacent or overlapping sites for TFs that are regulatory. There is a good discussion of the topic here (PMID: 32665585).

      (3) The authors ignore the role of translation.

      - The authors' assay does not measure promoter activity alone, this can only be tested by measuring the amount of RNA produced. Rather, the assay used measures the combined outputs of transcription and translation. If the DNA fragments they have cloned contain promoters with no appropriately positioned Shine-Dalgarno sequence then the authors will not detect GFP or RFP production, even though the promoter could be making an RNA (likely to be prematurely terminated by Rho, due to a lack of translation). This is known for promoters in promoter islands (e.g. Figure 1 in PMID: 33958766).

      - In Figure S6 it appears that the is a strong bias for mutations resulting in RFP expression to be close to the 3' end of the fragment. Very likely, this occurs because this places the promoter closer to RFP and there are fewer opportunities for premature termination by Rho

      (4) Ignoring or misrepresenting the literature.

      - As eluded to above, promoter islands are large sections of horizontally acquired, high AT-content, DNA. It is well known that such sequences are i) packed with promoters driving the expression on RNAs that aren't translated ii) silenced, albeit incompletely, by H-NS and iii) targeted by Rho which terminates untranslated RNA synthesis (PMIDs: 24449106, 28067866, 18487194). None of this is taken into account anywhere in the paper and it is highly likely that most, if not all, of the DNA sequences the authors have used contain promoters generating untranslated RNAs.

      - The authors state that GC content does not correlate with the emergence of new promoters. It is known that GC content does correlate to the emergence of new promoters because promoters are themselves AT-rich DNA sequences (e.g. see Figure 1 of PMID: 32297955). There are two reasons the authors see no correlation in this work. First, the DNA sequences they have used are already very AT-rich (between 65 % and 78 % AT-content). Second, they have only examined a small range of different AT-content DNA (i.e. between 65 % and 78 %). The effect of AT-content on promoter emerge is most clearly seen between AT-content of between around 40 % and 60 %. Above that level, the strong positive correlation plateaus.

      - Once these authors better include and connect their results to the previous literature, they can also add some discussion of how previous papers in recent years may have also missed some of this important context.

      (5) Lack of information about sequences used and mutations.

      - To properly assess the work any reader will need access to the sequences cloned at the start of the work, where known TSSs are within these sequences (ideally +/- H-NS, which will silence transcription in the chromosomal context but may not when the sequences are removed from their natural context and placed in a plasmid). Without this information, it is impossible to assess the validity of the authors' work.

      - The authors do not account for the possibility that DNA sequences in the plasmid, on either side of the cloned DNA fragment, could resemble promoter elements. If this is the case, then mutations in the cloned DNA will create promoters by "pairing up" with the plasmid sequences. There is insufficient information about the DNA sequences cloned, the mutations identified, or the plasmid, to determine if this is the case. It is possible that this also accounts for mutational hotspots described in the paper.

      (6) Overselling the conclusions.

      Line 420: The paper claims to have generated important new insights into promoters. At the same time, the main conclusion is that "Our study demonstrates that mutations to -10 and -35 boxes motifs are the primary paths to create new promoters and to modulate the activity of existing promoters". This isn't new or unexpected. People have been doing experiments showing this for decades. Of course, mutations that make or destroy promoter elements create and destroy promoters. How could it be any other way?

    1. eLife assessment

      This useful work provides a risk-prediction tool, in the form of a nomogram, for practitioners and elderly patients with non-metastatic colon cancer using data from the SEER registry. The unique contribution of this work is the focus on conditional survival. However, the underlying statistical approach is suboptimal and therefore incomplete, which substantially lessens the potential impact of this work. The analysis could use a more rigorous consideration of competing risks.

    2. Reviewer #1 (Public Review):

      Summary:

      This study assessed conditional survival in elderly patients with non-metastatic colon cancer who underwent colectomy. The study found that 5-year conditional overall survival rates exhibited a slight increase initially, followed by a decrease over time. In contrast, 5-year conditional colon-specific survival rates consistently improved over the same period. Nomograms were developed to predict survival probabilities at baseline and for patients surviving 1, 3, and 5 years post-diagnosis, with good predictive performance. The study concludes that conditional survival offers valuable insights into medium- and long-term survival probabilities for these patients.

      Strengths:

      The strengths of this study include robust study design, methodology, statistical analysis, and interpretation of the findings. Utilizing a well-known database for the analysis is another strength. Differentiating overall survival and colon-specific survival rates could be another one. Focusing on elderly patients with this condition is another major point. Providing nomograms for an easier implication of the findings in real-world clinical practice is a major strength of the study.

      Weaknesses:

      Relying on only one database of patients and narrowing down the population to only elderly patients who underwent colectomy could be mentioned as a weakness. Less generalizability of the findings for other populations and not including more diverse databases is a major weakness of this study. The good predictive capabilities of the developed tools are another weakness that could be improved to be excellent.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors assessed the conditional survival of elderly patients with non-metastatic colon cancer who had survived a certain length of time after colectomy. They used data from the Surveillance, Epidemiology, and End Results (SEER) registry to conduct a conditional survival analysis providing estimates of conditional survival rates as well as an analysis of which variables were most important for survival at baseline, one year, three years, and five years.

      Strengths:

      - The authors used SEER data, providing them with long-term follow-up, and thoroughly considered a wide range of variables related to cancer mortality.<br /> - The authors did a thorough job of assessing the predictive ability of their models.<br /> - The authors used conditional survival, providing estimates of survival that are meaningful for patients/physicians, making them useful for clinical practice.

      Weaknesses:

      - The paper would have benefited from a more thorough explanation of why the methods were improvements on existing approaches.

      - This study was primarily interested in cancer mortality, and compared it to the secondary outcome of death from any cause. The study would have benefited from modeling death from non-cancer causes (the competing risk) in addition to death from colon cancer, rather than comparing only to the composite endpoint of death from any cause.

      - When considering a cause-specific hazard, as done with cancer survival in this paper, it would be better to consider the cumulative incidence function rather than Kaplan Meier, since it does not assume the independence of the events like Kaplan Meier does. For this reason, the paper would benefit from focusing on the results of the adjusted cause-specific hazard models (rather than the unadjusted conditional survival estimates done using Kaplan Meier estimates shown in Figure 1 and conducting a parallel analysis for death from other causes.

      - The authors mention that they consider disparities using a log-rank test. For the same reason as above, is not the best approach when dealing with competing risks as it depends on Kaplan Meier curves. The log-rank test may be fine if there is no strong dependence between the two causes of death, but the paper would benefit from some discussion of that choice, or sensitivity analysis by comparison to other approaches.

      - The variables for the adjusted models were chosen with univariate Cox regression analysis, with any variables having a p-value less than 0.05 being included in the adjusted. Another approach, which may have made the models more easily comparable, would be to choose the variables that are relevant based on prior literature and include them in the multivariate model regardless of significance. The paper would benefit from a discussion of what is gained by excluding some variables from some models.

    4. Reviewer #3 (Public Review):

      Summary:

      This article uses a subset of data from the SEER cancer registry to develop nomograms, a patient-facing risk prediction tool, for predicting overall and cancer-specific survival in elderly patients who underwent colectomy for the treatment of non-metastatic colon cancer. A unique contribution is the intent to provide conditional predictions, i.e. given that you have survived for x years from your diagnosis, what is your probability of survival for an additional y years? Although the goal is a useful one, the approach is unfortunately hampered by some important weaknesses.

      Strengths:

      Predicting conditional overall survival is a useful, patient-oriented goal.

      The data source is the high-quality SEER cancer registry.

      Weaknesses:

      Using Kaplan-Meier methodology to estimate the survival distribution for a time-to-event in the presence of another competing time-to-event (in this case: estimating colon-specific survival in the presence of death from other causes) will generally over-estimate the event rate. The reported colon-specific survival probabilities are probably biased downwards from their true values. See https://pubmed.ncbi.nlm.nih.gov/10204198/

      A similar concern would apply to the use of the cause-specific Cox model, and thus also the nomogram, to predict absolute (conditional) survival.

      The p-value-based methodology for determining which predictors should be included in the nomogram is rudimentary. More modern variable selection methods, e.g. the Lasso, would have been preferred.

      Related to the above comment, some predictors are present for the conditional survival nomogram for time t, then absent for time t+1, then present again for time t+2. A cancer site is an example of such a predictor. From a face validity perspective, this doesn't really make sense. Ideally, predictors would not enter, then leave, and then re-enter a model.

      Many observations were excluded due to missingness in predictors, e.g. >10000 were excluded to due unknown CEA (Supplementary Figure 1). Given the number of observations dropped due to missingness in the predictors, ideally an attempt would have been made to incorporate the partial information available in these data.

      Details are lacking on how the AUCs and Brier scores were calculated in the presence of censoring / competing events, which limits the reader's understanding of the results.

      It is not clear why a nomogram would be preferred to an online risk prediction calculator.

    1. eLife assessment

      The work is important and of potential value to areas other than the bone field because it supports a role and mechanism for beta-catenin that is novel and unusual. The findings are significant in that they support the presence of another anabolic pathway in bone that can be productively targeted for therapeutic goals. The data for the most part are convincing. The work could be strengthened by better characterizing the osteoclast KO of Malat1 related to the Lys cre model and by including biochemical markers of bone turnover from the mice.

    2. Reviewer #1 (Public Review):

      Summary

      The authors were trying to discover a novel bone remodeling network system. They found that an IncRNA Malat1 plays a central role in the remodeling by binding to β-catenin and functioning through the β-catenin-OPG/Jagged1 pathway in osteoblasts and chondrocytes. In addition, Malat1 significantly promotes bone regeneration in fracture healing in vivo. Their findings suggest a new concept of Malat1 function in the skeletal system. One significantly different finding between this manuscript and the competing paper pertains to the role of Malat1 in osteoclast lineage, specifically, whether Malat1 functions intrinsically in osteoclast lineage or not.

      Strengths:

      This study provides strong genetic evidence demonstrating that Malat1 acts intrinsically in osteoblasts while suppressing osteoclastogenesis in a non-autonomous manner, whereas the other group did not utilize relevant conditional knockout mice. As shown in the results, Malat1 knockout mouse exhibited abnormal bone remodeling and turnover. Furthermore, they elucidated molecular function of Malat1, which is sufficient to understand the phenotype in vivo.

      Weaknesses:

      Discussing differences between previous paper and their status would be highly informative and beneficial for the field, as it would elucidate the solid underlying mechanisms.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors investigated the roles of IncRNA Malat1 in bone homeostasis which was initially believed to be non-functional for physiology. They found that both Malat1 KO and conditional KO in osteoblast lineage exhibit significant osteoporosis due to decreased osteoblast bone formation and increased osteoclast resorption. More interestingly they found that deletion of Malat1 in osteoclast lineage cells does not affect osteoclast differentiation and function. Mechanistically, they found that Malat1 acts as a co-activator of b-Catenin directly regulating osteoblast activity and indirectly regulating osteoclast activity via mediating OPG, but not RANKL expression in osteoblast and chondrocyte. Their discoveries establish a previously unrecognized paradigm model of Malat1 function in the skeletal system, providing novel mechanistic insights into how a lncRNA integrates cellular crosstalk and molecular networks to fine-tune tissue homeostasis, and remodeling.

      Strengths:

      The authors generated global and conditional KO mice in osteoblast and osteoclast lineage cells and carefully analyzed the role of Matat1 with both in vivo and in vitro systems. The conclusion of this paper is mostly well supported by data.

      Weaknesses:

      More objective biological and biochemical analyses are required.

    4. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Qin and colleagues study the role of Malat1 in bone biology. This topic is interesting given the role of lncRNAs in multiple physiologic processes. A previous study (PMID 38493144) suggested a role for Malat1 in osteoclast maturation. However, the role of this lncRNA in osteoblast biology was previously not explored. Here, the authors note osteopenia with increased bone resorption in mice lacking Malat1 globally and in osteoblast lineage cells. At the mechanistic level, the authors suggest that Malat1 controls beta-catenin activity. These results advance the field regarding the role of this lncRNA in bone biology.

      Strengths:

      The manuscript is well-written and data are presented in a clear and easily understandable manner. The bone phenotype of osteoblast-specific Malat1 knockout mice is of high interest. The role of Malat1 in controlling beta-catenin activity and OPG expression is interesting and novel.

      Weaknesses:

      The lack of a bone phenotype when Malat1 is deleted with LysM-Cre is of interest given the previous report suggesting a role for this lncRNA in osteoclasts. However, to interpret the findings here, the authors should investigate the deletion efficiency of Malat1 in osteoclast lineage cells in their model. The data in the fracture model in Figure 8 seems incomplete in the absence of a more complete characterization of callus histology and a thorough time course. The role of Malat1 and OPG in chondrocytes is unclear since the osteocalcin-Cre mice (which should retain normal Malat1 levels in chondrocytes) have similar bone loss as the global mutants.

    1. eLife assessment

      In this valuable study, Gue, Hue et al. describe how two poorly understood rhabdomyosarcoma fusion-oncogenes, VGLL2::NCOA2 and TEAD1::NCOA2, function at the genomic, transcriptional, and proteomic levels in multiple systems. They generated solid data that support that these fusion-oncogenes leverage TEAD transcriptional signatures, in a mechanism that is independent of YAP/TAZ, and that this activity potentially contributes to tumorigenesis. This work offers new mechanistic insights into oncogenic gene fusion events identified in cancer patients and reveals potential therapeutic strategies for the treatment of rhabdomyosarcomas.

    2. Reviewer #1 (Public Review):

      Summary:

      Guo, Hue et al. focused on understanding the epigenetic activity and functional dependencies for two different fusions found in infantile rhabdomyosarcoma, VGLL2::NCOA2, and TEAD1::NCOA2. They use a variety of models and methods; specifically, ectopic expression of the fusions in human 293T cells to perform RNAseq (both fusions), CUT&RUN (VGLL2::NCOA2), and BioID mass spec (both fusions). These data identify that the VGLL2::NCOA2 fusion has peaks that are enriched for TEAD motifs. Further, CPB/p300 CUT&RUN support an enrichment of binding sites and three TEAD targets in VGLL2::NCOA2 and TEAD1::NCOA2 expressing cells. They also functionally evaluated genetic and chemical dependencies (TEAD inhibition), and found this was only effective for the VGLL2::NCOA2 fusion, and not for TEAD1::NCOA2. Using complementary biochemical approaches they suggest (with other supporting data) that the fusions regulate TEAD transcriptional outputs via a YAP/TAZ independent mechanism. Further, they expand into a C2C12 myoblast model and show that TEAD1::NCOA2 is transforming in colony formation assays and in mouse allografts. This is consistent with previously published strategies using VGLL2::NCOA2. Importantly, they show that a CBP/p300 (a binding partner found in their BioID mass spec) small molecule inhibitor suppresses tumor formation using this mouse allograft model, that the tumors are less proliferative, and have a reduction in transcriptional of three TEAD target genes. Generally, the data is interesting and suggests new biology for these fusion-oncogenes. However, the choice of 293T for the majority of the transcriptional, epigenetic, and proteomic studies makes the findings difficult to interpret in the context of the human disease, and the rationale for the choice of an epithelial-like kidney cell line is not discussed. Further, details are missing from the figures, figure legends, and methods that make the data difficult to interpret, and should be added to improve the reader's understanding. Overall, the breadth of methods used in this study, and the comparison of the two fusion-oncogene's biology is of interest to the fusion-oncogene, pediatric sarcoma, and epigenetic therapeutic targeting fields.

      Strengths:

      (1) Multiple experimental approaches were used to understand the biology of the fusion-oncogenes, including genomic, proteomic, chemical, and genetic inhibition. These approaches identify potential new mechanisms of convergent fusion-oncogene activity, around TEAD transcriptional targeting (that is YAP/TAZ independent) and reveal CBP/p300 as a functional dependency.

      (2) Complementary models were used, including cell-based assays and mouse allograft models to show the dependency on CBP/P300.

      (3) Co-IPs were clear and convincing and showed direct interaction of the fusion-oncogene with ectopic and endogenous TEAD1/pan-TEAD, but not YAP/TAZ.

      (4) Potential to follow-up on additional targets/mechanisms of tumorigenesis. For example, in the BioID proteomics screen, a unique VGLL2::NCOA2 and TEAD::NCOA2 interactor is P53, which also is an enriched pathway in Figure 4C in the p300 CUT&RUN peaks in the VGLL2::NCOA2 and TEAD1::NCOA2 expressing cells - is this indicative of the toxicity of the fusion-oncogenes or do you think this informs potential mechanisms for transformation.

      Weaknesses:

      (1) The rationale for performing genomics, transcriptional, and proteomics work in 293T cells is not discussed. Further, there are no functional readouts mentioned in the 293T cells with expression of the fusion-oncogenes. Did these cells have any phenotypes associated with fusion-oncogene expression (proliferation differences, morphological changes, colony formation capacity)? Further, how similar are the gene expression signatures from RNA-seq to rhabdomyosarcoma? This would help the reader interpret how similar these cell models are to human disease.

      (2) TEAD1::NCOA2 fusion-oncogene model was not credentialed past H&E, and expression of Desmin. Is the transcriptional signature in C2C12 or 293T similar to a rhabdomyosarcoma gene signature?

      (3) For the fusion-oncogenes, did the HA, FLAG, or V5 tag impact fusion-oncogene activity? Was the tag on the 3' or 5' of the fusion? This was not discussed in the methods.

      (4) Generally, the lack of details in the figures, figure legends, and methods make the data difficult to interpret. A few examples are below:

      a. Individual data points are not shown for figure bar plots (how many technical or biological replicates are present and how many times was the experiment repeated?).<br /> b. What exons were included in the fusion-oncogenes from VGLL2 and NCOA2 or TEAD1 and NCOA2?<br /> c. For how long were the colony formation experiments performed? Two weeks?<br /> d. In Figure 2D, what concentration of CP1 was used and for how long?<br /> e. How was A485 resuspended for cell culture and mouse experiments, what is the percentage of DMSO?<br /> f. How many replicates were done for RNA-seq, CUT&RUN, and ATACseq experiments?

    3. Reviewer #2 (Public Review):

      In the manuscript entitled "VGLL2 and TEAD1 fusion proteins drive YAP/TAZ-independent transcription and tumorigenesis by engaging p300", Gu et al. studied two Hippo pathway-related gene fusion events (i.e., VGLL2-NCOA2, TEAD1-NCOA2) in spindle cell rhabdomyosarcoma (scRMS) and showed that their fusion proteins can activate Hippo downstream gene transcription independent of YAP/TAZ. Using the BioID-based mass spectrometry analysis, the authors revealed histone acetyltransferase CBP/p300 as specific binding proteins for VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. Pharmacologically targeting p300 inhibited the fusion proteins-induced Hippo downstream gene transcription and tumorigenic events.

      Overall, this study provides mechanistic insights into the scRMS-associated gene fusions in tumorigenesis and reveals potential therapeutic targets for cancer treatment. The manuscript is well-written and easy to follow.

      Here, several suggestions are made for the authors to improve their study.

      Main points

      (1) The authors majorly focused on the Hippo downstream gene transcription in this study, while a significant portion of genes regulated by the VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins are non-Hippo downstream genes (Figure 3). The authors should investigate whether the altered Hippo pathway transcription is essential for VGLL2-NCOA2 and TEAD1-NCOA2-induced cell transformation and tumorigenesis. Specifically, they should test if treatment with the TEAD inhibitor can reverse the cell transformation and tumorigenesis caused by VGLL2-NCOA2 but not TEAD1-NCOA2. In addition, it is important to examine whether YAP-5SA expression can rescue the inhibitory effects of A485 on VGLL2-NCOA2 and TEAD1-NCOA2-induced colony formation and tumor growth. This will help clarify whether Hippo downstream gene transcription is important for the oncogenic activities of these two fusion proteins.

      (2) Rationale for selecting CBP/p300 for functional studies needs to be provided. The BioID-MS experiment identified many interacting proteins for VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins (Table S4). The authors should explain the scoring system used to identify the high-interacting proteins for VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. Was CEP/p300 the top candidates on the list? Providing this information will help justify the focus on CBP/p300 and validate their importance in this study.

      (3) p300 was revealed as a key driver for the VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins-induced transcriptome alteration and tumorigenesis. To strengthen the point, the authors should identify the p300 binding region on VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins. Mutants with defects in p300 binding/recruitment should be generated and included as a control in the related q-PCR and tumorigenic studies. This work will help confirm the crucial role of p300 in mediating the oncogenic effects of these two fusion proteins.

      (4) Another major issue is the overexpression system extensively used in this study. It is important to determine whether the VGLL2-NCOA2 and TEAD1-NCOA2 fusion genes are also amplified in cancer. If not, the expression levels of the VGLL2-NCOA2 and TEAD1-NCOA2 fusion proteins should be adjusted to endogenous levels to assess their oncogenic effects on gene transcription and tumorigenesis. This approach would make the study more relevant to the pathological conditions observed in scRMS cancer patients.

    1. eLife assessment

      Approaches for quantifying synaptic activity events are currently limited, and recent advances in AI and deep learning provide an opportunity to develop powerful new ways to automate this process. In this study, the authors have generated a valuable tool, miniML, that uses open-source software that convincingly enables rapid, automated, and accurate quantification of synaptic events from a variety of systems and approaches. This software will be of significant utility to a variety of neuroscience researchers.

    2. Reviewer #1 (Public Review):

      O'Neill et al. have developed a software analysis application, miniML, that enables the quantification of electrophysiological events. They utilize a supervised deep learned-based method to optimize the software. miniML is able to quantify and standardize the analyses of miniature events, using both voltage and current clamp electrophysiology, as well as optically driven events using iGluSnFR3, in a variety of preparations, including in the cerebellum, calyx of held, Golgi cell, human iPSC cultures, zebrafish, and Drosophila. The software appears to be flexible, in that users are able to hone and adapt the software to new preparations and events. Importantly, miniML is an open-source software free for researchers to use and enables users to adapt new features using Python.

      Overall this new software has the potential to become widely used in the field and an asset to researchers. However, the authors fail to discuss or even cite a similar analysis tool recently developed (SimplyFire), and determine how miniML performs relative to this platform. There are a handful of additional suggestions to make miniML more user-friendly, and of broad utility to a variety of researchers, as well as some suggestions to further validate and strengthen areas of the manuscript:

      (1) miniML relative to existing analysis methods: There is a major omission in this study, in that a similar open source, Python-based software package for event detection of synaptic events appears to be completely ignored. Earlier this year, another group published SimplyFire in eNeuro (Mori et al., 2024; doi: 10.1523/eneuro.0326-23.2023). Obviously, this previous study needs to be discussed and ideally compared to miniML to determine if SimplyFire is superior or similar in utility, and to underscore differences in approach and accuracy.

      (2) The manuscript should comment on whether miniML works equally well to quantify current clamp events (voltage; e.g. EPSP/mEPSPs) compared to voltage clamp (currents, EPSC/mEPSCs), which the manuscript highlights. Are rise and decay time constants calculated for each event similarly?

      (3) The interface and capabilities of miniML appear quite similar to Mini Analysis, the free software that many in the field currently use. While the ability and flexibility for users to adapt and adjust miniML for their own uses/needs using Python programming is a clear potential advantage, can the authors comment, or better yet, demonstrate, whether there is any advantage for researchers to use miniML over Mini Analysis or SimplyFire if they just need the standard analyses?

      (4) Additional utilities for miniML: The authors show miniML can quantify miniature electrophysiological events both current and voltage clamp, as well as optical glutamate transients using iGluSnFR. As the authors mention in the discussion, the same approach could, in principle, be used to quantify evoked (EPSC/EPSP) events using electrophysiology, Ca2+ events (using GCaMP), and AP waveforms using voltage indicators like ASAP4. While I don't think it is reasonable to ask the authors to generate any new experimental data, it would be great to see how miniML performs when analysing data from these approaches, particularly to quantify evoked synaptic events and/or Ca2+ (ideally postsynaptic Ca2+ signals from miniature events, as the Drosophila NMJ have developed nice approaches).

    3. Reviewer #2 (Public Review):

      Summary:

      This paper presents miniML as a supervised method for the detection of spontaneous synaptic events. Recordings of such events are typically of low SNR, where state-of-the-art methods are prone to high false positive rates. Unlike current methods, training miniML requires neither prior knowledge of the kinetics of events nor the tuning of parameters/thresholds.

      The proposed method comprises four convolutional networks, followed by a bi-directional LSTM and a final fully connected layer which outputs a decision event/no event per time window. A sliding window is used when applying miniML to a temporal signal, followed by an additional estimation of events' time stamps. miniML outperforms current methods for simulated events superimposed on real data (with no events) and presents compelling results for real data across experimental paradigms and species.

      Strengths:

      The authors present a pipeline for benchmarking based on simulated events superimposed on real data (with no events). Compared to five other state-of-the-art methods, miniML leads to the highest detection rates and is most robust to specific choices of threshold values for fast or slow kinetics. A major strength of miniML is the ability to use it for different datasets. For this purpose, the CNN part of the model is held fixed and the subsequent networks are trained to adapt to the new data. This Transfer Learning (TL) strategy reduces computation time significantly and more importantly, it allows for using a substantially smaller data set (compared to training a full model) which is crucial as training is supervised (i.e. uses labeled examples).

      Weaknesses:

      The authors do not indicate how the specific configuration of miniML was set, i.e. number of CNNs, units, LSTM, etc. Please provide further information regarding these design choices, whether they were based on similar models or if chosen based on performance.

      The data for the benchmark system was augmented with equal amounts of segments with/without events. Data augmentation was undoubtedly crucial for successful training.

      (1) Does a balanced dataset reflect the natural occurrence of events in real data? Could the authors provide more information regarding this matter?

      (2) Please provide a more detailed description of this process as it would serve users aiming to use this method for other sub-fields.

      The benchmarking pipeline is indeed valuable and the results are compelling. However, the authors do not provide comparative results for miniML for real data (Figures 4-8). TL does not apply to the other methods. In my opinion, presenting the performance of other methods, trained using the smaller dataset would be convincing of the modularity and applicability of the proposed approach.

      Impact:

      Accurate detection of synaptic events is crucial for the study of neural function. miniML has a great potential to become a valuable tool for this purpose as it yields highly accurate detection rates, it is robust, and is relatively easily adaptable to different experimental setups.

      Additional comments:

      Line 73: the authors describe miniML as "parameter-free". Indeed, miniML does not require the selection of pulse shape, rise/fall time, or tuning of a threshold value. Still, I would not call it "parameter-free" as there are many parameters to tune, starting with the number of CNNs, and number of units through the parameters of the NNs. A more accurate description would be that as an AI-based method, the parameters of miniML are learned via training rather than tuned by the user.

      Line 302: the authors describe miniML as "threshold-independent". The output trace of the model has an extremely high SNR so a threshold of 0.5 typically works. Since a threshold is needed to determine the time stamps of events, I think a better description would be "robust to threshold choice".

    4. Reviewer #3 (Public Review):

      miniML as a novel supervised deep learning-based method for detecting and analyzing spontaneous synaptic events. The authors demonstrate the advantages of using their methods in comparison with previous approaches. The possibility to train the architecture on different tasks using transfer learning approaches is also an added value of the work. There are some technical aspects that would be worth clarifying in the manuscript:

      (1) LSTM Layer Justification: Please provide a detailed explanation for the inclusion of the LSTM layer in the miniML architecture. What specific benefits does the LSTM layer offer in the context of synaptic event detection?

      (2) Temporal Resolution: Can you elaborate on the reasons behind the lower temporal resolution of the output? Understanding whether this is due to specific design choices in the model, data preprocessing, or post-processing will clarify the nature of this limitation and its impact on the analysis.

      (3) Architecture optimization: how was the architecture CNN+LSTM optimized in terms of a number of CNN layers and size?

    1. eLife assessment

      This study provides a novel and promising NPRL2 gene therapy for enhanced immunotherapy response in a KRAS/STK11 mutant anti-PD1 resistant metastatic NSCLC humanized mouse model. Overall, the authors presented a large amount of convincing in vivo data to demonstrate that NPRL2 gene therapy induces antitumor activity through DC-mediated antigen presentation and cytotoxic immune cell activation. This work will be of interest and useful to medical biologists and oncologists in the research field of KRAS-mutant NSCLC.

    2. Reviewer #1 (Public Review):

      This study excellently complements the previous one by unveiling the properties of NPRL2 in augmenting the effect of immune checkpoint inhibitors such as pembrolizumab in KRAS mutant lung cancer models.

      The following points should be clarified:

      (1) In KRAS mutant cell lines with LKB1 co-mutations or deletions, such as A549 cells, does treatment with NPRL2 not increase the efficacy of immunotherapy? Is this correct? Similarly, does the delivery of NPRL2 only potentiate the effect of immunotherapy in KRAS mutant cell lines without associated LKB1 mutations?

      (2) Do the authors analyze by western blot if NPRL2 influences or restores STING and LKB1 in the A549 cell line that lacks LKB1 and STING?

      (3) Mechanistically, is there any explanation as to why NPRL2 delivery increases the efficacy of immunotherapy? Is there any effect on FUS or MYC?

      (4) Is there any way to carry out a clinical study of systematically delivering NPRL2 in KRAS lung cancer patients?

    3. Reviewer #2 (Public Review):

      Summary:

      NPRL2 gene therapy induces effective antitumor immunity in KRAS/STK11 mutant anti-PD1 resistant metastatic non-small cell lung cancer (NSCLC) in a humanized mouse model by Meraz et al investigated the antitumor immune responses to NPRL2 gene therapy in aPD1R / KRAS/STK11mt NSCLC in a humanized mouse model, and found that NPRL2 gene therapy induces antitumor activity on KRAS/STK11mt/aPD1R tumors through DC-mediated antigen presentation and cytotoxic immune cell activation.

      Strengths:

      The novelty of the study.

      Weaknesses:

      (1) The inconsistent effect of NPRL2 combined with pembrolizumab. Figure 2I-K, showed a similar tumor intensity in the NPRL2 group and combination group. However, NPRL2 combined with pembrolizumab was synergistic in the KRASwt/aPD1S H1299 tumors in Figure 4.

      (2) The authors stated that NPRL2 combined with pembrolizumab was not synergistic in the KRAS/STK11mt/aPD1R tumors but was synergistic in the KRASwt/aPD1S H1299 tumors. How did the synergistic effect defined in the study, more details need to be provided here.

      (3) Nearly all of the work was performed pre-clinically. Validation in the clinical setting would provide more strong evidence for the conclusion.

      (4) Figure 5 and Figure 6 have the same legend. These 2 figures could be merged as a new one.

      (5) Figure 5B & C, n=9 in the Figure 5B. However, the detail number in Figure 5C was less than 9.

    4. Reviewer #3 (Public Review):

      Summary:

      NPRL2/TUSC4 is a tumor suppressor gene whose expression is reduced in many cancers including NSCLC. This study presents a novel finding on NPRL2 gene therapy, which induces antitumor activity on aPD1-resistant tumors. Since KRAS/STK11 mutant tumors were reported to be less benefited from ICIs, this study has potential clinical application value.

      Strengths:

      This work uncovers the advantage of NPRL2 gene therapy by using humanized models and multiple cell lines. Moreover, via immune cell depletion studies, the mechanism of NPRL2 gene therapy has focused on dendritic cells and CD8+T cells.

      Weaknesses:

      A major concern would be the lack of systematic, and logical rigor. This work did not present a link between apoptosis and antigen presenting induced by NPRL2 restoration. There is no evidence proving that the PI3K/AKT/mTOR signaling pathway is related to antigen presenting, which is the major reason of NPRL2 induced antitumor response. Therefore, the two parts may not support each other logically.

    1. eLife assessment

      This work proposes that positive biodiversity-ecosystem functioning relationships found in experiments have been exaggerated because commonly used statistical analyses are flawed. As an alternative, the authors suggest a new analysis based on species competitive responses. Unfortunately, the presented methods are not reproducibly described, not yet complete, and inadequate for hypothesis testing. The reviewers agreed that the authors have either misinterpreted or chosen not to take into account much of the current research literature in the field of plant competition and biodiversity research.

    2. Reviewer #1 (Public Review):

      [Editors' note: this is an overall synthesis from the Reviewing Editor in consultation with the reviewers.]

      The three reviews expand our critique of this manuscript in some depth and complementary directions. These can be synthesized in the following main points (we point out that there is quite a bit more that could be written about the flaws with this study; however, time constraints prevented us from further elaborating on the issues we see):

      (1) It is unclear what the authors want to do. It seems their main point is that the large BEF literature and especially biodiversity experiments overstate the occurrence of positive biodiversity effects because some of these can result from competition. Because reduced interspecific relative to intraspecific competition in mixture is sufficient to produce positive effects in mixtures (if interspecific competition = 0 then RYT = S, where S is species richness in mixture -- this according to the reciprocal yield law = law of constant final yield), they have a problem accepting NE > 0 as true biodiversity effect (see additive partitioning method of Loreau & Hector 2001 cited in manuscript).

      (2) The authors' next claim, without justification, that additive partitioning of NE is flawed and theoretically and biologically meaningless. They misinterpret the CE component as biological niche partitioning and the SE component as biological dominance. They do not seem to accept that the additive partitioning is a logically and mathematically sound derivation from basic principles that cannot be contested.

      (3) The authors go on to introduce a method to calculate species-level overyielding (RY > 1/S in replacement series experiments) as a competitive growth response and multiply this with the species monoculture biomass relative to the maximum to obtain competitive expectation. This method is based on resource competition and the idea that resource uptake is fully converted into biomass (instead of e.g. investing it in allelopathic chemical production).

      (4) It is unclear which experiments should be done, i.e. are partial-density monocultures planted or simply calculated from full-density monocultures? At what time are monocultures evaluated? The framework suggests that monocultures must have the full potential to develop, but in experiments, they are often performing very poorly, at least after some time. I assume in such cases the monocultures could not be used.

      (5) There are many reasons why the ideal case of only resource competition playing a role is unrealistic. This excludes enemies but also differential conversion factors of resources into biomass and antagonistic or facilitative effects. Because there are so many potential reasons for deviations from the null model of only resource competition, a deviation from the null model does not allow conclusions about underlying mechanisms.

      Furthermore, this is not a systematically developed partitioning, but some rather empirical ad hoc formulation of a first term that is thought to approximate competitive effects as understood by the authors (but again, there already are problems here). The second residual term is not investigated. For a proper partitioning approach, one would have to decompose overyielding into two (or more) terms and demonstrate (algebraically) that under some reasonable definitions of competitive and non-competitive interactions, these end up driving the respective terms.

      (6) Using a simplistic simulation to test the method is insufficient. For example, I do not see how the simulation includes a mechanism that could create CE in additive partitioning if all species would have the same monoculture yield. Similarly, they do not include mechanisms of enemies or antagonistic interactions (e.g. allelopathy).

      (7) The authors do not cite relevant literature regarding density x biodiversity experiments, competition experiments, replacement-series experiments, density-yield experiments, additive partitioning, facilitation, and so on.

      Overall, this manuscript does not lead further from what we have already elaborated in the broad field of BEF and competition studies and rather blurs our understanding of the topic.

    3. Reviewer #2 (Public Review):

      This manuscript is motivated by the question of what mechanisms cause overyielding in mixed-species communities relative to the corresponding monocultures. This is an important and timely question, given that the ultimate biological reasons for such biodiversity effects are not fully understood.

      As a starting point, the authors discuss the so-called "additive partitioning" (AP) method proposed by Loreau & Hector in 2001. The AP is the result of a mathematical rearrangement of the definition of overyielding, written in terms of relative yields (RY) of species in mixtures relative to monocultures. One term, the so-called complementarity effect (CE), is proportional to the average RY deviations from the null expectations that plants of both species "do the same" in monocultures and mixtures. The other term, the selection effect (SE), captures how these RY deviations are related to monoculture productivity. Overall, CE measures whether relative biomass gains differ from zero when averaged across all community members, and SE, whether the "relative advantage" species have in the mixture, is related to their productivity. In extreme cases, when all species benefit, CE becomes positive. When large species have large relative productivity increases, SE becomes positive. This is intuitively compatible with the idea that niche complementarity mitigates competition (CE>0), or that competitively superior species dominate mixtures and thereby driver overyielding (SE>0).

      However, it is very important to understand that CE and SE capture the "statistical structure" of RY that underlies overyielding. Specifically, CE and SE are not the ultimate biological mechanisms that drive overyielding, and never were meant to be. CE also does not describe niche complementarity. Interpreting CE and SE as directly quantifying niche complementarity or resource competition, is simply wrong, although it sometimes is done. The criticism of the AP method thus in large part seems unwarranted. The alternative methods the authors discuss (lines 108-123) are based on very similar principles.

      The authors now set out to develop a method that aims at linking response patterns to "more true" biological mechanisms.

      Assuming that "competitive dominance" is key to understanding mixture productivity, because "competitive interactions are the predominant type of interspecific relationships in plants", the authors introduce "partial density" monocultures, i.e. monocultures that have the same planting density for a species as in a mixture. The idea is that using these partial density monocultures as a reference would allow for isolating the effect of competition by the surrounding "species matrix".

      The authors argue that "To separate effects of competitive interactions from those of other species interactions, we would need the hypothesis that constituent species share an identical niche but differ in growth and competitive ability (i.e., absence of positive/negative interactions)." - I think the term interaction is not correctly used here, because clearly competition is an interaction, but the point made here is that this would be a zero-sum game.

      The authors use the ratio of productivity of partial density and full-density monocultures, divided by planting density, as a measure of "competitive growth response" (abbreviated as MG). This is the extra growth a plant individual produces when intraspecific competition is reduced.

      Here, I see two issues: first, this rests on the assumption that there is only "one mode" of competition if two species use the same resources, which may not be true, because intraspecific and interspecific competition may differ. Of course, one can argue that then somehow "niches" are different, but such a niche definition would be very broad and go beyond the "resource set" perspective the authors adopt. Second, this value will heavily depend on timing and the relationship between maximum initial growth rates and competitive abilities at high stand densities.

      The authors then progress to define relative competitive ability (RC), and this time simply uses monoculture biomass as a measure of competitive ability. To express this biomass in a standardized way, they express it as different from the mean of the other species and then divide by the maximum monoculture biomass of all species.

      I have two concerns here: first, if competitive ability is the capability of a species to preempt resources from a pool also accessed by another species, as the authors argued before, then this seems wrong because one would expect that a species can simply be more productive because it has a broader niche space that it exploits. This contradicts the very narrow perspective on competitive ability the authors have adopted. This also is difficult to reconcile with the idea that specialist species with a narrow niche would outcompete generalist species with a broad niche. Second, I am concerned by the mathematical form. Standardizing by the maximum makes the scaling dependent on a single value.

      As a final step, the authors calculate a "competitive expectation" for a species' biomass in the mixture, by scaling deviations from the expected yield by the product MG ⨯ RC. This would mean a species does better in a mixture when (1) it benefits most from a conspecific density reduction, and (2) has a relatively high biomass.

      Put simply, the assumption would be that if a species is productive in monoculture (high RC), it effectively does not "see" the competitors and then grows like it would be the sole species in the community, i.e. like in the partial density monoculture.

      Overall, I am not very convinced by the proposed method.

      (1) The proposed method seems not very systematic but rather "ad hoc". It also is much less a partitioning method than the AP method because the other term is simply the difference. It would be good if the authors investigated the mathematical form of this remainder and explored its properties.. when does complementarity occur? Would it capture complementarity and facilitation?

      (2) The justification for the calculation of MG and RC does not seem to follow the very strict assumptions of what competition (in the absence of complementarity) is. See my specific comments above.

      (3) Overall, the manuscript is hard to read. This is in part a problem of terminology and presentation, and it would be good to use more systematic terms for "response patterns" and "biological mechanisms".

      Examples:<br /> - on line 30, the authors write that CE is used to measure "positive" interactions and SE to measure "competitive interactions", and later name "positive" and "negative" interactions "mechanisms of species interactions". Here the authors first use "positive interaction" as any type of effect that results in a community-level biomass gain, but then they use "interaction" with reference to specific biological mechanisms (e.g. one species might attract a parasite that infests another species, which in turn may cause further changes that modify the growth of the first and other species).

      - on line 70, the authors state that "positive interaction" increases productivity relative to the null expectation, but it is clear that an interaction can have "negative" consequences for one interaction partner and "positive" ones for the other. Therefore, "positive" and "negative" interactions, when defined in this way, cannot be directly linked to "resource partitioning" and "facilitation", and "species interference" as the authors do. Also, these categories of mechanisms are still simple. For example, how do biotic interactions with enemies classify, see above?

      - line 145: "Under the null hypothesis, species in the mixture are assumed to be competitively equivalent (i.e., absence of interspecific interactions)". This is wrong. The assumption is that there are interspecific interactions, but that these are the same as the intraspecific ones. Weirdly, what follows is a description of the AP method, which does not belong here. This paragraph would better be moved to the introduction where the AP method is mentioned. Or omitted, since it is basically a repetition of the original Loreau & Hector paper.

      Other points:

      - line 66: community productivity, not ecosystem productivity.<br /> - line 68: community average responses are with respect to relative yields - this is important!<br /> - line 64: what are "species effects of species interactions" ?<br /> - line 90: here "competitive" and "productive" are mixed up, and it is important to state that "suffers more" refers to relative changes, not yield changes.<br /> - line 92: "positive effect of competitive dominance": I don't understand what is meant here.

    4. Reviewer #3 (Public Review):

      Summary:

      This manuscript by Tao et al. reports on an effort to better specify the underlying interactions driving the effects of biodiversity on productivity in biodiversity experiments. The authors are especially concerned with the potential for competitive interactions to drive positive biodiversity-ecosystem functioning relationships by driving down the biomass of subdominant species. The authors suggest a new partitioning schema that utilizes a suite of partial density treatments to capture so-called competitive ability. While I agree with the authors that understanding the underlying drivers of biodiversity-ecosystem functioning relationships is valuable - I am unsure of the added value of this specific approach for several reasons.

      Strengths:

      I can find a lot of value in endeavouring to improve our understanding of how biodiversity-ecosystem functioning relationships arise. I agree with the authors that competition is not well integrated into the complementarity and selection effect and interrogating this is important.

      Weaknesses:

      (1) The authors start the introduction very narrowly and do not make clear why it is so important to understand the underlying mechanisms driving biodiversity-ecosystem functioning relationships until the end of the discussion.

      (2) The authors criticize the existing framework for only incorporating positive interactions but this is an oversimplification of the existing framework in several ways:<br /> a. The existing partitioning scheme incorporates resource partitioning which is an effect of competition.<br /> b. The authors neglect the potential that negative feedback from species-specific pests and pathogens can also drive positive BEF and complementarity effects but is not a positive interaction, necessarily. This is discussed in Schnitzer et al. 2011, Maron et al. 2011, Hendriks et al. 2013, Barry et al. 2019, etc.<br /> c. Hector and Loreau (and many of the other citations listed) do not limit competition to SE because resource partitioning is a byproduct of competition.

      (3) It is unclear how this new measure relates to the selection effect, in particular. I would suggest that the authors add a conceptual figure that shows some scenarios in which this metric would give a different answer than the traditional additive partition. The example that the authors use where a dominant species increases in biomass and the amount that it increases in biomass is greater than the amount of loss from it outcompeting a subdominant species is a general example often used for a selection effect when exactly would you see a difference between the two? :<br /> a. Just a note - I do think you should see a difference between the two if the species suffers from strong intraspecific competition and has therefore low monoculture biomass but this would tend to also be a very low-density monoculture in practice so there would potentially be little difference between a low density and high-density monoculture because the individuals in a high-density monoculture would die anyway. So I am not sure that in practice you would really see this difference even if partial density plots were incorporated.

      (4) One of the tricky things about these endeavors is that they often pull on theory from two different subfields and use similar terminology to refer to different things. For example - in competition theory, facilitation often refers to a positive relative interaction index (this seems to be how the authors are interpreting this) while in the BEF world facilitation often refers to a set of concrete physical mechanisms like microclimate amelioration. The truth is that both of these subfields use net effects. The relative interaction index is also a net outcome as is the complementarity effect even if it is only a piece of the net biodiversity effect. Trying to combine these two subfields to come up with a new partitioning mechanism requires interrogating the underlying assumptions of both subfields which I do not see in this paper.

      (5) The partial density treatment does not isolate competition in the way that the authors indicate. All of the interactions that the authors discuss are density-dependent including the mechanism that is not discussed (negative feedback from species-specific pests and pathogens). These partial density treatment effects therefore cannot simply be equated to competition as the authors indicate.:<br /> a. Additionally - the authors use mixture biomass as a stand-in for competitive ability in some cases but mixture biomass could also be determined by the degree to which a plant is facilitated in the mixture (for example).

      (6) I found the literature citation to be a bit loose. For example, the authors state that the additive partition is used to separate positive interactions from competition (lines 70-76) and cite many papers but several of these (e.g. Barry et al. 2019) explicitly do not say this.

      (7) The natural take-home message from this study is that it would be valuable for biodiversity experiments to include partial density treatments but I have a hard time seeing this as a valuable addition to the field for two reasons:<br /> a. In practice - adding in partial density treatments would not be feasible for the vast majority of experiments which are already often unfeasibly large to maintain.<br /> b. The density effect would likely only be valuable during the establishment phase of the experiment because species that are strongly limited by intraspecific competition will die in the full-density plots resulting in low-density monocultures. You can see this in many biodiversity experiments after the first years. Even though they are seeded (or rarely planted) at a certain density, the density after several years in many monocultures is quite low.

    5. Reviewer #4 (Public Review):

      Summary:

      This manuscript claims to provide a new null hypothesis for testing the effects of biodiversity on ecosystem functioning. It reports that the strength of biodiversity effects changes when this different null hypothesis is used. This main result is rather inevitable. That is, one expects a different answer when using a different approach. The question then becomes whether the manuscript's null hypothesis is both new and an improvement on the null hypothesis that has been in use in recent decades.

      Strengths:

      In general, I appreciate studies like this that question whether we have been doing it all wrong and I encourage consideration of new approaches.

      Weaknesses:

      Despite many sweeping critiques of previous studies and bold claims of novelty made throughout the manuscript, I was unable to find new insights. The manuscript fails to place the study in the context of the long history of literature on competition and biodiversity and ecosystem functioning. The Introduction claims the new approach will address deficiencies of previous approaches, but after reading further I see no evidence that it addresses the limitations of previous approaches noted in the Introduction. Furthermore, the manuscript does not reproducibly describe the methods used to produce the results (e.g., in Table 1) and relies on simulations, claiming experimental data are not available when many experiments have already tested these ideas and not found support for them. Finally, it is unclear to me whether rejecting the 'new' null hypothesis presented in the manuscript would be of interest to ecologists, agronomists, conservationists, or others. I will elaborate on each of these points below.

      The critiques of biodiversity experiments and existing additive partitioning methods are overstated, as is the extent to which this new approach addresses its limitations. For example, the critique that current biodiversity experiments cannot reveal the effects of species interactions (e.g., lines 37-39) isn't generally true, but it could be true if stated more specifically. That is, this statement is incorrect as written because comparisons of mixtures, where there are interspecific and intraspecific interactions, with monocultures, where there are only intraspecific interactions, certainly provide information about the effects of species interactions (interspecific interactions). These biodiversity experiments and existing additive partitioning approaches have limits, of course, for identifying the specific types of interactions (e.g., whether mediated by exploitative resource competition, apparent competition, or other types of interactions). However, the approach proposed in this manuscript gets no closer to identifying these specific mechanisms of species interactions. It has no ability to distinguish between resource and apparent competition, for example. Thus, the motivation and framing of the manuscript do not match what it provides. I believe the entire Introduction would need to be rewritten to clarify what gap in knowledge this proposed approach is addressing and what would be gained by filling this knowledge gap.

      I recommend that the Introduction instead clarify how this study builds on and goes beyond many decades of literature considering how competition and biodiversity effects depend on density. This large literature is insufficiently addressed in this manuscript. This fails to give credit to previous studies considering these ideas and makes it unclear how this manuscript goes beyond the many previous related studies. For example, see papers and books written by de Wit, Harper, Vandermeer, Connolly, Schmid, and many others. Also, note that many biodiversity experiments have crossed diversity treatments with a density treatment and found no significant effects of density or interactions between density and diversity (e.g., Finn et al. 2013 Journal of Applied Ecology). Thus, claiming that these considerations of density are novel, without giving credit to the enormous number of previous studies considering this, is insufficient.

      Replacement series designs emerged as a consensus for biodiversity experiments because they directly test a relevant null hypothesis. This is not to say that there are no other interesting null hypotheses or study designs, but one must acknowledge that many designs and analyses of biodiversity experiments have already been considered. For example, Schmid et al. reviewed these designs and analyses two decades ago (2002, chapter 6 in Loreau et al. 2002 OUP book) and the overwhelming consensus in recent decades has been to use a replacement series and test the corresponding null hypothesis.

      It is unclear to me whether rejecting the 'new' null hypothesis presented in the manuscript would be of interest to ecologists, agronomists, conservationists, or others. Most biodiversity experiments and additive partitions have tested and quantified diversity effects against the null hypothesis that there is no difference between intraspecific and interspecific interactions. If there was no less competition and no more facilitation in mixtures than in monocultures, then there would be no positive diversity effects. Rejecting this null hypothesis is relevant when considering coexistence in ecology, overyielding in agronomy, and the consequences of biodiversity loss in conservation (e.g., Vandermeer 1981 Bioscience, Loreau 2010 Princeton Monograph). This manuscript proposes a different null hypothesis and it is not yet clear to me how it would be relevant to any of these ongoing discussions of changes in biodiversity.

      The claim that all previous methods 'are not capable of quantifying changes in ecosystem productivity by species interactions and species or community level' is incorrect. As noted above, all approaches that compare mixtures, where there are interspecific interactions, to monocultures, where there are no species interactions, do this to some extent. By overstating the limitations of previous approaches, the manuscript fails to clearly identify what unique contribution it is offering, and how this builds on and goes beyond previous work.

      The manuscript relies on simulations because it claims that current experiments are unable to test this, given that they have replacement series designs (lines 128-131). There are, however, dozens of experiments where the replacement series was repeated at multiple densities, which would allow a direct test of these ideas. In fact, these ideas have already been tested in these experiments and density effects were found to be nonsignificant (e.g., Finn et al. 2013).

      It seems that the authors are primarily interested in trees planted at a fixed density, with no opportunity for changes in density, and thus only changes in the size of individuals (e.g., Fig. 1). In natural and experimental systems, realized density differs from the initial planted density, and survivorship of seedlings can depend on both intraspecific and interspecific interactions. Thus, the constrained conditions under which these ideas are explored in this manuscript seem narrow and far from the more complex reality where density is not fixed.

      Additional detailed comments:

      It is unclear to me which 'effects' are referred to on line 36. For example, are these diversity effects or just effects of competition? What is the response variable?

      The usefulness of the approach is overstated on line 52. All partitioning approaches, including the new one proposed here, give the net result of many types of species interactions and thus cannot 'disentangle underlying mechanisms of species interactions.'

      The weaknesses of previous approaches are overstated throughout the manuscript, including in lines 60-61. All approaches provide some, but not all insights. Sweeping statements that previous approaches are not effective, without clarifying what they can and can't do, is unhelpful and incorrect. Also, these statements imply that the approach proposed here addresses the limitations of these previous approaches. I don't yet see how it does so.

      The definitions given for the CE and SE on line 71 are incorrect. Competition affects both terms and CE can be negative or have nothing to do with positive interactions, as noted in many of the papers cited.

      The proposed approach does not address the limitations noted on lines 73 and 74.

      The definition of positive interactions in lines 77 and 78 seems inconsistent with much of the literature, which instead focuses on facilitation or mutualism, rather than competition when describing positive interactions.

      Throughout the manuscript, competition is often used interchangeably with resource competition (e.g., line 82) and complementarity is often attributed to resource partitioning (e.g., line 77). This ignores apparent competition and partitioning enemy-free niche space, which has been found to contribute to biodiversity effects in many studies.

      In what sense are competitive interactions positive for competitive species (lines 82-83)? By definition, competition is an interaction that has a negative effect. Do you mean that interspecific competition is less than intraspecific competition? I am having a very difficult time following the logic.

      Results are asserted on lines 93-95, but I cannot find the methods that produced these results. I am unable to evaluate the work without a repeatable description of the methods.

      The description of the null hypothesis in the common additive partitioning approach on lines 145-146 is incorrect. In the null case, it does not assume that there are no interspecific interactions, but rather that interspecific and intraspecific interactions are equivalent.

    1. eLife assessment

      Here the authors present a useful extension of their previous method to cluster neuronal activity into cell assemblies (groups of neurons with correlated activity). The authors provide solid evidence that this method can identify temporal dynamics of neuronal clusters in sample simulated data, and they show how this method can be applied to whole-brain zebrafish data.

    2. Reviewer #1 (Public Review):

      Summary:

      Understanding large-scale neural activity remains a formidable challenge in neuroscience. While several methods have been proposed to discover the assemblies from such large-scale recordings, most previous studies do not explicitly model the temporal dynamics. This study is an attempt to uncover the temporal dynamics of assemblies using a tool that has been established in other domains.

      The authors previously introduced the compositional Restricted Boltzmann Machine (cRBM) to identify neuron assemblies in zebrafish brain activity. Building upon this, they now employ the Recurrent Temporal Restricted Boltzmann Machine (RTRBM) to elucidate the temporal dynamics within these assemblies. By introducing recurrent connections between hidden units, RTRBM could retrieve neural assemblies and their temporal dynamics from simulated and zebrafish brain data.

      Strengths:

      The RTRBM has been previously used in other domains. Training in the model has been already established. This study is an application of such a model to neuroscience. Overall, the paper is well-structured and the methodology is robust, the analysis is solid to support the authors' claim.

      Weaknesses:

      The overall degree of advance is very limited. The performance improvement by RTRBM compared to their cRBM is marginal, and insights into assembly dynamics are limited.

      (1) The biological insights from this method are constrained. Though the aim is to unravel neural ensemble dynamics, the paper lacks in-depth discussion on how this method enhances our understanding of zebrafish neural dynamics. For example, the dynamics of assemblies can be analyzed using various tools such as dimensionality reduction methods once we have identified them using cRBM. What information can we gain by knowing the effective recurrent connection between them? It would be more convincing to show this in real data.

      (2) Despite the increased complexity of RTRBM over cRBM, performance improvement is minimal. Accuracy enhancements, less than 1% in synthetic and zebrafish data, are underwhelming (Figure 2G and Figure 4B). Predictive performance evaluation on real neural activity would enhance model assessment. Including predicted and measured neural activity traces could aid readers in evaluating model efficacy.

    3. Reviewer #2 (Public Review):

      Summary:

      In this work, the authors propose an extension to some of the last author's previous work, where a compositional restricted Boltzmann machine was considered as a generative model of neuron-assembly interaction. They augment this model by recurrent connections between the Boltzmann machine's hidden units, which allow them to explicitly account for temporal dynamics of the assembly activity. Since their model formulation does not allow the training towards a compositional phase (as in the previous model), they employ a transfer learning approach according to which they initialise their model with a weight matrix that was pre-trained using the earlier model so as to essentially start the actually training in a compositional phase. Finally, they test this model on synthetic and actual data of whole-brain light-sheet-microscopy recordings of spontaneous activity from the brain of larval zebrafish.

      Strengths:

      This work introduces a new model for neural assembly activity. Importantly, being able to capture temporal assembly dynamics is an interesting feature that goes beyond many existing models. While this work clearly focuses on the method (or the model) itself, it opens up an avenue for experimental research where it will be interesting to see if one can obtain any biologically meaningful insights considering these temporal dynamics when one is able to, for instance, relate them to development or behaviour.

      Weaknesses:

      For most of the work, the authors present their RTRBM model as an improvement over the earlier cRBM model. Yet, when considering synthetic data, they actually seem to compare with a "standard" RBM model. This seems odd considering the overall narrative, and it is not clear why they chose to do that. Also, in that case, was the RTRBM model initialised with the cRBM weight matrix?

      A few claims made throughout the work are slightly too enthusiastic and not really supported by the data shown. For instance, when the authors refer to the clusters shown in Figure 3D as "spatially localized", this seems like a stretch, specifically in view of clusters 1, 3, and 4. Moreover, when they describe the predictive performance of their model as "close to optimal" when the down-sampling factor coincided with the interaction time scale, it seems a bit exaggerated given that it was more or less as close to the upper bound as it was to the lower bound.

      When discussing the data statistics, the authors quote correlation values in the main text. However, these do not match the correlation values in the figure to which they seem to belong. Now, it seems that in the main text, they consider the Pearson correlation, whereas in the corresponding figure, it is the Spearman correlation. This is very confusing, and it is not really clear as to why the authors chose to do so.

      Finally, when discussing the fact that the RTRBM model outperforms the cRBM model, the authors state it does so for different moments and in different numbers of cases (fish). It would be very interesting to know whether these are the same fish or always different fish.

    4. Reviewer #3 (Public Review):

      With ever-growing datasets, it becomes more challenging to extract useful information from such a large amount of data. For that, developing better dimensionality reduction/clustering methods can be very important to make sense of analyzed data. This is especially true for neuroscience where new experimental advances allow the recording of an unprecedented number of neurons. Here the authors make a step to help with neuronal analyses by proposing a new method to identify groups of neurons with similar activity dynamics. I did not notice any obvious problems with data analyses here, however, the presented manuscript has a few weaknesses:

      (1) Because this manuscript is written as an extension of previous work by the same authors (van der Plas et al., eLife, 2023), thus to fully understand this paper it is required to read first the previous paper, as authors often refer to their previous work for details. Similarly, to understand the functional significance of identified here neuronal assemblies, it is needed to go to look at the previous paper.

      (2) The problem of discovering clusters in data with temporal dynamics is not unique to neuroscience. Therefore, the authors should also discuss other previously proposed methods and how they compare to the presented here RTRBM method. Similarly, there are other methods using neural networks for discovering clusters (assemblies) (e.g. t-SNE: van der Maaten & Hinton 2008, Hippocluster: Chalmers et al. 2023, etc), which should be discussed to give better background information for the readers.

      (3) The above point to better describe other methods is especially important because the performance of the presented here method is not that much better than previous work. For example, RTRBM outperforms the cRBM only on ~4 out of 8 fish datasets. Moreover, as the authors nicely described in the Limitations section this method currently can only work on a single time scale and clusters have to be estimated first with the previous cRBM method. Thus, having an overview of other methods which could be used for similar analyses would be helpful.

    1. Reviewer #1 (Public Review):

      Summary

      A novel statistical model of neural population activity called the Random Projection model has been recently proposed. Not only is this model accurate, efficient, and scalable, but also is naturally implemented as a shallow neural network. This work proposes a new class of RP model called the reshaped RP model. Inheriting the virtue of the original RP model, the proposed model is more accurate and efficient than the original, as well as compatible with various biological constraints. In particular, the authors have demonstrated that normalizing the total synaptic input in the reshaped model has a homeostatic effect on the firing rates of the neurons, resulting in even more efficient representations with equivalent computational accuracy. These results suggest that synaptic normalization contributes to synaptic homeostasis as well as efficiency in neural encoding.

      Strengths<br /> This paper demonstrates that the accuracy and efficiency of the random projection models can be improved by extending the model with reshaped projections. Furthermore, it broadens the applicability of the model under biological constraints of synaptic regularization. It also suggests the advantage of the sparse connectivity structure over the fully connected model for modeling spiking statistics. In summary, this work successfully integrates two different elements, statistical modeling of the spikes and synaptic homeostasis in a single biologically plausible neural network model. The authors logically demonstrate their arguments with clear visual presentations and well-structured text, facilitating an unambiguous understanding for readers.

      Weaknesses<br /> It would be helpful if the following issues about the major claims of the manuscript could be expanded and/or clarified:

      (1) We find it interesting that the reshaped model showed decreased firing rates of the projection neurons. We note that maximizing the entropy <-ln p(x)> with a regularizing term -\lambda <\sum _i f(x_i)>, which reflects the mean firing rate, results in \lambda _i = \lambda for all i in the Boltzmann distribution. In other words, in addition to the homeostatic effect of synaptic normalization which is shown in Figures 3B-D, setting all \lambda_i = 1 itself might have a homeostatic effect on the firing rates. It would be better if the contribution of these two homeostatic effects be separated. One suggestion is to verify the homeostatic effect of synaptic normalization by changing the value of \lambda.

      (2) As far as we understand, \theta_i (thresholds of the neurons) are fixed to 1 in the article. Optimizing the neural threshold as well as synaptic weights is a natural procedure (both biologically and engineeringly), and can easily be computed by a similar expression to that of a_ij (equation 3). Do the results still hold when changing \theta _i is allowed as well? For example,

      a. If \theta _i becomes larger, the mean firing rates will decrease. Does the backprop model still have higher firing rates than the reshaped model when \theta _i are also optimized?

      b. Changing \theta _i affects the dynamic range of the projection neurons, thus could modify the effect of synaptic constraints. In particular, does it affect the performance of the bounded model (relative to the homeostatic input models)?

      (3) In Figure 1, the authors claim that the reshaped RP model outperforms the RP model. This improved performance might be partly because the reshaped RP model has more parameters to be optimized than the RP model. Indeed, let the number of projections N and the in-degree of the projections K, then the RP model and the reshaped RP model have N and KN parameters, respectively. Does the reshaped model still outperform the original one when only (randomly chosen) N weights (out of a_ij) are allowed to be optimized and the rest is fixed? (or, does it still outperform the original model with the same number of optimized parameters (i.e. N/K neurons)?)

      (4) In Figure 2, the authors have demonstrated that the homeostatic synaptic normalization outperforms the bounded model when the allowed synaptic cost is small. One possible hypothesis for explaining this fact is that the optimal solution lies in the region where only a small number of |a_ij| is large and the rest is near 0. If it is possible to verify this idea by, for example, exhibiting the distribution of a_ij after optimization, it would help the readers to better understand the mechanism behind the superiority of the homeostatic input model.

      (5) In Figures 5D and 5E, the authors present how different reshaping constraints result in different learning processes ("rotation"). We find these results quite intriguing, but it would help the readers understand them if there is more explanation or interpretation. For example,

      a. In the "Reshape - Hom. circuit 4.0" plot (Fig 5D, upper-left), the rotation angle between the two models is almost always the same. This is reasonable since the Homeostatic Circuit model is the least constrained model and could be almost irrelevant to the optimization process. Is there any similar interpretation to the other 3 plots of Figure 5D?

      b. In Figure 5E, is there any intuitive explanation for why the three models take minimum rotation angle at similar global synaptic cost (~0.3)?

    1. eLife assessment

      The paper characterized a specific defect in the spatial working memory of mice with a deficit in a protein called Rac1. Rac1 inhibition was limited to the presynaptic compartment of neurons, which is significant because past work has inhibited both pre- and postsynaptic compartments. The study also identified potential effectors of Rac1. The work is important for these reasons, and the strength of the evidence is exceptional.

    2. Reviewer #1 (Public Review):

      - A summary of what the authors were trying to achieve:

      The authors focused on Rac1, one of the most extensively studied members of the Ras superfamily of small GTPases, an intracellular signal transducer that remodels actin and phosphorylation signaling networks. They performed an extensive series of behavioral tests and found a striking result of selectively inhibiting presynaptic Rac1. Previous studies have made the claim that Rac1-mediated signaling is associated with hippocampal-dependent working memory and longer-term forms of learning and memory. Rac1 was known to modulate both pre- and postsynaptic plasticity. What was missing was selective manipulation of Rac1 function at either pre- or postsynaptic loci. Kim, Soderling, and colleagues showed that following the expression of a genetically encoded Rac1-inhibitor at presynaptic terminals, spatial working memory is selectively impaired. In contrast, Rac1 inhibition at postsynaptic sites spared the spatial working memory but affected longer-term cognitive processes.

      - An account of the major strengths and weaknesses of the methods and results:

      This paper is part of an ambitious research trajectory, presented in multiple rigorous studies, that combines hypothesis-free fishing for candidate signal transduction elements with precise testing of physiological and behavioral outcomes. Each of these arenas has challenges and pitfalls. This paper contains punchlines in both behavioral and cell biological areas. The effect of presynaptic Rac1 inhibition on short-term behavioral memory was convincingly demonstrated with three different behavioral tests, including a quite striking result on delayed non-matching to place task. I found the claim of a specific effect on working memory more convincing here than in previous work. On the other hand, the authors sought to clarify the presynaptic regulatory mechanisms, leveraging new advances in mass spectrometry to identify the proteomic and post-translational landscape of presynaptic Rac1 signaling. They identified particular serine/threonine kinases and phosphorylated cytoskeletal signaling and synaptic vesicle proteins that became enriched with active Rac1. They argued that phosphorylated sites in these proteins are at positions likely to have regulatory effects on synaptic vesicles. They found changes in the distribution and morphology of synaptic vesicles following presynaptic Rac1 inhibition. They also report a postsynaptic consequence, a slightly increased spine cross-sectional area.

      - An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The selective agent is the Rac1-inhibiting polypeptide W56; W56 is fused to a protein with specific subcellular localizations in neurons. Hedrick, Yasuda, et al., 2016 showed that this kind of strategy enabled a spatially targeted inhibitory effect. Collaborating with Yasuda, O'Neil in Soderling's group previously reported that Rac1 negatively regulates synaptic vesicle replenishment at both excitatory and inhibitory synapses.

      In the current study by Kim et al., the goal is to interfere with Rac1 function in vivo. Once again, as in O'Neil, the functional intervention was to virally express a W56 peptide, fused to synapsin, a protein with specific subcellular localization-in this case presynaptic. The key control was to compare the effect of W56 with a scrambled sequence (Scr) in the negative control group. As verification of presynaptic efficacy, Kim found that W56-pre makes vesicles larger and further from the active zone without changing overall bouton morphology. Fresh fishing with MassSpec suggests that presynaptic vesicle proteins are affected.

      I am convinced that the presynaptic Rac1 function was successfully tweaked and that this had an effect on working memory tested with 5 s intertrial intervals, in a time range where the field is hard-pressed to find robust cell biological mechanisms for memory storage. (Ion channel dynamics are an alternative, but the focus here was on cytoskeletal, not plasma membrane proteins). What was missing was a direct index of vesicle dynamics or an explanation of why a hypothetical alteration in vesicle dynamics shows up as a change in vesicle size or location. The summarizing scheme is necessarily vague; it lacks specific details about how the effect on working memory occurs, or whether it involves excitatory as opposed to inhibitory nerve terminals.

      - A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community:

      This study reveals a previously unrecognized presynaptic role of Rac1 signaling in cognitive processes and provides insights into its potential regulatory mechanisms.

      An outside observer might appreciate evidence that clearly shows that pivotal cytoskeletal cell biology is not the exclusive monopoly of either side of the synaptic cleft.

      - Any additional context you think would help readers interpret or understand the significance of the work:

      --Overall, it shows off the art of combining fishing with causal experiments, parallel to Steve Marx's work on L-type calcium channel modulation (Nature).

      --Multiple mutations associated with human neurodevelopmental and psychiatric disorders involve genes that encode regulators of the synaptic cytoskeleton. A major, unresolved question is how the disruption of specific actin filament structures leads to the onset and progression of complex synaptic and behavioral phenotypes.

      --The formation of long actin filaments along the axon's longitudinal axis is relevant to the sharing of synaptic vesicles amongst multiple boutons in so-called vesicle superpools (Chenouard & Tsien, NatComm)

    3. Reviewer #2 (Public Review):

      Summary:

      The paper described a behavioural characterisation of mice with presynaptically-inhibited Rac1 in the hippocampus. This is followed by a BioID and phosphoproteomic analysis of Rac1, highlighting potential downstream effectors of active or non-active Rac1 and potential downstream phosphorylated targets.

      Strengths:

      An original molecular approach that has been established in a previous paper by the authors (PMID 34269176) to block Rac1 function exclusively at the presynapse is now utilised to characterise a link between presynaptic dysfunction and mouse behavior. The experiments and the data well-support the conclusion that the function of Rac1 has distinct outcomes on mouse behavior, depending on its site of action.

      Weaknesses:

      A main limitation of the study is that it lacks physiological and biochemical analysis to follow up on hits identified in a BioID and phosphoprotemic analysis of presynaptic active and non-active Rac1 variants.

    1. eLife assessment

      This study provides important information about the formation of ribbon synapses in mouse cochlear hair cells, which facilitate the temporally-precise transmission of acoustic information to the auditory nerve. Live-cell imaging provides compelling evidence that ribbon precursor volume is dynamically modified by fission and fusion events on microtubules, but some of the other evidence included, particularly in relation to the directed transport of these precursors to the hair cell active zone is incomplete. These findings will be of interest to neuroscientists studying synapse formation and function and should inspire further research into the molecular basis for synaptic ribbon maturation.

    2. Reviewer #1 (Public Review):

      Summary

      The manuscript by Voorn and collaborators aims at deciphering the microtubule-dependent ribbon formation in mouse hair cells. Using STED/confocal imaging, pharmacology tools, and mouse mutant, the group of Christian Vogl convincingly demonstrated that ribbon, the organelle that tethers vesicles at the hair cell synapse, results from the fusion and fission of ribbon precursors, moving along the microtubule network. This study goes hand in hand with a complementary paper (Hussain et al.) showing similar findings in zebrafish hair cells.

      Strengths

      This study demonstrated i) the motion of ribbons precursors along the microtubules, ii) ribbons precursors undergo multiple cycles of fusion-fission events and iii) kinesin Kif1a is critical for synaptic maturation. The results are solid and the images are mesmeric.

      Weaknesses

      As stated by the authors in the discussion, the mechanism underlying the threshold shift in the Kif1a mutant is unclear and may not be solely attributed to the reduction of the ribbon volume.

      Impact

      The synaptogenesis in the auditory sensory cell remains still elusive. Here, this study shows a high plasticity in the synaptogenesis. Indeed, the formation of the synaptic organelle is a dynamic process consisting of several rounds of fusion-fission of presynaptic elements. This study will undoubtedly boost a new line of research aimed at identifying the specific molecular determinants that target ribbon precursors to the synapse and govern the fusion-fission process.

    3. Reviewer #2 (Public Review):

      Summary

      This manuscript makes use of live cell imaging to look at aggregates of the synaptic ribbon protein ribeye to explore synapse formation in an organotypic culture system. The authors find that microtubule disruption influences the motion of a subset of ribeye spots and changes to ribbon volume. Disruption of the microtubule motor is also found to change ribeye motion and ribbon volume, albeit in the opposite direction. Together these results support a role for microtubule-based transport in synapse assembly.

      Strengths

      (1) The use of the in vitro imaging approach provides a method for high-quality live cell imaging in a mammalian preparation.

      (2) The data characterizing the movement of Ribeye in the cochlea is new and exciting.

      (3) The role of motors in the delivery of Ribeye to the synapse had never been established. The effects of nocodozole on directional asymmetry for the subset of slow-moving particles are convincing, though it is unclear to this reviewer how frequently these objects undergo directed motion.

      (4) The effect of Kif1a on ribbon size is an interesting finding that doesn't rely on overexpression and supports the importance of motors on the delivery of ribeye to the synapse.

      Weaknesses

      (1) The analysis leaves unclear what fraction of ribeye spots make use of active transport mechanisms. The authors make the claim that 54% underwent targeted transport because fits of their MSD vs time were best-fit by an exponent >1. This overstates the reliability of this approach. Purely diffusive motion will not always fit perfectly with an exponent of exactly 1 and one would expect roughly to have to have greater than 1 and half less than one, which is what they observe. In point of fact, truly directed transport should have an exponent near 2 (Figure 2F), which only a handful of spots seem to exhibit. I should also note that none of the examples look like those that are typically associated with directed motion.

      (2) The imaging approach makes use of viral expression using a non-Ribeye promoter. This overexpression approach will likely exaggerate the number of ribeye spots and could saturate binding to other proteins or other factors. Also, the promoters aren't under the control of feedback mechanisms that would typically turn off expression at the appropriate time.

      (3) The effect of Kif1A removal on the ABR threshold is very unlikely to be due to ribbon size. Complete removal of the ribbon only has a modest effect on the ABR threshold, so these modest reductions in size are unlikely to contribute much.

      (4) Fusion and fission of small aggregates are difficult to resolve with light microscopy and the examples provided in Figure 3 are indistinguishable from two spots that happen to be too close to each other to resolve.

      5) The "slight left shift" in the velocity distribution in Figure 5C does not look significant. Is it?

      6) Nocodozole and elimination of Kif1a have opposite effects on ribbon volume, which might point to alternative roles for the microtubules.

    4. Reviewer #3 (Public Review):

      Summary

      In this study, the authors addressed the question of how synaptic ribbons-specialized, electron-dense presynaptic structures-are formed from ribbon precursors in sensory hair cells. Specifically, the authors evaluated whether molecular motor-driven, microtubule-based transport plays a role in the directed transport of ribbon precursors to the active zone of cochlear hair cells and assessed whether there was a specific role for the microtubule motor Kinesin Family Member 1A (Kif1a). Using live imaging of cochlear explants and fixed images of both mature and developing cochlea, they provide evidence that ribbon precursors are actively transported on microtubules, that ribbon precursor volume is dynamically modified by fission and fusion events on microtubules, and that Kif1a plays a role in synaptic ribbon maturation.

      Strengths

      Overall, the data presented in this study support that the fission and fusion of ribbon precursors are dependent on microtubule-based translocation, and this dynamic assembly of precursors may involve Kif1a. Live-imaging data and analysis provide strong evidence for microtubule-based transport contributing to dynamic fission-fusion events of ribbon precursors. Further, fixed image analysis of Kif1a mutants supports that it plays a key role in synaptic ribbon maturation.

      Weaknesses

      While the authors clearly established the polarity and stability of microtubules in hair cells, they did not assess the net direction of putative slow microtubule-based movement (i.e. the ratios of plus to minus end-directed travel) in their analysis of ribbon precursor displacement. This information is critical in establishing a role for microtubule-based transport in localizing ribbon precursors to the active zones in the basolateral region of hair cells to form presynaptic ribbons. In addition, the discussion section did not elaborate on what is known about the coordination of molecular motor proteins during microtubule-based transport nor did it effectively incorporate the interpretation of the results with what has been described in previous studies on intracellular transport and the roles of Kif1a in synaptic vesicle precursor trafficking.

    1. eLife assessment

      This valuable study investigates the brain representations of Braille letters in blind participants and provides convincing evidence using EEG and fMRI that the decoding of letter identity across the reading hand takes place in the visual cortex. The evidence supporting the claims of the authors is solid, although the inclusion of a sighted control group and additional analyses would have strengthened the study. The work will be of interest to neuroscientists working on brain plasticity.

    2. Reviewer #1 (Public Review):

      Summary:

      The researchers examined how individuals who were born blind or lost their vision early in life process information, specifically focusing on the decoding of Braille characters. They explored the transition of Braille character information from tactile sensory inputs, based on which hand was used for reading, to perceptual representations that are not dependent on the reading hand.

      They identified tactile sensory representations in areas responsible for touch processing and perceptual representations in brain regions typically involved in visual reading, with the lateral occipital complex serving as a pivotal "hinge" region between them.

      In terms of temporal information processing, they discovered that tactile sensory representations occur prior to cognitive-perceptual representations. The researchers suggest that this pattern indicates that even in situations of significant brain adaptability, there is a consistent chronological progression from sensory to cognitive processing.

      Strengths:

      By combining fMRI and EEG, and focusing on the diagnostic case of Braille reading, the paper provides an integrated view of the transformation processing from sensation to perception in the visually deprived brain. Such a multimodal approach is still rare in the study of human brain plasticity and allows us to discern the nature of information processing in blind people's early visual cortex, as well as the time course of information processing in a situation of significant brain adaptability.

      Weaknesses:

      The lack of a sighted control group limits the interpretations of the results in terms of profound cortical reorganization, or simple unmasking of the architectural potentials already present in the normally developing brain. Moreover, the conclusions regarding the behavioral relevance of the sensory and perceptual representations in the putatively reorganized brain are limited due to the behavioral measurements adopted.

    3. Reviewer #2 (Public Review):

      Summary:

      Haupt and colleagues performed a well-designed study to test the spatial and temporal gradient of perceiving braille letters in blind individuals. Using cross-hand decoding of the read letters, and comparing it to the decoding of the read letter for each hand, they defined perceptual and sensory responses. Then they compared where (using fMRI) and when (using EEG) these were decodable. Using fMRI, they showed that low-level tactile responses specific to each hand are decodable from the primary and secondary somatosensory cortex as well as from IPS subregions, the insula, and LOC. In contrast, more abstract representations of the braille letter independent from the reading hand were decodable from several visual ROIs, LOC, VWFA, and surprisingly also EVC. Using a parallel EEG design, they showed that sensory hand-specific responses emerge in time before perceptual braille letter representations. Last, they used RSA to show that the behavioral similarity of the letter pairs correlates to the neural signal of both fMRI (for the perceptual decoding, in visual and ventral ROIs) and EEG (for both sensory and perceptual decoding).

      Strengths:

      This is a very well-designed study and it is analyzed well. The writing clearly describes the analyses and results. Overall, the study provides convincing evidence from EEG and fMRI that the decoding of letter identity across the reading hand occurs in the visual cortex in blindness. Further, it addresses important questions about the visual cortex hierarchy in blindness (whether it parallels that of the sighted brain or is inverted) and its link to braille reading.

      Weaknesses:

      Although I have some comments and requests for clarification about the details of the methods, my main comment is that the manuscript could benefit from expanding its discussion. Specifically, I'd appreciate the authors drawing clearer theoretical conclusions about what this data suggests about the direction of information flow in the reorganized visual system in blindness, the role VWFA plays in blindness (revised from the original sighted role or similar to it?), how information arrives to the visual cortex, and what the authors' predictions would be if a parallel experiment would be carried out in sighted people (is this a multisensory recruitment or reorganization?). The data has the potential to speak to a lot of questions about the scope of brain plasticity, and that would interest broad audiences.

      To aid in drawing even more concrete conclusions about the flow of information, I suggest that the authors also add at least another early visual ROI to plot more clearly whether EVC's response to braille letters arrives there through an inverted cortical hierarchy, intermediate stages from VWFA, or directly, as found in the sighted brain for spoken language.

      Similarly, it may be informative to look specifically at the occipital electrodes' time differences between decoding for the different parameters and their correlation to behavior.

      Regarding the methods, further detail on the ability to read with both hands equally and any residual vision of the participants would be helpful.

    1. eLife assessment

      This valuable study uses recently developed EEG analysis methods to investigate spatial distractor suppression in a combined visual search/working memory task. While the reported results are convincing, the combined task design leaves open alternative interpretations than those currently discussed in the manuscript, potentially limiting the generalisability of the findings to other task settings. The study will be of interest to cognitive neuroscientists and psychologists working on visual attention and memory.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors tested whether learning to suppress (ignore) salient distractors (e.g., a lone colored nontarget item) via statistical regularities (e.g., the distractor is more likely to appear in one location than any other) was proactive (prior to paying attention to the distractor) or reactive (only after first attending the distractor) in nature. To test between proactive and reactive suppression the authors relied on a recently developed and novel technique designed to "ping" the brain's hidden priority map using EEG inverted encoding models. Essentially, a neutral stimulus is presented to stimulate the brain, resulting in activity on a priority map which can be decoded and used to argue when this stimulation occurred (prior to or after attending to a distracting item). The authors found evidence that despite learning to suppress the high probability distractor location, the suppression was reactive, not proactive in nature.

      Overall, the manuscript is well-written, tests a timely question, and provides novel insight into a long-standing debate concerning distractor suppression.

      Strengths (in no particular order):

      (1) The manuscript is well-written, clear, and concise (especially given the complexities of the method and analyses).

      (2) The presentation of the logic and results is mostly clear and relatively easy to digest.

      (3) This question concerning whether location-based distractor suppression is proactive or reactive in nature is a timely question.

      (4) The use of the novel "pinging" technique is interesting and provides new insight into this particularly thorny debate over the mechanisms of distractor suppression.

      Weaknesses (in no particular order):

      (1) The authors tend to make overly bold claims without either A) mentioning the opposing claim(s) or B) citing the opposing theoretical positions. Further, the authors have neglected relevant findings regarding this specific debate between proactive and reactive suppression.

      (2) The authors should be more careful in setting up the debate by clearly defining the terms, especially proactive and reactive suppression which have recently been defined and were more ambiguously defined here.

      (3) There were some methodological choices that should be further justified, such as the choice of stimuli (e.g., sizes, colors, etc.).

      (4) The figures are often difficult to process. For example, the time courses are so far zoomed out (i.e., 0, 500, 100 ms with no other tick marks) that it makes it difficult to assess the timing of many of the patterns of data. Also, there is a lot of baseline period noise which complicates the interpretations of the data of interest.

      (5) Sometimes the authors fail to connect to the extant literature (e.g., by connecting to the ERP components, such as the N2pc and PD components, used to argue for or against proactive suppression) or when they do, overreach with claims (e.g., arguing suppression is reactive or feature-blind more generally).

    3. Reviewer #2 (Public Review):

      Summary:

      The authors investigate the mechanisms supporting learning to suppress distractors at predictable locations, focusing on proactive suppression mechanisms manifesting before the onset of a distractor. They used EEG and inverted encoding models (IEM). The experimental paradigm alternates between a visual search task and a spatial memory task, followed by a placeholder screen acting as a 'ping' stimulus -i.e., a stimulus to reveal how learned distractor suppression affects hidden priority maps. Behaviorally, their results align with the effects of statistical learning on distractor suppression. Contrary to the proactive suppression hypothesis, which predicts reduced memory-specific tuning of neural representations at the expected distractor location, their IEM results indicate increased tuning at the high-probability distractor location following the placeholder and prior to the onset of the search display.

      Strengths:

      Overall, the manuscript is well-written and clear, and the research question is relevant and timely, given the ongoing debate on the roles of proactive and reactive components in distractor processing. The use of a secondary task and EEG/IEM to provide a direct assessment of hidden priority maps in anticipation of a distractor is, in principle, a clever approach. The study also provides behavioral results supporting prior literature on distractor suppression at high-probability locations.

      Weaknesses:

      (1) At a conceptual level, I understand the debate and opposing views, but I wonder whether it might be more comprehensive to present also the possibility that both proactive and reactive stages contribute to distractor suppression. For instance, anticipatory mechanisms (proactive) may involve expectations and signals that anticipate the expected distractor features, whereas reactive mechanisms contribute to the suppression and disengagement of attention.

      (2) The authors focus on hidden priority maps in pre-distractor time windows, arguing that the results challenge a simple proactive view of distractor suppression. However, they do not provide evidence that reactive mechanisms are at play or related to the pinging effects found in the present paradigm. Is there a relationship between the tuning strength of CTF at the high-probability distractor location and the actual ability to suppress the distractor (e.g., behavioral performance)? Is there a relationship between CTF tuning and post-distractor ERP measures of distractor processing? While these may not be the original research questions, they emerge naturally and I believe should be discussed or noted as limitations.

      (3) How do the authors ensure that the increased tuning (which appears more as a half-split or hemifield effect rather than gradual fine-grained tuning, as shown in Figure 5) is not a byproduct of the dual-task paradigm used, rather than a general characteristic of learned attentional suppression? For example, the additional memory task and the repeated experience with the high-probability distractor at the specific location might have led to longer-lasting and more finely-tuned traces for memory items at that location compared to others.

      (4) It is unclear how IEM was performed on total vs. evoked power, compared to typical approaches of running it on single trials or pseudo-trials.

      (5) Following on point 1. What is the rationale for relating decreased (but not increased) tuning of CTF to proactive suppression? Could it be that proactive suppression requires anticipatory tuning towards the expected feature to implement suppression? In other terms, better 'tuning' does not necessarily imply a higher signal amplitude and could be observable even under signal suppression. The authors should comment on this and clarify.

      Minor:

      (1) In the Word file I reviewed, there are minor formatting issues, such as missing spaces, which should be double-checked.

      (2) Would the authors predict that proactive mechanisms are not involved in other forms of attention learning involving distractor suppression, such as habituation?

      (3) A clear description in the Methods section of how individual CTFs for each location were derived would help in understanding the procedure.

      (4) Why specifically 1024 resampling iterations?

    4. Reviewer #3 (Public Review):

      Summary:

      In this experiment, the authors use a probe method along with time-frequency analyses to ascertain the attentional priority map prior to a visual search display in which one location is more likely to contain a salient distractor.  The main finding is that neural responses to the probe indicate that the high probability location is attended, rather than suppressed, prior to the search display onset.  The authors conclude that suppression of distractors at high-probability locations is a result of reactive, rather than proactive, suppression.

      Strengths:

      This was a creative approach to a difficult and important question about attention.  The use of this "pinging" method to assess the attentional priority map has a lot of potential value for a number of questions related to attention and visual search. Here as well, the authors have used it to address a question about distractor suppression that has been the subject of competing theories for many years in the field. The paper is well-written, and the authors have done a good job placing their data in the larger context of recent findings in the field.

      Weaknesses:

      The link between the memory task and the search task could be explored in greater detail. For example, how might attentional priority maps change because of the need to hold a location in working memory? This might limit the generalizability of these findings. There could be more analysis of behavioral data to address this question. In addition, the authors could explore the role that intertrial repetition plays in the attentional priority map as these factors necessarily differ between conditions in the current design. Finally, the explanation of the CTF analyses in the results could be written more clearly for readers who are less familiar with this specific approach (which has not been used in this field much previously).

    1. eLife assessment

      This important work uses in vivo foveal cone-resolved imaging and simultaneous microscopic photostimulation to investigate the relationship between ocular drift - eye movements long thought to be random - and visual acuity. The surprising result is that ocular drift is systematic - causing the object to move to the center of the cone mosaic over the course of each perceptual trial. The tools used to reach this conclusion are state-of-the-art and the evidence presented is convincing. This work advances our understanding of the visuomotor system and the interplay of anatomy, oculomotor behavior, and visual acuity.

    2. Reviewer #1 (Public Review):

      Summary:

      This paper investigates the relationship between ocular drift - eye movements long thought to be random - and visual acuity. This is a fundamental issue for how vision works. The work uses adaptive optics retinal imaging to monitor eye movements and where a target object is in the cone photoreceptor array. The surprising result is that ocular drift is systematic - causing the object to move to the center of the cone mosaic over the course of each perceptual trial. The tools used to reach this conclusion are state-of-the-art and the evidence presented is convincing.

      Strengths

      The central question of the paper is interesting, as far as I know, it has not been answered in past work, and the approaches employed in this work are appropriate and provide clear answers.

      The central finding - that ocular drift is not a completely random process - is important and has a broad impact on how we think about the relationship between eye movements and visual perception.

      The presentation is quite nice: the figures clearly illustrate key points and have a nice mix of primary and analyzed data, and the writing (with one important exception) is generally clear.

      Weaknesses

      The handling of the Nyquist limit is confusing throughout the paper and could be improved. It is not clear (at least to me) how the Nyquist limit applies to the specific task considered. I think of the Nyquist limit as saying that spatial frequencies above a certain cutoff set by the cone spacing are being aliased and cannot be disambiguated from the structure at a lower spatial frequency. In other words, there is a limit to the spatial frequency content that can be uniquely represented by discrete cone sampling locations. Acuity beyond that limit is certainly possible with a stationary image - e.g. a line will set up a distribution of responses in the cones that it covers, and without noise, an arbitrarily small displacement of the line would change the distribution of cone responses in a way that could be resolved. This is an important point because it relates to whether some kind of active sampling or movement of the detectors is needed to explain the spatial resolution results in the paper. This issue comes up in the introduction, results, and discussion. It arises in particular in the two Discussion paragraphs starting on line 343.

      One question that came up as I read the paper was whether the eye movement parameters depend on the size of the E. In other words, to what extent is ocular drift tuned to specific behavioral tasks?

    3. Reviewer #2 (Public Review):

      Summary:

      In this work, Witten et al. assess visual acuity, cone density, and fixational behavior in the central foveal region in a large number of subjects.

      This work elegantly presents a number of important findings, and I can see this becoming a landmark work in the field. First, it shows that acuity is determined by the cone mosaic, hence, subjects characterized by higher cone densities show higher acuity in diffraction-limited settings. Second, it shows that humans can achieve higher visual resolution than what is dictated by cone sampling, suggesting that this is likely the result of fixational drift, which constantly moves the stimuli over the cone mosaic. Third, the study reports a correlation between the amplitude of fixational motion and acuity, namely, subjects with smaller drifts have higher acuities and higher cone density. Fourth, it is shown that humans tend to move the fixated object toward the region of higher cone density in the retina, lending further support to the idea that drift is not a random process, but is likely controlled. This is a beautiful and unique work that furthers our understanding of the visuomotor system and the interplay of anatomy, oculomotor behavior, and visual acuity.

      Strengths:

      The work is rigorously conducted, it uses state-of-the-art technology to record fixational eye movements while imaging the central fovea at high resolution and examines exactly where the viewed stimulus falls on individuals' foveal cone mosaic with respect to different anatomical landmarks in this region. The figures are clear and nicely packaged. It is important to emphasize that this study is a real tour-de-force in which the authors collected a massive amount of data on 20 subjects. This is particularly remarkable considering how challenging it is to run psychophysics experiments using this sophisticated technology. Most of the studies using psychophysics with AO are, indeed, limited to a few subjects. Therefore, this work shows a unique set of data, filling a gap in the literature.

      Weaknesses:

      No major weakness was noted, but data analysis could be further improved by examining drift instantaneous direction rather than start-point-end-point direction, and by adding a statistical quantification of the difference in direction tuning between the three anatomical landmarks considered.

    4. Reviewer #3 (Public Review):

      Summary:

      The manuscript by Witten et al., titled "Sub-cone visual resolution by active, adaptive sampling in the human foveola," aims to investigate the link between acuity thresholds (and hyperacuity) and retinal sampling. Specifically, using in vivo foveal cone-resolved imaging and simultaneous microscopic photostimulation, the researchers examined visual acuity thresholds in 16 volunteers and correlated them with each individual's retinal sampling capacity and the characteristics of ocular drift.

      First, the authors found that although visual acuity was highly correlated with the individual spatial arrangement of cones, for all participants, visual resolution exceeded the Nyquist sampling limit - a well-known phenomenon in the literature called hyperacuity.

      Thus, the researchers hypothesized that this increase in acuity, which could not be explained in terms of spatial encoding mechanisms, might result from exploiting the spatiotemporal characteristics of visual input, which is continuously modulated over time by eye movements even during so-called fixations (e.g., ocular drift).

      Authors reported a correlation between subjects, between acuity threshold and drift amplitude, suggesting that the visual system benefits from transforming spatial input into a spatiotemporal flow. Finally, they showed that drift, contrary to the traditional view of it as random involuntary movement, appears to exhibit directionality: drift tends to move stimuli to higher cone density areas, therefore enhancing visual resolution.

      Strengths:

      The work is of broad interest, the methods are clear, and the results are solid.

      Weaknesses:

      Literature (1/2): The authors do not appear to be aware of an important paper published in 2023 by Lin et al. (https://doi.org/10.1016/j.cub.2023.03.026), which nicely demonstrates that (i) ocular drifts are under cognitive influence, and (ii) specific task knowledge influences the dominant orientation of these ocular drifts even in the absence of visual information. The results of this article are particularly relevant and should be discussed in light of the findings of the current experiment.

      Literature (2/2): The hypothesis that hyperacuity is attributable to ocular movements has been proposed by other authors and should be cited and discussed (e.g., https://doi.org/10.3389/fncom.2012.00089, https://doi.org/10.1016/s0896-6273(01)00466-4).

      Drift Dynamic Characterization: The drift is primarily characterized as the "concatenated vector sum of all frame-wise motion vectors within the 500 ms stimulus duration.". To better compare with other studies investigating the link between drift dynamics and visual acuity (e.g., Clark et al., 2022), it would be interesting to analyze the drift-diffusion constant, which might be the parameter most capable of describing the dynamic characteristics of drift.

      Possible inconsistencies: Binocular differences are not expected based on the hypothesis; the authors may speculate a bit more about this. Additionally, the fact that hyperacuity does not occur with longer infrared wavelengths but the drift dynamics do not vary between the two conditions is interesting and should be discussed more thoroughly.

      As a Suggestion: can the authors predict the accuracy of individual participants in single trials just by looking at the drift dynamics?

    1. eLife assessment

      This manuscript aims to unravel the contribution of cholesterol to aquaporin-0 (AQP0) tetramer array formation within lens membranes. Compelling electron crystallography data are combined with solid molecular dynamics experiments to identify a specific cholesterol binding site of significance to protein clustering within lipid rafts. The important work advances our understanding of membrane biology and will be of broad interest to membrane transport biologists, biochemists, and structural biologists.

    2. Reviewer #1 (Public Review):

      Aquaporin-0 forms 2D crystals in the lens of the eye. This propensity to form 2D crystals was originally exploited to solve the structure of aquaporin-0 reconstituted in membranes. Existing structures do not explain why the proteins spontaneously form these arrays, however. In this work the authors investigate the hypothesis that the main lipids in the native membranes, sphingomyelin and cholesterol, contribute to lattice formation. By titrating the cholesterol: sphingomyelin ratio, the authors identify cholesterol binding sites of increasing stability. The authors identify a cholesterol that interacts with adjacent tetramers and is bound at an unusual membrane depth. Computational simulations suggest that this cholesterol is only stable in the context of adjacent tetramers (ie lattice formation) and that the presence of the cholesterol increases the stability of that interface. The exact mechanism is not clear, but the authors propose that the so-called "deep cholesterol" improves shape complementarity between adjacent tetramers and modulates the kinetics of protein-protein interactions. Finally, the authors provide a reasonable model for the role of cholesterol in

      Strengths of this manuscript include the analysis of multiple structures determined with different lipid compositions and lipid:cholesterol ratios. For each of these, multiple lipids can be modelled, giving a good sense of the lipid specificity at various favorable lipid binding positions. In addition, multiple hypotheses are tested in a very thorough computational analysis that provides the framework for interpreting the structural observations. The authors also provide a thorough scholarly discussion that connects their work with other studies of membrane protein-cholesterol interactions.

      The model presented by the authors is consistent with the data described.

    3. Reviewer #2 (Public Review):

      Summary:

      In the manuscript by Chiu et al., "Structure and dynamics of cholesterol-mediated aquaporin-0 arrays and implications for lipid rafts," the authors address the effect of cholesterol on array formation by AQP0. Using a combination of electron crystallography and molecular dynamics simulations, the authors show binding of a "deep" cholesterol molecule between AQP0 tetramers. Each AQP0 tetramer binds four deep cholesterols to form a crystallographic array of AQP0.

      Strengths:

      The combined approaches of electron crystallography and MD simulations under different lipid conditions (different sphingomyelin and cholesterol concentrations) are a strength of the study. The authors provide a thorough and convincing assessment of cholesterol binding, protein-protein interactions, and array formation by AQP0. The MD simulations allow the authors to consider the propensity of cholesterol to occupy the observed binding sites in the absence of crystal contacts. The combined methods and the breadth of analyses set a high standard in the field of membrane protein structural biology.

      The findings of the authors fit nicely into a growing body of literature on cholesterol binding sites that mediate membrane protein-protein interactions. Cholesterol interacts with a variety of membrane proteins via its smooth alpha face of rough beta face. AQP0 is somewhat unique in that it binds the rough face of cholesterol in a "deep" binding site that places cholesterol in the middle of the membrane bilayer. So-called "deep" cholesterol binding sites have been described for GPCRs and docking studies suggest they may exist on other ion channels and transporters. In the case of AQP0, the deep cholesterol acts as a glue that holds two tetramers together. Since each tetramer has four binding sites for deep cholesterol, the assembly and mechanical stability of an extended two-dimensional array of AQP0 tetramers is a natural consequence in lens membranes.

      Weaknesses:

      The authors report that the findings generally apply to raft formation in membranes. However, this point is less clear as the lens membrane in which AQP0 resides is rather unique in lipid and protein content and density. Nonetheless, the authors achieve the overall goal of evaluating cholesterol binding to AQP0, and there are many valuable and informative figures in the main manuscript and supplement that provide convincing results and interpretations.

    4. Reviewer #3 (Public Review):

      Summary:

      This manuscript aims to unravel the mechanisms behind Aquaporin-0 (AQP0) tetramer array formation within lens membranes. The authors utilized electron crystallography and molecular dynamics (MD) simulations to shed light on the role of cholesterol in shaping the structural organization of AQP0. The evidence suggests that cholesterol not only defines the positions and orientations of associated molecules but also plays a crucial role in stabilizing AQP0 tetramer arrays. This study provides valuable insights into the potential principles driving protein clustering within lipid rafts, advancing our understanding of membrane biology.

      In this review, I will focus on the MD simulations part, since this is my area of expertise. The authors conducted an impressive set of MD simulations aiming at understanding the role of cholesterol in structural organization of AQP0 arrays. These simulations clearly demonstrate the well-defined localization of cholesterol molecules around a single AQP0 tetramer, aligning with previous computational studies and the crystallographic structures presented in this manuscript. Interestingly, the authors identified an unusual position for one cholesterol molecule, located near the center of the lipid bilayer, which was stabilized by the adjacent AQP0 tetramers. The authors showed that these adjacent tetramers can withstand a larger lateral detachment force when deep cholesterol molecules are present at the interface compared to scenarios with sphingomyelin (SM) molecules at the interface between two AQP0 tetramers. Authors interpret that result as evidence that deep cholesterol molecules mechanically stabilize the interface of the AQP0 tetramers.

      The simple steered MD simulations are typically employed to either identify pathways for subsequent free energy calculations, such as umbrella sampling or perform numerous non-equilibrium simulations, utilizing the Jarzynski equation to extract free energy. In this paper, the authors conducted steered MD simulations to examine the maximum force required to separate tetramers, and they did not carry out the more rigorous but challenging free energy calculations. The observation that the maximum force needed to separate tetramers in the presence of cholesterol (compared to the SM case) suggests a positive direction in the authors' work, however, free energy calculations would be needed to fully support the cholesterol stabilization effect.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The model presented by the authors is consistent with the data described. Further testing of this model, for example by mutating the deep cholesterol binding site, would strengthen the model. However, such experiments might be challenging due to the relatively non-specific/hydrophobic nature of the deep cholesterol binding site.

      We completely agree that testing of the deep cholesterol-binding site by mutagenesis would be ideal. However, as the reviewer points out, such experiments would be challenging, not only because of the non-specific/hydrophobic nature of the deep cholesterol-binding site but also because we have been purifying AQP0 from natural sources (sheep eyes) and because it would be very difficult to secure the substantial amount of cryo-EM time needed to generate an electron crystallographic structure.

      Reviewer #2 (Public Review):

      The authors report that the findings generally apply to raft formation in membranes. However, this point is less clear as the lens membrane in which AQP0 resides is rather unique in lipid and protein content and density.

      We agree that the lens membrane is quite unique in its lipid and protein content and density, but rafts are also characterized by the same lipids and high protein density. Nonetheless, we do agree that our suggested implications for lipid rafts are speculative and so we emphasize this more in the revised version of the manuscript by writing: “This model is specific for the formation of AQP0 arrays in lens membranes, but we speculate that similar principles may underlie the organization of lipid rafts”.

      Reviewer #3 (Public Review):

      The authors showed that these adjacent tetramers can withstand a larger lateral detachment force when deep cholesterol molecules are present at the interface compared to scenarios with sphingomyelin (SM) molecules at the interface between two AQP0 tetramers. Authors interpret that result as evidence that deep cholesterol molecules mechanically stabilize the interface of the AQP0 tetramers. This conclusion has minor weaknesses, and the rigor of the lateral detachment simulations could be increased by establishing a reference point for the detachment force needed to separate AQP0 tetramers in a scenario without lipids at the interface between tetramers, and by increasing the number of repeats for the non-equilibrium steered MD simulations. Thermodynamic integration might be a better approach to compute the stabilization energy in the presence of cholesterol compared to the SM case.

      In all electron crystallographic structures of AQP0 determined to date, lipids have always been observed sandwiched in between the AQP0 tetramers (see, for example, Gonen et al., Nature, 2005 and Hite et al., EMBO J., 2010). Therefore, considering a scenario without lipids at the interface would be unnatural and the AQP0 array would likely not be stable. Such a scenario would thus not be the most appropriate reference point for the lateral detachment simulations. In our view, comparison of a scenario with the deep cholesterol at the interface versus a scenario without it appeared a more realistic setup to investigate the stabilizing role the deep cholesterol has on the association of AQP0 tetramers. In the Results subsection regarding these simulations, we added the following sentence to further stress the rationale of our experimental setup: “Comparison of these two cases should allow us to assess the effect of the deep-binding Chol3 molecules on the mechanical stability of the associated AQP0 tetramers.”

      Concerning the second suggestion of the reviewer of increasing the number of repeats, we doubled the number of simulation replicas: now it is n=20 for each pulling velocity and lipid interface. The trend of higher detachment forces for the interface containing cholesterol prevailed in a statistically significant, robust fashion (see Figure 7 of the revised manuscript and the main text referring to it). In consequence, as the reviewer suggested, extension of the dataset increased the rigor of the lateral detachment simulations. In addition to Figure 7 and the Results section, the Methods section and Table 4 have been updated to reflect the expanded dataset. 

      Finally, concerning the usage of thermodynamic integration to compute the stabilization energy, we agree with the reviewer that calculation of the free energy would be better to determine the thermodynamic stabilization imparted by the cholesterol molecules. At an earlier stage of the project, we did indeed consider carrying out this type of simulations, but we decided against it because of the complexity and poor convergence of such calculations. Our choice is also based on a previous attempt in which it proved very challenging to use free energy calculations to assess the binding of lipids to a flippase (see Wang et al. BioRxiv, https://doi.org/10.1101/ 2020.06.24.169771, 2021). We now included this consideration in the revised manuscript by adding the following sentence in the Discussion: “Although we provide solid evidence here that deep cholesterol impart mechanical stabilization, free energy calculations would be required to obtain the full picture of thermodynamic stabilization. Such free energy calculations are challenging for lipids, due to the chemical complexity and poor convergence involved (Wang et al., 2021), and are thus beyond the scope of the current work.”

      Reviewer #1 (Recommendations For The Authors):

      Reorganizing a few concepts would make the story easier to follow. For example, the analysis of the bilayer thickness seems disjointed. Although Figure 4 shows measurements, it is not clear that the measurements represent bilayer thickness until the last paragraph of page 21 in the discussion, where "Hydrophobic thickness" is first introduced. Moving that first paragraph of page 22 that refers to Fig. 4A to the results would be helpful to understand the figure, and would prepare the reader for this part of the discussion.

      In response to the reviewer, we moved the description of the measurements of the hydrophobic thickness to the Results section (Page 12) and adjusted the Discussion to minimize repetition (page 22).

      Likewise, Figure 4E shows measurements of something, but it is not clear that these are the dimensions of a protein pocket until well into the discussion.

      In response to the reviewer’s comment, we added a sentence both in the Results section [It sits in a pocket between the two adjacent AQP0 tetramers that is wider in the extracellular leaflet than the cytoplasmic leaflet (Figure 4E)] as well as to the caption of Figure 4E [The dotted lines indicate the distance between the two adjacent AQP0 tetramers at the positions of the ring system (~8.5 Å) and the acyl chain (~2.5 Å)].

      Figure 2 - a comment for the non-specialists on what this region of the protein is would be helpful context. Is this the pore with part of the NPA motif?

      We agree with the referee and added the following sentence to the caption of Figure 2: “A region of the water-conducting pathway close to the NPA (asparagine-proline-alanine), the AQP signature motif, is shown”.

      Reviewer #2 (Recommendations For The Authors):

      There is only one recommendation: In the results section entitled "Cholesterol positions observed in the electron crystallographic structures are representative of those around single AQP0 tetramers" the authors do not describe their approach. They refer to a reference (AponteSantamaria et al., 2012). The authors state the problem (investigate cholesterol positions), but it would be helpful to the readers if they also described the experimental approach.

      We agree with the reviewer and made the following addition to the sentence “we performed MD simulations and calculated time-averaged densities to investigate ...”

      Reviewer #3 (Recommendations For The Authors):

      Technical comments:

      (1) Authors stated: "Equilibration simulations were then performed until bulk membrane properties, such as thickness and deuterium order parameters, became stable and congruent with previous reports such as those by (Doktorova et al., 2020) and others (Figure 5-figure supplement 2 and Figure 5-figure supplement 3)." However, bilayer thickness is not represented in these figures. Additionally, I observed that the area per lipid (APL) appeared to be somewhat variable. This variation was particularly noticeable in systems where SM:CHOL=2:1, which seem to be not fully equilibrated. Is the figure displaying APL data for only one repetition? Could you please include plots for the other repetitions?

      We thank the reviewer for pointing this out. We would like to clarify that we used CHARMMGUI to generate one lipid bilayer configuration for each mixture and system size. These configurations (one per system) were extensively simulated to generate stable initial configurations of the lipid bilayers. Figure 5 – supplements 2 and 3 refer to this pre-equilibration step. The final pre-equilibrated configurations were then used in the subsequent multiple equilibrium MD runs that we performed, either with a single cholesterol molecule or with the AQP0 tetramer(s) inserted. We have clarified this procedure in the revised manuscript (see changes in the Methods section for the MD equilibrium simulations).  

      Concerning this pre-equilibration step, we have chosen the area per lipid, not thickness, to characterize the equilibration of the pure lipid bilayers. Accordingly, the area per lipid is the quantity shown in Figure 5 – figure supplement 3. We no longer refer to the membrane thickness in the revised manuscript.

      Concerning the variability in the area per lipid, we note that the large changes occur within the first few tens of nanoseconds of the pre-equilibration step, after which the area per lipid stabilizes. We would like to also point out that in Figure 5 – figure supplement 3, we chose a logarithmic scale for the time axis to actually make it possible for the reader to see the major changes that occur at the beginning of the pre-equilibration step (which would otherwise be difficult to see). In the particular case of the SM:CHOL=2:1 mixture_,_ the 64 lipids/leaflet system converged to a stable area per lipid value in the last 70 ns and the 244 lipids/leaflet system approached the same value in approximately the last 30 ns. This was a good indication that the large system had also converged. After equilibration of the membranes, a single cholesterol or AQP0 tetramer(s) were inserted and equilibrium simulations were initiated. However, the first 100 ns (or 300 ns in the case of the double tetramer system) were considered as a further equilibration time and were not included in the analysis. This is now explicitly stated in the revised manuscript: “The first 100 ns of each simulation replica (the first 300 ns for the two tetramer simulations) were considered as additional equilibration time and were not included in further analysis.”

      (2) Could you clarify the reasoning behind conducting the simulations at 323 K?

      We conducted the simulations at 323 K to ensure that the lipid bilayers were in the liquid phase.

      SM:CHOL mixtures have been reported to be in the liquid phase above 314 K (Keyvanloo et al. Biophys. J. 114: 1344, 2018). 323 K was thus chosen to be well above this value. Note that this temperature was also chosen in a previous MD simulation study of pure sphyngomyelin bilayers (Niemelä et al. Biophys. J. 87: 2976, 2004). This reasoning, as well as the two references, have been added to the Methods section in the revised manuscript.  

      (3) There appears to be a discrepancy in Figure 7. Panel F does not align with the provided caption. 

      We apologize for this mistake. The captions for panels E and F were switched. We corrected this mistake.

      (4) Likewise, in Figure 8, there is a mismatch between the caption and the figures. Furthermore, in the text, the authors assert, "In the absence of cholesterol, the AQP0 surface is completely covered by sphingomyelin in the hydrophobic region of the membrane and by water outside this region (Figure 8A, left column). As noted before, there are essentially no direct protein-protein interactions between the adjacent tetramers. When cholesterol was present at the interface, it interacted with AQP0 at the center of the membrane and remained mostly in place (Figure 8A, right column)." However, the left column shows cholesterol density. Could you please clarify this inconsistency, especially regarding the absence of cholesterol?

      We apologize for this mistake. The panels in Figure 8A showing the AQP0 surfaces in the absence and presence of cholesterol were switched. We corrected this mistake.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript by Estevam et al. reports new insights into the regulation of the receptor tyrosine kinase MET gained from two deep mutational scanning (DMS) datasets. In this paper, the authors use a classic selection system for oncogenic kinase signaling, the murine Ba/F3 cell line, to assess the functional effects of thousands of mutations in the kinase domains of MET in two contexts: (1) fusion of the whole MET intracellular region to the dimerization domain TPR, and (2) the same fusion protein, but with exon 14, which encodes part of the juxtamembrane region of MET, skipped. Critically, exon 14 skipping yields a version of MET that is found in many cancers and has higher signaling activity than the canonical MET isoform. The authors extensively analyze their DMS data to very convincingly show that their selection assay reports on kinase activity, by illustrating that many functionally important structural components of the kinase domain are not tolerant of mutation. Then, they turn their attention to a helical region of the juxtamembrane region (αJM), immediately after exon 14, which is posited to play a regulatory role in MET. Their DMS data illustrate that the strength and mutational tolerance of interactions between αJM and the key αC helix in the kinase domain depends on the presence or absence of exon 14. They also identify residues in the N-lobe of the kinase, such as P1153, which are not conserved across tyrosine kinases but appear to be essential for MET and MET-like kinases. Finally, the authors analyze their DMS data in the context of clinically-observed mutations and drug-resistance mutations.

      Overall, this manuscript is exciting because it provides new insights into MET regulation in general, as well as the role of exon 14. It also reveals ways in which the JM region of MET is different from that of many other receptor tyrosinekinases. The exon 14-skipped fusion protein DMS data is somewhat underexplored and could be discussed in greater detail, which would elevate excitement about the work. Furthermore, some of the cell biological validation experiments and the juxtaposition with clinical data are perhaps not assessed/interpreted as clearly they could be. Some constructive suggestions are given below to enhance the impact of the manuscript.

      Strengths:

      The main strengths of this paper, also summarized above in the summary, are as follows:

      (1) The authors very convincingly show that Ba/F3 cells can be coupled with deep mutational scanning to examine MET mutational effects. This is most clearly shown by highlighting how all of the known kinase structure and regulatory elements are highly sensitive to mutations, in accordance with a few other DMS datasets on other kinases.

      (2) A highlight of this paper is the juxtaposition of two DMS datasets for two different isoforms of the MET receptor. Very few comparisons like this exist in the literature, and they show how small changes to the overall architecture of a protein can impact its regulation and mutational sensitivity.

      (3) Another exciting advance in this manuscript is the deep structural analysis of the MET juxtamembrane region with respect to that of other tyrosine kinases - guided by the striking effect of mutations in the juxtamembrane helical region. The authors illustrate how the JM region of MET differs from that of other tyrosine kinases.

      (4) Overall, this manuscript will provide a resource for interpreting clinically relevant MET mutations.

      Weaknesses:

      (1) The manuscript is front-loaded with extensive analysis of the first DMS dataset, in which exon 14 is present, however, the discussion and analysis of the exon 14-skipped dataset is somewhat limited. In particular, a deeper discussion of the differences between the two datasets is warranted, to lay out the full landscape of mutations that have different functional consequences in the two isoforms. Rather, the authors only focus on differences in the JM region. What are the broader structural effects of exon 14 skipping across the whole kinase domain?

      Thank you for your feedback on our manuscript and our analysis of the exon 14 skipped mutational scanning data. The lack of a robust growth differential  between the wild type MET intracellular domain and the exon 14 skipped isoform within the Ba/F3 system suggests that there is not a significant growth advantage related to exon 14 skipping, likely due to the constitutive activation of both constructs by the TPR domain, which also suggests that the assay is potentially less sensitive to nuanced JM-driven effects between these two isoforms, aside from the highly sensitive ⍺JM-helix. We also lose insight on membrane-related interactions imposed on the juxtamembrane that may be important to fully understand the differences between these two isoforms in the cytoplasmically-expressed context. Therefore, we can at most speculate exon 14 skipped related differences between these two datasets.

      With these caveats in mind, to further address exon 14 and juxtamembrane-driven differences between these two mutational landscapes, we calculated the absolute score difference between TPR-METΔEx14 and TPR-MET (|METΔEx14 - MET|) and plotted the |ΔScore| in a heatmap. Overall, the two landscapes, as noted in the text, are largely similar with differences emerging mostly for specific mutations. Where we see the largest secondary structural difference continues to be the ⍺JM-helix, where MET is more sensitive to helix-breaking mutations such as proline. Again L1062 has the greatest difference in sensitivity between these two datasets for the ⍺JM-helix, with the introduction of negative charge resulting in loss-of-function for the TPR-MET kinase domain but having a null effect in the TPR-METΔEx14 kinase domain. Other positions with strong differences include the ⍺G and APE motif.

      We have incorporated more detailed discussion in text. 

      (2) It is unclear if gain-of-function mutations can actually be detected robustly in this specific system. This isn't a problem at face value, as different selection assays have different dynamic ranges. However, the authors don't discuss the statistical significance and reproducibility of gain- vs loss-of-function mutations, and none of the gain-of-function mutations are experimentally validated (some appear to show loss-of-function in their cellular validation assay with full-length MET). The manuscript would benefit from deeper statistical analysis (and discussion in the text) of gain-of-function mutations, as well as further validation of a broad range of activity scores in a functional assay. For the latter point, one option would be to express individual clones from their library in Ba/F3 cells and blot for MET activation loop phosphorylation (which is probably a reasonable proxy for activity/activation).

      Thank you for your comment on the statistical interpretations of gain-of-function (GOF) and loss-of-function (LOF) mutations. In this study we classify GOF and LOF based on the following metrics:

      (1) The difference between the missense mutation score and the wild type synonymous score for a given position must be smaller than the calculated propagated error, for both IL-3 withdrawal and IL-3 conditions

      (2) Missense mutations must be ≥ ±2 standard deviations (SD) from the mean of wild type synonymous mutations

      Given that our assay was conducted in a constitutively active kinase in the TPR-fusion context, gain-of-function mutations are expected to not only be rare, but also supersede baseline fitness. Within the IL-3 conditions, we expect that cells are not reliant or “addicted” to MET for growth proliferation. Nevertheless, due to the parallel nature of the screen, we can compare scores for variants in the IL-3 control and IL-3 withdrawal conditions to filter mutations that are solely exhibiting high fitness under selective pressure.

      To identify these mutations we 1) calculated the propagation of error between IL-3 and IL-3 withdrawal scores for the same variant 2) calculated the absolute difference between IL-3 and IL-3 withdrawal scores for the same variant 3) filtered variants if the IL-3 withdrawal score was ≥ +2 SDs, the IL-3 score was ≤ 0, and the absolute score difference between IL-3 and withdrawal conditions was larger than the propagated error.

      In analyzing mutations within the IL-3 withdrawal conditions, applying our statistical metrics, we find 33 mutations within the MET library, and 30 in the METΔEx14 library, that have a score of ≥ +2 SD and low propagated error. By increasing our boundary to ≥+2.5 SD, we can classify mutations with even higher confidence, identifying 10 mutations within the MET library, and 9 in the METΔEx14 library (Supplemental Data Figure 7).

      (3) In light of point 2, above, much of the discussion about clinically-relevant gain-of-function mutations feels a bit stretched - although this section is definitely very interesting in premise. A clearer delineation of gain-of-function, with further statistical support and ideally also some validation, would greatly strengthen the claims in this section.

      To address this concern, we have provided additional analysis and details on gain-of-function (GOF) classification in Supplemental Data Figure 5 and the overlap between GOF and clinically associated mutations in Supplemental Data Figure 8. Within our gain-of-function classifications, we pick up on several mutations at positions that have been clinically detected and experimentally validated in previous studies in both libraries (D1228, G1163, L1195), and show that GOF mutations also have low variance.

      Reviewer #2 (Public Review):

      Summary:

      The authors describe a deep mutational scanning (DMS) study of the kinase domain of the c-MET receptor tyrosine kinase. The screen is conducted with a highly activated fusion oncoprotein - Tpr-MET - in which the MET kinase domain is fused to the Tpr dimerization element. The mutagenized region includes the entire kinase domain and an alpha-helix in the juxtamembrane region that is essentially part of the MET kinase domain. The DMS screen is carried out in two contexts, one containing the entire cytoplasmic region of MET, and the other with an "exon 14 deletion" which removes a large portion of the juxtamembrane region (but retains the aforementioned alpha-helix). The work provides a robust and essentially exhaustive catalog of the effect of mutations (within the kinase domain) on the ability of the Tpr-MET fusion oncoproteins to drive IL3-independent growth of Ba/F3 cells. Every residue in the kinase is mutated to every natural amino acid. Given the design of the screen, one would expect it to be a powerful tool for identifying mutations that impair catalytic activity and therefore impair IL3-independent proliferation, but not the right tool for identifying gain-of-function mutations that operate by shifting the kinase from an inactive to active state (because the Tpr-Met fusion construct is already very highly activated). This is borne out by the data, which reveal many many deleterious mutations and few "gain-of-function" mutations (which are of uncertain significance, as discussed below).

      Strengths:

      The authors take a very scholarly and thorough approach to interpreting the effect of mutations in light of available information for the structure and regulation of MET and other kinases. They examine the effect of mutations in the so-called catalytic (C) and regulatory (R) spines, the interface between the JM alpha-helix and the C-helix, the glycine-rich loop, and other key elements of the kinase, providing a structural rationale for the deleterious effect of mutations. Comparison of the panoply of deleterious mutations in the TPR-met versus TPR- exon14del-MET DMS screens reveals an interesting difference - the exon14 deletion MET is much more tolerant of mutations in the JM alpha-helix/C-helix interface. The reason for this is unclear, however.

      Weaknesses:

      Because the screens were conducted with highly active Tpr-MET fusions, they have limited power to reveal gain-of-function mutations. Indeed, to the extent that Tpr-MET is as active or even more active than ligand-activated WT MET, one could argue that it is "fully" activated and that any additional gain of fitness would be "super-physiologic". I would expect such mutations to be rare (assuming that they could be detected at all in the Ba/F3 proliferation assay). Consistent with this, the authors note that gain-of-function mutations are rare in their screen (as judged by being more fit than the average of synonymous mutations). In their discussion of cancer-associated mutations, they highlight several "strong GOF variants in the DMS". It is unclear what the authors mean by "strong GOF", indeed it is unclear to this reviewer whether the screen has revealed any true gain of function mutations at all. A few points in this regard:

      (1) More active than the average of synonymous mutations (nucleotide changes that have no effect on the sequence of the expressed protein) seems to be an awfully low bar for GOF - by that measure, several synonymous mutations would presumably be classified as GOF.

      We completely agree that any mutation above the average synonymous would not be a robust assessment and thus why we statically filtered mutations in our entire analysis. To this point, and that of  Reviewer 1, we have further outlined our statistical definitions. In classifying mutations as GOF or LOF, the following parameters were used:

      (1) The difference between the missense mutation score and the wild type synonymous score for a given position must be smaller than the calculated propagated error, for both IL-3 withdrawal and IL-3 conditions

      (2) Missense mutations must be ≥ ±2 standard deviations (SD) from the mean of wild type synonymous mutations

      Therefore, only variants at the tail-ends of the mutational distribution were assessed, and further filtered based on propagation of error. For this reason, a “strong GOF” mutation as noted in this study is one that improves the fitness of an already active kinase. As pointed out, within our analysis, these are very rare occurrences, and in focusing on cancer-associated mutations we find that the variants that meet these statistical parameters require a larger genetic “leap” in the codon space. Overall, we have also changed our language in reference to GOF mutations in text.

      We hope this concern has been addressed in the new Supplemental Data Figures.

      (2) In the +IL3 heatmap in supplemental Figure 1A, there is as much or more "blue" indicating GOF as in the -IL3 heatmap, which could suggest that the observed level of gain in fitness is noise, not signal.

      We hope this concern has been addressed in the previous responses and new Supplemental Data Figures.

      (3) And finally, consistent with this interpretation, in Supplemental Figure 1C, comparing the synonymous and missense panels in the IL3 withdrawal condition suggests that the most active missense mutations (characterized here as strong GOF) are no more active than the most active synonymous mutations.

      We hope this concern has been addressed in the previous responses and figures above.

      My other major concern with the work as presented is that the authors conflate "activity" and "activation" in discussing the effects of mutations. "Activation" implies a role in regulation - affecting a switch between inactive and active conformations or states - at least in this reviewer's mind. As discussed above, the screen per se does not probe activation, only activity. To the extent that the residues discussed are important for activation/regulation of the kinase, that information is coming from prior structural/functional studies of MET and other kinases, not from the DMS screen conducted here. Of course, it is appropriate and interesting for the authors to consider residues that are known to form important structural/regulatory elements, but they should be careful with the use of activity vs. activation and make it clear to the reader that the screen probes the former. One example - in the abstract, the authors rightly note that their approach has revealed a critical hydrophobic interaction between the JM segment and the C-helix, but then they go on to assert that this points to differences in the regulation of MET and other RTKs. There is no evidence that this is a regulatory interaction, as opposed to simply a structural element present in MET (and indeed the authors' examination of prior crystal structures shows that the interaction is present in both active and inactive states.

      Thank you, and we completely agree that the distinction between “activity” and “activation” is important and that we can at most speculate and propose models for effects related to activation from this screen. We have edited the text to reflect these distinctions. In respect to activation and the second point, we believe the screen highlights the ⍺JM-C interface as a critical structural region, which may have a role in regulation based on the paradigm of juxtamembrane regulation in RTKs, the presence of a similar interface in TAM family kinases, the co-movement of the ⍺JM-helix and ⍺C-helix between active and inactive conformations in the structural ensemble, and the observation that within the TPR-METΔEx14 library there is a greater tolerance for mutations at interface positions than TPR-MET. We hope that are follow-up studies that directly probe the ⍺JM-C interface in respect to the entire juxtamembrane to truly say if/ what role this conserved motif plays in regard to MET function. We have changed the language of the text to reflect how these differences contribute to our proposed model, rather than any unintended assertion on direct regulatory effects.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggested major points to address:

      (1) Although the authors show that several key functional residues in the kinase domain are highly sensitive to mutation, it would be nice if the authors further established a clear connection between kinase activity and enrichment in the Ba/F3 assay. Specifically, it is unclear to what extent there is a correlation between the extent of enrichment/depletion and kinase activity - is a larger activity score necessarily indicative of higher kinase activity? This is partly validated by the P1153L mutation autophosphorylation western blots in Figure 4B, but this correlation is somewhat undermined by the data in 5F. Autophosphorylation data (or phosphorylation data on a direct downstream substrate) for a few mutants would really solidify what the activity score is truly reporting. This might also clarify the extent to which the difference between the two screens can be interpreted, and the extent to which gain-of-function can be interpreted.

      The Ba/F3 assay was carefully chosen for its addiction to exogenous IL-3, which serves as a permissive signaling switch. Any mutation that prevents TPR-MET/ΔEx14 from properly functioning is therefore dampening its signaling ability. Nevertheless, it is possible that some mutations with high scores are truly improving activity and others are sustaining activity through more stable interactions than the wild type kinase domain or with downstream signaling partners, which would require careful biochemical dissection outside the scope of this study. To address these points, we now refer to the mutation score simply as “score” rather than “activity score” and further discuss these caveats in text.

      (2) Overall, the exon 14-skipped dataset is under-discussed in the paper. The comparison of the two datasets is where most deep insights are likely to be found, and so a more thorough analysis/discussion of this dataset would really elevate the significance of the paper. For example, there appear to be a very large number of mutations that have divergent effects in the two screens (everything along the dashed lines in Figure 5D), but it's unclear where most of these mutations lie on the structure. It would be helpful if the residues with divergent mutational effects between the two screens (Supplementary Figure 5E) were mapped onto a structure of the JM-KD construct.

      To address this concern, new analysis has been added to the supplement, showing the score differences between MET and METΔEx14 mutations as a heatmap (Supplemental Data Figure 7A). Within this analysis we further applied our statistical filtering methods and structurally mapped positions with the greatest differential scores to show where divergent effects cluster (Supplemental Data Figure 7D). Consistent with our previous reports, the ⍺JM-helix and ⍺C-helix show the largest cluster of divergent effects, in addition to sites such as the ⍺G and APE motif. Further discussion of these points have been added to the text.

      (3) Based on the observations that αJM-αC interactions seem to be less strictly required in the exon 14 mutant, the hypothesis that exon 14 skipping merely removes a Cbl docking site seems largely unsatisfactory. There seems to be more direct structural alterations that could explain this change, but these are not really discussed or speculated on. Related to this, while L1062 mutations are more tolerated, as the authors showed in both the mutational heatmap and the cellular experiments, its binding counterpart L1125 still seems to be somewhat immutable based on the heatmaps. So, more hypothesis/exploration of how exon 14 skipping affects MET KD structure would be a nice addition to the paper.

      We agree that loss of the Cbl docking site is an insufficient model to capture the full nature of JM regulation and exon 14 skipping effects, which was a major incentive for this study. The outstanding ⍺JM-⍺C-helix sensitivity also excites us because it points to a potential regions of the JM that potentially is involved in kinase activity through ⍺C-helix interactions, much like the CDK models and other RTK-JM interactions. We observed that the ⍺JM-⍺C helix retain contact, and propose that the ⍺JM-⍺C helix move in unison between active and inactive conformations. However, it is possible that a more complicated mechanism might also exist, where there is a larger degree of maintenance of these contacts in a homodimer. For instance, in Figure 3G, if you compare the ⍺JM-helix conformations, in both RON and AXL there is more distance and a pivot away from the ⍺C-helix. It’s is possible that there are shared mechanisms between the MET and TAM families that could further elucidate exactly how these ⍺JM-helices interact with the kinase domain during the activity transitions and what biophysical role JM truncations play.

      (4) The discussion about mutations S1122Q and L1062D is a bit confusing and incomplete. From the DMS data, it appears that L1062D should be mildly gain-of-function for the exon 14 deletion variant and very loss of function for wild-type MET. In the validation HeLa cell experiments L1062D is loss-of-function in both contexts, but a mention of this discrepancy is omitted. Then, when the discordance between DMS and HeLa cell experiments is observed again for S1122Q, it is explicitly called out for activation-loop phosphorylation, but then there is no mention of the fact that HGF stimulation leads to greater pERK levels for S1122Q in the exon 14 deletion context (the opposite of the DMS result). The Erk phosphorylation discrepancy should be mentioned. It is entirely reasonable, as the authors suggest, that there are differences between full-length MET and the TPR fusions, but the enhanced Erk phosphorylation by the S1122Q mutation is surprising (and intriguing!). This section could use some re-analysis/re-writing and further discussion.

      Thank you for this comment. As noted L1062D shows slight GOF in METΔEx14 but LOF in MET. The blots show expression of L1062D and S1122Q in the full length receptor in the absence and presence of HGF stimulation. L1062D is loss of function for both contexts only in -HGF conditions, but shows expression in phosphorylated METΔEx14, but not MET. For S1122Q, indeed there is a stronger pERK signal in the METΔEx14, which highlights how probing all regions of phosphorylation (A-loop and C-tail) and many MET-associate pathways (ERK, AKT) may be important to understand in what way these mutations are affect MET phosphorylation and proliferation. We have included this point in the text.

      (5) Related to the previous point, one other thing to consider here is that perhaps gain-of-function mutations are simply not detectable in this particular DMS assay. The authors state that GOF and LOF are defined as 2 standard deviations from the mean of the WT-synonymous distribution. How many mutations are actually designated to be GOF based on this criterion? Are those GOF mutations as reproducible as the LOF mutations? It would be worthwhile to separately analyze the variance in activity scores for every loss-of-function mutation and gain-of-function mutation. It seems likely that loss-of-function scores are a lot more reproducible than gain-of-function ones, suggesting that the most apparent gain-of-function signal is just noise in the assay. The few outliers to this point (true gain-of-function mutations) may be some of the ones discussed in Figure 6. If this is true, it would lend confidence to the claims associated with Figure 6.

      In analyzing and classifying both GOF and LOF mutations, error was a main filtering parameter. Each fitness score, calculated by Enrich2, is representative of the slope across time points  and biological replicates for the read frequency of the mutation. The associated standard error (SE) reflects the variance for each mutation within the scoring framework (Rubin et al., 2017). Mutations were then further filtered based on low propagated error, calculated by comparing the standard error (SE) of each missense mutation to the SE of the respective wild type synonymous mutation. Therefore, mutations were only classified as GOF or LOF if there was low error, in addition to the other score filters previously described. We have plotted the classified GOF mutations with their respective SE in the newly incorporated Supplemental Data Figure 8C.

      (6) In the discussion of panels 6C and 6D, the assertion is that the "clinical, not validated" category has more mutations that are low-fitness outliers than the "clinical, validated" category. From the graphs, it's actually hard to tell if this is the case for two reasons: (1) the way the graphs are normalized, (to the largest value in each histogram), you cannot compare bar heights (and thus number of mutations) between two histograms on the same graph. (2) Just looking at the shapes of the distributions, or considering maybe the mean or median values, it's unclear whether the "validated" and "not validated" populations are actually different from one another.

      This is an important indication, and we have added analysis showing the distribution and number of clinically-associated mutations within our libraries without normalization in the main text and in Supplemental Data Figure 8A-B.

      (7) This sentence in the last results section is somewhat unclear: "GOF resistance mutations may indicate an effect on the equilibrium of kinase activation, whereas LOF resistance mutations likely affect inhibitor-protein interactions directly." The first part makes sense, but it is not totally obvious how one can infer anything about inhibitor-protein interactions from mutations that are LOF with respect to kinase activity. Related to this, how are LOF mutations selected in the presence of an inhibitor? Is the assumption here that the mutation might totally abrogate inhibitor binding but only slightly impair the kinase? Perhaps this could be explained a bit more.

      Here, the idea we wanted to get across is that there are two models  that can explain how a mutation can contribute to resistance: shift the activity equilibrium at baseline or directly impair drug effects and restore baseline activity. Mutations that are labeled resistant and GOF, favor the first model. Mutations that are labeled resistant and LOF, favor the second model. In the presence of an inhibitor, which is in the scope outside of this study, LOF mutations would be sensitive to the inhibitor (ie WT-like and sensitive).

      (8) Some additional details of the library preparation and sequencing should be given in the methods section. It appears that the variable region of the library is roughly 275 amino acid residues long, which means >800 bases. How was this sequenced? From the methods, it sounds like all of the variants were pooled into a single library, but then sequencing was done using a 300x300 paired-end Illumina kit, which would not cover the length of the whole variable region. Was the library actually screened in segments as sub-libraries and then separately sequenced? Alternatively, was the whole library screened at once, and then different segments were amplified out for sequencing? If the latter approach is used, this could yield confounding results for counting wild-type variants that have the parent wild-type coding sequence. For example, if you amplify your kinase library in three segments after a single selection on the whole library, and you sequence those three segments separately, you might find a read that appears as wild-type in the part you amplified/sequenced but has a mutation in a region that you did not sequence. If this approach is taken, the counts for the wild-type sequence would be inaccurate, in which case, how is the data normalized with WT as a reference? Regardless of the method used, some more details should be provided in the methods section.

      In this study, we used the Nextera XT DNA Library Preparation Kit (Illumina), which uses a tagmentanation approach that randomly fragments our 861 bp amplicon into ~300 bp fragments with a transposase, resulting in a Poisson distribution of fragment sizes. This allows for direct sequencing of all amplicons and libraries with an SP300 paired-end run, which we ran on two lanes of a NovaSeq6000. Samples are demultiplexed  and processed by our analysis pipeline with a lookup table that associates the unique dual index to the specific sample (library, time point, biological replicate, IL-3 condition).

      The TPR-MET and TPR-METΔEx14 libraries were prepared in parallel throughout the entire experiment, from cloning to virus generation to transductions, screening, cell harvesting, sequencing prep, and sequencing. In other words, the TPR-MET and TPR-METΔEx14 were transduced into their own, respective batch of cells for each biological replicate, then selected and screened on the same day for each replicate and time point. Each library and condition (time point, biological replicate, IL-3 condition) was prepared in parallel but still an independent sample. At the stage of tagmentation, each sample was arrayed, where each well corresponds to a library, biological replicate, and time point. At the stage of sequencing, samples across the two libraries were normalized to 10mM (library, biological replicate, time point, IL-3 condition) then pooled together and all run on two lanes of the same NovaSeq6000 flow cell.

      PCR and sequencing bias was one of the most important parameters for us, which is why we performed tagmentation in parallel and sequenced everything on the same run. We have added extra details to the methods and hope that we have clarified your questions on this matter.

      Suggested minor points to address:

      (1) TPR (as in TPR-MET fusion) is not defined in the text when it is first mentioned. And it wasn't immediately clear that this is not a membrane-associated domain (Figure 5E makes this way more obvious than Figure 1B does). Perhaps this could be made more explicit in the text or in Figure 1.

      We have incorporated a new schematic in Figure 1B to better illustrate the TPR-fusion constructs used within this study. The usage of the TPR-fusion is first mentioned in the introduction, paragraph 4, and revised the main-text to delineate the usage of the TPR-fusion more clearly.

      (2) In Figure 2G, it would be helpful if the wild-type amino acid residue was listed underneath the position number in the two graphs (even though those residues are also highlighted in 2H).

      Thank you for this recommendation, we have added the wild type amino acid next to the position number in the x-axis label.

      (3) For Supplementary Data Figure 2, is it possible to calculate conservation scores at each position using some kind of evolutionary model, rather than relying on visual inspection of the sequence logo? Can one quantitatively assert that the C-spine is less conserved than the R-spine overall, or can this only be said for certain positions? Related to this, in comparing Figure 2G to Supplementary Data Figure 2, it is interesting that there isn't any obvious correspondence between mutational tolerance and conservation within the C-spine. For example, 1165 seems to be the most conserved position in the C-spine, but several substitutions are tolerated at this position, just like 1210, which is one of the least conserved positions in the C-spine. Finally, it's very likely that positions 1165, 1210, 1272, and 1276 co-vary, given that they all pack into the same hydrophobic cluster. This might be why they appear less conserved. These last few points might be worth discussing briefly if the authors want to relate mutational tolerance to evolutionary conservation.

      Thank you for this recommendation. To better quantitatively determine C-spine versus R-spine conservation, we performed a multiple sequence alignment of all RTK kinase domain sequences to properly identify corresponding R- and C-spine locations, as previously done in generating the spine logos, then used the bio3D structural bioinformatics package in R to calculate the conservation score of each residue position by amino acid “similarity” with a blosum62 matrix (Supplemental Data Figure 2B). In concordance with the logos, we find that C-spine positions 1092, 1108, 1165 have the highest conservation scores, even compared to some R-spine mutations. We also see across the alignment that indeed, C-spine positions 1165 1210,1211,1212, and 1272, and 1276 co-vary within RTK families. We have revised the text to reflect these points, and more specifically discuss position-level conservation rather than generalizing conservation for the C- and R-spines.

      (4) On Page 7 of the merged document, there appear to be some figure labeling errors. In the first and second paragraphs of the "Critical contacts between..." section, Figure 3B is referenced multiple times as a structural alignment/ensemble, but this is a heatmap.

      Thank you for catching this! The correct figure panels are now referenced.

      (5) In the text describing Figure 3A, it is stated that the structures were aligned to the N-lobe, but the figure legend says that all structures were aligned to alpha-C and alpha-JM.

      Thank you - a local alignment to the ⍺JM-helix and ⍺C-helix is correct, the idea here being that if the ⍺JM-helix and ⍺C-helix are linked to an active/inactive conformation like in the case of the insulin receptor, these two clusters could be revealed through the structural ensemble. However, we discovered this was not the case, combined with the DMS sensitivity to mutations at the packing interface leads us to believe that the MET JM has a distinctive regulatory mechanism that relies on this ⍺C-helix interface. We have made this correction to the text.

      (6) It would be helpful if the alpha-C and alpha-JM helices in Figure 3D were labeled on the MET structures.

      The ⍺C-helix and ⍺JM-helix are now labeled in Figure 3D.

      (7) It appears that Figure 4E is never explicitly referenced in the text.

      Thank you, Figure 4E is now appropriately referenced in the text.

      (8) Throughout the Figure 6 legend, for the histograms, it is stated that "Counts are normalized to the total mutations in each screen dataset." This might not be the correct description of normalization, as this would mean that the sum of all of the bins should equal 1. Rather, the normalization appears to be to the bin with the largest number of mutants in it, which is given a value of 1. This difference is really critical to how one visually inspects the overlaid histograms.

      Thank you for this comment. Here, the intention was to aid in the visualization of the distribution of cancer-associated and resistance associated mutations, which is a much smaller population compared to the whole library and becomes easily masked. We originally applied a “stat(ncount)” function in R, which as noted scales the data and sets the peak to 1, which only applied to the clinical and cancer-associated mutations plotted. Now, to better compare distributions, normalization has been removed, instead opting to overlay the distributions of all missense mutations and the subset of clinical mutations directly with their own y-axis scale. This modification has been made throughout Figure 6 panels, hopefully improving interpretability.

      Reviewer #2 (Recommendations For The Authors):

      A few thoughts/suggestions:

      (1) Regarding kinase regulation, the "closing of N- and C-lobe" upon activation is an often mentioned component of activation, and I'm sure is true in many cases, but it is not a general feature of kinase activation.

      The text has been updated - we removed the description of N- and C-lobe closure. 

      (2) With respect to the inactive state of MEK, the DFG-flipped structure discussed here is almost certainly an inhibitor-induced conformation. Again, DFG-flip is often discussed as a mechanism of kinase regulation, and while in some kinases this might be the case, more often it is a drug-induced or drug-stabilized inactive conformation. The SRC/CDK-like inactive conformation in 2G15 is more likely a physiologically relevant inactive state. (or even better, the ATP-bound inactive state structure 3DKC, which exhibits a somewhat different SRC/CDK-like inactive conformation).

      The PDB 3R7O structure was chosen as the main representation because it was the clearest representation of a wild type structure with an aligned R- and C- spine, solvent-exposed, phosphorylated activation loop. Although 3DKC is bound to ATP, this structure is still in an inactive conformation and has stabilizing mutations (Y1234/F, Y1235D) and an atypical alpha helix structure in the activation loop. However, we agree the SRC/CDK-like inactive conformation is an important representation and we have incorporated our structural mapping on 2G15 in the new supplemental figures with further details on statistical analysis and comparison of libraries.

      (3) Following the comments above, I would describe the process of activation in a simpler way (in any case, it is peripheral to the work described here). Something along the lines of "phosphorylation on tyrosines XX and XX induces rearrangement of the activation segment and promotes and stabilizes the inward active position of the C-helix." Can go on to mention that this forms the E1127/K1110 salt bridge. (The DFG is already "in" in the SRc/CDK-like inactive state).

      We have changed the language to more simply describe activation. Thank you!

      (4) Would be great to see DMS with the intact receptor done in a way that could identify mutations that lead to activation in a ligand-independent manner. (but obviously beyond the scope of this paper).

      Agreed! This would be an excellent follow up for the future, especially to elucidate juxtamembrane regulation, as the membrane context is likely required.

      A typo or two:

      Boarded instead of bordered/outlined in legend to Fig. 1.

      P11553L in the 2nd line of the 2nd paragraph in that section.

      Thank you, we have addressed these typos!

    2. eLife assessment

      This manuscript describes a deep mutational scanning study of the kinase domain of the MET receptor tyrosine kinase. The study yields an important catalog of essentially all possible deleterious mutations in this portion of the receptor., with convincing evidence. The manuscript will be of interest to researchers working in the field of receptor tyrosine kinases.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors describe a deep mutational scanning (DMS) study of the kinase domain of the c-MET receptor tyrosine kinase. The screen is conducted with a highly activated fusion oncoprotein - Tpr-MET - in which the MET kinase domain is fused to the Tpr dimerization element. The mutagenized region includes the entire kinase domain and an alpha-helix in the juxtamembrane region that is essentially part of the MET kinase domain. The DMS screen is carried out in two contexts, one containing the entire cytoplasmic region of MET, and the other with an "exon 14 deletion" which removes a large portion of the juxtamembrane region (but retains the aforementioned alpha-helix). The work provides a robust and essentially exhaustive catalog of the effect of mutations (within the kinase domain) on the ability the Tpr-MET fusion oncoproteins to drive IL3-independent growth of Ba/F3 cells. Every residue in the kinase is mutated to every natural amino acid. Given the design of the screen, one would expect it to be a powerful tool for identifying mutations that impair catalytic activity and therefore impair IL3-independent proliferation. This is borne out by the data, which reveal many many deleterious mutations. The study reveals relatively few "gain-of-fitness" mutations, but this is not unexpected because it is carried out with an already-activated form of the MET kinase (the oncogenic Tpr-met fusion).

      Strengths:

      The authors take a very scholarly and thorough approach in interpreting the effect of mutations in light of available information for the structure and regulation of MET and other kinases. They examine the effect of mutations in the so-called catalytic (C) and regulatory (R) spines, the interface between the JM alpha-helix and the C-helix, the glycine-rich loop and other key elements of the kinase, providing a structural rationale for the deleterious effect of mutations. Comparison of the panoply of deleterious mutations in the TPR-met versus TPR- exon14del-MET DMS screens reveals an interesting difference - the exon14 deletion MET is much more tolerant of mutations in the JM alpha-helix/C-helix interface. The reason for this is unclear, however.

      An important qualification of the study is that it was carried out with the already highly activated Tpr-Met fusion. As a consequence, it is not expected to reveal mutations that activate the kinase -- activate in the sense of promoting a switch between physiologically-relevant inactive and active states. Consistent with this, the authors note that gain-of-fitness mutations are rare in their screen, and those that are identified induce modest but significant increases in fitness.

    1. Reviewer #2 (Public Review):

      Summary:

      The study focuses on the vomeronasal organ, the peripheral chemosensory organ of the accessory olfactory system, by employing single-cell transcriptomics. The author analyzed the mouse vomeronasal organ, identifying diverse cell types through their unique gene expression patterns. Developmental gene expression analysis revealed that two classes of sensory neurons diverge in their maturation from common progenitors, marked by specific transient and persistent transcription factors. A comparative study between major neuronal subtypes, which differ in their G-protein sensory receptor families and G-protein subunits (Gnai2 and Gnao1, respectively), highlighted a higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons. Moreover, distinct differences in ER content and ultrastructure suggest some intriguing roles of ER in Gnao1-positive vomeronasal neurons. This work is likely to provide useful data for the community and is conceptually novel with the unique role of ER in a subset of vomeronasal neurons. This reviewer has some minor concerns and some suggestions to improve the manuscript.

      Strengths:

      (1) The study identified diverse cell types based on unique gene expression patterns, using single-cell transcriptomic.

      (2) The analysis suggests that two classes of sensory neurons diverge during maturation from common progenitors, characterized by specific transient and persistent transcription factors.

      (3) A comparative study highlighted differences in Gnai2- and Gnao1-positive sensory neurons.

      (4) Higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons.

      (5) Distinct differences in ER content and ultrastructure suggest unique roles of ER in Gnao1-positive vomeronasal neurons.

      (6) The research provides conceptually novel on the unique role of ER in a subset of vomeronasal neurons, offering valuable insights to the community.

      Weaknesses:

      (1) The connection between observations from sc RNA-seq and EM is unclear.

      (2) The lack of quantification for the ER phenotype is a concern.

    2. eLife assessment

      This valuable study uses single-cell transcriptomics to explore the mouse vomeronasal organ and represents an advance that enhances our understanding of neural diversity within this sensory system. Findings suggest a unique endoplasmic reticulum (ER) structure in Gnao1 neurons and allow for the synthesis of a developmental trajectory from stem cells to mature vomeronasal sensory neurons. Convincing methods, data, and analyses broadly support the claims, although experiments supporting the main ER-related claim require additional quantification of co-expression and statistics on labeling intensity or coverage. Adding these data would greatly strengthen the conclusions of the paper.

    3. Reviewer #1 (Public Review):

      Devakinandan and colleagues present a manuscript analyzing single-cell RNA-sequencing data from the mouse vomeronasal organ. The main advances in this manuscript are to identify and verify the differential expression of genes that distinguish apical and basal vomeronasal neurons. The authors also identify the enriched expression of ER-related genes in Gnao1 neurons, which they verify with in situ hybridizations and immunostaining, and also explore via electron microscopy. Finally, the results of this manuscript are presented in an online R shiny app. Overall, these data are a useful resource to the community. I have a few concerns about the manuscript, which I've listed below.

      General Concerns:

      (1) The authors mention that they were unable to identify the cells in cluster 13. This cluster looks similar to the "secretory VSN" subtype described in a recent preprint from C. Ron Yu's lab (10.1101/2024.02.22.581574). The authors could try comparing or integrating their data with this dataset (or that in Katreddi et al. 2022) to see if this is a common cell type across datasets (or arises from a specific type of cell doublets). In situ hybridizations for some of the marker genes for this cluster could also highlight where in the VNO these cells reside.

      (2) I found the UMAPs for the neurons somewhat difficult to interpret. Unlike Katreddi et al. 2022 or Hills et al. 2024, it's tricky to follow the developmental trajectories of the cells in the UMAP space. Perhaps the authors could try re-embedding the data using gene sets that don't include the receptors? It would also be interesting to see if the neuron clusters still cluster by receptor-type even when the receptors are excluded from the gene sets used for clustering. Plots relating the original clusters to the neuronal clusters, or dot plots showing marker gene expression for the neuronal clusters might both be useful. For example, right now it's difficult to interpret clusters like n8-13.

    4. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Devakinandan and colleagues have undertaken a thorough characterization of the cell types of the mouse vomeronasal organ, focusing on the vomeronasal sensory neurons (VSNs). VSNs are known to arise from a common pool of progenitors that differentiate into two distinct populations characterized by the expression of either the G protein subunit Gnao1 or Gnai2. Using single-cell RNA sequencing followed by unsupervised clustering of the transcriptome data, the authors identified three Gnai2+ VSN subtypes and a single Gnao1+ VSN type. To study VSN developmental trajectories, Devakinandan and colleagues took advantage of the constant renewal of the neuronal VSN pool, which allowed them to harvest all maturation states. All neurons were re-clustered and a pseudotime analysis was performed. The analysis revealed the emergence of two pools of Gap43+ clusters from a common lineage, which differentiate into many subclusters of mature Gnao1+ and Gnai2+ VSNs. By comparing the transcriptomes of these two pools of immature VSNs, the authors identified a number of differentially expressed transcription factors in addition to known markers. Next, by comparing the transcriptomes of mature Gnao1+ and Gnai2+ VSNs, the authors report the enrichment of ER-related genes in Gnao1+ VSNs. Using electron microscopy, they found that this enrichment was associated with specific ER morphology in Gnao1+ neurons. Finally, the authors characterized chemosensory receptor expression and co-expression (as well as H2-Mv proteins) in mature VSNs, which recapitulated known patterns.

      Strengths:

      The data presented here provide new and interesting perspectives on the distinguishing features between Gnao1+ and Gnai2+ VSNs. These features include newly identified markers, such as transcription factors, as well as an unsuspected ER-related peculiarity in Gnao1+ neurons, consisting of a hypertrophic ER and an enrichment in ER-related genes. In addition, the authors provide a comprehensive picture of specific co-expression patterns of V2R chemoreceptors and H2-Mv genes.

      Importantly, the authors provide a browser (scVNOexplorer) for anyone to explore the data, including gene expression and co-expression, number and proportion of cells, with a variety of graphical tools (violin plots, feature plots, dot plots, ...).

      Weaknesses:

      The study still requires refined analyses of the data and rigorous quantification to support the main claims.

      The method description for filtering and clustering single-cell RNA-sequencing data is incomplete. The Seurat package has many available pipelines for single-cell RNA-seq analysis, with a significant impact on the output data. How did the authors pre-process and normalize the data? Was the pipeline used with default settings? What batch correction method was applied to the data to mitigate possible sampling or technical effects? Moreover, the authors do not describe how cell and gene filtering was performed. The data in Figure 7-Supplement 3 show that one-sixth of the V1Rs do not express any chemoreceptor, while over a hundred cells express more than one chemoreceptor. Do these cells have unusually high or low numbers of genes or counts? To exclude the possibility of a technical artifact in these observations, the authors should describe how they dealt with putative doublet cells or debris. Surprisingly, some clusters are characterized by the expression of specific chemoreceptors (VRs). Have these been used for clustering? If so, clustering should be repeated after excluding these receptors.

      The identification of the VSN types should be consistent across the different analyses and validated. The data presented in Figure 1 lists four mature VSN types, whereas the re-clustering of neurons presented in Figure 3 leads to a different subdivision. At present, it remains unclear whether these clusters reflect the biology of the system or are due to over-clustering of the data, and therefore correspond to either noise or arbitrary splitting of continua. Clusters should be merged if they do not correspond to discrete categories of cells, and correspondence should be established between the different clustering analyses. To validate the detected clusters as cell types, markers characteristic of each of these populations can be evaluated by ISH or IHC.

      There is a lack of quantification of imaging data, which provides little support for the ER-related main claim. Quantification of co-expression and statistics on labeling intensity or coverage would greatly strengthen the conclusions and the title of the paper.

    5. Author response:

      eLife assessment

      This valuable study uses single-cell transcriptomics to explore the mouse vomeronasal organ and represents an advance that enhances our understanding of neural diversity within this sensory system. Findings suggest a unique endoplasmic reticulum (ER) structure in Gnao1 neurons and allow for the synthesis of a developmental trajectory from stem cells to mature vomeronasal sensory neurons. Convincing methods, data, and analyses broadly support the claims, although experiments supporting the main ER-related claim are incomplete and lack quantification of co-expression and statistics on labeling intensity or coverage. Adding these data would greatly strengthen the conclusions of the paper.

      Public Reviews:

      Reviewer #1 (Public Review):

      Devakinandan and colleagues present a manuscript analyzing single-cell RNA-sequencing data from the mouse vomeronasal organ. The main advances in this manuscript are to identify and verify the differential expression of genes that distinguish apical and basal vomeronasal neurons. The authors also identify the enriched expression of ER-related genes in Gnao1 neurons, which they verify with in situ hybridizations and immunostaining, and also explore via electron microscopy. Finally, the results of this manuscript are presented in an online R shiny app. Overall, these data are a useful resource to the community. I have a few concerns about the manuscript, which I've listed below.

      General Concerns:

      (1) The authors mention that they were unable to identify the cells in cluster 13. This cluster looks similar to the "secretory VSN" subtype described in a recent preprint from C. Ron Yu's lab (10.1101/2024.02.22.581574). The authors could try comparing or integrating their data with this dataset (or that in Katreddi et al. 2022) to see if this is a common cell type across datasets (or arises from a specific type of cell doublets). In situ hybridizations for some of the marker genes for this cluster could also highlight where in the VNO these cells reside.

      Cluster13 (Obp2a+) cells identified in our study have similar gene expression markers to those identified with the “putative secretory” cells in Hills et al. manuscript. At the time this manuscript was available publicly, our publication was already finalized and communicated. We welcome the suggestion to integrate data, which we will attempt and address in our revision.      

      (2) I found the UMAPs for the neurons somewhat difficult to interpret. Unlike Katreddi et al. 2022 or Hills et al. 2024, it's tricky to follow the developmental trajectories of the cells in the UMAP space. Perhaps the authors could try re-embedding the data using gene sets that don't include the receptors? It would also be interesting to see if the neuron clusters still cluster by receptor-type even when the receptors are excluded from the gene sets used for clustering. Plots relating the original clusters to the neuronal clusters, or dot plots showing marker gene expression for the neuronal clusters might both be useful. For example, right now it's difficult to interpret clusters like n8-13.

      We will represent the UMAPs to make the developmental trajectory clearer. How neuron clusters are affected by the presence or exclusion of receptors is an interesting question that we will address in our revision, along with showing markers of each neuronal cluster, as suggested by the reviewer.  

      Reviewer #2 (Public Review):

      Summary:

      The study focuses on the vomeronasal organ, the peripheral chemosensory organ of the accessory olfactory system, by employing single-cell transcriptomics. The author analyzed the mouse vomeronasal organ, identifying diverse cell types through their unique gene expression patterns. Developmental gene expression analysis revealed that two classes of sensory neurons diverge in their maturation from common progenitors, marked by specific transient and persistent transcription factors. A comparative study between major neuronal subtypes, which differ in their G-protein sensory receptor families and G-protein subunits (Gnai2 and Gnao1, respectively), highlighted a higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons. Moreover, distinct differences in ER content and ultrastructure suggest some intriguing roles of ER in Gnao1-positive vomeronasal neurons. This work is likely to provide useful data for the community and is conceptually novel with the unique role of ER in a subset of vomeronasal neurons. This reviewer has some minor concerns and some suggestions to improve the manuscript.

      Strengths:

      (1) The study identified diverse cell types based on unique gene expression patterns, using single-cell transcriptomic.

      (2) The analysis suggests that two classes of sensory neurons diverge during maturation from common progenitors, characterized by specific transient and persistent transcription factors.

      (3) A comparative study highlighted differences in Gnai2- and Gnao1-positive sensory neurons.

      (4) Higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons.

      (5) Distinct differences in ER content and ultrastructure suggest unique roles of ER in Gnao1-positive vomeronasal neurons.

      (6) The research provides conceptually novel on the unique role of ER in a subset of vomeronasal neurons, offering valuable insights to the community.

      Weaknesses:

      (1) The connection between observations from sc RNA-seq and EM is unclear.

      (2) The lack of quantification for the ER phenotype is a concern.

      We would like to point out that the connection between scRNA-seq and EM was made in our experiments that investigated the localization of ER proteins via IHC (in Figure 5). The intriguing observation that the levels of a number of ER luminal and membrane proteins were higher in Gnao1 compared to Gnai2 neurons, led us to hypothesize a differential ER content or ultrastructure, which was verified by EM. The quantification of ER phenotype would definitely strengthen our observations, which we will add in our revised manuscript.       

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Devakinandan and colleagues have undertaken a thorough characterization of the cell types of the mouse vomeronasal organ, focusing on the vomeronasal sensory neurons (VSNs). VSNs are known to arise from a common pool of progenitors that differentiate into two distinct populations characterized by the expression of either the G protein subunit Gnao1 or Gnai2. Using single-cell RNA sequencing followed by unsupervised clustering of the transcriptome data, the authors identified three Gnai2+ VSN subtypes and a single Gnao1+ VSN type. To study VSN developmental trajectories, Devakinandan and colleagues took advantage of the constant renewal of the neuronal VSN pool, which allowed them to harvest all maturation states. All neurons were re-clustered and a pseudotime analysis was performed. The analysis revealed the emergence of two pools of Gap43+ clusters from a common lineage, which differentiate into many subclusters of mature Gnao1+ and Gnai2+ VSNs. By comparing the transcriptomes of these two pools of immature VSNs, the authors identified a number of differentially expressed transcription factors in addition to known markers. Next, by comparing the transcriptomes of mature Gnao1+ and Gnai2+ VSNs, the authors report the enrichment of ER-related genes in Gnao1+ VSNs. Using electron microscopy, they found that this enrichment was associated with specific ER morphology in Gnao1+ neurons. Finally, the authors characterized chemosensory receptor expression and co-expression (as well as H2-Mv proteins) in mature VSNs, which recapitulated known patterns.

      Strengths:

      The data presented here provide new and interesting perspectives on the distinguishing features between Gnao1+ and Gnai2+ VSNs. These features include newly identified markers, such as transcription factors, as well as an unsuspected ER-related peculiarity in Gnao1+ neurons, consisting of a hypertrophic ER and an enrichment in ER-related genes. In addition, the authors provide a comprehensive picture of specific co-expression patterns of V2R chemoreceptors and H2-Mv genes.

      Importantly, the authors provide a browser (scVNOexplorer) for anyone to explore the data, including gene expression and co-expression, number and proportion of cells, with a variety of graphical tools (violin plots, feature plots, dot plots, ...).

      Weaknesses:

      The study still requires refined analyses of the data and rigorous quantification to support the main claims.

      The method description for filtering and clustering single-cell RNA-sequencing data is incomplete. The Seurat package has many available pipelines for single-cell RNA-seq analysis, with a significant impact on the output data. How did the authors pre-process and normalize the data? Was the pipeline used with default settings? What batch correction method was applied to the data to mitigate possible sampling or technical effects? Moreover, the authors do not describe how cell and gene filtering was performed.

      The data in Figure 7-Supplement 3 show that one-sixth of the V1Rs do not express any chemoreceptor, while over a hundred cells express more than one chemoreceptor. Do these cells have unusually high or low numbers of genes or counts? To exclude the possibility of a technical artifact in these observations, the authors should describe how they dealt with putative doublet cells or debris.

      Surprisingly, some clusters are characterized by the expression of specific chemoreceptors (VRs). Have these been used for clustering? If so, clustering should be repeated after excluding these receptors.

      The identification of the VSN types should be consistent across the different analyses and validated. The data presented in Figure 1 lists four mature VSN types, whereas the re-clustering of neurons presented in Figure 3 leads to a different subdivision. At present, it remains unclear whether these clusters reflect the biology of the system or are due to over-clustering of the data, and therefore correspond to either noise or arbitrary splitting of continua. Clusters should be merged if they do not correspond to discrete categories of cells, and correspondence should be established between the different clustering analyses. To validate the detected clusters as cell types, markers characteristic of each of these populations can be evaluated by ISH or IHC.

      There is a lack of quantification of imaging data, which provides little support for the ER-related main claim. Quantification of co-expression and statistics on labeling intensity or coverage would greatly strengthen the conclusions and the title of the paper.

      scRNA-seq data analysis methods: We agree with the reviewer and will elaborate on the various criterion, parameters and methods in our revision. As described above, our revised manuscript will include analysis of how inclusion / exclusion of VRs affects cell clusters, as well as quantification of the ER phenotype. We will address the reviewer’s concern of over-clustering.

      We think that the cells expressing zero as well as two V1Rs are real and cannot be attributed to debris or doublets for the following reasons:

      a) Cells expressing no V1Rs are not necessarily debris because they express other neuronal markers at the same level as cells that express one or two V1Rs. Higher expression threshold values used in our analysis may have somewhat increased the proportion of cells with zero V1Rs. We will modify figure 7-supplement 3c to add another group showing Gnai2 level in cells expressing zero V1Rs.

      b) Cells co-expressing V1R genes: We listed the frequency of cells co-expressing V1R gene combinations in Supplementary table - 8. Among 134 cells that express two V1Rs, 44 cells express Vmn1r85+Vmn1r86, 21 express Vmn1r184+Vmn1r185, 13 express Vmn1r56+Vmn1r57, 6 express Vmn1r168+Vmn1r177, and so on. Doublets generally are a random combination of two cells. Here, each specific co-expression combination represents multiple cells and is highly unlikely by random chance. Some of the co-expression combinations were identified earlier and verified experimentally in Lee et al., 2019 and Hills et. al. Furthermore, Figure-7 supplement 3c shows that the level of Gnai2 expression is comparable across cells expressing one or two V1Rs. If the V1R expressing cells are doublets, we expect the level of Gnai2 to be higher, as compared to cells expressing single V1R. We will elaborate on this in our revised manuscript.

    1. eLife assessment

      This study compiles a wide range of results on the connectivity, stimulus selectivity, and potential role of the claustrum in sensory behavior. While most of the connectivity results confirm earlier studies, this valuable work provides incomplete evidence that the claustrum responds to multimodal stimuli and that local connectivity is reduced across cells that have similar long-range connectivity. The conclusions drawn from the behavioral results are weakened by the animals' poor performance on the designed task.This study has the potential to be of interest to neuroscientists.

    2. Reviewer #1 (Public Review):

      Summary:

      The paper by Shelton et al investigates some of the anatomical and physiological properties of the mouse claustrum. First, they characterize the intrinsic properties of claustrum excitatory and inhibitory neurons and determine how these different claustrum neurons receive input from different cortical regions. Next, they perform in vitro patch clamp recordings to determine the extent of intraclaustrum connectivity between excitatory neurons. Following these experiments, in vivo axon imaging was performed to determine how claustrum-retrosplenial cortex neurons are modulated by different combinations of auditory, visual, and somatosensory input. Finally, the authors perform claustrum lesions to determine if claustrum neurons are required for performance on a multisensory discrimination task

      Strengths:

      An important potential contribution the authors provide is the demonstration of intra-claustrum excitation. In addition, this paper provides the first experimental data where two cortical inputs are independently stimulated in the same experiment (using 2 different opsins). Overall, the in vitro patch clamp experiments and anatomical data provide confirmation that claustrum neurons receive convergent inputs from areas of the frontal cortex. These experiments were conducted with rigor and are of high quality.

      Weaknesses:

      The title of the paper states that claustrum neurons integrate information from different cortical sources. However, the authors did not actually test or measure integration in the manuscript. They do show physiological convergence of inputs on claustrum neurons in the slice work. Testing integration through simultaneous activation of inputs was not performed. The convergence of cortical input has been recently shown by several other papers (Chia et al), and the current paper largely supports these previous conclusions. The in vivo work did test for integration because simultaneous sensory stimulations were performed. However, integration was not measured at the single cell (axon) level because it was unclear how activity in a single claustrum ROI changes in response to (for example) visual, tactile, and visual-tactile stimulations. Reading the discussion, I also see the authors speculate that the sensory responses in the claustrum could arise from attentional or salience-related inputs from an upstream source such as the PFC. In this case, claustrum cells would not integrate anything (but instead respond to PFC inputs).

      The different experiments in different figures often do not inform each other. For example, the authors show in Figure 3 that claustrum-RSP cells (CTB cells) do not receive input from the auditory cortex. But then, in Figure 6 auditory stimuli are used. Not surprisingly, claustrum ROIs respond very little to auditory stimuli (the weakest of all sensory modalities). Then, in Figure 7 the authors use auditory stimuli in the multisensory task. It seems that these experiments were done independently and were not used to inform each other.

      One novel aspect of the manuscript is the focus on intraclaustrum connectivity between excitatory cells (Figure 2). The authors used wide-field optogenetics to investigate connectivity. However, the use of paired patch-clamp recordings remains the ground truth technique for determining the rate of connectivity between cell types, and paired recordings were not performed here. It is difficult to understand and gain appreciation for intraclaustrum connectivity when only wide-field optogenetics is used.

      In Figure 2, CLA-rsp cells express Chrimson, and the authors removed cells from the analysis with short latency responses (which reflect opsin expression). But wouldn't this also remove cells that express opsin and receive monosynaptic inputs from other opsin-expressing cells, therefore underestimating the connectivity between these CLA-rsp neurons? I think this needs to be addressed.

      In Figure 5J the lack of difference in the EPSC-IPSC timing in the RSP is likely due to 1 outlier EPSC at 30ms which is most likely reflecting polysynaptic communication. Therefore, I do not feel the argument being made here with differences in physiology is particularly striking.

      In the text describing Figure 5, the authors state "These experiments point to a complex interaction ....likely influenced by cell type of CLA projection and intraclaustral modules in which they participate". How does this slice experiment stimulating axons from one input relate to different CLA cell types or intra-claustrum circuits? I don't follow this argument.

      In Figure 6G and H, the blank condition yields a result similar to many of the sensory stimulus conditions. This blank condition (when no stimulus was presented) serves as a nice reference to compare the rest of the conditions. However, the remainder of the stimulation conditions were not adjusted relative to what would be expected by chance. For example, the response of each cell could be compared to a distribution of shuffled data, where time-series data are shuffled in time by randomly assigned intervals and a surrogate distribution of responses generated. This procedure is repeated 200-1000x to generate a distribution of shuffled responses. Then the original stimulus-triggered response (1s post) could be compared to shuffled data. Currently, the authors just compare pre/post-mean data using a Mann-Whitney test from the mean overall response, which could be biased by a small number of trials. Therefore, I think a more conservative and statistically rigorous approach is warranted here, before making the claim of a 20% response probability or 50% overall response rate.

      Regarding Figure 6, a more conventional way to show sensory responses is to display a heatmap of the z-scored responses across all ROIs, sorted by their post-stimulus response. This enables the reader to better visualize and understand the claims being made here, rather than relying on the overall mean which could be influenced by a few highly responsive ROIs.

      For Figure 6, it would also help to display some raw data showing responses at the single ROI level and the population level. If these sensory stimulations are modulating claustrum neurons, then this will be observable on the mean population vector (averaged df/f across all ROIs as a function of time) within a given experiment and would add support to the conclusions being made.

      As noted by the authors, there is substantial evidence in the literature showing that motor activity arises in mice during these types of sensory stimulation experiments. It is foreseeable that at least some of the responses measured here arise from motor activity. It would be important to identify to what extent this is the case.

      All claims in the results for Figure 6 such as "the proportion of responsive axons tended to be highest when stimuli were combined" should be supported by statistics.

      In Figure 7, the authors state that mice learned the structure of the task. How is this the case, when the number of misses is 5-6x greater than the number of hits on audiovisual trials (S Figure 19). I don't get the impression that mice perform this task correctly. As shown in Figure 7I, the hit rate is exceptionally low on the audiovisual port in controls. I just can't see how control and lesion mice can have the same hit rate and false alarm rate yet have different d'. Indeed, I might be missing something in the analysis. However, given that both groups of mice are not performing the task as designed, I fail to see how the authors' claim regarding multisensory integration by the claustrum is supported. Even if there is some difference in the d' measure, what does that matter when the hits are the least likely trial outcome here for both groups.

      In the discussion, it is stated that "While axons responded inconsistently to individual stimulus presentations, their responsivity remained consistent between stimuli and through time on average...". I do not understand this part of the sentence. Does this mean axons are consistently inconsistent?

      In the discussion, the authors state their axon imaging results contrast with recent studies in mice. Why not actually do the same analysis that Ollerenshaw did, so this statement is supported by fact? As pointed out above, the criteria used to classify an axon as responsive to stimuli were very liberal in this current manuscript.

      I find the discussion wildly speculative and broad. For example, "the integrative properties of the CLA could act as a substrate for transforming the information content of its inputs (e.g. reducing trial-to-trial variability of responses to conjunctive stimuli...)". How would a claustrum neuron responding with a 10% reliability to a stimuli (or set of stimuli) provide any role in reducing trial-to-trial variability of sensory activity in the cortex?

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Shelton et al. explore the organization of the Claustrum. To do so, they focus on a specific claustrum population, the one projecting to the retrosplenial cortex (CLA-RSP neurons). Using an elegant technical approach, they first described electrophysiological properties of claustrum neurons, including the CLA-RSP ones. Further, they showed that CLA-RSP neurons (1) directly excite other CLA neurons, in a 'projection-specific' pattern, i.e. CLA-RSP neurons mainly excite claustrum neurons not projecting to the RSP and (2) received excitatory inputs from multiple cortical territories (mainly frontal ones). To confirm the 'integrative' property of claustrum networks, they then imaged claustrum axons in the cortex during single- or multi-sensory stimulations. Finally, they investigated the effect of CLA-RSP lesion on performance in a sensory detection task.

      Strengths:<br /> Overall, this is a really good study, using state-of-the-art technical approaches to probe the local/global organization of the Claustrum. The in-vitro part is impressive, and the results are compelling.

      Weaknesses:<br /> One noteworthy concern arises from the terminology used throughout the study. The authors claimed that the claustrum is an integrative structure. Yet, integration has a specific meaning, i.e. the production of a specific response by a single neuron (or network) in response to a specific combination of several input signals. In this study, the authors showed compelling results in favor of convergence rather than integration. On a lighter note, the in-vivo data are less convincing, and do not entirely support the claim of "integration" made by the authors.

    4. Reviewer #3 (Public Review):

      The claustrum is one of the most enigmatic regions of the cerebral cortex, with a potential role in consciousness and integrating multisensory information. Despite extensive connections with almost all cortical areas, its functions and mechanisms are not well understood. In an attempt to unravel these complexities, Shelton et al. employed advanced circuit mapping technologies to examine specific neurons within the claustrum. They focused on how these neurons integrate incoming information and manage the output. Their findings suggest that claustrum neurons selectively communicate based on cortical projection targets and that their responsiveness to cortical inputs varies by cell type.

      Imaging studies demonstrated that claustrum axons respond to both single and multiple sensory stimuli. Extended inhibition of the claustrum significantly reduced animals' responsiveness to multisensory stimuli, highlighting its critical role as an integrative hub in the cortex.

      However, the study's conclusions at times rely on assumptions that may undermine their validity. For instance, the comparison between RSC-projecting and non-RSC-projecting neurons is problematic due to potential false negatives in the cell labeling process, which might not capture the entire neuron population projecting to a brain area. This issue casts doubt on the findings related to neuron interconnectivity and projections, suggesting that the results should be interpreted with caution. The study's approach to defining neuron types based on projection could benefit from a more critical evaluation or a broader methodological perspective.

      Nevertheless, the study sets the stage for many promising future research directions. Future work could particularly focus on exploring the functional and molecular differences between E1 and E2 neurons and further assess the implications of the distinct responses of excitatory and inhibitory claustrum neurons for internal computations. Additionally, adopting a different behavioral paradigm that more directly tes2ts the integration of sensory information for purposeful behavior could also prove valuable.

    1. eLife assessment

      This valuable study uses dynamic metabolic models to compare perturbation responses in a bacterial system, analyzing whether they return to their steady state or amplify beyond the initial perturbation. The evidence supporting the emergent properties of perturbed metabolic systems to network topology and sensitivity to specific metabolites is compelling. However, the mathematical explanation of the perturbation response is incomplete, and a more comprehensive metabolic and biosynthesis model would be beneficial.

    2. Reviewer #1 (Public Review):

      Summary

      The author studied metabolic networks for central metabolism, focusing on how system trajectories returned to their steady state. To quantify the response, systematic perturbation was performed in simulation and the maximal destabilization away from the steady state (compared with the initial perturbation distance) was characterized. The author analyzed the perturbation response and found that sparse networks and networks with more cofactors are more "stable", in the sense that the perturbed trajectories have smaller deviations along the path back to the steady state.

      Strengths and major contributions

      The author compared three metabolic models and performed systematic perturbation analysis in simulation. This is the first work to characterize how perturbed trajectories deviate from equilibrium in large biochemical systems and illustrated interesting findings about the difference between sparse biological systems and randomly simulated reaction networks.

      Weaknesses

      There are two main weaknesses in this study:

      First, the metabolic network in this study is incomplete. For example, amino acid synthesis and lipid synthesis are important for biomass and growth, but they are not included in the three models used in this study. NADH and NADPH are as important as ATP/ADP/AMP, but they are not included in the models. In the future, a more comprehensive metabolic and biosynthesis model is required.

      Second, this work does not provide a mathematical explanation of the perturbation response χ. Since the perturbation analysis is performed close to the steady state (or at least belongs to the attractor of single-steady-state), local linear analysis would provide useful information. By complementing with other analysis in dynamical systems (described below) we can gain more logical insights about perturbation response.

      Discussion and impact for the field

      Metabolic perturbation is an important topic in cell biology and has important clinical implications in pharmacodynamics. The computational analysis in this study provides an initiative for future quantitative analysis on metabolism and homeostasis.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors have conducted a valuable comparative analysis of perturbation responses in three nonlinear kinetic models of E. coli central carbon metabolism found in the literature. They aimed to uncover commonalities and emergent properties in the perturbation responses of bacterial metabolism. They discovered that perturbations in the initial concentrations of specific metabolites, such as adenylate cofactors and pyruvate, significantly affect the maximal deviation of the responses from steady-state values. Furthermore, they explored whether the network connectivity (sparse versus dense connections) influences these perturbation responses. The manuscript is reasonably well written.

      Strengths:

      Well-defined and valuable research questions.

      Weaknesses:

      (1) In the study on determining key metabolites affecting responses to perturbations (starting from line 171), the authors fix the values of individual concentrations to their steady-state values and observe the responses. Such a procedure adds artificial constraints to the network because, in the natural responses of cells (and models) to perturbations, it is highly unlikely that metabolites will not evolve in time. By fixing the values of specific metabolites, the authors prohibit the metabolic network from evolving in the most optimal way to compensate for the perturbation. Instead of this procedure, have the authors considered for this task applying techniques from variance-based sensitivity analysis (Sobol, global sensitivity analysis), where they can calculate the first-order sensitivity index and total effect index? Using this technique, the authors would be able to determine the key metabolites while allowing for metabolic responses to perturbations without unnatural constraints.

      (2) To follow up on the previous remark, the authors state that the metabolites that augment the response coefficient when their concentration is fixed tend to be allosteric regulators. The authors should report which allosteric regulations are implemented in each of the models so that one can compare against Figure 2. Again, the effect of allosteric regulation by a specific metabolite that is quantified the way the authors did is biased by fixing the concentration value - it is true that negative feedback is broken when the metabolite concentration is fixed, however, in the rate law, there is still the fixed inhibition term with its value corresponding to the inhibition at the steady state. To see the effect of allosteric regulation by a metabolite, one can change the inhibition constants instead of constraining the responses with fixed concentrations.

      (3) Given the role of ATP in metabolic processes, the authors' finding of the sensitivity of the three networks' responses to perturbations in the AXP concentrations seems reasonable. However, drawing such firm conclusions from only three models, with each of them built around one steady state and having one kinetic parameter set despite that they were built for different physiologies, raises some questions. It is well-known in studies related to basins of attraction of the steady states that the nonlinear responses also depend on the actual steady states, the values of kinetic parameters, and implemented kinetic rate law, i.e., not only on the topology of the underlying systems. In the population of only three models, we cannot exclude the possibility of overlaps and strong similarities in the values of kinetic parameters, steady states, and enzyme saturations that all affect and might bias the observed responses. Ideally, to eliminate the possibility of such biases, one should simulate responses of a large population of models for multiple physiologies (and the corresponding steady states) and multiple parameter sets per physiology. This can be a difficult task, but having more kinetic models in this work would go a long way toward more convincing results. Recently, E. coli nonlinear kinetic models from several groups appeared that might help in this task, e.g., Haiman et al., PLoS Comput Biol, 17(1): e1008208, (2021), Choudhury et al., Nat Mach Intell, 4, 710-719, (2022); Hu et al., Metab Eng, 82, 123-133 (2024), Narayanan et al., Nat Commun, 15:723, (2024).

      (4) Can the authors share their insights on what could be the underlying reasons for the bimodal distribution in Figure 1E? Even after adding random reactions, the distribution still has two modes - why is that?

      (5) Considering the effects of the sparsity of the networks on the perturbation responses (from line 223 onwards), when we compare the three analyzed models, it is clear that the Khodayari et al. model is a superset of the other two models. Therefore, this model can be considered as, e.g., Chassagnole model with Nadd reactions (though not randomly added). Based on Figures 1b and S2, one can observe that the responses of the Khodayari models have stronger responses, which is exactly opposite to the authors' conclusion that adding the reactions weakens the responses. The authors should comment on this.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      The manuscript by Sejour et al. is testing "translational ramp" model described previously by Tuller et al. in S. cerevisiae. Authors are using bioinformatics and reporter based experimental approaches to test whether "rare codons" in the first 40 codons of the gene coding sequences increase translation efficiency and regulate abundance of translation products in yeast cells. Authors conclude that "translation ramp" model does not have support using a new set of reporters and bioinformatics analyses. The strength of bioinformatic evidence and experimental analyses (even very limited) of the rare codons insertion in the reporter make a compelling case for the authors claims. However the major weakness of the manuscript is that authors do not take into account other models that previously disputed "rare or slow codon" model of Tuller et al. and overstate their own results that are rather limited. This maintains to be the weak part of the manuscript even in the revised form.

      We are glad the reviewer thinks our evidence makes “a compelling case for the authors claims”. This was our main aim, and we are satisfied with this.

      The reviewer believes the major weakness of the manuscript is that we do not take into account other models and do not (see below) cite numerous other relevant papers. The reviewer made essentially the same criticism at the first review, at which time we looked quite hard for papers generally meeting the reviewer’s description. We found a few, which we incorporated here. Still, we did not find the body of evidence whose existence the reviewer implies. We are citing every study we know to be relevant, though of course we will have inadvertently missed some, given the huge body of literature. After the first round of review, we wrote “the reviewer did not give specific references, and, though we looked, we weren’t always sure which papers the reviewer had in mind.” We hoped the reviewer would provide citations. But only two citations are provided here, both to A. Kochetov, and these don’t seem central to the reviewer’s points.

      The studies that authors do not mention argue with "translation ramp" model and show more thorough analyses of translation initiation to elongation transition as well as early elongation "slow down" in ribosome profiling data. Moreover several studies have used bioinformatical analyses to point out the evolution of N-terminal sequences in multiple model organisms including yeast, focusing on either upstream ORFs (uORFs) or already annotated ORFs. The authors did not mention multiple of these studies in their revised manuscript and did not comment on their own results in the context of these previous studies.

      Mostly, we do not know to what papers the reviewer is referring. This may be our failing, but it would have helped if the reviewer had cited one of them. There are papers discussing the evolution of N-terminal sequences, but as far as we know, these do not discuss translation speed or codon usage. Of course, we may have missed some papers.

      As such the authors approach to data presentation, writing and data discussion makes the manuscript rather biased, focused on criticizing Tuller et al. study and short on discussing multiple other possible reasons for slow translation elongation at the beginning of the protein synthesis. This all together makes the manuscript at the end very limited.

      We think the reviewer may be considering our paper as being generally about translation speeds, whereas in our minds, it is not. This difference in views as to what the paper is “about” is perhaps causing friction. To us, it is indeed a limited paper. We are narrowly focused on the finding of Tuller that there is an enrichment of rare, slow codons at the 5’ end of genes, and we have sought an explanation of this particular fact. This is not a paper about rates of translation generally—it is a limited paper about the reason for the 5’ enrichment of rare, slow codons.

      To expand on this, the encoded slow 5’ translation due to rare, slow codons (of Tuller et al.) is a small effect (1% to 3%). The possible unencoded slow 5’ translation of unknown mechanism discussed by some other papers (e.g., Weinberg et al. 2016, Shah et al. 2013) is a much larger effect (50% or more). Just from the different magnitudes, it seems likely these are different phenomena. And yet, despite the small size of the encoded effect, it is for some reason this paper by Tuller et al. that has captured the attention of the literature: as we point out below, Tuller et al. has been cited over 900 times. Partly because of the wide and continuing influence of this paper, it is worth specifically and narrowly addressing its findings.

      Reviewer #2 (Public Review):

      Tuller et al. first made the curious observation, that the first ∼30-50 codons in most organisms are encoded by scarce tRNAs and appear to be translated slower than the rest of the coding sequences (CDS). They speculated that this has evolved to pace ribosomes on CDS and prevent ribosome collisions during elongation - the "Ramp" hypothesis. Various aspects of this hypothesis, both factual and in terms of interpreting the results, have been challenged ever since. Sejour et al. present compelling results confirming the slower translation of the first ~40 codons in S. cerevisiae but providing an alternative explanation for this phenomenon. Specifically, they show that the higher amino acid sequence divergence of N-terminal ends of proteins and accompanying lower purifying selection (perhaps the result of de novo evolution) is sufficient to explain the prevalence of rare slow codons in these regions. These results are an important contribution in understanding how aspects of the evolution of protein coding regions can affect translation efficiency on these sequences and directly challenge the "Ramp" hypothesis proposed by Tuller et al.

      I believe the data is presented clearly and the results generally justify the conclusions.

      We thank the reviewer for his/her attention to the manuscript, and for his/her comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      As mentioned in the public review major weakness of the manuscript is the lack of analyses for confounding effects, overstatements of the results (using single amino acid sequence reporter) and the lack of discussion of previous work that argues against Tuller et al model. In my previous review I mentioned multiple other studies that addressed "slow codons" model in more detail.

      No, the reviewer did not cite any specific studies.

      While some of these studies are mentioned in the revised manuscript, authors are still rather biased and selective in their discussions. I should also point out that previous studies, that authors fail again to mention, were focused on either translation initiation, initiation to elongation transition or early elongation effects in relation to mRNA sequence, structure, codons as well as amino acid sequence. Also additional studies with bioinformatic analyses of N-terminal conservation and existence of start sites at the beginning of the protein sequences in multiple model organisms were also omitted.

      Again, we do not know to what papers the reviewer is referring. But this sounds like a lot. Our paper is aimed at a specific, narrow topic: Why is there an excess of rare, slow codons in the 5’ region of genes? We are not trying to make general statements about all things affecting and affected by translation speed, we are just trying to explain the excess of rare, slow codons.

      In general manuscript seems to be too much focused-on discussion of Tuller's paper . . .

      Yes, we are focused on the Tuller findings, the excess of rare slow codons in 5’ regions.

      . . . and arguing with the model that was already shown by multiple other studies to be limited and not correct.

      We find it unsatisfactory that the reviewer states in a public review that there are multiple other studies showing that the Tuller model is not correct, and yet does not cite any of them. Furthermore, for the reviewer to say that Tuller et al. is “not correct” is too sweeping. The core finding of Tuller et al. was the excess of rare, slow codons in the 5’ regions of genes. We confirm this; we believe it is correct; we are not aware of any literature disputing this. Then, Tuller interpreted this as an adaptation to promote translational efficiency. On the interpretation, we disagree with Tuller. But if one is to disagree with this interpretation, one needs an alternative explanation of the fact of the excess rare, slow codons. Providing such an alternative explanation, and doing an experiment to distinguish the explanations, is our contribution. We are not aware of any other paper making our interpretation.

      There are of course many papers that discuss various aspects of translation at the 5’ ends of genes, and we do cite quite a few such papers in our manuscript, though certainly not all. But papers of this general kind do not, and cannot, show that Tuller et al. is “not correct”. As far as we know, no paper provides an alternative explanation for the rare slow codons, and no paper does an experiment to modulate translation speed and look at the effect on gene expression. Notably, the slow translation phenomenon associated with the rare codons found by Tuller et al. is a very small effect—a change of about 1% to 3% of translation speed. Some other papers on translation speed are dealing with possible changes in the range of 50% or more. These are presumably some other phenomenon (if indeed they are even real changes in translation speed), and, whether they are true or not, the results and interpretations of Tuller et al. could still be true or not. Of course, if we knew of some previous paper showing the Tuller paper is not correct, we should and would cite it.

      To expand on the current view of Tuller in the literature, Tuller et al. has been cited 956 times according to Google Scholar. This makes it an extremely influential paper. After finding Tuller et al. in Entrez Pubmed, one can look under “Cited by” and see the five most recent papers that cite Tuller et al. The five papers given on May 23 2024 were Bharti . . . Ignatova 2024; Uddin 2024; Khandia . . . Choudhary 2024; Love and Nair 2024; and Oelschlaeger 2024. We went through these five most recent papers that cite Tuller et al., and asked, did these authors cite the Tuller results as fully correct, or did they mention any doubts about the results? All five of the papers cited the Tuller results as fully correct, with no mention of any kind of doubt. For instance, Kandia et al. 2024 state “The slow “ramp” present at 5’ end of mRNA forms an optimal and robust means to reduce ribosomal traffic jams, thus minimizing the cost of protein expression40.”, while Oelschlaeger (2024) states “Slow translation ramps have also been described elsewhere and proposed to prevent traffic jams along the mRNA [51,52,53].” Although Uddin (2024) cited Tuller as fully correct, Uddin seemed to think (it is a little unclear) that Tuller found an enrichment of highly-used codons, opposite to the actual finding. The multiple contrary studies mentioned by the reviewer do not seem to have been very influential.

      There are papers containing skepticism about the Tuller interpretation, and also papers with results that are difficult to reconcile in a common-sense way with the Tuller interpretation. But skepticism, and a difficulty to reconcile with common sense, are far from a demonstration that a paper is incorrect. Indeed, Tuller et al. may have been published in Cell, and may be so highly cited, exactly because the findings are counter-intuitive, colliding with common sense. Our contribution is to find a common-sense interpretation of the surprising but correct underlying fact of the 5’ enrichment of rare, slow codons.

      Having wrote that in the previous review, I have to admit that Sejour et al manuscript in the main text has a minimal amount of novelty with experimental evidence, the conclusions are based on three reporters with and without stalling/collision sequence with the same amino acid sequence and varying codons. Some more novelty is seen in bioinformatic analyses of multiple yeast sequences and sequence conservation at the N-termini of proteins. However, even this part of the manuscript is not discussed fully and with correct comparison to previous studies. Authors, based on my previous comments discuss further experimental shortcomings in their new and "expanded" discussion but the use of a single reporter in this case cannot relate to all differences that may be coming from ORFs seen in complete yeast transcriptome. There are multiple studies that used more reporters with more than one amino-acid and mRNA sequence as well as with similar variation of the rare or common codons. The handwaving argument about the influence of all other mechanisms that can arise from different start sites, RNA structure, peptide interaction with exit channel, peptidyl-tRNA drop-off, eIF3 complex initiation-elongation association, and etc, is just pointing up to a manuscript that is more about bashing up Tuller's model and old paper than trying to make a concise story about their own results and discuss their study in plethora of studies that indicated multiple other models for slow early elongation.

      We don’t understand why the reviewer is so grudging.

      Discussion of the ribosome's collisions and potential impact of such scenario in the author's manuscript is left completely without citation, even though such work has relevant results to the author's conclusions and Tuller's model.

      This is not true. We cite Dao Duc and Song (2018) “The impact of ribosomal interference, codon usage, and exit tunnel interactions on translation elongation rate variation.” PLoS Genet 14, and Tesina, . . . and Green (2020) “Molecular mechanism of translational stalling by inhibitory codon combinations and Poly(A) tracts. EMBO J., which are two excellent papers on this subject. We also cite Gamble et al. (2016), who found the underlying result, but at that time did not attribute it to ribosome collisions.

      Previous studies (not cited) for example clearly indicate how the length from stalling sequence to start codon is related to ribosome collisions. Moreover such studies are pointing out differences in initiation vs elongation rates that may impact ribosome collisions and protein expression. Both of these topics would be very valuable in discussions of evolutionary changes in the current yeast ORFs. Not to mention that authors do not really discuss also possibilities for differences in 5'UTRs and uORFs in relation to downstream ORFs sequence and codon composition.

      It is not clear to us that such papers are highly relevant to the issue on which we are working.

      The argument about whether cycloheximide or not is doing 5' ribosome slowdown (lines 425-443) is just rambling about Weinberg's paper from 2016 without any real conclusion. In this section authors are just throwing down hypothesis that were more clearly explained in Weinberg's manuscript or shown experimentally in studies done after the Weinberg et al. paper was published.

      Earlier, the reviewer had the criticism that “The studies that authors do not mention argue with "translation ramp" model and show more thorough analyses of translation initiation to elongation transition as well as early elongation "slow down" in ribosome profiling data.” The main study we know of dealing with these issues like these is that of Weinberg et al. 2016. In our opinion, this is a thoughtful paper on these issues. But now, at this point, the reviewer seems to criticize the fact that we do extensively cite results from Weinberg et al. It is true that there is no ultimate conclusion, but why there is no conclusion is a little bit interesting. Weinberg et al show that even in studies that do not use cycloheximide as the first step in ribosome profiling, there is some left-over high density of ribosomes near 5’ ends. But, all these ribosome profiling experiments do use cycloheximide at a later step in the procedure. Until someone does a ribosome profiling experiment without the use of any cycloheximide at any step, there will be no firm conclusion. This is not our fault—and also not the issue we are writing about. And, the reason this paragraph is in the manuscript at all is that the reviewer (we thought) had asked for something like this in the first review.

      At the end, even in the limited novelty of evolutionary arguments about non-existing N-terminal conservation of codons or amino acids they fail to cite and discuss previous work by Kochetov (BioEssays, 2008 and NAR, 2011) which have additional explanation on evolution of N-terminal sequences in yeast, human or Drosophila.

      These two papers of Dr. Kochetov’s have some relevance and we now cite them. These are the only papers cited by the reviewer in his/her two reviews.

      Probably the reviewer would have preferred a paper on a different subject.


      The following is the authors’ response to the original reviews.

      Response to Reviewers:

      We thank the reviewers for their comments, and their evident close reading of the manuscript. Generally, we agree with the reviewers on the strengths and weaknesses of our manuscript. Our revised manuscript has a more extensive discussion of alternative explanations for initial high ribosome density as seen by ribosome profiling, and which more specifically points out the limitations of our work.

      As a preface to specific responses to the reviewers, we will say that we could divide observations of slow initial translation into two categories, which we will call “encoded slow codons”, and “increased ribosome density”. With respect to the first category, Tuller et al. documented initial “encoded slow codons”, that is, there is a statistical excess of rare, slowly-translated codons at the 5’ ends of genes. Although the size of this effect is small, statistical significance is extremely high, and the existence of this enrichment is not in any doubt. At first sight, this appears to be a strong indication of a preference for slow initial translation. In our opinion, our main contribution is to show that there is an alternative explanation for this initial enrichment of rare, slow codons—that they are a spandrel, a consequence of sequence plasticity at the 5’ (and 3’) ends of genes. The reviewers seem to generally agree with this, and we are not aware that any other work has provided an explanation for the 5’ enrichment of rare codons.

      The second category of observations pertaining to slow initial translation is “increased ribosome density”. Early ribosome profiling studies used cycloheximide to arrest cell growth, and these studies showed a higher density of ribosomes near the 5’ end of genes than elsewhere. This high initial ribosome density helped motivate the paper of Tuller et al., though their finding of “encoded slow codons” could explain only a very small part of the increased ribosome density. More modern ribosome profiling studies do not use cycloheximide as the first step in arresting translation, and in these studies, the density of ribosomes near the 5’ end of genes is greatly reduced. And yet, there remains, even in the absence of cycloheximide at the first step, a significantly increased density of ribosomes near the 5’ end (e.g., Weinberg et al., 2016). (However, most or all of these studies do use cycloheximide at a later step in the protocol, and the possibility of a cycloheximide artefact is difficult to exclude.) Some of the reviewer’s concerns are that we do not explain the increased 5’ ribosome density seen by ribosome profiling. We agree; but we feel it is not the main point of our manuscript. In revision, we more extensively discuss other work on increased ribosome density, and more explicitly point out the limitations of our manuscript in this regard. We also note, though, that increased ribosome density is not a direct measure of translation speed—it can have other causes.

      Specific Responses.

      Reviewer 1 was concerned that we did not more fully discuss other work on possible reasons for slow initial translation. We discuss such work more extensively in our revision. However, as far as we know, none of this work proposes a reason for the 5’ enrichment of rare, slow codons, and this is the main point of our paper. Furthermore, it is not completely clear that there is any slow initial translation. The increase in ribosome density seen in flash-freeze ribosome profiling could be an artefact of the use of cycloheximide at the thaw step of the protocols; or it could be a real measure of high ribosome density that occurs for some other reason than slow translation (e.g., ribosomes might have low processivity at the 5’ end).

      Reviewer 1 was also concerned about confounding effects in our reporter gene analysis of the effects of different codons on efficiency of translation. We have two comments. First, it is important to remember that although we changed codons in our reporters, we did not change any amino acids. We changed codons only to synonymous codons. Thus at least one of the reviewer’s possible confounding effects—interactions of the nascent peptide chain with the exit channel of the ribosome—does not apply. However, of course, the mRNA nucleotide sequence is altered, and this would cause a change in mRNA structure or abundance, which could matter. We agree this is a limitation to our approach. However, to fully address it, we feel it would be necessary to examine a really large number of quite different sequences, which is beyond the scope of this work. Furthermore, mRNAs with low secondary structure at the 5’ end probably have relatively high rates of initiation, and also relatively high rates of elongation, and it might be quite difficult to disentangle these. But in neither case is there an argument that slow initial translation is efficient. Accurate measurement of mRNA levels would be helpful, but would not disentangle rates of initiation from rates of elongation as causes of changes in expression.

      Reviewer 2 was concerned that the conservation scores for the 5’ 40 amino acids, and the 3’ 40 amino acids were similar, but slow translation was only statistically significant for the 5’ 40 amino acids. As we say in the manuscript, we are also puzzled by this. We note that 3’ translation is statistically slow, if one looks over the last 100 amino acids. Our best effort at an explanation is a sort of reverse-Tuller explanation: that in the last 40 amino acids, the new slow codons created by genome plasticity are fairly quickly removed by purifying selection, but that in the first 40 amino acids, for genes that need to be expressed at low levels, purifying selection against slow codons is reduced, because poor translation is actually advantageous for these genes. To expand on this a bit, we feel that the 5000 or so proteins of the proteome have to be expressed in the correct stoichiometric ratios, and that poor translation can be a useful tool to help achieve this. In this explanation, slow translation at the 5’ end is bad for translation (in agreement with our reporter experiments), but can be good for the organism, when it occurs in front of a gene that needs to be expressed poorly. Whereas, in Tuller, slow translation at the 5’ end is good for translation.

      Reviewer 2 wondered whether the N-terminal fusion peptide affects GFP fluorescence in our reporter. This specific reporter, with this N-terminus, has been characterized by Dean and Grayhack (2012), and by Gamble et al. (2016), and the idea that a super-folder GFP reporter is not greatly affected by N-terminal fusions is based on the work of Pedelacq (2006). None of these papers show whether this N-terminal fusion might have some effect, but together, they provide good reason to think that any effect would be small. These citations have been added.

    1. Author response:

      Reviewer #1 (Public Review):

      Abbasi et al. assess in this MEG study the directed connectivity of both cortical and subcortical regions during continuous speech production and perception. The authors observed bidirectional connectivity patterns between speech-related cortical areas as well as subcortical areas in production and perception. Interestingly, they found in speaking low-frequency connectivity from subcortical (the right cerebellum) to cortical (left superior temporal) areas, while connectivity from the cortical to subcortical areas was in the high frequencies. In listening a similar cortico-subcortical connectivity pattern was observed for the low frequencies, but the reversed connectivity in the higher frequencies was absent.

      The work by Abbasi and colleagues addresses a relevant, novel topic, namely understanding the brain dynamics between speaking and listening. This is important because traditionally production and perception of speech and language are investigated in a modality-specific manner. To have a more complete understanding of the neurobiology underlying these different speech behaviors, it is key to also understand their similarities and differences. Furthermore, to do so, the authors utilize state-of-the-art directed connectivity analyses on MEG measurements, providing a quite detailed profile of cortical and subcortical interactions for the production and perception of speech. Importantly, and perhaps most interesting in my opinion, is that the authors find evidence for frequency-specific directed connectivity, which is (partially) different between speaking and listening. This could suggest that both speech behaviors rely (to some extent) on similar cortico-cortical and cortico-subcortical networks, but different frequency-specific dynamics.

      These elements mentioned above (investigation of both production and perception, both cortico-cortical and cortico-subcortical connectivity is considered, and observing frequency-specific connectivity profiles within and between speech behaviors), make for important novel contributions to the field. Notwithstanding these strengths, I find that they are especially centered on methodology and functional anatomical description, but that precise theoretical contributions for neurobiological and cognitive models of speech are less transparent. This is in part because the study compares speech production and perception in general, but no psychophysical or psycholinguistic manipulations are considered. I also have some critical questions about the design which may pose some confounds in interpreting the data, especially with regard to comparing production and perception.

      (1) While the cortico-cortical and cortico-subcortical connectivity profiles highlighted in this study and the depth of the analyses are impressive, what these data mean for models of speech processing remains on the surface. This is in part due, I believe, to the fact that the authors have decided to explore speaking and listening in general, without targeting specific manipulations that help elucidate which aspects of speech processing are relevant for the particular connectivity profiles they have uncovered. For example, the frequency-specific directed connectivity is it driven by low-level psychophysical attributes of the speech or by more cognitive linguistic properties? Does it relate to the monitoring of speech, timing information, and updating of sensory predictions? Without manipulations trying to target one or several of these components, as some of the referenced work has done (e.g., Floegel et al., 2020; Stockert et al., 2021; Todorović et al., 2023), it is difficult to draw concrete conclusions as to which representations and/or processes of speech are reflected by the connectivity profiles. An additional disadvantage of not having manipulations within each speech behavior is that it makes the comparison between listening and speaking harder. That is, speaking and listening have marked input-output differences which likely will dominate any comparison between them. These physically driven differences (or similarities for that matter; see below) can be strongly reduced by instead exploring the same manipulations/variables between speaking and listening. If possible (if not to consider for future work), it may be interesting to score psychophysical (e.g., acoustic properties) or psycholinguistic (e.g., lexical frequency) information of the speech and see whether and how the frequency-specific connectivity profiles are affected by it.

      We thank the reviewer for pointing this out. The current study is indeed part of a larger project investigating the role of the internal forward model in speech perception and production. In the original, more comprehensive study, we also included a masked condition where participants produced speech as usual, but their auditory perception was masked. This allowed us to examine how the internal forward model behaves when it doesn't receive the expected sensory consequences of generated speech. However, for the current study, we focused solely on data from the speaking and listening conditions due to its specific research question. We agree that further manipulations would be interesting. However, for this study our focus was on natural speech and we avoided other manipulations (beyond masked speech) so that we can have sufficiently long recording time for the main speaking and listening conditions.

      (2) Recent studies comparing the production and perception of language may be relevant to the current study and add some theoretical weight since their data and interpretations for the comparisons between production and perception fit quite well with the observations in the current work. These studies highlight that language processes between production and perception, specifically lexical and phonetic processing (Fairs et al., 2021), and syntactic processing (Giglio et al., 2024), may rely on the same neural representations, but are differentiated in their (temporal) dynamics upon those shared representations. This is relevant because it dispenses with the classical notion in neurobiological models of language where production and perception rely on (partially) dissociable networks (e.g., Price, 2010). Rather those data suggest shared networks where different language behaviors are dissociated in their dynamics. The speech results in this study nicely fit and extend those studies and their theoretical implications.

      We thank the reviewer for the suggestion and we will include these references and the points made by the reviewer in our revised manuscript.

      (3) The authors align the frequency-selective connectivity between the right cerebellum and left temporal speech areas with recent studies demonstrating a role for the right cerebellum for the internal modelling in speech production and monitoring (e.g., Stockert et al., 2021; Todorović et al., 2023). This link is indeed interesting, but it does seem relevant to point out that at a more specific scale, it does not concern the exact same regions between those studies and the current study. That is, in the current study the frequency-specific connectivity with temporal regions concerns lobule VI in the right cerebellum, while in the referenced work it concerns Crus I/II. The distinction seems relevant since Crus I/II has been linked to the internal modelling of more cognitive behavior, while lobule VI seems more motor-related and/or contextual-related (e.g., D'Mello et al., 2020; Runnqvist et al., 2021; Runnqvist, 2023).

      We thank the reviewer for their insightful comment. The reference was intended to provide evidence for the role of the cerebellum in internal modelling in speech. We do not claim that we have the spatial resolution with MEG to reliably spatially resolve specific parts of the cerebellum.

      (4) On the methodological side, my main concern is that for the listening condition, the authors have chosen to play back the speech produced by the participants in the production condition. Both the fixed order as well as hearing one's own speech as listening condition may produce confounds in data interpretation, especially with regard to the comparison between speech production and perception. Could order effects impact the observed connectivity profiles, and how would this impact the comparison between speaking and listening? In particular, I am thinking of repetition effects present in the listening condition as well as prediction, which will be much more elevated for the listening condition than the speaking condition. The fact that it also concerns their own voice furthermore adds to the possible predictability confound (e.g., Heinks-Maldonado et al., 2005). In addition, listening to one's speech which just before has been articulated may, potentially strategically even, enhance inner speech and "mouthing" in the participants, hereby thus engaging the production mechanism. Similarly, during production, the participants already hear their own voice (which serves as input in the subsequent listening condition). Taken together, both similarities or differences between speaking and listening connectivity may have been due to or influenced by these order effects, and the fact that the different speech behaviors are to some extent present in both conditions.

      This is a valid point raised by the reviewer. By listening to their own previously produced speech, our participants might have anticipated and predicted the sentences easier. However, during designing our experiment, we tried to lower the chance of this anticipation by several steps. First, participants were measured in separate sessions for speech production and perception tasks. There were always several days' intervals between performing these two conditions. Secondly, our questions were mainly about a common/general topic. Consequently, participants may not remember their answers completely.

      Importantly, using the same stimulus material for speaking and listening guaranteed that there was no difference in the low-level features of the material for both conditions that could have affected the results of our statistical comparison.

      Due to bone conduction, hearing one’s unaltered own speech from a recording may seem foreign and could lead to unwanted emotional reactions e.g. embarrassment, so participants were asked whether they heard their own voice in a recording already (e.g. from a self-recorded voice-message in WhatsApp) which most of them confirmed. Participants were also informed that they were going to hear themselves during the measurement to further reduce unwanted psychophysiological responses.

      (5) The ability of the authors to analyze the spatiotemporal dynamics during continuous speech is a potentially important feat of this study, given that one of the reasons that speech production is much less investigated compared to perception concerns motor and movement artifacts due to articulation (e.g., Strijkers et al., 2010). Two questions did spring to mind when reading the authors' articulation artifact correction procedure: If I understood correctly, the approach comes from Abbasi et al. (2021) and is based on signal space projection (SSP) as used for eye movement corrections, which the authors successfully applied to speech production. However, in that study, it concerned the repeated production of three syllables, while here it concerns continuous speech of full words embedded in discourse. The articulation and muscular variance will be much higher in the current study compared to three syllables (or compared to eye movements which produce much more stable movement potentials compared to an entire discourse). Given this, I can imagine that corrections of the signal in the speaking condition were likely substantial and one may wonder (1) how much signal relevant to speech production behavior is lost?; (2) similar corrections are not necessary for perception, so how would this marked difference in signal processing affect the comparability between the modalities?

      One of the results of our previous study (Abbasi et al., 2021) was that the artefact correction was not specific to individual syllables but generalised across syllables. Also, the repeated production of syllables was associated with substantial movements of the articulators mimicking those observed during naturalistic speaking. We therefore believe that the artefact rejection is effective during speaking. We also checked this by investigating speech related coherence in brain parcels in spatial proximity to the articulators. In our previous study we also show that the correction method retains neural activity to a very large degree. We are therefore confident that speaking and listening conditions can be compared and that the loss of true signals from correcting the speaking data will be minor.

      References:

      • Abbasi, O., Steingräber, N., & Gross, J. (2021). Correcting MEG artifacts caused by overt speech. Frontiers in Neuroscience, 15, 682419.

      • D'Mello, A. M., Gabrieli, J. D., & Nee, D. E. (2020). Evidence for hierarchical cognitive control in the human cerebellum. Current Biology, 30(10), 1881-1892.

      • Fairs, A., Michelas, A., Dufour, S., & Strijkers, K. (2021). The same ultra-rapid parallel brain dynamics underpin the production and perception of speech. Cerebral Cortex Communications, 2(3), tgab040.

      • Floegel, M., Fuchs, S., & Kell, C. A. (2020). Differential contributions of the two cerebral hemispheres to temporal and spectral speech feedback control. Nature Communications, 11(1), 2839.

      • Giglio, L., Ostarek, M., Sharoh, D., & Hagoort, P. (2024). Diverging neural dynamics for syntactic structure building in naturalistic speaking and listening. Proceedings of the National Academy of Sciences, 121(11), e2310766121.

      • Heinks‐Maldonado, T. H., Mathalon, D. H., Gray, M., & Ford, J. M. (2005). Fine‐tuning of auditory cortex during speech production. Psychophysiology, 42(2), 180-190.

      • Price, C. J. (2010). The anatomy of language: a review of 100 fMRI studies published in 2009. Annals of the new York Academy of Sciences, 1191(1), 62-88.

      • Runnqvist, E., Chanoine, V., Strijkers, K., Pattamadilok, C., Bonnard, M., Nazarian, B., ... & Alario, F. X. (2021). Cerebellar and cortical correlates of internal and external speech error monitoring. Cerebral Cortex Communications, 2(2), tgab038.

      • Runnqvist, E. (2023). Self-monitoring: The neurocognitive basis of error monitoring in language production. In Language production (pp. 168-190). Routledge.

      • Stockert, A., Schwartze, M., Poeppel, D., Anwander, A., & Kotz, S. A. (2021). Temporo-cerebellar connectivity underlies timing constraints in audition. Elife, 10, e67303.

      • Strijkers, K., Costa, A., & Thierry, G. (2010). Tracking lexical access in speech production: electrophysiological correlates of word frequency and cognate effects. Cerebral cortex, 20(4), 912-928.

      • Todorović, S., Anton, J. L., Sein, J., Nazarian, B., Chanoine, V., Rauchbauer, B., ... & Runnqvist, E. (2023). Cortico-cerebellar monitoring of speech sequence production. Neurobiology of Language, 1-21.

      Reviewer #2 (Public Review):

      Summary:

      The authors re-analyse MEG data from a speech production and perception study and extend their previous Granger causality analysis to a larger number of cortical-cortical and in particular cortical-subcortical connections. Regions of interest were defined by means of a meta-analysis using Neurosynth.org and connectivity patterns were determined by calculating directed influence asymmetry indices from the Granger causality analysis results for each pair of brain regions. Abbasi et al. report feedforward signals communicated via fast rhythms and feedback signals via slow rhythms below 40 Hz, particularly during speaking. The authors highlight one of these connections between the right cerebellum lobule VI and auditory association area A5, where in addition the connection strength correlates negatively with the strength of speech tracking in the theta band during speaking (significant before multiple comparison correction). Results are interpreted within a framework of active inference by minimising prediction errors.

      While I find investigating the role of cortical-subcortical connections in speech production and perception interesting and relevant to the field, I am not yet convinced that the methods employed are fully suitable to this endeavour or that the results provide sufficient evidence to make the strong claim of dissociation of bottom-up and top-down information flow during speaking in distinct frequency bands.

      Strengths:

      The investigation of electrophysiological cortical-subcortical connections in speech production and perception is interesting and relevant to the field. The authors analyse a valuable dataset, where they spent a considerable amount of effort to correct for speech production-related artefacts. Overall, the manuscript is well-written and clearly structured.

      Weaknesses:

      The description of the multivariate Granger causality analysis did not allow me to fully grasp how the analysis was performed and I hence struggled to evaluate its appropriateness. Knowing that (1) filtered Granger causality is prone to false positives and (2) recent work demonstrates that significant Granger causality can simply arise from frequency-specific activity being present in the source but not the target area without functional relevance for communication (Schneider et al. 2021) raises doubts about the validity of the results, in particular with respect to their frequency specificity. These doubts are reinforced by what I perceive as an overemphasis on results that support the assumption of specific frequencies for feedforward and top-down connections, while findings not aligning with this hypothesis appear to be underreported. Furthermore, the authors report some main findings that I found difficult to reconcile with the data presented in the figures. Overall, I feel the conclusions with respect to frequency-specific bottom-up and top-down information flow need to be moderated and that some of the reported findings need to be checked and if necessary corrected.

      Major points

      (1) I think more details on the multivariate GC approach are needed. I found the reference to Schaum et al., 2021 not sufficient to understand what has been done in this paper. Some questions that remained for me are:

      (i) Does multivariate here refer to the use of the authors' three components per parcel or to the conditioning on the remaining twelve sources? I think the latter is implied when citing Schaum et al., but I'm not sure this is what was done here?

      If it was not: how can we account for spurious results based on indirect effects?

      Yes, multivariate refers to the three components.

      (ii) Did the authors check whether the GC of the course-target pairs was reliably above the bias level (as Schaum et. al. did for each condition separately)? If not, can they argue why they think that their results would still be valid? Does it make sense to compute DAIs on connections that were below the bias level? Should the data be re-analysed to take this concern into account?

      We performed statistics on DAI and believe that this is a valid approach. We argue that random GC effects would not survive our cluster-corrected statistics.

      (iii) You may consider citing the paper that introduced the non-parametric GC analysis (which Schaum et al. then went on to apply): Dhamala M, Rangarajan G, Ding M. Analyzing Information Flow in Brain Networks with Nonparametric Granger Causality. Neuroimage. 2008; 41(2):354-362. https://doi.org/10.1016/j.neuroimage.2008.02. 020

      Thanks, we will add this reference in the revised version.

      (2) GC has been discouraged for filtered data as it gives rise to false positives due to phase distortions and the ineffectiveness of filtering in the information-theoretic setting as reducing the power of a signal does not reduce the information contained in it (Florin et al., 2010; Barnett and Seth, 2011; Weber et al. 2017; Pinzuti et al., 2020 - who also suggest an approach that would circumvent those filter-related issues). With this in mind, I am wondering whether the strong frequency-specific claims in this work still hold.

      This must be a misunderstanding. We are aware of the problem with GC on filtered data. But GC was here computed on broadband data and not in individual frequency bands.

      (3) I found it difficult to reconcile some statements in the manuscript with the data presented in the figures:

      (i) Most notably, the considerable number of feedforward connections from A5 and STS that project to areas further up the hierarchy at slower rhythms (e.g. L-A5 to R-PEF, R-Crus2, L CB6 L-Tha, L-FOP and L-STS to R-PEF, L-FOP, L-TOPJ or R-A5 as well as R-STS both to R-Crus2, L-CB6, L-Th) contradict the authors' main message that 'feedback signals were communicated via slow rhythms below 40 Hz, whereas feedforward signals were communicated via faster rhythms'. I struggled to recognise a principled approach that determined which connections were highlighted and reported and which ones were not.

      (ii) "Our analysis also revealed robust connectivity between the right cerebellum and the left parietal cortex, evident in both speaking and listening conditions, with stronger connectivity observed during speaking. Notably, Figure 4 depicts a prominent frequency peak in the alpha band, illustrating the specific frequency range through which information flows from the cerebellum to the parietal areas." There are two peaks discernible in Figure 4, one notably lower than the alpha band (rather theta or even delta), the other at around 30 Hz. Nevertheless, the authors report and discuss a peak in the alpha band.

      (iii) In the abstract: "Notably, high-frequency connectivity was absent during the listening condition." and p.9 "In contrast with what we reported for the speaking condition, during listening, there is only a significant connectivity in low frequency to the left temporal area but not a reverse connection in the high frequencies."

      While Fig. 4 shows significant connectivity from R-CB6 to A5 in the gamma frequency range for the speaking, but not for the listening condition, interpreting comparisons between two effects without directly comparing them is a common statistical mistake (Makin and Orban de Xivry). The spectrally-resolved connectivity in the two conditions actually look remarkably similar and I would thus refrain from highlighting this statement and indicate clearly that there were no significant differences between the two conditions.

      (iv) "This result indicates that in low frequencies, the sensory-motor area and cerebellum predominantly transmit information, while in higher frequencies, they are more involved in receiving it."

      I don't think that this statement holds in its generality: L-CB6 and R-3b both show strong output at high frequencies, particularly in the speaking condition. While they seem to transmit information mainly to areas outside A5 and STS these effects are strong and should be discussed.

      We appreciate the reviewer's thoughtful comments. We acknowledge that not all connectivity patterns strictly adhere to the initial observation regarding feedback and feedforward communication. It's true that our primary focus was on interactions between brain regions known to be crucial for speech prediction, including auditory, somatosensory, and cerebellar areas. However, we also presented connectivity patterns across other regions to provide a more comprehensive picture of the speech network. We believe this broader perspective can be valuable for future research directions.

      Regarding the reviewer's observation about the alpha band peak in Figure 4, we agree that a closer examination reveals the connectivity from right cerebellum to the left parietal is in a wider low frequency range. We will refrain from solely emphasizing the alpha band and acknowledge the potential contribution of lower frequencies to cerebellar-parietal communication.

      We also appreciate the reviewer highlighting the need for a more nuanced interpretation of the listening condition connectivity compared to the speaking condition. The reviewer is correct in pointing out that while Figure 4 suggests a high-frequency connectivity from L-A5 to R-CB only in the speaking condition, a direct statistical comparison between conditions might not reveal a significant difference. We will revise the manuscript to clarify this point.

      Finally, a closer examination of Figure 3 revealed that the light purple and dark green edges in the speaking condition for R-CB6 and L-3b suggest outgoing connections at low frequencies, while other colored edges indicate information reception at high frequencies. We acknowledge that exceptions to this directional pattern might exist and warrant further investigation in future studies.

      (4) "However, definitive conclusions should be drawn with caution given recent studies raising concerns about the notion that top-down and bottom-up signals can only be transmitted via separate frequency channels (Ferro et al., 2021; Schneider et al., 2021; Vinck et al., 2023)."

      I appreciate this note of caution and think it would be useful if it were spelled out to the reader why this is the case so that they would be better able to grasp the main concerns here. For example, Schneider et al. make a strong point that we expect to find Granger-causality with a peak in a specific frequency band for areas that are anatomically connected when the sending area shows stronger activity in that band than the receiving one, simply because of the coherence of a signal with its own linear projection onto the other area. The direction of a Granger causal connection would in that case only indicate that one area shows stronger activity than the other in the given frequency band. I am wondering to what degree the reported connectivity pattern can be traced back to regional differences in frequency-specific source strength or to differences in source strength across the two conditions.

      This is indeed an important point. That is why we are discussing our results with great caution and specifically point the reader to the relevant literature. We are indeed thinking about a future study where we investigate this connectivity using other connectivity metrics and a detailed consideration of power.

      Reviewer #3 (Public Review):

      In the current paper, Abbasi et al. aimed to characterize and compare the patterns of functional connectivity across frequency bands (1 Hz - 90 Hz) between regions of a speech network derived from an online meta-analysis tool (Neurosynth.org) during speech production and perception. The authors present evidence for complex neural dynamics from which they highlight directional connectivity from the right cerebellum to left superior temporal areas in lower frequency bands (up to beta) and between the same regions in the opposite direction in the (lower) high gamma range (60-90 Hz). Abbasi et al. interpret their findings within the predictive coding framework, with the cerebellum and other "higher-order" (motor) regions transmitting top-down sensory predictions to "lower-order" (sensory) regions in the lower frequencies and prediction errors flowing in the opposite direction (i.e., bottom-up) from those sensory regions in the gamma band. They also report a negative correlation between the strength of this top-down functional connectivity and the alignment of superior temporal regions to the syllable rate of one's speech.

      Strengths:

      (1) The comprehensive characterization of functional connectivity during speaking and listening to speech may be valuable as a first step toward understanding the neural dynamics involved.

      (2) The inclusion of subcortical regions and connectivity profiles up to 90Hz using MEG is interesting and relatively novel.

      (3) The analysis pipeline is generally adequate for the exploratory nature of the work.

      Weaknesses:

      (1) The work is framed as a test of the predictive coding theory as it applies to speech production and perception, but the methodological approach is not suited to this endeavor.

      We agree that we cannot provide definite evidence for predictive coding in speech production and perception and we believe that we do not make that claim in the manuscript. However, our results are largely consistent with what can be expected based on predictive coding theory.

      (2) Because of their theoretical framework, the authors readily attribute roles or hierarchy to brain regions (e.g., higher- vs lower-order) and cognitive functions to observed connectivity patterns (e.g., feedforward vs feedback, predictions vs prediction errors) that cannot be determined from the data. Thus, many of the authors' claims are unsupported.

      We will revise the manuscript to more clearly differentiate our results (e.g. directed Granger-Causality from A to B) from their interpretation (potentially indicating feedforward or feedback signals).

      (3) The authors' theoretical stance seems to influence the presentation of the results, which may inadvertently misrepresent the (otherwise perfectly valid; cf. Abbasi et al., 2023) exploratory nature of the study. Thus, results about specific regions are often highlighted in figures (e.g., Figure 2 top row) and text without clear reasons.

      Our connectograms reveal a multitude of results that we hope is interesting to the community. At the same time the wealth of findings poses a problem for describing them. We did not see a better way then to highlight specific connections of interest.

      (4) Some of the key findings (e.g., connectivity in opposite directions in distinct frequency bands) feature in a previous publication and are, therefore, interesting but not novel.

      We actually see this as a strength of the current manuscript. The computation of connectivity is here extended to a much larger sample of brain areas. It is reassuring to see that the previously reported results generalise to other brain areas.

      (5) The quantitative comparison between speech production and perception is interesting but insufficiently motivated.

      We thank the reviewer for this comment. We have addressed that in detail in response to the point (1&4) of reviewer 1.

      (6) Details about the Neurosynth meta-analysis and subsequent selection of brain regions for the functional connectivity analyses are incomplete. Moreover, the use of the term 'Speech' in Neurosynth seems inappropriate (i.e., includes irrelevant works, yielding questionable results). The approach of using separate meta-analyses for 'Speech production' and 'Speech perception' taken by Abbasi et al. (2023) seems more principled. This approach would result, for example, in the inclusion of brain areas such as M1 and the BG that are relevant for speech production.

      We agree that there are inherent limitations in automated meta-analysis tools such as Neurosynth. Papers are used in the meta-analysis that might not be directly relevant. However, Neurosynth has proven its usefulness over many years and has been used in many studies. We also agree that our selection of brain areas is not complete. But Granger Causality analysis of every pair of ROIs leads to complex results and we had to limit our selection of areas.

      (7) The results involving subcortical regions are central to the paper, but no steps are taken to address the challenges involved in the analysis of subcortical activity using MEG. Additional methodological detail and analyses would be required to make these results more compelling. For example, it would be important to know what the coverage of the MEG system is, what head model was used for the source localization of cerebellar activity, and if specific preprocessing or additional analyses were performed to ensure that the localized subcortical activity (in particular) is valid.

      There is a large body of evidence demonstrating that MEG can record signals from deep brain areas such as thalamus and cerebellum including Attal & Schwarz 2013, Andersen et al, Neuroimage 2020; Piastra et al., 2020; Schnitzler et al., 2009. These and other studies provide evidence that state-of-the-art recording (with multichannel SQUID systems) and analysis is sufficient to allow reconstruction of subcortical areas. However, spatial resolution is clearly reduced for these deep areas. We will add a statement in the revised manuscript to acknowledge this limitation.

      (8) The results and methods are often detailed with important omissions (a speech-brain coupling analysis section is missing) and imprecisions (e.g., re: Figure 5; the Connectivity Analysis section is copy-pasted from their previous work), which makes it difficult to understand what is being examined and how. (It is also not good practice to refer the reader to previous publications for basic methodological details, for example, about the experimental paradigm and key analyses.) Conversely, some methodological details are given, e.g., the acquisition of EMG data, without further explanation of how those data were used in the current paper.

      We will revise the relevant sections of the manuscript.

      (9) The examination of gamma functional connectivity in the 60 - 90 Hz range could be better motivated. Although some citations involving short-range connectivity in these frequencies are given (e.g., within the visual system), a more compelling argument for looking at this frequency range for longer-range connectivity may be required.

      Given previous evidence of connectivity in the gamma band we think that it would be a weakness to exclude this frequency band from analysis.

      (10) The choice of source localization method (linearly constrained minimum variance) could be explained, particularly given that other methods (e.g. dynamic imaging of coherent sources) were specifically designed and might potentially be a better alternative for the types of analyses performed in the study.

      Both LCMV and DICS are beamforming methods. We used LCMV because we wanted used Granger Causality which requires broadband signals. DICS would only provide frequency-specific band-limited signals.

      (11) The mGC analysis needs to be more comprehensively detailed for the reader to be able to assess what is being reported and the strength of the evidence. Relatedly, first-level statistics (e.g., via estimation of the noise level) would make the mGC and DAI results more compelling.

      We perform group-level cluster-based statistics on mGC while correcting for multiple comparisons across frequency bands and brain parcels and report only significant results. This is an established approach that is routinely used in this type of studies.

      (12) Considering the exploratory nature of the study, it is essential for other researchers to continue investigating and validating the results presented in the current manuscript. Thus, it is concerning that data and scripts are not fully and openly available. Data need not be in its raw state to be shared and useful, which circumvents the stated data privacy concerns.

      We acknowledge the reviewer's concern regarding the full availability of the dataset. Due to privacy limitations on the collected data, we are unable to share it publicly at this time. However, to promote transparency and enable further exploration, we have provided the script used for data analysis and an example dataset. This example dataset should provide a clear understanding of the data structure and variables used in the analysis. Additionally, we are happy to share the complete dataset upon request from research teams interested in performing in-depth secondary analyses.

    1. eLife assessment

      This important work offers a thorough exploration of the molecular features of different cell types within the mouse vomeronasal organ, including the expression of chemosensory receptors, using single-cell transcriptomics. The data are thoughtfully analyzed and presented, although the evidence is incomplete and only partially supports some of the claims made by the authors.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors comprehensively present data from single-cell RNA sequencing and spatial transcriptomics experiments of the juvenile male and female mouse vomeronasal organ, with a particular emphasis on the neuronal populations found in this sensory tissue. The use of these two methods effectively maps the locations of relevant cell types in the vomeronasal organ at a level of depth beyond what is currently known. Targeted analysis of the neurons in the vomeronasal organ produced several important findings, notably the common co-expression of multiple vomeronasal type 1 receptors (V1Rs), vomeronasal type 2 receptors (V2Rs), and both V1R+V2Rs by individual neurons, as well as the presence of a small but noteworthy population of neurons expressing olfactory receptors (ORs) and associated signal transduction molecules. Additionally, the authors identify transcriptional patterns associated with neuronal development/maturation, producing lists of genes that can be used and/or further investigated by the field. Finally, the authors report the presence of coordinated combinatorial expression of transcription factors and axon guidance molecules associated with multiple neuronal types, providing the framework for future studies aimed at understanding how these patterns relate to the complex glomerular organization in the accessory olfactory bulb. Several of these conclusions have been reached by previous studies, partially limiting the overall impact of the current work. However, when combined, these results provide important insights into the cellular diversity in the vomeronasal organ that are likely to support multiple future studies of the vomeronasal system.

      Strengths:

      The comprehensive analysis of the data provides a wealth of information for future research into vomeronasal organ function. The targeted analysis of neuronal gene transcription demonstrates the co-expression of multiple receptors by individual neurons and confirms the presence of a population of OR-expressing neurons in the vomeronasal organ. Although many of these findings have been noted by others, the depth of analysis here validates and extends prior findings in an effective manner. The use of spatial transcriptomics to identify the locations of specific cell types is especially useful and produces a template for the field's continued research into the various cell types present in this complex sensory tissue. Overall, the manuscript's biggest strength is found in the richness of the data presented, which will not only support future work in the broader field of vomeronasal system function but also provide insights into others studying complex sensory tissues.

      Weaknesses:

      As noted above, several previous studies have identified co-expression of vomeronasal receptors by vomeronasal sensory neurons, and the expression of non-vomeronasal receptors, and this was not adequately addressed in the manuscript as presented. The inherent weaknesses of single-cell RNA sequencing studies based on the 10x Genomics platforms (need to dissociate tissues, limited depth of sequencing, etc.) are acknowledged. However, the authors document their extensive attempts to avoid making false positive conclusions through the use of software tools designed for this purpose. Because of its complexity, there are some portions of the manuscript where the data are difficult to interpret as presented, but this is a relatively minor weakness. The data resulting from the use of the Resolve Biosciences spatial transcriptomics platform are somewhat difficult to interpret, and the methods are somewhat opaque. That said, the resulting data provide useful links between transcriptional identities and cellular locations, which is not possible without the use of such tools.

    3. Reviewer #2 (Public Review):

      In their paper entitled "Molecular, Cellular, and Developmental Organization of the Mouse Vomeronasal Organ at Single Cell Resolution" Hills Jr. et al. perform single-cell transcriptomic profiling and analyze tissue distribution of a large number of transcripts in the mouse vomeronasal organ (VNO). The use of these complementary tools provides a robust approach to investigating many aspects of vomeronasal sensory neuron (VSN) biology based on transcriptomics. Harnessing the power of these techniques, the authors present the discovery of previously unidentified sensory neuron types in the mouse VNO. Furthermore, they report co-expression of chemosensory receptors from different clades on individual neurons, including the co-expression of VR and OR. Finally, they evaluated the correlation between transcription factor expression and putative surface axon guidance molecules during the development of different neuronal lineages. Based on such correlation analysis, authors further propose a putative cascade of events that could give rise to different neuronal lineages and morphological organization.

      Taken together, Hills Jr. et al. present findings on (a) cell types in the VNO, (b) novel classes of sensory neurons, (c) developmental trajectories of the neuronal linage, (d) receptor expression in VSNs, (e) co-expression of chemosensory receptors, (f) a surface molecule code for individual receptor types, and (g) transcriptional regulation of receptor and axon guidance cues. Before outlining the major strengths and weaknesses of the manuscript, we need to disclose that, while we are comfortable reviewing aspects (a) to (e) of their work, we lack the expertise to provide constructive criticism on the two last points (f) and (g). Thus, we will not comment on these.

      In general, interpretations/claims put forward by Hills Jr. et al. appear striking at first glance. Upon careful review of the manuscript, however, it becomes apparent that many of the groundbreaking discoveries lack compelling support. Several (not all) of the results presented in this work lack novelty, accurate interpretability, and corroboration. A recurrent theme throughout the manuscript is an incomplete, and somewhat misleading account of the current knowledge in the field. This is perhaps most apparent in the introductory paragraphs, where the authors present a biased report of previously published work, largely including only those results that do not overlap with their own findings, but ignoring results that would question the novelty of the data presented here. For example: "...In contrast, transcriptomic information of the VNO is rather limited (Ref 24,25)...". Indeed, transcriptomic information of the mouse VNO is limited. Here, however, the authors ignore recent reports of robust single-cell transcriptomic analysis from adult and juvenile mice. These papers are, in part, cited later in this manuscript (ref 88, 89, 90, 91), or are completely missing (doi.org/10.7554/eLife.77259). Regardless, previously published results on the same topics have to be included in the Introduction to put the background and novelty of the findings into perspective.

      General comments on (a) cell types in the VNO

      The authors performed single-cell transcriptomic analysis of a large number of cells from both adult and juvenile VNO, creating the largest dataset of its kind to date. This dataset contains a wealth of information and, once made public, could be a valuable resource to the community. However, the analysis implemented in this paper raises several questions:

      Did the authors perform any cell selectivity, or any directed dissection, to obtain mainly neuronal cells? Previous studies reported a greater proportion of non-neuronal cells. For example, while Katreddi and co-workers (ref 89) found that the most populated clusters are identified as basal cells, macrophages, pericytes, and vascular smooth muscle, Hills Jr. et al. in this work did not report such types of cells. Did the authors check for the expression of marker genes listed in Ref 89 for such cell types?

      The authors should report the marker genes used for cell annotation. This is important for data validation, comparison with other publicly available datasets, as well as future use of this dataset.<br /> The authors reported no differences between juvenile and adult samples, and between male and female samples. It is not clear how they evaluate statistically significant differences, which statistical test was used, or what parameters were evaluated.

      "Based on our transcriptomic analysis, we conclude that neurogenic activity is restricted to the marginal zone." This conclusion is quite a strong statement, given that this study was not directed to carefully study neurogenesis distribution, and when neurogenesis in the basal zone has been proposed by other works, as stated by the authors.

      General comments on (b) novel classes of sensory neurons

      The authors report at least two new types of sensory neurons in the mouse VNO, a finding of huge importance that could have a substantial impact on the field of sensory physiology. However, the evidence for such new cell types is based solely on this transcriptomic dataset and, as such, is quite weak, since many crucial morphological and physiological aspects would be missing to clearly identify them as novel cell types. As stated before, many control and confirmatory experiments, and a careful evaluation of the results presented in this work must be performed to confirm such a novel and interesting discovery. The reported "novel classes of sensory neurons" in this work could represent previously undescribed types of sensory neurons, but also previously reported cells (see below) or simply possible single-cell sequencing artefacts.

      The authors report the co-expression of V2R and Gnai2 transcripts based on sequencing data. That could dramatically change classical classifications of basal and apical VSNs. However, did the authors find support for this co-expression in spatial molecular imaging experiments?

      Canonical OSNs: The authors report a cluster of cells expressing neuronal markers and ORs and call them canonical OSN. However, VSNs expressing ORs have already been reported in a detailed study showing their morphology and location inside the sensory epithelium (References 82, 83). Such cells are not canonical OSNs since they do not show ciliary processes, they express TRPC2 channels and do not express Golf. Are the "canonical OSNs" reported in this study and the OR-expressing VSNs (ref 82, 83) different? Which parameters, other than Gnal and Cnga2 expression, support the authors' bold claim that these are "canonical OSNs"? What is the morphology of these neurons? In addition, the mapping of these "canonical OSNs" shown in Figure 2D paints a picture of the negligible expression/role of these cells (see their prediction confidence).

      Secretory VSN: The authors report another novel type of sensory neurons in the VNO and call them "secretory VSNs". Here, the authors performed an analysis of differentially expressed genes for neuronal cells (dataset 2) and found several differentially expressed genes in the sVSN cluster. However, it would be interesting to perform a gene expression analysis using the whole dataset including neuronal and non-neuronal cells. Could the authors find any marker gene that unequivocally identifies this new cell type?

      When the authors evaluated the distribution of sVSN using the Molecular Cartography technique, they found expression of sVSN in both sensory and non-sensory epithelia. How do the authors explain such unexpected expression of sensory neurons in the non-sensory epithelium?

      The low total genes count and low total reads count, combined with an "expression of marker genes for several cell types" could indicate low-quality beads (contamination) that were not excluded with the initial parameter setting. It looks like cells in this cluster express a bit of everything V1R, V2R, OR, secretory proteins...

      General comments on (c) developmental trajectories of the neuronal linage

      The authors evaluated a possible cascade of events leading to the development of different lineages of mature sensory neurons using GBCs as a starting point. They found the differential expression of several transcription factors at different stages of development. This analysis was performed correctly, and its interpretation is coherent. However, it is mysterious why the authors included only classical V1R and V2R-expressing neurons, while the novel sensory neurons, cOSN and sVSN, were not included. Furthermore, it is important to notice again the misreport of previously published works.

      The authors wrote "...the transcriptomic landscape that specifies the lineages is not known...". This statement is not completely true, or at least misleading. There are still many undiscovered aspects of the transcriptomics landscape and lineage determination in VSNs. However, authors cannot ignore previously reported data showing the landscape of neuronal lineages in VSNs (Ref ref 88, 89, 90, 91 and doi.org/10.7554/eLife.77259). Expression of most of the transcription factors reported by this study (Ascl1, Sox2, Neurog1, Neurod1...) were already reported, and for some of them, their role was investigated, during early developmental stages of VSNs (Ref ref 88, 89, 90, 91 and doi.org/10.7554/eLife.77259). In summary, the authors should fully include the findings from previous works (Ref ref 88, 89, 90, 91 and doi.org/10.7554/eLife.77259), clearly state what has been already reported, what is contradictory and what is new when compared with the results from this work.

      General comments on (d) receptor expression in VSNs

      The authors evaluated the expression of chemosensory receptors in the VNO and correlated receptor expression with the expression of transcription factors. The analysis of such correlation showed that, while the expression of V1Rs is mainly correlated with the expression of the transcription factor Meis2, the expression of V2Rs is correlated with the combination of many transcription factors. These results are interesting, however, the co-expression of specific V2Rs with specific transcription factors does not imply a direct implication in receptor selection. Directed experiments to evaluate the VR expression dependent on a specific transcription factor must be performed.

      This study reports that transcription factors, such as Pou2f1, Atf5, Egr1, or c-Fos could be associated with receptor choice in VSNs. However, no further evidence is shown to support this interaction. Based on these purely correlative data, it is rather bold to propose cascade model(s) of lineage consolidation.

      General comments on (e) co-expression of chemosensory receptors

      The authors use spatial molecular imaging to evaluate the co-expression of many chemosensory receptors in single VNO cells. Molecular Cartography is a powerful tool and the reported data in this work is truly interesting. The authors show some clear confirmation of previously reported V2R co-expression (Figure 5H), and new co-expression of chemosensory receptors including V1R, V2R, and Fpr (Figure 5G-K).

      However, it is difficult to evaluate and interpret the results due to the lack of cell borders in spatial molecular imaging. The inclusion of cell border delimitation in the reported images (membrane-stained or computer-based) could be tremendously beneficial for the interpretation of the results.

      It is surprising that the authors reported a new cell type expressing OR, however, they did not report the expression of ORs in Molecular Cartography technique. Did the authors evaluate the expression of OR using the cartography technique?

    4. Reviewer #3 (Public Review):

      This study presents a detailed examination of the molecular and cellular organization of the mouse VNO, unveiling new cell types, receptor co-expression patterns, lineage specification regulation, and potential associations between transcription factors, guidance molecules, and receptor types crucial for vomeronasal circuitry wiring specificity. The study identifies a novel type of VSN molecularly different from classic VSNs, which may serve as an accessory to other VSNs by secreting olfactory binding proteins and mucins in response to VNO activation. They also describe a previously undetected co-expression of multiple VRs in individual VSNs, providing an interesting view of the ongoing discussion on how receptor choice occurs in VSNs, either stochastic or deterministic. Finally, the study correlates the expression of axon guidance molecules associated with individual VRs, providing a putative molecular mechanism that specifies VSN axon projections and their connection with postsynaptic cells in the accessory olfactory bulb.

      The conclusions of this paper are well supported by data, but some aspects of data analysis and acquisition need to be clarified and extended.

      (1) The authors claim that they have identified two new classes of sensory neurons, one being a class of canonical olfactory sensory neurons (OSNs) within the VNO. This classification as canonical OSNs is based on expression data of neurons lacking the V1R or V2R markers but instead expressing ORs and signal transduction molecules, such as Gnal and Cnga2. Since OR-expressing neurons in the VNO have been previously described in many studies, it remains unclear to me why these OR-expressing cells are considered here a "new class of OSNs." Moreover, morphological features, including the presence of cilia, and functional data demonstrating the recognition of chemosignals by these neurons, are still lacking to classify these cells as OSNs akin to those present in the MOE. While these cells do express canonical markers of OSNs, they also appear to express other VSN-typical markers, such as Gnao1 and Gnai2 (Figure 2B), which are less commonly expressed by OSNs in the MOE. Therefore, it would be more precise to characterize this population as atypical VSNs that express ORs, rather than canonical OSNs.

      (2) The second new class of sensory neurons identified corresponds to a group of VSNs expressing prototypical VSN markers (including V1Rs, V2Rs, and ORs), but exhibiting lower ribosomal gene expression. Clustering analysis reveals that this cell group is relatively isolated from V1R- and V2R-expressing clusters, particularly those comprising immature VSNs. The question then arises: where do these cells originate? Considering their fewer overall genes and lower total counts compared to mature VSNs, I wonder if these cells might represent regular VSNs in a later developmental stage, i.e., senescent VSNs. While the secretory cell hypothesis is compelling and supported by solid data, it could also align with a late developmental stage scenario. Further data supporting or excluding these hypotheses would aid in understanding the nature of this new cell cluster, with a comparison between juvenile and adult subjects appearing particularly relevant in this context.

      (3) The authors' decision not to segregate the samples according to sex is understandable, especially considering previous bulk transcriptomic and functional studies supporting this approach. However, many of the highly expressed VR genes identified have been implicated in detecting sex-specific pheromones and triggering dimorphic behavior. It would be intriguing to investigate whether this lack of sex differences in VR expression persists at the single-cell level. Regardless of the outcome, understanding the presence or absence of major dimorphic changes would hold broad interest in the chemosensory field, offering insights into the regulation of dimorphic pheromone-induced behavior. Additionally, it could provide further support for proposed mechanisms of VR receptor choice in VSNs.

      (4) The expression analysis of VRs and ORs seems to have been restricted to the cell clusters associated with the neuronal lineage. Are VRs/ORs expressed in other cell types, i.e. sustentacular, HBC, or other cells?

    5. Author response:

      We would like to thank all reviewers for their time, critical evaluation, recognition, and constructive comments of the manuscript. We will revise the manuscript accordingly. Below are our point-to-point response to the comments.

      From Reviewer #1:

      “…several previous studies have identified co-expression of vomeronasal receptors by vomeronasal sensory neurons, and the expression of non-vomeronasal receptors, and this was not adequately addressed in the manuscript as presented.”

      We plan to add context and citations to the Introduction and Results sections relating to recent studies on the co-expression of vomeronasal receptors and the expression of non-vomeronasal receptors in VSNs.

      “The data resulting from the use of the Resolve Biosciences spatial transcriptomics platform are somewhat difficult to interpret, and the methods are somewhat opaque.”

      Unfortunately, detailed Molecular Cartography protocols remain proprietary at Resolve Biosciences and were not disclosed. We acknowledge this limitation. Our role in the acquisition and processing of data for this experiment is included in the current Methods section. We will clarify this in the revised manuscript. Additional figures produced by the Molecular Cartography analysis will also be added (See response to Reviewer #2, below) to the supplemental materials to help clarify interpretation of the results.

      From Reviewer #2:

      “…the authors present a biased report of previously published work, largely including only those results that do not overlap with their own findings, but ignoring results that would question the novelty of the data presented here.”

      We had no intention of misleading the readers. In fact, we have discussed discrepancies between our results with other studies. However, we inadvertently left out a critical publication in preparing the manuscript. We plan to add context and citations (where missing) relating to recent studies that use single cell RNA sequencing in the vomeronasal organ, studies relating to the co-expression of vomeronasal receptors, and studies discussing V1R/V2R lineage determination.

      “Did the authors perform any cell selectivity, or any directed dissection, to obtain mainly neuronal cells? Previous studies reported a greater proportion of non-neuronal cells. For example, while Katreddi and co-workers (ref 89) found that the most populated clusters are identified as basal cells, macrophages, pericytes, and vascular smooth muscle, Hills Jr. et al. in this work did not report such types of cells. Did the authors check for the expression of marker genes listed in Ref 89 for such cell types?”

      For VNO dissections, we removed bones and blood vessels from VNO tissue and only kept the sensory epithelium. This procedure removed vascular smooth muscle cells, pericytes, and other non-neuronal cell types, which explains differences in cell proportions between out study and previous studies. We used a DAPI/Draq5 assay to sort live/nucleated cells for sequencing and no specific markers were used for cell selection. All cells in the experiment were successfully annotated using the cell-type markers shown in Fig. 1B, save for cells from the sVSN cluster, which were novel, and required further analysis to characterize.

      “The authors should report the marker genes used for cell annotation.”

      Marker genes used for cell annotation are shown in figure 1B. A full list of all marker genes used in the cell annotation process will be provided.

      “The authors reported no differences between juvenile and adult samples, and between male and female samples. It is not clear how they evaluate statistically significant differences, which statistical test was used, or what parameters were evaluated.”

      The claims made about male/female mice and P14/P56 mice directly pertain to the distribution of clusters and cells in UMAP space as seen in Figure 1 C & D. We have indeed performed differential gene expression analysis for male/female and P14/P56 comparisons using the FindMarkers function from the Seurat R package. Although we have found significant differential expression between male and female, and between P14 and P56 animals, the genes in this list do not appear to be influential for the neuronal lineage and cell type specification or related to cell adhesion molecules, which are the main focuses of this study. Nevertheless, we plan to add these results to the supplemental materials in a revised manuscript.

      “‘Based on our transcriptomic analysis, we conclude that neurogenic activity is restricted to the marginal zone.’ This conclusion is quite a strong statement, given that this study was not directed to carefully study neurogenesis distribution, and when neurogenesis in the basal zone has been proposed by other works, as stated by the authors.”

      Eighteen slides from whole VNO sections were used in Molecular Cartography analysis, while one representative slide was used to present findings. Across all slides, GBCs, INPs, and iVSNs show a pattern of proximity to the marginal zone (MZ), with GBCs presenting nearest to the MZ and iVSNs presented furthest. We believe that the full scope of our results justifies our claim that neurogenesis is restricted to the MZ. This claim is also supported by the 2021 study by Katreddi & Forni. We will provide additional figures to further support this claim.

      “The authors report at least two new types of sensory neurons in the mouse VNO, a finding of huge importance that could have a substantial impact on the field of sensory physiology. However, the evidence for such new cell types is based solely on this transcriptomic dataset and, as such, is quite weak, since many crucial morphological and physiological aspects would be missing to clearly identify them as novel cell types. As stated before, many control and confirmatory experiments, and a careful evaluation of the results presented in this work must be performed to confirm such a novel and interesting discovery. The reported "novel classes of sensory neurons" in this work could represent previously undescribed types of sensory neurons, but also previously reported cells (see below) or simply possible single-cell sequencing artefacts.”

      The reviewer is correct that detailed morphological and physiological studies are needed to further understand these cells. This is an opinion we share. Our paper is primarily intended as a resource paper to provide access to a large-scale single-cell RNA-sequenced dataset and discoveries based on the transcriptomic data that can support and inspire ongoing and future experiments in the field. Nonetheless, we are confident that neither of the novel cell clusters are the result of sequencing artefacts. We performed a robust quality-control protocol, including count correction for ambient RNA with the R package, SoupX, multiplet cell detection and removal with the Python module, Scrublet, and a strict 5% mitochondrial gene expression cut-off. Furthermore, the cell clusters in question show no signs of being the result of sequencing artefacts, as they are physically connected in a reasonable orientation to the rest of the neuronal lineage in modular clusters in 2D and 3D UMAP space. The OSN and sVSN (S1H) cell clusters each show distinct and self-consistent expressions of genes. Gene ontology (GO) analysis reveals significant GO term enrichment for both the sVSN (Fig. 2G) and mOSN clusters when compared to mature V1R and V2R VSNs, indicating functional differences. Additional figures for mOSN differential gene expression and gene ontology analysis results will be added to the supplemental figures.

      “The authors report the co-expression of V2R and Gnai2 transcripts based on sequencing data. That could dramatically change classical classifications of basal and apical VSNs. However, did the authors find support for this co-expression in spatial molecular imaging experiments?” 

      Genes with extremely high expression levels overwhelm signals from other genes, and therefore had to be removed from the experiment. This is a limitation of the Molecular Cartography platform. Unfortunately, Gnai2 was determined to be one of these genes and was not evaluated for this purpose.

      “Canonical OSNs: The authors report a cluster of cells expressing neuronal markers and ORs and call them canonical OSN. However, VSNs expressing ORs have already been reported in a detailed study showing their morphology and location inside the sensory epithelium (References 82, 83). Such cells are not canonical OSNs since they do not show ciliary processes, they express TRPC2 channels and do not express Golf. Are the "canonical OSNs" reported in this study and the OR-expressing VSNs (ref 82, 83) different? Which parameters, other than Gnal and Cnga2 expression, support the authors' bold claim that these are "canonical OSNs"? What is the morphology of these neurons? In addition, the mapping of these "canonical OSNs" shown in Figure 2D paints a picture of the negligible expression/role of these cells (see their prediction confidence).” 

      We observe OR expression in VSNs in our data; these cells cluster with VSNs. The putative mOSN cluster exhibits its own trajectory, distinct from VSN clusters. These cells express Gnal (Golf), which is not expressed in VSNs expressing ORs, nor in any other cell-type in the data. After performing differential gene expression on the putative mOSN cluster, comparing with V1R and V2R VSNs, independently, GO analysis returned the top significantly enriched GO molecular function, ‘olfactory receptor activity’, and the top significantly enriched cellular component, ‘cilium’. Because we were limited to list of 100 genes in Molecular Cartography probe panel, we have prioritized the detection of canonical VNO cell-types, vomeronasal receptor co-expression, and the putative sVSNs, and were not able to include a robust analysis of the putative OSNs.

      “Secretory VSN: The authors report another novel type of sensory neurons in the VNO and call them "secretory VSNs". Here, the authors performed an analysis of differentially expressed genes for neuronal cells (dataset 2) and found several differentially expressed genes in the sVSN cluster. However, it would be interesting to perform a gene expression analysis using the whole dataset including neuronal and non-neuronal cells. Could the authors find any marker gene that unequivocally identifies this new cell type?”

      We did not find unequivocal marker genes for sVSNs. We did perform differential analysis of the sVSN cluster with whole VNO data and with the neuronal subset, as well as against specific cell-types. We could not find a single gene that was perfectly exclusive to sVSNs. We used a combinatorial marker-gene approach to predicting sVSNs in the Molecular Cartography data. This required a larger subset of our 100 gene panel to be dedicated to genes for detecting sVSNs.

      “When the authors evaluated the distribution of sVSN using the Molecular Cartography technique, they found expression of sVSN in both sensory and non-sensory epithelia. How do the authors explain such unexpected expression of sensory neurons in the non-sensory epithelium?” 

      In our scRNA-Seq experiment, blood vessels were removed, limiting the power to distinguish between certain cell types. Because of the limited number of genes that we can probe using Molecular Cartography, the number of genes associated with sVSNs may be present in the non-sensory epithelium. This could lead to the identification of cells that may or may not be identical to the sVSNs in the non-neuronal epithelium. Indeed, further studies will need to be conducted to determine the specificity of these cells.

      “The low total genes count and low total reads count, combined with an "expression of marker genes for several cell types" could indicate low-quality beads (contamination) that were not excluded with the initial parameter setting. It looks like cells in this cluster express a bit of everything V1R, V2R, OR, secretory proteins...”

      We are confident that the putative sVSN cell cluster is not the result of low-quality cells. We performed a robust quality-control protocol, including count correction for ambient RNA with the R package, SoupX, multiplet cell detection and removal with the Python module, Scrublet, and a strict 5% mitochondrial gene expression cut-off. Furthermore, the cell clusters in question show no signs of being the result of sequencing artefacts, as they are connected in a reasonable orientation to the rest of the neuronal lineage in modular clusters in 2D and 3D UMAP space. The OSN and sVSN cell clusters each show distinct and self-consistent expressions of genes (Fig. S1H). Gene ontology (GO) analysis reveals significant GO term enrichment for both the sVSN (Fig. 2G) and mOSN clusters when compared to mature V1R and V2R VSNs, indicating functional differences. Moreover, while some genes were expressed at a lower level when compared to the canonical VSNs, others were expressed at higher levels, precluding the cause of discrepancy as resulting from an overall loss of gene counts.

      “The authors wrote ‘...the transcriptomic landscape that specifies the lineages is not known...’. This statement is not completely true, or at least misleading. There are still many undiscovered aspects of the transcriptomics landscape and lineage determination in VSNs. However, authors cannot ignore previously reported data showing the landscape of neuronal lineages in VSNs (Ref ref 88, 89, 90, 91 and doi.org/10.7554/eLife.77259). Expression of most of the transcription factors reported by this study (Ascl1, Sox2, Neurog1, Neurod1...) were already reported, and for some of them, their role was investigated, during early developmental stages of VSNs (Ref ref 88, 89, 90, 91 and doi.org/10.7554/eLife.77259). In summary, the authors should fully include the findings from previous works (Ref ref 88, 89, 90, 91 and doi.org/10.7554/eLife.77259), clearly state what has been already reported, what is contradictory and what is new when compared with the results from this work.“

      This is a difference in opinion about the terminology. Transcriptomic landscape in our paper refers to the genome-wide expression by individual cells, not just individual genes. The reviewer is correct that many of the genetic specifiers have been identified, which we cited and discussed. We consider these studies as providing a “genetic” underpinning, rather than the “transcriptomic landscape” in lineage progression. We will clarify this point in the revised manuscript. 

      “…the co-expression of specific V2Rs with specific transcription factors does not imply a direct implication in receptor selection. Directed experiments to evaluate the VR expression dependent on a specific transcription factor must be performed.” 

      The reviewer is correct, and we did not claim that the co-expression of specific transcription factors indicate a direct relationship with receptor selection. We agree that further directed experiments are required to investigate this question.

      “This study reports that transcription factors, such as Pou2f1, Atf5, Egr1, or c-Fos could be associated with receptor choice in VSNs. However, no further evidence is shown to support this interaction. Based on these purely correlative data, it is rather bold to propose cascade model(s) of lineage consolidation.”

      The reviewer is correct. As any transcriptomic study will only be correlative, additional studies will be needed to unequivocally determine the mechanistic link between the transcription factors with receptor choice. Our model provides a base for these studies.

      “The authors use spatial molecular imaging to evaluate the co-expression of many chemosensory receptors in single VNO cells. […] However, it is difficult to evaluate and interpret the results due to the lack of cell borders in spatial molecular imaging. The inclusion of cell border delimitation in the reported images (membrane-stained or computer-based) could be tremendously beneficial for the interpretation of the results.”

      The most common practice for cell segmentation of spatial transcriptomics data is to determine cell borders based on nuclear staining with expansion. We have tested multiple algorithms based on recent studies, but each has its own caveat. We will clarify this point in the revised manuscript.

      “It is surprising that the authors reported a new cell type expressing OR, however, they did not report the expression of ORs in Molecular Cartography technique. Did the authors evaluate the expression of OR using the cartography technique?” 

      We were limited to a 100-gene probe panel and only included one OR, the expression was not high enough for us to substantiate any claims.

      From Reviewer #3:

      “(1) The authors claim that they have identified two new classes of sensory neurons, one being a class of canonical olfactory sensory neurons (OSNs) within the VNO. This classification as canonical OSNs is based on expression data of neurons lacking the V1R or V2R markers but instead expressing ORs and signal transduction molecules, such as Gnal and Cnga2. Since OR-expressing neurons in the VNO have been previously described in many studies, it remains unclear to me why these OR-expressing cells are considered here a "new class of OSNs." Moreover, morphological features, including the presence of cilia, and functional data demonstrating the recognition of chemosignals by these neurons, are still lacking to classify these cells as OSNs akin to those present in the MOE. While these cells do express canonical markers of OSNs, they also appear to express other VSN-typical markers, such as Gnao1 and Gnai2 (Figure 2B), which are less commonly expressed by OSNs in the MOE. Therefore, it would be more precise to characterize this population as atypical VSNs that express ORs, rather than canonical OSNs.”

      We observe OR expression in VSNs in our data; these cells cluster with VSNs. The putative mOSN cluster exhibits its own trajectory, distinct from VSN clusters. These cells express Gnal (Golf), which is not expressed in VSNs expressing ORs, nor in any other cell-type in the data. We have performed differential gene expression analysis on the putative mOSN cluster to compare with V1R and V2R VSNs. GO analysis returned the top significantly enriched GO terms include “olfactory receptor activity” and “cilium”., further supporting that these are OSNs Because we were limited to list of 100 genes in Molecular Cartography probe panels, we have prioritized the detection of canonical VNO cell-types, vomeronasal receptor co-expression, and the putative sVSNs, and were not able to include a robust analysis of the putative OSNs. With regard to Gnai2 and Go expression, we have examined our data from the OSNs dissociated from the olfactory epithelium and detected substantial expression of both. This new analysis provides additional support for our claim. We will update the information in a revised manuscript.

      “(2) The second new class of sensory neurons identified corresponds to a group of VSNs expressing prototypical VSN markers (including V1Rs, V2Rs, and ORs), but exhibiting lower ribosomal gene expression. Clustering analysis reveals that this cell group is relatively isolated from V1R- and V2R-expressing clusters, particularly those comprising immature VSNs. The question then arises: where do these cells originate? Considering their fewer overall genes and lower total counts compared to mature VSNs, I wonder if these cells might represent regular VSNs in a later developmental stage, i.e., senescent VSNs. While the secretory cell hypothesis is compelling and supported by solid data, it could also align with a late developmental stage scenario. Further data supporting or excluding these hypotheses would aid in understanding the nature of this new cell cluster, with a comparison between juvenile and adult subjects appearing particularly relevant in this context.” 

      We wholeheartedly agree with this assessment. Our initial thought was that these were senescent VSNs, but the trajectory analysis did not support this scenario, leading us to propose that these are putative secretive cells. Our analysis also shows that overall, 46% of the putative sVSNs were from the P14 sample and 54% from P56. These cells comprise roughly 6.4% of all P14 cells and 8.5% of P56 cells. In comparison, 28.4% of all cells are mature V1R VSNs at P14, but the percentage rise to 46.7% at P56. The significant presence of sVSNs at P14, and the disproportionate increase when compared with mature VSNs indicate that these are unlikely to be late developmental stage or senescent cells, although we cannot exclude these possibilities. We plan to clarify these points in the revised manuscript.   

      We did not include sVSNs in the trajectory inference analysis because of inherent uncertainty about their developmental origins. However, PCA embeddings were the basis of the pseudotime analysis, and those embeddings that do include the sVSN cluster show that it is distributed evenly between the mature V1R and V2R clusters, with all mature clusters equidistant from GBC and INP clusters, indicating that they may indeed originate from the same stem cell populations. We plan to include trajectory analysis based on this assumption in the revised manuscript.

      (3) The authors' decision not to segregate the samples according to sex is understandable, especially considering previous bulk transcriptomic and functional studies supporting this approach. However, many of the highly expressed VR genes identified have been implicated in detecting sex-specific pheromones and triggering dimorphic behavior. It would be intriguing to investigate whether this lack of sex differences in VR expression persists at the single-cell level. Regardless of the outcome, understanding the presence or absence of major dimorphic changes would hold broad interest in the chemosensory field, offering insights into the regulation of dimorphic pheromone-induced behavior. Additionally, it could provide further support for proposed mechanisms of VR receptor choice in VSNs. 

      The reviewer raised a good point. We did not observe differences between male and female, or between P14 and P56 mice in the distribution of clusters and cells in UMAP space. Indeed, our differential expression analysis has revealed significantly differentially expressed genes in both comparisons. These genes have not been implicated in lineage or cell type determination and we decided not to include the analysis in the current version. In the revised manuscript, we plan to include the results.   

      “(4) The expression analysis of VRs and ORs seems to have been restricted to the cell clusters associated with the neuronal lineage. Are VRs/ORs expressed in other cell types, i.e. sustentacular, HBC, or other cells?” 

      Sparsely expressed low counts of VR and OR genes were observed in non-neuronal cell-types. When their expression as a percentage of cell-level gene counts is considered, however, the expression is negligible when compared to the neurons. The observed expression may be explained by stochastic base-level expression, or it may be the result of remnant ambient RNA that passed filtering. We will clarify this point in the revision.

    1. eLife assessment

      This important study demonstrates a novel method for imaging glutamate receptors in situ via cryo-ET. The use of cutting-edge methods is well-described and is convincing, but there are minor concerns as to how generally this approach can be used in imaging cell surface receptors. This paper is broadly relevant to biophysicists and neuroscientists.

    2. Reviewer #1 (Public Review):

      Summary:

      Matsui et al. present an experimental pipeline for visualizing the molecular machinery of synapses in the brain, which includes numerous techniques, starting with generating labeled antibodies and recombinant mice, continuing with HPF and FIB milling, and finishing with tilt series collection and 3D image processing. This pipeline represents a breakthrough in the preparation of brain tissue for high-resolution imaging and can be used in future tomographic research to reconstruct molecular details of synaptic complexes as well as pre- and post-synaptic assemblies. This methodology can also be adapted for a broader range of tissue preparations and signifies the next step towards a better structural understanding of how molecular machineries operate in natural conditions.

      Strengths:

      The manuscript is very well written, contains a detailed description of methodology, provides nice illustrations, and will be an outstanding guide for future research.

      Weaknesses:

      None noted.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors present a method that allows for the identification and localization of molecular machinery at chemical synapses in unstained, unfixed native brain tissue slices. They believe that this approach will provide a 3D structural basis for understanding different mechanisms of synaptic transmission, plasticity, and development. To achieve this, the group used genetically engineered mouse lines and generated thin brain slices that underwent high-pressure freezing (HPF) and focused ion beam (FIB) milling. Utilizing cryo-electron tomography (cryo-ET) and integrating it with cryo-fluorescence microscopy, they achieved micrometer resolution in identifying the glutamatergic synapses along with nanometer resolution to locate AMPA receptors GluA2-subunits using Fab-AuNP conjugates. The findings are summarized with detailed examples of successfully prepared substrates for cryo-ET, specific morphological identification and localization, and the detailed structural organization of excitatory synapses, including synaptic vesicle clusters close to the postsynaptic density and in the cleft.

      Strengths:

      The study advances previous work that used cultured neurons or synaptosomes. Combining cryo-electron tomography (cryo-ET) with fluorescence-guided targeting and labeling with Fab-AuNP conjugates enabled the study of synapses and molecular structures in their native environment without chemical fixation or staining. This preserves their near-native state, offering high specificity and resolution. The methods developed are generalizable, allowing adaptation for identifying and localizing other key molecules at glutamatergic synapses and potentially useful for studying a variety of synapses and cellular structures beyond the scope of this research.

      Weaknesses

      The preparation and imaging techniques are complex and require highly specialized equipment and expertise, potentially limiting their accessibility and widespread adoption.

      Additionally, the methods might need further modifications/tweaks to study other types of synapses or molecular structures effectively.

      The reliance on genetically engineered mouse lines may again impact the generalizability of the findings.

      Similarly, the requirement of monoclonal, high-affinity antibodies/Fab fragments to specifically label receptors/proteins would limit the wider employment of these methods.

    1. eLife assessment

      This study provides the first analysis of vascular stabilization on the critical and evolutionarily conserved structure around the Circle of Willis in the brain, strengthened by using parallel in vivo and in vitro experimental approaches. The evidence supporting the claims is solid and the work will be valuable for scientists studying developmental and disease-related vascular stabilization.

    2. Reviewer #2 (Public Review):

      Summary:

      Cheng et al. explore the development of the arteries that form the circle of Willis and investigate how blood flow pulsatility influences vascular smooth muscle cell (VSMC) differentiation. Using live confocal imaging of the developing zebrafish, the authors show that endothelial cells in circle of Willis arteries transition from venous to arterial identity between 54 hours post-fertilization (hpf) and 3 days post-fertilization (dpf), and that this coincides with pdgfrb+ mural cell progenitor differentiation into acta2+ arterial VSMCs. They find that the anterior portions of the circle of Willis, including the internal carotid arteries (CaDI), establish acta2 expression earlier than posterior aspects, likely due to faster flow rate and increased pulsatility through the CaDI. Then, using computational fluid dynamics, an in vitro co-culture assay, and genetic and drug manipulations of blood flow, the authors provide evidence that pdgfrb+ differentiation is dependent upon pulsatile blood flow and klf2a activation. The results add to our understanding of vascular development and suggest that deficits in pulsatile flow could be potential drivers of arteriopathies.

      Strengths:

      (1) Longitudinal confocal imaging of live developing zebrafish makes the timeline of arterial development in the circle of Willis easy to understand. This is a strong approach to studying how vascular networks are altered with genetic and pharmacological manipulations.<br /> (2) Rigorous use of multiple techniques to test the hypothesis that pulsatile blood flow is required for smooth muscle cell differentiation. The microangiography experiment, in vitro co-culture assay, and genetic and drug manipulations of heart rate at various developmental timepoints yield outcomes that are consistent with the hypothesis.

      Weaknesses:

      (1) The authors should provide more information on how blood flow velocity and wall shear stress are calculated from circle of Willis vascular structure. It is presumed that these values are dependent upon the 3-D morphology of the vessel network, as labeled by intravenous dextran dye, but this is not clear. Small local differences in vessel diameter and shape will influence blood flow velocity, but these morphological changes are not clearly articulated. Further, it is unclear how flow input levels to the CaDI and basilar arteries are decided across time-points. In general, descriptions of the blood flow modeling are very sparse.<br /> (2) Is it possible to measure the blood flow speed empirically with line-scanning or high-speed tracking of labeled blood cells? This would provide some validation of the modeling results.<br /> (3) Does the cardiac injection of dextran itself affect the diameter or flow of the arteries, given the invasiveness of the procedure? This could be examined in fish with a transgenic endothelial label and with vs. without dextran.<br /> (4) The data from the microangiography experiment in Figure 3 does not fully support the stated results. The authors report that the CaDI had the highest blood flow speed starting from 54 hfp, but it does not appear to be higher than the other arteries at this time point. Additionally, there is not sufficient evidence that wall shear stress coincides with smooth muscle cell differentiation in the CaDI. Wall shear stress appears to be similar between 54 hpf and 3 dpf in the CaDI, only increasing between 3 dpf and 4 dpf, while differentiation is shown to begin at 3 dpf.<br /> (5) The genetic and drug manipulations of heart rate are important experiments, but more detail is required to understand the effects of the manipulations. At least, a discussion on the limitations of these manipulations is needed. For example, how does one separate the pulsatile versus nutritive effects of blood flow/heart rate reduction? It is possible that off-target or indirect effects of Nifedipine decrease smooth muscle cell proliferation, or that altered cardiac contractility fundamentally alters many aspects of vascular development other than blood flow. Nifedipine is also likely to act upon VSMC calcium handling in the circle of Willis, which may in turn affect cell maturation.<br /> (6) It is unclear if acta2 expression is conferring vascular tone, as would be expected if the cells are behaving as mature VSMCs. Does arterial diameter decrease with an increase in acta2 expression? Are acta2 positive mural cells associated with more dynamic changes in arteriole diameter under basal or stimulated conditions?

    3. Reviewer #3 (Public Review):

      Summary:

      Cheng et al. studied if and how blood flow regulates differentiation of vascular smooth muscle cells (VSMC) in the Circle of Willis (CW) in zebrafish embryos. They show that CW vessels gradually acquire arterial identity. VSMCs also undergo gradual differentiation, which correlates with blood flow velocity. Using cell culture they show that pulsatile blood flow promotes pericyte differentiation into smooth muscle cells. They further identify transcription factor klf2a as differentially regulated by blood flow, and show that klf2a inhibition results in VSMC differentiation. The authors conclude that pulsatile flow promotes VSMC differentiation through klf2a activation.

      Strengths:

      Overall this is an important study, because VSMC differentiation in CW has not been previously studied, although analogous observations regarding the role of blood flow and klf2 involvement have been previously made in other systems and other vascular beds, for example, mouse klf2 mutants, which have deficient VSMC coverage of the dorsal aorta (Wu et al., 2008, JBC 283: 3942-50). The results convincingly show that VSMC differentiation in CW depends on the blood flow, and that klf2a flow dependent function regulates VSMC differentiation.

      Weaknesses:

      (1) The provided data do not support correlation between wall shear stress (WSS) and acta2+ cell number. The number of acta2+ cells in CaDI increases dramatically between 54 hpf and 3 dpf (Fig. 2F). However, the graph provided in the response to reviewers shows that WSS in CaDI is actually lower at 3 dpf compared to 54 hpf. Authors argue that Pearson correlation analysis shows that both variables increase together, but this is calculated over the stage between 54 hpf and 4 dpf. acta2+ cells appear by 3 dpf, and at this stage WSS in CaDI is not increased (or even lower), which argues agains WSS being the cause of acta2+ cell differentiation. Furthermore, data in Fig. 3I-K show that WSS actually decreases in BCA and PCS between 54 hpf and 4 dpf, while the number of acta2+ increases in BCA and PCS by 4 dpf. This also argues against the argument that WSS affects differentiation of acta2+ cells.<br /> (2) In multiple instances, results are based on a single independent experiment (Fig. 3, Fig. 4H, I, Fig. S2 and Fig. S3) with only a few embryos analyzed in many cases. This falls short of expected standards in the field, and it is unclear if these results are reproducible.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Weaknesses to be addressed: 

      (1) More detail is required to understand the effects of genetic and drug manipulations on heart rate as these are important experiments. At the very least, a discussion on the limitations of these manipulations is needed. 

      - For example, how does one separate the pulsatile versus nutritive effects of blood flow/heartrate reduction? 

      - The conclusion that arterial SMC differentiation is driven by pulsatile blood flow needs to be toned down. Indeed, this conclusion is mainly supported by in vitro cell co-cultures exposed to laminar versus pulsatile flow. In vivo, reducing Tnnt2a expression affects cardiac contractility and blood flow does not selectively affect pulsatility. To make this conclusion, the authors would need an experimental means to selectively dampen the pulsatility of blood flow.

      We understand this concern and we toned down the statements related to the pulsatile flow of our conclusion by using 'flow' instead of 'pulsatile flow' in all text except for the in vitro co-cultures part. We also added a paragraph to discuss the limited capability of qualitatively reduce blood flow in vivo, and acknowledge that the effects of nutrients and flow reduction could not be uncoupled in live zebrafish embryos. We proposed that in the future, in vitro 3D vascular culture models may be combined with microfluidics to precisely calibrate nutrient composition in culture media, flow velocity and pulse; these methods would help address these questions more thoroughly. See page 11-12 line 312-322.

      (2) Since mural cells are sensitive to transmural pressure, could the authors elaborate on the potential role of raised intravascular pressure in SMC differentiation? This would better parallel rodents and humans. 

      We thank you for this suggestion. We added a paragraph to discuss the potential role of raised intravascular pressure in VSMC differentiation in the discussion section (see page 11 line 296-311).

      (3) The authors use nifedipine to reduce blood flow. Nifedipine is a specific and potent inhibitor of voltage-dependent calcium channels (VDCC) which are expressed in SMCs. Prior studies (PMID: 35588738) showed that VDCC blockers increased rather than inhibited SMC differentiation. Nifedipine is also likely to act upon VSMC calcium handling in the circle of Willis, which may in turn affect cell maturation. Could the authors comment on this seeming discrepancy?

      It is possible that off-target or indirect effects of Nifedipine decrease smooth muscle cell proliferation, or that altered cardiac contractility fundamentally alters aspects of vascular development other than blood flow. 

      - Additionally, it would be helpful to report the quantitative heart rate reduction achieved with Nifedipine. This would clear up concerns that the heart rate reduction is too large for normal vascular development to occur, and thus decrease proliferation rate independent of changes in blood flow pulsatility. 

      We concur with these comments, which is why our experimentation with Nifedipine is reinforced by employing an alternative, non-pharmacological strategy to inhibit blood flow: the use of morpholino against tnnt2a gene. The results with either Nifedipine or tnnt2a support the lack of VSMCs maturation. In addition, we provided the quantitative heart rate reduction achieved with Nifedipine shown in new Figure S2A-S2C, suggesting that the drug is not completely halting the heart rate but decreasing it. Nevertheless, we report that Zebrafish embryos can survive and develop a normal blood vascular system without any heartbeat. Hence, we exclude that the effect on VSMCs maturation is linked non-specifical effects caused by the loss of heartbeat. Nevertheless, we now acknowledged in our discussion the limitation of nifedipine, as it may affect VSMC through VDCCs (page 12, line 323-334).

      We also added a paragraph in the discussion section to compare nifedipine, an L-type VDCC blocker, and ML218, a T-type VDCC selective inhibitor from the previous study (Ando et al., 2022). We noted that in this previous study, the increase in VSMC differentiation only occur on anterior metencephalic central arteries (AMCtAs) that are more than 40 mm away from the BCA; these AMCtAs are much smaller than CoW arteries and have different geometry hence possible different kinetics of VSMC maturation (Ando et al., 2022) as our manuscript discovery would suggest.

      (4) The authors should provide more information on how blood flow velocity and wall shear stress are calculated from the Circle of Willis vascular structure. It is presumed that these values are dependent upon the 3-D morphology of the vessel network, as labeled by intravenous dextran dye, but this is not clear. (a second reviewer similarly comments: I was unclear how flow velocity values were obtained in Fig. 3E. Are they based on computational simulation, or are they experimentally calculated following the dextran injection?) Small local differences in vessel diameter and shape will influence blood flow velocity, but these morphological changes are not clearly articulated. Further, it is unclear how flow input levels to the CaDI and basilar arteries are decided across time points. For instance, is it possible to measure the blood flow speed empirically with line-scanning or high-speed tracking of labeled blood cells or particles? This would provide validation of the modeling results. 

      The computational fluid dynamic simulation was performed according to previous study from our lab (Barak et al., 2021). Blood flow velocity and wall shear stress are dependent upon the 3D morphology of the vessel network labeled by intravascular dextran. Details on how the computational fluid dynamic simulation was performed are added in method section page 17 line 433-449.

      Moreover, to address this reviewer concern we have now provided new experimental measurement of blood flow using the red blood cell (RBC) velocity via axial line scanning microscopy in Tg(kdrl:gfp;gata1:DsRed)zn1/sd2 zebrafish embryos at 54 hpf, 3 dpf, and 4 dpf. By using the experimental RBC velocity, we re-simulated the computational fluid dynamic. The new findings align with our conclusion and are further elaborated upon in response to this reviewer comment listed as point 6. Details on how RBC velocity calculated is added in method section page 16 line 414-431.

      (5) Does the cardiac injection of dextran itself affect the diameter of the arteries, given the invasiveness of the procedure? This could be examined in fish with a transgenic endothelial label with and without dextran. 

      Here, we performed an experiment on wildtype zebrafish at 5 days post-fertilization (dpf) with and without Dextran injection, examining the effects of Dextran injection on vessel diameters. As shown in the representative image below, the XZ panel clearly illustrates a Dextran-filled PCS vessel with no alteration in vessel size. Dextran microangiography, a technique employed to obtain vessel geometry with fluorescent microsphere, has been well established in zebrafish (Kamei et al., 2010). Our findings, demonstrating that Dextran does not affect vessel size, are consistent with previous studies utilizing Dextran microangiography.

      Author response image 1.

      (6) The data from the microangiography experiment in Figure 3 does not fully support the stated results. The authors report that the CaDI had the highest blood flow speed starting from 54 hpf, but it does not appear to be higher than the other arteries at this time point. Additionally, there is not sufficient evidence that wall shear stress coincides with smooth muscle cell differentiation in the CaDI. Wall shear stress appears to be similar between 54 hpf and 3 dpf in the CaDI, only increasing between 3 dpf and 4 dpf, while differentiation is shown to begin at 3 dpf. The authors need to address this and/or soften conclusions. 

      First, In response to this specific reviewer concern, we measured red blood cell (RBC) velocity by used axial line scanning microscopy to analyze Tg(kdrl:gfp;gata1:DsRed)zn1/sd2 zebrafish embryos (the detailed method was added in Method section in the manuscript). We replaced the computational simulated blood flow velocity by RBC velocity in new Figure 3E-3G, and re-run the computational simulated wall shear stress (WSS) using the RBC velocity in new Figure 3I-3K. We compared RBC velocity and WSS among different vessels at each time point. We confirmed that CaDI has the highest RBC velocity starting from 54 hpf to 4 dpf (new Figure 3A-3C, and 3E-3G) and found an overall increase in average WSS from 54 hpf to 4 dpf (new Figure 3A-3C, and 3H). Further, WSS in CaDI was significantly higher than BCA and PCS at 54 hpf, 3 dpf, and 4 dpf (new Figure 3A-3C, 3I-3K). Altogether, the CFD simulation suggests that CoW arteries experience different hemodynamic WSS that is associated with spatiotemporal pattern of VSMC differentiation on CoW arteries.”.  (Page 6, line 153-162)

      Second, to identify the correlation of WSS and VSMC differentiation in CaDI, we performed Pearson correlation analysis. In the image provided here, we plotted a linear regression with normalized # of acta2+ cells in CaDI and WSS with developmental stages (54 hpf, 3 and 4 dpf), and performed Pearson correlation coefficient analysis by using GraphPad Prism 10.0.3. The correlation coefficient r = 0.595, suggesting that the two variables (acta2+ cells and WSS) tend to increase together with developmental stages (54 hpf, 3 and 4 dpf).

      Author response image 2.

      Third, we softened our conclusion as the RBC velocity across CoW arteries was differentially distributed while VSMC differentiation occurred in these vessels.

      (7) It is unclear if acta2 expression is conferring vascular tone, as would be expected if the cells are behaving as mature VSMCs. Does arterial diameter decrease with an increase in acta2 expression? Are acta2-positive mural cells associated with more dynamic changes in arteriole diameter under basal or stimulated conditions? 

      Thanks for this interesting question. VSMC maturation and its vasoactivity could be further investigated in the future. Our study focused on early stage of VSMC differentiation, in which pdgfrb+ progenitors started to express VSMC marker acta2. We discussed the onset of transgelin expression and loss of abcc9 expression as markers of VSMC maturation. In addition, a previous study found that VSMC covered vessels in zebrafish brain dilate as early as 4 dpf and constrict at 6 dpf (Bahrami & Childs, 2020). Future study may focus on the association between expression of different VSMC markers and VSMC functional maturation. (page 10, line 272-279)

      (8) The authors argue that CoW vessels transition from venous to arterial identity (Fig. 1). However, kdrl is not an ideal arterial marker for this experiment as it is expressed in both arteries and veins. While it is true that many arterial beds have stronger kdrl expression than the veins, its expression in both arteries and veins changes with developmental stage, and its expression level may vary depending on the type of vessel. Therefore, showing that kdrl increases from 32 hpf - 4 dpf in CoW vessels is not convincing because its expression may increase in both venous or arterial vasculature as the vessels mature. In addition, flt4 expression is not exclusively venous; for example, it has noticeable expression in the dorsal aorta at 24-32 hpf stages. It would be helpful to confirm this transition by analyzing additional arterial and venous markers. 

      We acknowledge this and we added a paragraph to discuss the limitation. We combined loss of flt4 and increase in kdrl to establish the temporal sequence of circle of Willis morphogenesis, arterial specification, and VSMC differentiation. We acknowledge that additional arterial and venous markers need to be analyzed for a more thorough characterization of arterial specification in vertebrate brain vascular development. See page 12 line 335-341.

      (9) The authors show that acta2+ VSMCs are absent in tnnt2a MO embryos, concluding that blood flow is required for their differentiation from pericytes. However, there is no data showing that pericytes are still present in tnnt2a MO embryos. Although this has been previously shown by Ando et al 2016, it would be beneficial to confirm in the current study as this is a critical piece of evidence needed for this conclusion. 

      To determine if blood flow is dispensable for pdgfrb+ progenitor recruitment, we performed tnnt2a MO (0.35 ng/embryo) injection in Tg(pdgrb:egfp, kdrl:ras-mcherry) ncv22/s896. Loss of blood flow did not affect pdgfrb+ progenitor emergence around the CoW (new Figure S2G-S2H) at 3 days post fertilization (dpf). This is consistent with previous observation in Ando et al 2016 Figure S2C (Ando et al., 2016).

      (10) The authors show that klf2a MO injected embryos have a reduced number of VSMCs at 3 dpf but a normal number at 4 dpf (Fig. 6), concluding that klf2a is only important to initiate CaDI muscularization. If this is true, it would raise important questions about how VSMCs differentiate at a later stage in the absence of klf2a. For instance, is blood flow not required to differentiate at a later stage, or is there another factor that compensates in the absence of klf2a? The alternative explanation/ caveat is that klf2a MO loses efficacy with development, leading to the recovery of VSMCs at this stage. Therefore, it would be important to confirm this result using a genetic klf2a mutant. 

      Thank you for pointing this out.  We note that based on the klf2a reporter line, klf2a activity in CoW arterial endothelial cells is highly correlated with the number of acta2+ VSMCs in CaDI, BCA and PCS at 3 dpf (r = 0.974, new Figure S5J). Interestingly however, klf2a activity remained stable from 3 dpf to 4 dpf, well beyond initiation of VSMC differentiation. Thus, we speculate sustained klf2a expression may support further maturation of VSMCs, as acta2+ VSMCs showed distinct morphology at 4 dpf compared with 3 dpf. (Page 10, line 268-272). As for the observation that klf2a morphants have normal number of VSMCs at 4 dpf, we think that in addition to the temporary effect of morpholino, a proximal explanation is compensation by paralogous klf2b in zebrafish. We acknowledge that further characterization of CoW VSMC development in klf2a and klf2b double genetic mutants (Rasouli et al., 2018; Steed et al., 2016) may help determine whether klf2b compensates klf2a in CoW VSMC differentiation beyond 4 dpf. See page 10-11 line 292-295.

      (11) A large part of the discussion focuses on Notch and Wnt signaling, as downstream Klf2 effectors. While these are reasonable hypotheses to propose, there is no data on the involvement of these pathways in the current study. It seems excessive to speculate on detailed mechanisms of how Klf2 activates Notch and Wnt signaling in the absence of data showing that these pathways are affected in CoW vessels. Therefore, the discussion could be shortened here unless additional data can be obtained to demonstrate the involvement of these pathways in VSMCs in CoW.

      We concur and have condensed the discussion on Notch and Wnt signaling as downstream klf2 effectors.

      Minor comments: 

      (1) Line 138 "CaDI is the only vessels in the CoW receiving pulsatile arterial blood low ... ". Adding a reference to support this statement would be useful. 

      We agree and revised this sentence into ‘CaDI receive proximal arterial feed through lateral dorsal aorta from cardiac outflow tract (Isogai et al., 2001)’. It was also based on our general observation of zebrafish vascular anatomy and blood flow under a confocal microscope.

      (2) The image insets in Figs. 1A, 2A, 4E-L, 5A, 6A are quite small. Please make them larger to help the reader interpret the findings. 

      We agree. We maximized the image size to help the reader interpret the finding, and to visualize confocal images and schematics side-by-side.

      (3) The schematics in Figs. 1-2, and 4-6 are helpful, but the different cell types are difficult to see because they are small and their colors/shapes are not very distinct. 

      We agree. We increased the size and color contrast to provide better visualization of the schematics in new schematic Figures. 1-2 and 4-6.

      (4) It is stated that there are no diameter differences between different arteries, but statistics are not reported. 

      The statistics in Figure 3D were performed by ordinary two-way ANOVA followed by Tukey’s multiple comparisons test, with a single pooled variance. Here we added pairwise comparisons among vessels in the CoW. Hence when non indicated the difference are non-significant.

      (5) Figure 3F would be better visualized on a log scale, as it is difficult to see the differences between each post-fertilization timepoint. 

      We agree. In the new Figure 3H, the average wall shear stress (WSS) in CoW arteries is presented on log scale in y axis to see the differences between each post-fertilization timepoint.

      (6) Please provide more background and validation on the pericyte cell line, and their use for the questions in this study. 

      Thank you for the question, TgBAC(pdgfrb:egfp)ncv22 was generated and described by Ando et al 2016 to clarify mural cell coverage of vascular endothelium in zebrafish (Ando et al., 2016). We added a describe in the method section to provide background and validation on this pericyte line (see page 13 line 368-372).

      (7) Flow velocity and WSS changes are shown in each vessel in Figs. 3E,G. However, the comparison should be made between different types of vessels to see if there is a statistical difference and PCS, for example, which would explain differences in VSMC coverage. 

      We agreed. We compared the difference among arteries in the CoW at each developmental timepoint and performed ordinary one-way ANOVA with Tukey’s multiple comparisons test. Figure. 3E is replaced by new Figure. 3E-G and Figure. 3G is replaced by new Figure. 3I-K.

      (8) Similarly, between CaDI, the number of klf2a cells in Fig. 5B should be compared between different vessels, not between different stages of the same vessel. 

      We agree. In new Figure 5B-E, the number of klf2a+ cells per 100 μm vessel length are compared among different vessels at each developmental stage and analyzed by ordinary one-way ANOVA with Tukey’s multiple comparisons test.

      (9) When quantifying klf2+ cells in Fig. 5, it would be helpful to quantify klf2 expression level between cells in different vessels. This could be done by quantifying GFP expression in existing images. The difference in expression level may explain the variation between CaDI and PCS more accurately than just the difference in cell number. 

      The GFP expression reflect the stability of GFP protein expression and labels discrete nuclei with active klf2a expression. Hence the quantification of GFP level might not give an accurate readout of klf2a expression per se but rather of its activity. For this reason we don’t think that this experiment will add accurate measurement of klf2a expression.

      (10) Do data points in Figure 4D correspond to different cells in the same chamber experiment? If so, they cannot be treated as independent replicates. Each data point should correspond to an independent replicate experiment. 

      We agree. Now in the figure legend, we report the number of cells analyzed.

      (11) Graph placement is confusing in Figs. 4I, M. An adjacent Fig. 4G shows Nifedipine treated embryos, while the graph next to (Fig. 4I) shows acta+ cell number from tnnt2a 4 dpf experiment. Similarly, the bottom Fig. 4K tnn2a 4 dpf MO experiment has an adjacent graph Fig. 4M, which shows nifedipine treatment quantification, which makes it very confusing. 

      We agreed. We rearranged Figure 4E (representative images of control embryos at 3 dpf and 4 dpf), Figure 4F (tnnt2a MO embryos at 3 dpf and 4 dpf), Figure 4G (nifedipine treated embryos at 3 dpf and 4 dpf).

      Reference:

      Ando, K., Fukuhara, S., Izumi, N., Nakajima, H., Fukui, H., Kelsh, R. N., & Mochizuki, N. (2016). Clarification of mural cell coverage of vascular endothelial cells by live imaging of zebrafish. Development, 143(8), 1328-1339. https://doi.org/10.1242/dev.132654

      Ando, K., Tong, L., Peng, D., Vazquez-Liebanas, E., Chiyoda, H., He, L., Liu, J., Kawakami, K., Mochizuki, N., Fukuhara, S., Grutzendler, J., & Betsholtz, C. (2022). KCNJ8/ABCC9-containing K-ATP channel modulates brain vascular smooth muscle development and neurovascular coupling. Dev Cell, 57(11), 1383-1399 e1387. https://doi.org/10.1016/j.devcel.2022.04.019

      Bahrami, N., & Childs, S. J. (2020). Development of vascular regulation in the zebrafish embryo. Development, 147(10). https://doi.org/10.1242/dev.183061

      Barak, T., Ristori, E., Ercan-Sencicek, A. G., Miyagishima, D. F., Nelson-Williams, C., Dong, W., Jin, S. C., Prendergast, A., Armero, W., Henegariu, O., Erson-Omay, E. Z., Harmanci, A. S., Guy, M., Gultekin, B., Kilic, D., Rai, D. K., Goc, N., Aguilera, S. M., Gulez, B., . . . Gunel, M. (2021). PPIL4 is essential for brain angiogenesis and implicated in intracranial aneurysms in humans. Nat Med, 27(12), 2165-2175. https://doi.org/10.1038/s41591-021-01572-7

      Isogai, S., Horiguchi, M., & Weinstein, B. M. (2001). The vascular anatomy of the developing zebrafish: an atlas of embryonic and early larval development. Dev Biol, 230(2), 278-301. https://doi.org/10.1006/dbio.2000.9995

      Kamei, M., Isogai, S., Pan, W., & Weinstein, B. M. (2010). Imaging blood vessels in the zebrafish. In Methods in cell biology (Vol. 100, pp. 27-54). Elsevier.

      Rasouli, S. J., El-Brolosy, M., Tsedeke, A. T., Bensimon-Brito, A., Ghanbari, P., Maischein, H. M., Kuenne, C., & Stainier, D. Y. (2018). The flow responsive transcription factor Klf2 is required for myocardial wall integrity by modulating Fgf signaling. Elife, 7. https://doi.org/10.7554/eLife.38889

      Steed, E., Faggianelli, N., Roth, S., Ramspacher, C., Concordet, J. P., & Vermot, J. (2016). klf2a couples mechanotransduction and zebrafish valve morphogenesis through fibronectin synthesis. Nat Commun, 7, 11646. https://doi.org/10.1038/ncomms11646

    1. eLife assessment

      This important study demonstrates that combining AlphaFold2 with the author's sampling method AF2-RAVE improves protein-ligand docking for three protein kinases and their inhibitors. The evidence is compelling but would benefit from a more complete description of the methodology and a clear assessment of the method's range of applicability. The work will be of interest to researchers who work on computer-aided drug design.

    2. Reviewer #1 (Public Review):

      The development of effective computational methods for protein-ligand binding remains an outstanding challenge to the field of drug design. This impressive computational study combines a variety of structure prediction (AlphaFold2) and sampling (RAVE) tools to generate holo-like protein structures of three kinases (DDR1, Abl1, and Src kinases) for binding to type I and type II inhibitors. Of central importance to the work is the conformational state of the Asp-Phy-Gly "DFG motif" where the Asp points inward (DFG-in) in the active state and outward (DFG-out) in the inactive state. The kinases bind to type I or type II inhibitors when in the DFG-in or DFG-out states, respectively.

      It is noted that while AlphaFold2 can be effective in generating ligand-free apo protein structures, it is ineffective at generating holo-structures appropriate for ligand binding. Starting from the native apo structure, structural fluctuations are necessary to access holo-like structures appropriate for ligand binding. A variety of methods, including reduced multiple sequence alignment (rMSA), AF2-cluster, and AlphaFlow may be used to create decoy structures. However, those methods can be limited in the diversity of structures generated and lack a physics-based analysis of Boltzmann weight critical to their relative evaluation.

      To address this need, the authors combine AlphaFold2 with the Reweighted Autoencoded Variational Bayes for Enhanced Sampling (RAVE) method, to explore metastable states and create a Boltzmann ranking. With that variety of structures in hand, grid-based docking methods Glide and Induced-Fit Docking (IFD) were used to generate protein-ligand (kinase-inhibitor) complexes.

      The authors demonstrate that using AlphaFold2 alone, there is a failure to generate DFG-out structures needed for binding to type II inhibitors. By applying the AlphaFold2 with rMSA followed by RAVE (using short MD trajectories, SPIB-based collective variable analysis, and enhanced sampling using umbrella sampling), metastable DFG-out structures with Boltzmann weighting are generated enabling protein-ligand binding. Moreover, the authors found that the successful sampling of DFG-out states for one kinase (DDR1) could be used to model similar states for other proteins (Abl1 and Src kinase). The AF2RAVE approach is shown to result in a set of holo-like protein structures with a 50% rate of docking type II inhibitors.

      Overall, this is excellent work and a valuable contribution to the field that demonstrates the strengths and weaknesses of state-of-the-art computational methods for protein-ligand binding. The authors also suggest promising directions for future study, noting that potential enhancements in the workflow may result from the use of binding site prediction models and free energy perturbation calculations.

    3. Reviewer #2 (Public Review):

      Summary:

      This manuscript explores the utility of AlphaFold2 (AF2) and the author's own AF2-RAVE method for drug discovery. As has been observed elsewhere, the predictive power of docking against AF2 structures is quite limited, particularly for proteins like kinases that have non-trivial conformational dynamics. However, using enhanced sampling methods like RAVE to explore beyond AF2 starting structures leads to a significant improvement.

      Strengths:

      This is a nice demonstration of the utility of the authors' previously published RAVE method.

      Weaknesses:

      My only concern is the authors' discussion of induced fit. I'm quite confident the structures discussed are present in the absence of ligand binding, consistent with conformational selection. It seems the author's own data also argues for an important role in conformational selection. It would be nice to acknowledge this instead of going along with the common practice in drug discovery of attributing any conformational changes to induced fit without thoughtful consideration of conformational selection.

    4. Reviewer #3 (Public Review):

      In this manuscript, the authors aim to enhance AlphaFold2 for protein conformation-selective drug discovery through the integration of AlphaFold2 and physics-based methods, focusing on improving the accuracy of predicting protein structures ensemble and small molecule binding of metastable protein conformations to facilitate targeted drug design.

      The major strength of the paper lies in the methodology, which includes the innovative integration of AlphaFold2 with all-atom enhanced sampling molecular dynamics and induced fit docking to produce protein ensembles with structural diversity. Moreover, the generated structures can be used as reliable crystal-like decoys to enrich metastable conformations of holo-like structures. The authors demonstrate the effectiveness of the proposed approach in producing metastable structures of three different protein kinases and perform docking with their type I and II inhibitors. The paper provides strong evidence supporting the potential impact of this technology in drug discovery. However, limitations may exist in the generalizability of the approach across other structures, especially complex structures such as protein-protein or DNA-protein complexes.

      The authors largely achieved their aims by demonstrating that the AF2RAVE-Glide workflow can generate holo-like structure candidates with a 50% successful docking rate for known type II inhibitors. This work is likely to have a significant impact on the field by offering a more precise and efficient method for predicting protein structure ensemble, which is essential for designing targeted drugs. The utility of the integrated AF2RAVE-Glide approach may streamline the drug discovery process, potentially leading to the development of more effective and specific medications for various diseases.

    1. eLife assessment

      The mechanisms that ensure accurate chromosome segregation are key for genome integrity and defects therein can cause human disease. Although the involvement of MAP kinases in modulating mitosis is known, this manuscript makes a valuable contribution by going to some lengths to reveal links between Spindle Assembly Checkpoint dynamics and stress-responsive MAP-kinase pathways. The strength of the evidence is solid but there are minor weaknesses, which need to be addressed.

    2. Reviewer #1 (Public Review):

      Summary:

      This manuscript addresses two main issues:<br /> (i) do MAPKs play an important role in SAC regulation in single-cell organism such as S pombe?<br /> (ii) what is the nature of their involvement and what are their molecular targets?

      The authors have extensively used the cold-sensitive β-tubulin mutant to activate or inactivate SAC employing an arrest-release protocol. Localization of Cdc13 (cyclin B) to the SPBs is used as a readout for the SAC activation or inactivation. The roles of two major MAPK pathways i.e. stress-activated pathway (SAP) and cell integrity pathway (CIP), have been explored in this context (with CIP more extensively than SAP). Sty1Δ or pmk1Δ mutants were used to inactivate the SAP or CIP pathways and wis1DD or pek1DD expression was utilized to constitutively activate these pathways, respectively. Lowering of Slp1Cdc20 abundance (by phosphorylation of Slp1-Thr 480) is revealed as the main function of MAPK to augment the robustness of the spindle assembly checkpoint.

      Strengths:

      The experiments are generally well-conducted, and the results support the interpretations in various sections. The experimental data clearly supports some of the key conclusions:

      (1) While inactivation of SAP and CIP compromises SAC-imposed arrest, their constitutive activation delays the release from the SAC-imposed arrest.<br /> (2) CIP signaling, but not SAP signaling, attenuates Slp1Cdc20 levels.<br /> (3) Pmk1 and Cdc20 physically interact and Pmk1-docking sequences in Slp1 (PDSS) are identified and confirmed by mutational/substitution experiments.<br /> (4) Thr480 (and also S76) is identified as the residue phosphorylated by Pmk1. S28 and T31 are identified as Cdk1 phosphorylation sites. These are confirmed by mutational and other related analyses.<br /> (5) Functional aspects of the phosphorylation sites have been elucidated to some extent: (a) Phosphorylation of Slp1-T480 by Pmk1 reduces its abundance thereby augmenting the SAC-induced arrest (b) S28, T31 (also S59) are phosphorylated by Cdk1(c) K472 and K479 residues are involved in ubiquitylation of Slp.

      Weaknesses:

      (1) Cdc13 localization to SPBs has been used as a readout for SAC activation/inactivation throughout the manuscript. However, the only image showing such localization (Figure 1C) is of poor quality where the Cdc13 localization to SPBs is barely visible. This should be replaced by a better image.

      (2) The overlapping error bars in Cdc13-localization data in some figures (for instance Figure 3E and 4H) make the effect of various mutations on SAC activation/inactivation rather marginal. In some of these cases, Western-blotting data support the authors' conclusions better.

      (3) This specific point is not really a weakness but rather a loose end:<br /> One of the conclusions of this study is that MAPK (PMK1) contributes to the robustness of SAC-induced arrest by lowering the abundance of Slp1Cdc20. The authors have used pmk1Δ or constitutively activating the MAPK pathways (Pek1DD) and documented their effect on SAC activation/inactivation dynamics. It is not clear if SAC activation also leads to activation of MAPK pathways for them to contribute to the SAC robustness. To tie this loose end, the author could have checked if the MAPK pathway is also activated under the conditions when SAC is activated. Unless this is shown, one must assume that the authors are attributing the effect they observe to the basal activity of MAPKs.

      (4) This is also a loose end:<br /> The authors show that activation of stress pathways (by addition of KCl for instance) causes phosphorylation-dependent Slp1Cdc20 downregulation (Figure 6) under the SAC-activating condition. Does activation of the stress pathway cause phosphorylation-dependent Slp1Cdc20 downregulation under the non-SAC-activation condition or does it occur only under the SAC-activating condition?

      (5) Although the authors have gone to some length to identify S28 and T31 (also S59) as phosphorylation sites for Cdk1, their functional significance in the context of MAPK involvement is not yet clear. Perhaps it is outside the scope of this study to dig deeper into this aspect more than the authors have.

      (6) In its current state, the Discussion section is quite disjointed. The first section "Involvement of MAPKs in cell cycle regulation" should be in the Introduction section (very briefly, if at all). It certainly does not belong to the Discussion section. In any case, the Discussion section should be more organized with a better flow of arguments/interpretations.

    3. Reviewer #2 (Public Review):

      Summary:

      This study by Sun et al. presents a role for the S. pombe MAP kinase Pmk1 in the activation of the Spindle Assembly Checkpoint (SAC) via controlling the protein levels of APC/C activator Cdc20 (Slp1 in S. pombe). The data presented in the manuscript is thorough and convincing. The authors have shown that Pmk1 binds and phosphorylates Slp1, promoting its ubiquitination and subsequent degradation. Since Cdc20 is an activator of APC/C, which promotes anaphase entry, constitutive Pmk1 activation leads to an increased percentage of metaphase-arrested cells. The authors have used genetic and environmental stress conditions to modulate MAP kinase signalling and demonstrate their effect on APC/C activation. This work provides evidence for the role of MAP kinases in cell cycle regulation in S. pombe and opens avenues for exploration of similar regulation in other eukaryotes.

      Strengths:

      The authors have done a very comprehensive experimental analysis to support their hypothesis. The data is well represented, and including a model in every figure summarizes the data well.

      Weaknesses:

      As mentioned in the comments, the manuscript does not establish that MAP kinase activity leads to genome stability when cells are subjected to genotoxic stressors. That would establish the importance of this pathway for checkpoint activation.

    1. eLife assessment

      The study is noteworthy for its effort to achieve a deeper understanding of PTH-1 Receptor signaling. This molecular pathway which underpins the control of calcium and phosphate metabolism throughout life in land-dwelling animals, can be targeted to the therapeutic benefit of people with osteoporosis. We consider the significance of the findings in this paper to be valuable to the community of investigators working on PTH receptor and PTH ligand signaling. The strength of the evidence is solid and it could become even stronger by addressing a few shortcomings.

    2. Reviewer #1 (Public Review):

      Summary:

      In this work, the authors investigate the functional difference between the most commonly expressed form of PTH, and a novel point mutation in PTH identified in a patient with chronic hypocalcemia and hyperphosphatemia. The value of this mutant form of PTH as a potential anabolic agent for bone is investigated alongside PTH(1-84), which is a previously used anabolic therapy. The authors have achieved the aims of the study. Their conclusion, however, that this suggests a "new path of therapeutic PTH analog development" seems unfounded; the benefit of this PTH variant is not clear, but the work is still interesting.

      The work does not identify why the patient with this mutation has hypocalcemia and hyperphosphatemia; this was not the goal of the study, but the data are useful for helping to understand that.

      Strengths:

      The work is novel, as it describes the function of a novel, naturally occurring, variant of PTH in terms of its ability to dimerise, to lead to cAMP activation, to increase serum calcium, and its pharmacological action compared to normal PTH.

      Weaknesses:

      (1) The use of very young, 8-10 week old, mice as a model of postmenopausal osteoporosis is a major limitation of this study. At 8 weeks, the effect of ovariectomy leads to lack of new trabecular bone formation, rather than trabecular bone loss due to a defect in bone remodelling. Although the findings here provide a comparison between two forms of PTH, it is unlikely to be of direct relevance to the patient population. For example, the authors find an inhibitory effect of PTH on osteoclast surface, which is very unusual. Adding to this concern is that the authors have not described the regions used for histomorphometry, and from their figures (particularly the TRAP stain), it seems that the primary spongiosa (which is a region of growth) has been used for histomorphometry, rather than the secondary spongiosa (which more accurately reflects bone remodelling). Much further detail is needed to justify the use of this very young model, and a section on the limitations of this model is needed. Please provide that section in the revised manuscript.

      (2) It is also somewhat concerning that the age range is from 8-10 weeks, increasing the variability within the model. Did the age of mice differ between the groups analysed?

      (3) Methods are not sufficiently detailed. For example, the regions used for histomorphometry are not described, there is no information on micro-CT thresholds, no detail on the force used for mechanical testing. Please address this request.

      (4) There are three things unclear about the calvarial injection mouse model. Firstly, were the mice injected over the calvariae or with a standard subcutaneous injection (e.g. at the back of the neck)? If they were injected over the calvaria, why were both surfaces measured? Secondly, why was the dose of the R25C-PTH double that of PTH(1-34)? Thirdly, there is no justification for the use of "more intense coloration" as a marker of new bone; this requires calcein labelling to prove it new bone. It would be more reliable to measure and report the thickness of the calvaria. Please address these technical questions.

      (5) The presentation of mechanical testing data is not sufficient. Example curves should be shown, and data corrected for bone size needs to be shown. The difference in mechanical behaviour is interesting, but does it stem from a difference in the amount of bone, or two a difference in the quality of the bone? Please explain this matter better in the manuscript.

      (6) The micro-CT analysis of the cortical bone in the OVX model is insufficient. Please indicate whether cross-sectional area has increased. Is there an increase in the size of the bones, or is the increase in cortical thickness due to a narrowing of the marrow space? This may help resolve the apparent contradiction between the cortical thickness data (where there is no difference between the two PTH formulations) and the mechanical testing data (where there is a difference). Please explain this matter better in the manuscript.

      (7) The evidence that dimeric PTH has a different effect to monomeric PTH is very slim; I am not sure this is a real effect. Such differences take a long time to sort out (e.g. the field is still trying to determine whether teriparatide and abaloparatide are different). I think the authors need to look more carefully at their data - almost all effects are the same. Ultimately, the statement that dimeric PTH may be a more effective anabolic therapy than monomeric PTH are not supported by the data, and this should be removed. There is little to no difference found between normal PTH and the variant in their effects on calcium and phosphate homeostasis or on bone mass. However, the analysis has been somewhat cursory, with insufficient mechanical testing or cortical data presented. Many of the effects seem to be the same (e.g. cortical thickness, P1NP, ALP, vertebral BV/TV and MAR), but the way it is written it sounds like there is a difference. Please remove some of the unfounded claims that you have made in this manuscript.

      (8) Statistical analysis used multiple t-tests. ANOVA would be more appropriate.

    3. Reviewer #2 (Public Review):

      Summary:

      The study conducted by Noh et al. investigated the effects of parathyroid hormone (PTH) and a dimeric PTH peptide on bone formation and serum biochemistry in ovariectomized mice as a model for postmenopausal osteoporosis. The authors claimed that the dimeric PTH peptide has pharmacological benefits over PTH in promoting bone formation, despite both molecules having similar effects on bone formation and serum Ca2+. However, after careful evaluation, I am not convinced that this manuscript adds a significant contribution to the literature on bone and mineral research.

      Strengths:

      Experiments are well performed, but strengths are limited to the methodology used to evaluate bone formation and serum biochemical analysis.

      Weaknesses:

      (1) Limited significance of this study:<br /> • this study follows a previous study (not cited) reporting the effect of the dimeric R25CPTH(1-34) on bone regeneration in an osteoporotic dog (Beagle) model (Jeong-Oh Shin et al., eLife 13:RP93830, 2024). It's unclear why the authors tested the dimeric R25C-PTH peptide on a rodent animal model, which has limitations because the healing mechanism of human bone is more similar in dogs than in mice.<br /> • the authors should clarify why they tested the effects of dimeric R25CPTH(1-34) and not dimeric R25CPTH(1-84)?<br /> • The study is descriptive with no mechanism.

      (2) Statistics are inadequately described or performed for the experimental design:<br /> • the statistical analysis in Figure 5 needs to be written in a way that makes it clearer how statistics were done; t-test or one-way ANOVA?<br /> • Statistics in Figures 6 and 7 should be performed by one-way ANOVA to compare the mean values of one variable among three or more groups, and not t-test.

      (3) Misleading and confused discussion:<br /> • The first paragraph lacks clarity in the PTH nomenclature and the authors should provide a clear statement that the PTH mutant found in patients is likely a monomeric R25CPTH(1-84), considering that there has been no proof of a dimeric form.<br /> • Moreover, the authors should discuss the study by White et al. (PNAS 2019), which shows that there are defective PTH1R signaling responses to monomeric R25CPTH(1-34). This results in faster ligand dissociation, rapid receptor recycling, a short cAMP time course, and a loss of calcium ion allosteric effect.<br /> • The authors should also clarify what they mean by "the dimeric form of R25CPTH can serve as a new peptide ...(lines 328-329)" The dimeric R25CPTH(1-34) induces similar bone anabolic effects and calcemic responses to PTH(1-34), so it is unclear what the new benefit of the dimeric PTH is.

      Please address these concerns.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are thankful for the handling of our manuscript. The following is a summary of our response and what we have done:

      (1) We are most thankful for the very thorough evaluation of our manuscript.

      (2) We were a bit shocked by the very negative commentary of referee 2.

      (3) We think, what put referee 2 off so much is that we were overconfident in the strength of our conclusions. We consider such overconfidence a big mistake. We have revised the manuscript to fix this problem.

      (4) We respond in great depth to all criticism and also go into technicalities.

      (5) We consider the possibility of a mistake. Yet, we carefully weighed the evidence advanced by referee 2 and by us and found that a systematic review supports our conclusions. Hence, we also resist the various attempts to crush our paper.

      (6) We added evidence (peripherin-antibody staining; our novel Figure 2) that suggests we correctly identified the inferior olive.

      (7) The eLife format – in which critical commentary is published along with the paper – is a fantastic venue to publish, what appears to be a surprisingly controversial issue.

      eLife assessment

      This potentially valuable study uses classic neuroanatomical techniques and synchrotron X-ray tomography to investigate the mapping of the trunk within the brainstem nuclei of the elephant brain. Given its unique specializations, understanding the somatosensory projections from the elephant trunk would be of general interest to evolutionary neurobiologists, comparative neuroscientists, and animal behavior scientists. However, the anatomical analysis is inadequate to support the authors' conclusion that they have identified the elephant trigeminal sensory nuclei rather than a different brain region, specifically the inferior olive.

      Comment: We are happy that our paper is considered to be potentially valuable. Also, the editors highlight the potential interest of our work for evolutionary neurobiologists, comparative neuroscientists, and animal behavior scientists. The editors are more negative when it comes to our evidence on the identification of the trigeminal nucleus vs the inferior olive. We have five comments on this assessment. (i) We think this assessment is heavily biased by the comments of referee 2. We show that the referee’s comments are more about us than about our paper. Hence, the referee failed to do their job (refereeing our paper) and should not have succeeded in leveling our paper. (ii) We have no ad hoc knock-out experiments to distinguish the trigeminal nucleus vs the inferior olive. Such experiments (extracellular recording & electrolytic lesions, viral tracing would be done in a week in mice, but they cannot and should not be done in elephants. (iii) We have extraordinary evidence. Nobody has ever described a similarly astonishing match of body (trunk folds) and myeloarchitecture in the brain before. (iv) We show that our assignment of the trigeminal nucleus vs the inferior olive is more plausible than the current hypothesis about the assignment of the trigeminal nucleus vs the inferior olive as defended by referee 2. We think this is why it is important to publish our paper. (v) We think eLife is the perfect place for our publication because the deviating views of referee 2 are published along.

      Change: We performed additional peripherin-antibody staining to differentiate the inferior olive and trigeminal nucleus. Peripherin is a cytoskeletal protein that is found in peripheral nerves and climbing fibers. Specifically, climbing fibers of various species (mouse, rabbit, pig, cow, and human; Errante et al., 1998) are stained intensely with peripherin-antibodies. What is tricky for our purposes is that there is also some peripherin-antibody reactivity in the trigeminal nuclei (Errante et al., 1998). Such peripherin-antibody reactivity is weaker, however, and lacks the distinct axonal bundle signature that stems from the strong climbing fiber peripherin-reactivity as seen in the inferior olive (Errante et al., 1998). As can be seen in our novel Figure 2, we observe peripherin-reactivity in axonal bundles (i.e. in putative climbing fibers), in what we think is the inferior olive. We also observe weak peripherin-reactivity, in what we think is the trigeminal nucleus, but not the distinct and strong labeling of axonal bundles. These observations are in line with our ideas but are difficult to reconcile with the views of the referee. Specifically, the lack of peripherin-reactive axon bundles suggests that there are no climbing fibers in what the referee thinks is the inferior olive.

      Errante, L., Tang, D., Gardon, M., Sekerkova, G., Mugnaini, E., & Shaw, G. (1998). The intermediate filament protein peripherin is a marker for cerebellar climbing fibres. Journal of neurocytology, 27, 69-84.

      Reviewer #1 :

      Summary:

      This fundamental study provides compelling neuroanatomical evidence underscoring the sensory function of the trunk in African and Asian elephants. Whereas myelinated tracts are classically appreciated as mediating neuronal connections, the authors speculate that myelinated bundles provide functional separation of trunk folds and display elaboration related to the "finger" projections. The authors avail themselves of many classical neuroanatomical techniques (including cytochrome oxidase stains, Golgi stains, and myelin stains) along with modern synchrotron X-ray tomography. This work will be of interest to evolutionary neurobiologists, comparative neuroscientists, and the general public, with its fascinating exploration of the brainstem of an icon sensory specialist. 

      Comment: We are incredibly grateful for this positive assessment.

      Changes: None.

      Strengths: 

      - The authors made excellent use of the precious sample materials from 9 captive elephants. 

      - The authors adopt a battery of neuroanatomical techniques to comprehensively characterize the structure of the trigeminal subnuclei and properly re-examine the "inferior olive".

      - Based on their exceptional histological preparation, the authors reveal broadly segregated patterns of metabolic activity, similar to the classical "barrel" organization related to rodent whiskers. 

      Comment: The referee provides a concise summary of our findings.

      Changes: None.

      Weaknesses: 

      - As the authors acknowledge, somewhat limited functional description can be provided using histological analysis (compared to more invasive techniques). 

      - The correlation between myelinated stripes and trunk fold patterns is intriguing, and Figure 4 presents this idea beautifully. I wonder - is the number of stripes consistent with the number of trunk folds? Does this hold for both species? 

      Comment: We agree with the referee’s assessment. We note that cytochrome-oxidase staining is an at least partially functional stain, as it reveals constitutive metabolic activity. A significant problem of the work in elephants is that our recording possibilities are limited, which in turn limits functional analysis. As indicated in Figure 5 (our former Figure 4) for the African elephant Indra, there was an excellent match of trunk folds and myelin stripes. Asian elephants have more, and less conspicuous trunk folds than African elephants. As illustrated in Figure 7, Asian elephants have more, and less conspicuous myelin stripes. Thus, species differences in myelin stripes correlate with species differences in trunk folds.

      Changes: We clarify the relation of myelin stripe and trunk fold patterns in our description of Figure 7.

      Reviewer #2 (Public Review): 

      The authors describe what they assert to be a very unusual trigeminal nuclear complex in the brainstem of elephants, and based on this, follow with many speculations about how the trigeminal nuclear complex, as identified by them, might be organized in terms of the sensory capacity of the elephant trunk.

      Comment: We agree with the referee’s assessment that the putative trigeminal nucleus described in our paper is highly unusual in size, position, vascularization, and myeloarchitecture. This is why we wrote this paper. We think these unusual features reflect the unique facial specializations of elephants, i.e. their highly derived trunk. Because we have no access to recordings from the elephant brainstem, we cannot back up all our functional interpretations with electrophysiological evidence; it is therefore fair to call them speculative.

      Changes: None.

      The identification of the trigeminal nuclear complex/inferior olivary nuclear complex in the elephant brainstem is the central pillar of this manuscript from which everything else follows, and if this is incorrect, then the entire manuscript fails, and all the associated speculations become completely unsupported. 

      Comment: We agree.

      Changes: None.

      The authors note that what they identify as the trigeminal nuclear complex has been identified as the inferior olivary nuclear complex by other authors, citing Shoshani et al. (2006; 10.1016/j.brainresbull.2006.03.016) and Maseko et al (2013; 10.1159/000352004), but fail to cite either Verhaart and Kramer (1958; PMID 13841799) or Verhaart (1962; 10.1515/9783112519882-001). These four studies are in agreement, but the current study differs.

      Comment & Change: We were not aware of the papers of Verhaart and included them in the revised manusript.

      Let's assume for the moment that the four previous studies are all incorrect and the current study is correct. This would mean that the entire architecture and organization of the elephant brainstem is significantly rearranged in comparison to ALL other mammals, including humans, previously studied (e.g. Kappers et al. 1965, The Comparative Anatomy of the Nervous System of Vertebrates, Including Man, Volume 1 pp. 668-695) and the closely related manatee (10.1002/ar.20573). This rearrangement necessitates that the trigeminal nuclei would have had to "migrate" and shorten rostrocaudally, specifically and only, from the lateral aspect of the brainstem where these nuclei extend from the pons through to the cervical spinal cord (e.g. the Paxinos and Watson rat brain atlases), the to the spatially restricted ventromedial region of specifically and only the rostral medulla oblongata. According to the current paper, the inferior olivary complex of the elephant is very small and located lateral to their trigeminal nuclear complex, and the region from where the trigeminal nuclei are located by others appears to be just "lateral nuclei" with no suggestion of what might be there instead.

      Comment: We have three comments here:

      (1) The referee correctly notes that we argue the elephant brainstem underwent fairly major rearrangements. In particular, we argue that the elephant inferior olive was displaced laterally, by a very large cell mass, which we argue is an unusually large trigeminal nucleus. To our knowledge, such a large compact cell mass is not seen in the ventral brain stem of any other mammal.

      (2) The referee makes it sound as if it is our private idea that the elephant brainstem underwent major rearrangements and that the rest of the evidence points to a conventional ‘rodent-like’ architecture. This is far from the truth, however. Already from the outside appearance (see our Figure 1B and Figure 7A) it is clear that the elephant brainstem has huge ventral bumps not seen in any other mammal. An extraordinary architecture also holds at the organizational level of nuclei. Specifically, the facial nucleus – the most carefully investigated nucleus in the elephant brainstem – has an appearance distinct from that of the facial nuclei of all other mammals (Maseko et al., 2013; Kaufmann et al., 2022). If both the overall shape and the constituting nuclei of the brainstem are very different from other mammals, it is very unlikely if not impossible that the elephant brainstem follows in all regards a conventional ‘rodent-like’ architecture.

      (3) The inferior olive is an impressive nucleus in the partitioning scheme we propose (Figure 2). In fact – together with the putative trigeminal nucleus we describe – it’s the most distinctive nucleus in the elephant brainstem. We have not done volumetric measurements and cell counts here, but think this is an important direction for future work. What has informed our work is that the inferior olive nucleus we describe has the serrated organization seen in the inferior olive of all mammals. We will discuss these matters in depth below.

      Changes: None.

      Such an extraordinary rearrangement of brainstem nuclei would require a major transformation in the manner in which the mutations, patterning, and expression of genes and associated molecules during development occur. Such a major change is likely to lead to lethal phenotypes, making such a transformation extremely unlikely. Variations in mammalian brainstem anatomy are most commonly associated with quantitative changes rather than qualitative changes (10.1016/B978-0-12-804042-3.00045-2). 

      Comment: We have two comments here:

      (1) The referee claims that it is impossible that the elephant brainstem differs from a conventional brainstem architecture because this would lead to lethal phenotypes etc. Following our previous response, this argument does not hold. It is out of the question that the elephant brainstem looks very different from the brainstem of other mammals. Yet, it is also evident that elephants live. The debate we need to have is not if the elephant brainstem differs from other mammals, but how it differs from other mammals.

      (2) In principle we agree with the referee’s thinking that the model of the elephant brainstem that is most likely to be correct is the one that requires the least amount of rearrangements to other mammals. We therefore prepared a comparison of the model the referee is proposing (Maseko et al., 2013; see Referee Table 1 below) with our proposition. We scored these models on their similarity to other mammals. We find that the referee’s ideas (Maseko et al., 2013) require more rearrangements relative to other mammals than our suggestion.

      Changes: Inclusion of Referee Table 1, which we discuss in depth below.

      The impetus for the identification of the unusual brainstem trigeminal nuclei in the current study rests upon a previous study from the same laboratory (10.1016/j.cub.2021.12.051) that estimated that the number of axons contained in the infraorbital branch of the trigeminal nerve that innervate the sensory surfaces of the trunk is approximately 400 000. Is this number unusual? In a much smaller mammal with a highly specialized trigeminal system, the platypus, the number of axons innervating the sensory surface of the platypus bill skin comes to 1 344 000 (10.1159. Yet, there is no complex rearrangement of the brainstem trigeminal nuclei in the brain of the developing or adult platypus (Ashwell, 2013, Neurobiology of Monotremes), despite the brainstem trigeminal nuclei being very large in the platypus (10.1159/000067195). Even in other large-brained mammals, such as large whales that do not have a trunk, the number of axons in the trigeminal nerve ranges between 400,000 and 500,000 (10.1007. The lack of comparative support for the argument forwarded in the previous and current study from this laboratory, and that the comparative data indicates that the brainstem nuclei do not change in the manner suggested in the elephant, argues against the identification of the trigeminal nuclei as outlined in the current study. Moreover, the comparative studies undermine the prior claim of the authors, informing the current study, that "the elephant trigeminal ganglion ... point to a high degree of tactile specialization in elephants" (10.1016/j.cub.2021.12.051). While clearly, the elephant has tactile sensitivity in the trunk, it is questionable as to whether what has been observed in elephants is indeed "truly extraordinary".

      Comment: These comments made us think that the referee is not talking about the paper we submitted, but that the referee is talking about us and our work in general. Specifically, the referee refers to the platypus and other animals dismissing our earlier work, which argued for a high degree of tactile specialization in elephants. We think the referee’s intuitions are wrong and our earlier work is valid.

      Changes: We prepared a Author response image 1 (below) that puts the platypus brain, a monkey brain, and the elephant trigeminal ganglion (which contains a large part of the trunk innervating cells) in perspective.

      Author response image 1.

      The elephant trigeminal ganglion is comparatively large. Platypus brain, monkey brain, and elephant ganglion. The elephant has two trigeminal ganglia, which contain the first-order somatosensory neurons. They serve mainly for tactile processing and are large compared to a platypus brain (from the comparative brain collection) and are similar in size to a monkey brain. The idea that elephants might be highly specialized for trunk touch is also supported by the analysis of the sensory nerves of these animals (Purkart et al., 2022). Specifically, we find that the infraorbital nerve (which innervates the trunk) is much thicker than the optic nerve (which mediates vision) and the vestibulocochlear nerve (which mediates hearing). Thus, not everything is large about elephants; instead, the data argue that these animals are heavily specialized for trunk touch.

      But let's look more specifically at the justification outlined in the current study to support their identification of the unusually located trigeminal sensory nuclei of the brainstem. 

      (1) Intense cytochrome oxidase reactivity.

      (2) Large size of the putative trunk module.

      (3) Elongation of the putative trunk module.

      (4) The arrangement of these putative modules corresponds to elephant head

      anatomy. 

      (5) Myelin stripes within the putative trunk module that apparently match trunk folds. <br /> (6) Location apparently matches other mammals.

      (7) Repetitive modular organization apparently similar to other mammals. <br /> (8) The inferior olive described by other authors lacks the lamellated appearance of this structure in other mammals.

      Comment: We agree those are key issues.

      Changes: None.

      Let's examine these justifications more closely.

      (1) Cytochrome oxidase histochemistry is typically used as an indicative marker of neuronal energy metabolism. The authors indicate, based on the "truly extraordinary" somatosensory capacities of the elephant trunk, that any nuclei processing this tactile information should be highly metabolically active, and thus should react intensely when stained for cytochrome oxidase. We are told in the methods section that the protocols used are described by Purkart et al (2022) and Kaufmann et al (2022). In neither of these cited papers is there any description, nor mention, of the cytochrome oxidase histochemistry methodology, thus we have no idea of how this histochemical staining was done. To obtain the best results for cytochrome oxidase histochemistry, the tissue is either processed very rapidly after buffer perfusion to remove blood or in recently perfusion-fixed tissue (e.g., 10.1016/0165-0270(93)90122-8). Given: (1) the presumably long post-mortem interval between death and fixation - "it often takes days to dissect elephants"; (2) subsequent fixation of the brains in 4% paraformaldehyde for "several weeks"; (3) The intense cytochrome oxidase reactivity in the inferior olivary complex of the laboratory rat (Gonzalez-Lima, 1998, Cytochrome oxidase in neuronal metabolism and Alzheimer's diseases); and (4) The lack of any comparative images from other stained portions of the elephant brainstem; it is difficult to support the justification as forwarded by the authors. The histochemical staining observed is likely background reactivity from the use of diaminobenzidine in the staining protocol. Thus, this first justification is unsupported. 

      Comment: The referee correctly notes the description of our cytochrome-oxidase reactivity staining was lacking. This is a serious mistake of ours for which we apologize very much. The referee then makes it sound as if we messed up our cytochrome-oxidase staining, which is not the case. All successful (n = 3; please see our technical comments in the recommendation section) cytochrome-oxidase stainings were done with elephants with short post-mortem times (≤ 2 days) to brain removal/cooling and only brief immersion fixation (≤ 1 day). Cytochrome-oxidase reactivity in elephant brains appears to be more sensitive to quenching by fixation than is the case for rodent brains. We think it is a good idea to include a cytochrome-oxidase staining overview picture because we understood from the referee’s comments that we need to compare our partitioning scheme of the brainstem with that of other authors. To this end, we add a cytochrome-oxidase staining overview picture (Author response image 3) along with an alternative interpretation from Maseko et al., 2013.

      Changes: (1) We added details on our cytochrome-oxidase reactivity staining protocol and the cytochrome-oxidase reactivity in the elephant brain in the manuscript and in our response to the general recommendations.

      (2) We provide a detailed discussion of the technicalities of cytochrome-oxidase staining below in the recommendation section, where the referee raised further criticisms.

      (3) We include a cytochrome-oxidase staining overview picture (Author response image 2) along with an alternative interpretation from Maseko et al., 2013.

      Author response image 2.

      Cytochrome-oxidase staining overview. Coronal cytochrome-oxidase staining overview from African elephant cow Indra; the section is taken a few millimeters posterior to the facial nucleus. Brown is putatively neural cytochrome-reactivity, and white is the background. Black is myelin diffraction and (seen at higher resolution, when you zoom in) erythrocyte cytochrome-reactivity in blood vessels (see our Figure 1E-G); such blood vessel cytochrome-reactivity is seen, because we could not perfuse the animal. There appears to be a minimal outside-in-fixation artifact (i.e. a more whitish/non-brownish appearance of the section toward the borders of the brain). This artifact is not seen in sections from Indra that we processed earlier or in other elephant brains processed at shorter post-mortem/fixation delays (see our Figure 1C).

      The same structures can be recognized in Author response image 2 and Supplememntary figure 36 of Maseko et al. (2013). The section is taken at an anterior-posterior level, where we encounter the trigeminal nuclei in pretty much all mammals. Note that the neural cytochrome reactivity is very high, in what we refer to as the trigeminal-nuclei-trunk-module and what Maseko et al. refer to as inferior olive. Myelin stripes can be recognized here as white omissions.

      At the same time, the cytochrome-oxidase-reactivity is very low in what Maseko et al. refer to as trigeminal nuclei. The indistinct appearance and low cytochrome-oxidase-reactivity of the trigeminal nuclei in the scheme of Maseko et al. (2013) is unexpected because trigeminal nuclei stain intensely for cytochrome-oxidase-reactivity in most mammals and because the trigeminal nuclei represent the elephant’s most important body part, the trunk. Staining patterns of the trigeminal nuclei as identified by Maseko et al. (2013) are very different at more posterior levels; we will discuss this matter below.

      Justifications (2), (3), and (4) are sequelae from justification (1). In this sense, they do not count as justifications, but rather unsupported extensions. 

      Comment: These are key points of our paper that the referee does not discuss.

      Changes: None.

      (4) and (5) These are interesting justifications, as the paper has clear internal contradictions, and (5) is a sequelae of (4). The reader is led to the concept that the myelin tracts divide the nuclei into sub-modules that match the folding of the skin on the elephant trunk. One would then readily presume that these myelin tracts are in the incoming sensory axons from the trigeminal nerve. However, the authors note that this is not the case: "Our observations on trunk module myelin stripes are at odds with this view of myelin. Specifically, myelin stripes show no tapering (which we would expect if axons divert off into the tissue). More than that, there is no correlation between myelin stripe thickness (which presumably correlates with axon numbers) and trigeminal module neuron numbers. Thus, there are numerous myelinated axons, where we observe few or no trigeminal neurons. These observations are incompatible with the idea that myelin stripes form an axonal 'supply' system or that their prime function is to connect neurons. What do myelin stripe axons do, if they do not connect neurons? We suggest that myelin stripes serve to separate rather than connect neurons." So, we are left with the observation that the myelin stripes do not pass afferent trigeminal sensory information from the "truly extraordinary" trunk skin somatic sensory system, and rather function as units that separate neurons - but to what end? It appears that the myelin stripes are more likely to be efferent axonal bundles leaving the nuclei (to form the olivocerebellar tract). This justification is unsupported.

      Comment: The referee cites some of our observations on myelin stripes, which we find unusual. We stand by the observations and comments. The referee does not discuss the most crucial finding we report on myelin stripes, namely that they correspond remarkably well to trunk folds.

      Changes: None.

      (6) The authors indicate that the location of these nuclei matches that of the trigeminal nuclei in other mammals. This is not supported in any way. In ALL other mammals in which the trigeminal nuclei of the brainstem have been reported they are found in the lateral aspect of the brainstem, bordered laterally by the spinal trigeminal tract. This is most readily seen and accessible in the Paxinos and Watson rat brain atlases. The authors indicate that the trigeminal nuclei are medial to the facial nerve nucleus, but in every other species, the trigeminal sensory nuclei are found lateral to the facial nerve nucleus. This is most salient when examining a close relative, the manatee (10.1002/ar.20573), where the location of the inferior olive and the trigeminal nuclei matches that described by Maseko et al (2013) for the African elephant. This justification is not supported. 

      Comment: The referee notes that we incorrectly state that the position of the trigeminal nuclei matches that of other mammals. We think this criticism is justified.

      Changes: We prepared a comparison of the Maseko et al. (2013) scheme of the elephant brainstem with our scheme of the elephant brainstem (see below Referee Table 1). Here we acknowledge the referee’s argument and we also changed the manuscript accordingly.

      (7) The dual to quadruple repetition of rostrocaudal modules within the putative trigeminal nucleus as identified by the authors relies on the fact that in the neurotypical mammal, there are several trigeminal sensory nuclei arranged in a column running from the pons to the cervical spinal cord, these include (nomenclature from Paxinos and Watson in roughly rostral to caudal order) the Pr5VL, Pr5DM, Sp5O, Sp5I, and Sp5C. However, these nuclei are all located far from the midline and lateral to the facial nerve nucleus, unlike what the authors describe in the elephants. These rostrocaudal modules are expanded upon in Figure 2, and it is apparent from what is shown that the authors are attributing other brainstem nuclei to the putative trigeminal nuclei to confirm their conclusion. For example, what they identify as the inferior olive in Figure 2D is likely the lateral reticular nucleus as identified by Maseko et al (2013). This justification is not supported.

      Comment: The referee again compares our findings to the scheme of Maseko et al. (2013) and rejects our conclusions on those grounds. We think such a comparison of our scheme is needed, indeed.

      Changes: We prepared a comparison of the Maseko et al. (2013) scheme of the elephant brainstem with our scheme of the elephant brainstem (see below Referee Table 1).

      (8) In primates and related species, there is a distinct banded appearance of the inferior olive, but what has been termed the inferior olive in the elephant by other authors does not have this appearance, rather, and specifically, the largest nuclear mass in the region (termed the principal nucleus of the inferior olive by Maseko et al, 2013, but Pr5, the principal trigeminal nucleus in the current paper) overshadows the partial banded appearance of the remaining nuclei in the region (but also drawn by the authors of the current paper). Thus, what is at debate here is whether the principal nucleus of the inferior olive can take on a nuclear shape rather than evince a banded appearance. The authors of this paper use this variance as justification that this cluster of nuclei could not possibly be the inferior olive. Such a "semi-nuclear/banded" arrangement of the inferior olive is seen in, for example, giraffe (10.1016/j.jchemneu.2007.05.003), domestic dog, polar bear, and most specifically the manatee (a close relative of the elephant) (brainmuseum.org; 10.1002/ar.20573). This justification is not supported. 

      Comment: We carefully looked at the brain sections referred to by the referee in the brainmuseum.org collection. We found contrary to the referee’s claims that dogs, polar bears, and manatees have a perfectly serrated (a cellular arrangement in curved bands) appearance of the inferior olive. Accordingly, we think the referee is not reporting the comparative evidence fairly and we wonder why this is the case.

      Changes: None.

      Thus, all the justifications forwarded by the authors are unsupported. Based on methodological concerns, prior comparative mammalian neuroanatomy, and prior studies in the elephant and closely related species, the authors fail to support their notion that what was previously termed the inferior olive in the elephant is actually the trigeminal sensory nuclei. Given this failure, the justifications provided above that are sequelae also fail. In this sense, the entire manuscript and all the sequelae are not supported.

      Comment: We disagree. To summarize:

      (1) Our description of the cytochrome oxidase staining lacked methodological detail, which we have now added; the cytochrome oxidase reactivity data are great and support our conclusions.

      (2)–(5)The referee does not really discuss our evidence on these points.

      (6) We were wrong and have now fixed this mistake.

      (7) The referee asks for a comparison to the Maseko et al. (2013) scheme (agreed, see Referee Table 1).

      (8) The referee bends the comparative evidence against us.

      Changes: None.

      A comparison of the elephant brainstem partitioning schemes put forward by Maseko et al 2013 and by Reveyaz et al.

      To start with, we would like to express our admiration for the work of Maseko et al. (2013). These authors did pioneering work on obtaining high-quality histology samples from elephants. Moreover, they made a heroic neuroanatomical effort, in which they assigned 147 brain structures to putative anatomical entities. Most of their data appear to refer to staining in a single elephant and one coronal sectioning plane. The data quality and the illustration of results are excellent.

      We studied mainly two large nuclei in six (now 7) elephants in three (coronal, parasagittal, and horizontal) sectioning planes. The two nuclei in question are the two most distinct nuclei in the elephant brainstem, namely an anterior ventromedial nucleus (the trigeminal trunk module in our terminology; the inferior olive in the terminology of Maseko et al., 2013) and a more posterior lateral nucleus (the inferior olive in our terminology; the posterior part of the trigeminal nuclei in the terminology of Maseko et al., 2013).

      Author response image 3 gives an overview of the two partitioning schemes for inferior olive/trigeminal nuclei along with the rodent organization (see below).

      Author response image 3.

      Overview of the brainstem organization in rodents & elephants

      The strength of the Maseko et al. (2013) scheme is the excellent match of the position of elephant nuclei to the position of nuclei in the rodent (Author response image 3). We think this positional match reflects the fact that Maseko et al. (2013) mapped a rodent partitioning scheme on the elephant brainstem. To us, this is a perfectly reasonable mapping approach. As the referee correctly points out, the positional similarity of both elephant inferior olive and trigeminal nuclei to the rodent strongly argues in favor of the Maseko et al. (2013), because brainstem nuclei are positionally very conservative.

      Other features of the Maseko et al. (2013) scheme are less favorable. The scheme marries two cyto-architectonically very distinct divisions (an anterior indistinct part) and a super-distinct serrated posterior part to be the trigeminal nuclei. We think merging entirely distinct subdivisions into one nucleus is a byproduct of mapping a rodent partitioning scheme on the elephant brainstem. Neither of the two subdivisions resemble the trigeminal nuclei of other mammals. The cytochrome oxidase staining patterns differ markedly across the anterior indistinct part (see our Author response image 3) and the posterior part of the trigeminal nuclei and do not match with the intense cytochrome oxidase reactivity of other mammalian trigeminal nuclei (Author response image 2). Our anti-peripherin staining (the novel Figure 2 of our manuscript) indicates that there probably no climbing fibers, in what Maseko et al. think. is inferior olive; this is a potentially fatal problem for the hypothesis. The posterior part of Maseko et al. (2013) trigeminal nuclei has a distinct serrated appearance that is characteristic of the inferior olive in other mammals. Moreover, the inferior olive of Maseko et al. (2013) lacks the serrated appearance of the inferior olive seen in pretty much all mammals; this is a serious problem.

      The partitioning scheme of Reveyaz et al. comes with poor positional similarity but avoids the other problems of the Maseko et al. (2013) scheme. Our explanation for the positionally deviating location of trigeminal nuclei is that the elephant grew one of the if not the largest trigeminal systems of all mammals. As a result, the trigeminal nuclei grew through the floor of the brainstem. We understand this is a post hoc just-so explanation, but at least it is an explanation.

      The scheme of Reveyaz et al. was derived in an entirely different way from the Maseko model. Specifically, we were convinced that the elephant trigeminal nuclei ought to be very special because of the gigantic trigeminal ganglia (Purkart et al., 2022). Cytochrome-oxidase staining revealed a large distinct nucleus with an elongated shape. Initially, we were freaked out by the position of the nucleus and the fact that it was referred to as inferior olive by other authors. When we found an inferior-olive-like nucleus at a nearby (although at an admittedly unusual) location, we were less worried. We then optimized the visualization of myelin stripes (brightfield imaging etc.) and were able to collect an entire elephant trunk along with the brain (African elephant cow Indra). When we made the one-to-one match of Indra’s trunk folds and myelin stripes (former Figure 4, now Figure 5) we were certain that we had identified the trunk module of the trigeminal nuclei. We already noted at the outset of our rebuttal that we now consider such certainty a fallacy of overconfidence. In light of the comments of Referee 2, we feel that a further discussion of our ideas is warranted.

      A strength of the Reveyaz model is that nuclei look like single anatomical entities. The trigeminal nuclei look like trigeminal nuclei of other mammals, the trunk module has a striking resemblance to the trunk and the inferior olive looks like the inferior olive of other mammals.

      We evaluated the fit of the two models in the form of a table (Author response table 1; below). Unsurprisingly, Author response table 1 aligns with our views of elephant brainstem partitioning.

      Author response table 1

      Qualitative evaluation of elephant brainstem partitioning schemes

      ++ = Very attractive; + = attractive; - = unattractive; -- = very unattractive

      We scored features that are clear and shared by all mammals – as far as we know them – as very attractive.

      We scored features that are clear and are not shared by all mammals – as far as we know them – as very unattractive.

      Attractive features are either less clear or less well-shared features.

      Unattractive features are either less clear or less clearly not shared features.

      Author response table 1 suggests two conclusions to us. (i) The Reveyaz et al. model has mainly favorable properties. The Maseko et al. (2013) model has mainly unfavorable properties. Hence, the Reveyaz et al. model is more likely to be true. (ii) The outcome is not black and white, i.e., both models have favorable and unfavorable properties. Accordingly, we overstated our case in our initial submission and toned down our claims in the revised manuscript.

      What the authors have not done is to trace the pathway of the large trigeminal nerve in the elephant brainstem, as was done by Maseko et al (2013), which clearly shows the internal pathways of this nerve, from the branch that leads to the fifth mesencephalic nucleus adjacent to the periventricular grey matter, through to the spinal trigeminal tract that extends from the pons to the spinal cord in a manner very similar to all other mammals. Nor have they shown how the supposed trigeminal information reaches the putative trigeminal nuclei in the ventromedial rostral medulla oblongata. These are but two examples of many specific lines of evidence that would be required to support their conclusions. Clearly, tract tracing methods, such as cholera toxin tracing of peripheral nerves cannot be done in elephants, thus the neuroanatomy must be done properly and with attention to detail to support the major changes indicated by the authors. 

      Comment: The referee claims that Maseko et al. (2013) showed by ‘tract tracing’ that the structures they refer to trigeminal nuclei receive trigeminal input. This statement is at least slightly misleading. There is nothing of what amounts to proper ‘tract tracing’ in the Maseko et al. (2013) paper, i.e. tracing of tracts with post-mortem tracers. We tried proper post-mortem tracing but failed (no tracer transport) probably as a result of the limitations of our elephant material. What Maseko et al. (2013) actually did is look a bit for putative trigeminal fibers and where they might go. We also used this approach. In our hands, such ‘pseudo tract tracing’ works best in unstained material under bright field illumination, because myelin is very well visualized. In such material, we find: (i) massive fiber tracts descending dorsoventrally roughly from where both Maseko et al. 2013 and we think the trigeminal tract runs. (ii) These fiber tracts run dorsoventrally and approach, what we think is the trigeminal nuclei from lateral.

      Changes: Ad hoc tract tracing see above.

      So what are these "bumps" in the elephant brainstem? 

      Four previous authors indicate that these bumps are the inferior olivary nuclear complex. Can this be supported?

      The inferior olivary nuclear complex acts "as a relay station between the spinal cord (n.b. trigeminal input does reach the spinal cord via the spinal trigeminal tract) and the cerebellum, integrating motor and sensory information to provide feedback and training to cerebellar neurons" (https://www.ncbi.nlm.nih.gov/books/NBK542242/). The inferior olivary nuclear complex is located dorsal and medial to the pyramidal tracts (which were not labeled in the current study by the authors but are clearly present in Fig. 1C and 2A) in the ventromedial aspect of the rostral medulla oblongata. This is precisely where previous authors have identified the inferior olivary nuclear complex and what the current authors assign to their putative trigeminal nuclei. The neurons of the inferior olivary nuclei project, via the olivocerebellar tract to the cerebellum to terminate in the climbing fibres of the cerebellar cortex.

      Comment: We agree with the referee that in the Maseko et al. (2013) scheme the inferior olive is exactly where we expect it from pretty much all other mammals. Hence, this is a strong argument in favor of the Maseko et al. (2013) scheme and a strong argument against the partitioning scheme suggested by us.

      Changes: Please see our discussion above.

      Elephants have the largest (relative and absolute) cerebellum of all mammals (10.1002/ar.22425), this cerebellum contains 257 x109 neurons (10.3389/fnana.2014.00046; three times more than the entire human brain, 10.3389/neuro.09.031.2009). Each of these neurons appears to be more structurally complex than the homologous neurons in other mammals (10.1159/000345565; 10.1007/s00429-010-0288-3). In the African elephant, the neurons of the inferior olivary nuclear complex are described by Maseko et al (2013) as being both calbindin and calretinin immunoreactive. Climbing fibres in the cerebellar cortex of the African elephant are clearly calretinin immunopositive and also are likely to contain calbindin (10.1159/000345565). Given this, would it be surprising that the inferior olivary nuclear complex of the elephant is enlarged enough to create a very distinct bump in exactly the same place where these nuclei are identified in other mammals? 

      Comment: We agree with the referee that it is possible and even expected from other mammals that there is an enlargement of the inferior olive in elephants. Hence, a priori one might expect the ventral brain stem bumps to the inferior olive, this is perfectly reasonable and is what was done by previous authors. The referee also refers to calbindin and calretinin antibody reactivity. Such antibody reactivity is indeed in line with the referee’s ideas and we considered these findings in our Referee Table 1. The problem is, however, that neither calbindin nor calretinin antibody reactivity are highly specific and indeed both nuclei in discussion (trigeminal nuclei and inferior olive) show such reactivity. Unlike the peripherin-antibody staining advanced by us, calbindin nor calretinin antibody reactivity cannot distinguish the two hypotheses debated.

      Changes: Please see our discussion above.

      What about the myelin stripes? These are most likely to be the origin of the olivocerebellar tract and probably only have a coincidental relationship with the trunk. Thus, given what we know, the inferior olivary nuclear complex as described in other studies, and the putative trigeminal nuclear complex as described in the current study, is the elephant inferior olivary nuclear complex. It is not what the authors believe it to be, and they do not provide any evidence that discounts the previous studies. The authors are quite simply put, wrong. All the speculations that flow from this major neuroanatomical error are therefore science fiction rather than useful additions to the scientific literature. 

      Comment: It is unlikely that the myelin stripes are the origin of the olivocerebellar tract as suggested by the referee. Specifically, the lack of peripherin-reactivity indicates that these fibers are not climbing fibers (our novel Figure 2). In general, we feel the referee does not want to discuss the myelin stripes and obviously thinks we made up the strange correspondence of myelin stripes and trunk folds.

      Changes: Please see our discussion above.

      What do the authors actually have? 

      The authors have interesting data, based on their Golgi staining and analysis, of the inferior olivary nuclear complex in the elephant.

      Comment: The referee reiterates their views.

      Changes: None.

      Reviewer #3 (Public Review):

      Summary: 

      The study claims to investigate trunk representations in elephant trigeminal nuclei located in the brainstem. The researchers identified large protrusions visible from the ventral surface of the brainstem, which they examined using a range of histological methods. However, this ventral location is usually where the inferior olivary complex is found, which challenges the author's assertions about the nucleus under analysis. They find that this brainstem nucleus of elephants contains repeating modules, with a focus on the anterior and largest unit which they define as the putative nucleus principalis trunk module of the trigeminal. The nucleus exhibits low neuron density, with glia outnumbering neurons significantly. The study also utilizes synchrotron X-ray phase contrast tomography to suggest that myelin-stripe-axons traverse this module. The analysis maps myelin-rich stripes in several specimens and concludes that based on their number and patterning they likely correspond with trunk folds; however, this conclusion is not well supported if the nucleus has been misidentified.

      Comment: The referee gives a concise summary of our findings. The referee acknowledges the depth of our analysis and also notes our cellular results. The referee – in line with the comments of Referee 2 – also points out that a misidentification of the nucleus under study is potentially fatal for our analysis. We thank the referee for this fair assessment.

      Changes: We feel that we need to alert the reader more broadly to the misidentification concern. We think the critical comments of Referee 2, which will be published along with our manuscript, will go a long way in doing so. We think the eLife publishing format is fantastic in this regard. We will also include pointers to these concerns in the revised manuscript.

      Strengths: 

      The strength of this research lies in its comprehensive use of various anatomical methods, including Nissl staining, myelin staining, Golgi staining, cytochrome oxidase labeling, and synchrotron X-ray phase contrast tomography. The inclusion of quantitative data on cell numbers and sizes, dendritic orientation and morphology, and blood vessel density across the nucleus adds a quantitative dimension. Furthermore, the research is commendable for its high-quality and abundant images and figures, effectively illustrating the anatomy under investigation.

      Comment: Again, a very fair and balanced set of comments. We are thankful for these comments.

      Changes: None.

      Weaknesses: 

      While the research provides potentially valuable insights if revised to focus on the structure that appears to be the inferior olivary nucleus, there are certain additional weaknesses that warrant further consideration. First, the suggestion that myelin stripes solely serve to separate sensory or motor modules rather than functioning as an "axonal supply system" lacks substantial support due to the absence of information about the neuronal origins and the termination targets of the axons. Postmortem fixed brain tissue limits the ability to trace full axon projections. While the study acknowledges these limitations, it is important to exercise caution in drawing conclusions about the precise role of myelin stripes without a more comprehensive understanding of their neural connections.

      Comment: The referee points out a significant weakness of our study, namely our limited understanding of the origin and targets of the axons constituting the myelin stripes. We are very much aware of this problem and this is also why we directed high-powered methodology like synchrotron X-ray tomograms to elucidate the structure of myelin stripes. Such analysis led to advances, i.e., we now think, what looks like stripes are bundles and we understand the constituting axons tend to transverse the module. Such advances are insufficient, however, to provide a clear picture of myelin stripe connectivity.

      Changes: We think solving the problems raised by the referee will require long-term methodological advances and hence we will not be able to solve these problems in the current revision. Our long-term plans for confronting these issues are the following: (i) Improving our understanding of long-range connectivity by post-mortem tracing and MR-based techniques such as Diffusion-Tensor-Imaging. (ii) Improving our understanding of mid and short-range connectivity by applying even larger synchrotron X-ray tomograms and possible serial EM.

      Second, the quantification presented in the study lacks comparison to other species or other relevant variables within the elephant specimens (i.e., whole brain or brainstem volume). The absence of comparative data for different species limits the ability to fully evaluate the significance of the findings. Comparative analyses could provide a broader context for understanding whether the observed features are unique to elephants or more common across species. This limitation in comparative data hinders a more comprehensive assessment of the implications of the research within the broader field of neuroanatomy. Furthermore, the quantitative comparisons between African and Asian elephant specimens should include some measure of overall brain size as a covariate in the analyses. Addressing these weaknesses would enable a richer interpretation of the study's findings.

      Comment: The referee suggests another series of topics, which include the analysis of brain parts volumes or overall brain size. We agree these are important issues, but we also think such questions are beyond the scope of our study.

      Changes: We hope to publish comparative data on elephant brain size and shape later this year.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I realize that elephant brains are a limiting resource in this project, along with the ability to perform functional investigations. However, I believe that Prof. Jon Kaas (Vanderbilt University) has one or more series of Nissl-stained brainstems from elephants. These might be of potential interest, as they were previously used to explore general patterns of trigeminal brainstem organization in a comparative manner (see Sawyer and Sarko, 2017, "Comparative Anatomy and Evolution of the Somatosensory Brain Stem" in the Evolution of Nervous System series) and might shed light on the positioning of the trigeminal complex and IO, with parts of the trigeminal nerve itself still attached to these sections.

      Comment: The referee suggests adding data from more elephants and we think this is a great suggestion because our ns are small. We followed this advice. We agree we need more comparative neuroanatomy of elephants and the urgency of this matter is palpable in the heated debate we have with Referee 2. Specifically, we need more long-range and short-range analysis of elephant brains.

      Changes: We plan to include data in the revised manuscript about cytoarchitectonics (Nissl), cytochrome-oxidase reactivity, and possibly also antibody reactivity from an additional animal, i.e., from the African elephant cow Bibi. The quality of this specimen is excellent and the post-mortem time to brain extraction was very short.

      We also have further plans for connectivity analysis (see our response above), but such data will not become available fast enough for the revision.

      Other recommendations: 

      - A general schematic showing input from trunk to PrV to the trigeminal subnuclei (as well as possibly ascending connections) might be informative to the reader, in terms of showing which neural relay is being examined.

      Comment: We think this is a very good suggestion in principle, but we were not satisfied with the schematics we came up with.

      Changes: None.

      - Perhaps a few more sentences described the significance of synchrotron tomography for those who may be unfamiliar.

      Comment & Change: We agree and implement this suggestion.

      - "Belly-shaped" trunk module description is unclear on page 9. 

      Comment & Change: We clarified this matter.

      - Typo on the last sentence of page 9. 

      Comment & Change: We fixed this mistake.

      Reviewer #2 (Recommendations For The Authors): 

      The data is only appropriate a specialized journal and is limited to the Golgi analysis of neurons within the inferior olivary complex of the elephant. This reviewer considers that the remainder of the work is speculation and that the paper in its current version is not salvageable.

      Comment: Rather than suggesting changes, the referee makes it clear that the referee does not want to see our paper published. We think this desire to reject is not rooted in a lack of quality of our work. In fact, we did an immense amount of work (detailed cytoarchitectonic analysis of six (now seven) elephant brainstems rather than one as in the case of our predecessors), cell counts, and X-ray tomography. Instead, we think the problem is rooted in the fact that we contradict the referee. To us, such suppression of diverging opinions – provided they are backed up with data – is a scientifically deeply unhealthy attitude. Science lives from the debate and this is why we did not exclude any referees even though we knew that our results do not align with the views of all of the few actors in the field.

      Changes: We think the novel eLife publishing scheme was developed to prevent such abuse. We look forward to having our data published along with the harsh comments of the referee. The readers and subsequent scientific work will determine who’s right and who’s wrong.

      In order to convince readers of the grand changes to the organization of the brainstem in a species suggested by the authors the data presented needs to be supported. It is not. 

      Comment: Again, this looks to us like more of the ‘total-rejection-commentary’ than like an actual recommendation.

      Changes: None.

      The protocol for the cytochrome oxidase histochemistry is not available in the locations indicated by the authors, and it is very necessary to provide this, as I fully believe that the staining obtained is not real, given the state of the tissue used. 

      Comment: We apologize again for not including the necessary details on our cytochrome-oxidase staining.

      From these comments (and the initial comments above) it appears that the referee is uncertain about the validity of cytochrome-oxidase staining. We (M.B., the senior author) have been doing this particular stain for approximately three decades. The referee being unfamiliar with cytochrome-oxidase staining is fine, but we can’t comprehend how the referee then comes to the ‘full belief’ that our staining patterns are ‘not real’ when the visual evidence indicates the opposite. We feel the referee does not want to believe our data.

      From hundreds of permutations, we can assure the referee that cytochrome-oxidase staining can go wrong in many ways. The most common failure outcome in elephants is a uniform light brown stain after hours or days of the cytochrome-oxidase reaction. This outcome is closely associated with long ≥2 days post-mortem/fixation times and reflects the quenching of cytochrome-oxidases by fixation. Interestingly, cytochrome-oxidase staining in elephant brains is distinctly more sensitive to quenching by fixation than cytochrome-oxidase staining in rodent brains. Another, more rare failure of cytochrome-oxidase staining comes as entirely white or barely colored sections; this outcome is usually associated with a bad reagent (most commonly old DAB, but occasionally also old or bad catalase, in case you are using a staining protocol with catalase). Another nasty cytochrome-oxidase staining outcome is smeary all-black sections. In this case, a black precipitate sticks to sections and screws up the staining (filtering and more gradual heating of the staining solution usually solve this problem). Thus, you can get uniformly white, uniformly light brown, and smeary black sections as cytochrome-oxidase staining failures. What you never get from cytochrome-oxidase staining as an artifact are sections with a strong brown to lighter brown differential contrast. All sections with strong brown to lighter brown differential contrast (staining successes) show one and the same staining pattern in a given brain area, i.e., brownish barrels in the rodent cortex, brownish barrelettes (trigeminal nuclei) in the rodent brainstem, brownish putative trunk modules/inferior olives (if we believe the referee) in the elephant brainstem. Cytochrome-oxidase reactivity is in this regard remarkably different from antibody staining. In antibody staining you can get all kinds of interesting differential contrast staining patterns, which mean nothing. Such differential contrast artifacts in antibody staining arise as a result of insufficient primary antibody specificity, the secondary antibody binding non-specifically, and of what have you not reasons. The reason that the brown differential contrast of cytochrome-oxidase reaction is pretty much fool-proof, relates to the histochemical staining mechanism, which is based on the supply of specific substrates to a universal mitochondrial enzyme. The ability to reveal mitochondrial metabolism and the universal and ‘fool-proof’ staining qualities make the cytochrome-oxidase reactivity a fantastic tool for comparative neuroscience, where you always struggle with insufficient information about antigen reactivity.

      We also note that the contrast of cytochrome-oxidase reactivity seen in the elephant brainstem is spectacular. As the Referee can see in our Figure 1C we observe a dark brown color in the putative trunk module, with the rest of the brain being close to white. Such striking cytochrome-oxidase reactivity contrast has been observed only very rarely in neuroanatomy: (i) In the rest of the elephant brain (brainstem, thalamus cortex) we did not observe as striking contrast as in the putative trunk module (the inferior olive according to the referee). (ii) In decades of work with rodents, we have rarely seen such differential activity. For example, cortical whisker-barrels (a classic CO-staining target) in rodents usually come out as dark brown against a light brown background.

      What all of this commentary means is that patterns revealed by differential cytochrome-oxidase staining in the elephant brain stem are real.

      Changes: We added details on our cytochrome-oxidase reactivity staining protocol and commented on cytochrome-oxidase reactivity in the elephant brain in general.

      The authors need to recognize that the work done in Africa on elephant brains is of high quality and should not be blithely dismissed by the authors - this stinks of past colonial "glory", especially as the primary author on these papers is an African female.

      Comment: The referee notes that we unfairly dismiss the work of African scientists and that our paper reflects a continuation of our horrific colonial past because we contradict the work of an African woman. We think such commentary is meant to be insulting and prefer to return to the scientific discourse. We are staunch supporters of diversity in science. It is simply untrue, that we do not acknowledge African scientists or the excellent work done in Africa on elephant brains. For example, we cite no less than four papers from the Manger group. We refer countless times in the manuscript to these papers, because these papers are highly relevant to our work. We indeed disagree with two anatomical assignments made by Maseko et al., 2013. Such differences should not be overrated, however. As we noted before, such differences relate to only 2 out of 147 anatomical assignments made by these authors. More generally, discussing and even contradicting papers is the appropriate way to acknowledge scientists. We already expressed we greatly admire the pioneering work of the Manger group. In our view, the perfusion of elephants in the field is a landmark experiment in comparative neuroanatomy. We closely work with colleagues in Africa and find them fantastic collaborators. When the referee is accusing us of contradicting the work of an African woman, the referee is unfairly and wrongly accusing us of attacking a scientist’s identity. More generally, we feel the discussion should focus on the data presented.

      Changes: None.

      In addition, perfusing elephants in the field with paraformaldehyde shortly after death is not a problem "partially solved" when it comes to collecting elephant tissue (n.b., with the right tools the brain of the elephant can be removed in under 2 hours). It means the problem IS solved. This is evidenced by the quality of the basic anatomical, immuno-, and Golgi-staining of the elephant tissue collected in Africa.

      Comment: This is not a recommendation. We repeat: In our view, the perfusion of elephants in the field by the Manger group is a landmark experiment in comparative neuroanatomy. Apart, from that, we think the referee got our ‘partially solved comment’ the wrong way. It is perhaps worthwhile to recall the context of this quote. We first describe the numerous limitations of our elephant material; admitting these limitations is about honesty. Then, we wanted to acknowledge previous authors who either paved the way for elephant neuroanatomy (Shoshani) or did a better job than we did (Manger; see the above landmark experiment). These citations were meant as an appreciation of our predecessors’ work and by far not meant to diminish their work. Why did we say that the problems of dealing with elephant material are only partially solved? Because elephant neuroanatomy is hard and the problems associated with it are by no means solved. Many previous studies rely on single specimen and our possibilities of accessing, removing, processing, and preserving elephant brains are limited and inferior to the conditions elsewhere. Doing a mouse brain is orders of magnitude easier than doing an elephant brain (because the problems of doing mouse anatomy are largely solved), yet it is hard to publish a paper with six elephant brains because the referees expect evidence at least half as good as what you get in mice.

      Changes: We replaced the ‘partially solved’ sentence.

      The authors need to give credit where credit is due - the elephant cerebellum is clearly at the core of controlling trunk movement, and as much as primary sensory and final stage motor processing is important, the complexity required for the neural programs needed to move the trunk either voluntarily or in response to stimuli, is being achieved by the cerebellum. The inferior olive is part of this circuit and is accordingly larger than one would expect.

      Comment: We think it is very much possible that the elephant cerebellum is important in trunk control.

      Changes: We added a reference to the elephant cerebellum in the introduction of our manuscript.

    2. eLife assessment

      This valuable study uses neuroanatomical techniques to investigate somatosensory projections from the elephant trunk to the brainstem. Given its unique specializations, understanding how the elephant trunk is represented within the brain is of general interest to evolutionary and comparative neuroscientists. The authors present solid evidence for the existence of a novel isomorphism in which the folds of the trunk are mapped onto the trigeminal nucleus; however, due to their unusual structure, some uncertainty remains about the identification and anatomical organization of nuclei within the elephant brainstem.

    3. Reviewer #1 (Public Review):

      This manuscript remains an intriguing investigation of the elephant brainstem, with particular attention drawn to possible sensory and motor representation of the renowned trunk of African and Asian elephants. As the authors note, this area has traditionally been identified as part of the superior olivary complex and associated with the fine motor control of the trunk; however, notable patterns within myelin stripes suggest that its parcellation may relate to specific regions/folds found along the long axis of the trunk, including elaborated regions for the trunk "finger" distal end.

      In this iteration of the manuscript, the researchers have provided peripherin antibody staining within the regions they have identified as the trigeminal nucleus and the superior olive. These data, with abundant peripherin expression within climbing fibers of the presumed superior olive and relatively lower expression within the trigeminal nucleus, bolster their interpretation of having comprehensively identified the trigeminal nucleus and trunk representation via a battery of neuroanatomical methods.

      All other conclusions remain the same, and these data have provoked intriguing and animated discussion on classification of neuroanatomical structure, particularly in species with relatively limited access to specimens. Most significantly, these discussions have underscored the fundamental nature of comparative methods (from protein to cellular to anatomical levels), including interpreting homologous structures among species of varying levels of relatedness.

    4. Reviewer #3 (Public Review):

      Summary:

      The study claims to investigate trunk representations in elephant trigeminal nuclei located in the brainstem. The researchers identify large protrusions visible from the ventral surface of the brainstem, which they examined using a range of histological methods. However, this ventral location is usually where the inferior olivary complex is found, which challenges the author's assertions about the nucleus under analysis. They find that this brainstem nucleus of elephants contains repeating modules, with a focus on the anterior and largest unit which they define as the putative nucleus principalis trunk module of the trigeminal. The nucleus exhibits low neuron density, with glia outnumbering neurons significantly. The study also utilizes synchrotron X-ray phase contrast tomography to suggest that myelin-stripe-axons traverse this module. The analysis maps myelin-rich stripes in several specimens and concludes that based on their number and patterning that they likely correspond with trunk folds; however this conclusion is not well supported if the nucleus has been misidentified.

      Strengths:

      The strength of this research lies in its comprehensive use of various anatomical methods, including Nissl staining, myelin staining, Golgi staining, cytochrome oxidase labeling, and synchrotron X-ray phase contrast tomography. The inclusion of quantitative data on cell numbers and sizes, dendritic orientation and morphology, and blood vessel density across the nucleus adds a quantitative dimension. Furthermore, the research is commendable for its high-quality and abundant images and figures, effectively illustrating the anatomy under investigation.

      Weaknesses:

      While the research provides potentially valuable insights if revised to focus on the structure that appears to be inferior olivary nucleus, there are certain additional weaknesses that warrant further consideration. First, the suggestion that myelin stripes solely serve to separate sensory or motor modules rather than functioning as an "axonal supply system" lacks substantial support due to the absence of information about the neuronal origins and the termination targets of the axons. Postmortem fixed brain tissue limits the ability to trace full axon projections. While the study acknowledges these limitations, it is important to exercise caution in drawing conclusions about the precise role of myelin stripes without a more comprehensive understanding of their neural connections.

      Second, the quantification presented in the study lacks comparison to other species or other relevant variables within the elephant specimens (i.e., whole brain or brainstem volume). The absence of comparative data to different species limits the ability to fully evaluate the significance of the findings. Comparative analyses could provide a broader context for understanding whether the observed features are unique to elephants or more common across species. This limitation in comparative data hinders a more comprehensive assessment of the implications of the research within the broader field of neuroanatomy. Furthermore, the quantitative comparisons between African and Asian elephant specimens should include some measure of overall brain size as a covariate in the analyses. Addressing these weaknesses would enable a richer interpretation of the study's findings.

    5. Reviewer #4 (Public Review):

      Summary:

      The authors report a novel isomorphism in which the folds of the elephant trunk are recognizably mapped onto the principal sensory trigeminal nucleus in the brainstem. Further, they identifiy the enlarged nucleus as being situated in this species in an unusual ventral midline position.

      Strengths:

      The identity of the purported trigeminal nucleus and the isomorphic mapping with the trunk folds is supported by multiple lines of evidence: enhanced staining for cytochrome oxidase, an enzyme associated with high metabolic activity; dense vascularization, consistent with high metabolic activity; prominent myelinated bundles that partition the nucleus in a 1:1 mapping of the cutaneous folds in the trunk periphery; near absence of labeling for the anti-peripherin antibody, specific for climbing fibers, which can be seen as expected in the inferior olive; and a high density of glia.

      Weaknesses:

      Despite the supporting evidence listed above, the identification of the gross anatomical bumps, conspicuous in the ventral midline, is problematic. This would be the standard location of the inferior olive, with the principal trigeminal nucleus occupying a more dorsal position. This presents an apparent contradiction which at a minimum needs further discussion. Major species-specific specializations and positional shifts are well-documented for cortical areas, but nuclear layouts in the brainstem have been considered as less malleable.

    6. Reviewer #5 (Public Review):

      After reading the manuscript and the concerns raised by reviewer 2 I see both sides of the argument - the relative location of trigeminal nucleus versus the inferior olive is quite different in elephants (and different from previous studies in elephants), but when there is a large disproportionate magnification of a behaviorally relevant body part at most levels of the nervous system (certainly in the cortex and thalamus), you can get major shifting in location of different structures. In the case of the elephant, it looks like there may be a lot of shifting. Something that is compelling is that the number of modules separated but the myelin bands correspond to the number of trunk folds which is different in the different elephants. This sort of modular division based on body parts is a general principle of mammalian brain organization (demonstrated beautifully for the cuneate and gracile nucleus in primates, VP in most of species, S1 in a variety of mammals such as the star nosed mole and duck-billed platypus). I don't think these relative changes in the brainstem would require major genetic programming - although some surely exists. Rodents and elephants have been independently evolving for over 60 million years so there is a substantial amount of time for changes in each l lineage to occur.

      I agree that the authors have identified the trigeminal nucleus correctly, although comparisons with more out groups would be needed to confirm this (although I'm not suggesting that the authors do this). I also think the new figure (which shows previous divisions of the brainstem versus their own) allows the reader to consider these issues for themselves. When reviewing this paper, I actually took the time to go through atlases of other species and even look at some of my own data from highly derived species. Establishing homology across groups based only on relative location is tough especially when there appears to be large shifts in relative location of structures. My thoughts are that the authors did an extraordinary amount of work on obtaining, processing and analyzing this extremely valuable tissue. They document their work with images of the tissue and their arguments for their divisions are solid. I feel that they have earned the right to speculate - with qualifications - which they provide.

    1. Author response:

      Thank you for organising the review and providing us with the reviewer's feedback. These comments are very useful, and we would like to express our gratitude to the reviewers for their efforts.

      The reviewers all point out a number of related improvements, relating to: 1) describing various processing steps more clearly, in the online documentation but also in the manuscript itself (e.g. for particle picking), 2) describing more clearly what features Ais offers, how these compare to those of other programmes, and how they might be interfaced with in third-party programmes (e.g. the expected format of models), and 3) a degree of subjectivity in discussion of the results presented in the manuscript (e.g. our statement that Pix2pix performed better in some cases than did other architectures).

      We will address these points, as well as the various other suggestions, in the upcoming revised manuscript and updates to Ais.

    1. eLife assessment

      The authors further corroborated their model that Netrin signaling promotes survival and dissemination of non-proliferating ovarian cancer cells. These valuable results were found to be of significant potential interest to cancer biologists in as much as they address gaps in knowledge pertinent to the mechanisms underpinning ovarian cancer spread. In general, it was thought that solid experimental evidence was provided to support the role of Netrin signaling in fueling ovarian cancer progression.

    2. Joint Public Review:

      In this article, the authors employed modified CRISPR screens ["guide-only (GO)-CRISPR"] in the attempt to identify the genes which may mediate cancer cell dormancy in the high grade serous ovarian cancer (HGSOC) spheroid culture models. Using this approach, they observed that abrogation of several of the components of the netrin (e.g., DCC, UNC5Hs) and MAPK pathways compromise survival of non-proliferative ovarian cancer cells. This strategy was complemented by the RNAseq approach which revealed that number of the components of the netrin pathway are upregulated in non-proliferative ovarian cancer cells, and that their overexpression is lost upon disruption of DYRK1A kinase that has been previously demonstrated to play a major role in survival of these cells. Perampalam et al. then employed a battery of cell biology approaches to support the model whereby the Netrin signaling governs the MEK-ERK axis to support survival of non-proliferative ovarian cancer cells. Moreover, the authors show that overexpression of Netrins 1 and 3 bolsters dissemination of ovarian cancer cells in the xenograft mouse model, while also providing evidence that high levels of the aforementioned factors are associated with poor prognosis of HGSOC patients.

      Strengths:

      In this valuable study Perampalam et al. developed a CRISPR-based screening approach to identify key genes that are enriched in high grade serous ovarian cancer spheroids. This led to a discovery that Netrin signaling plays a prominent role in survival of ovarian cancer cells. During revision, the authors provide additional evidence to support their central claims and to this end, it was found that they now provide solid evidence to substantiate the proposed model. This work is anticipated to be of interest to cancer biologists specializing in ovarian cancer biology.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Perampalam et al. describe novel methods for genome-wide CRISPR screening to identify and validate genes essential for HGSOC spheroid viability. In this study, they report that Netrin signaling is essential for maintaining disseminated cancer spheroid survival, wherein overexpression of Netrin pathway genes increases tumor burden in a xenograft model of ovarian cancer. They also show that high netrin expression correlates with poor survival outcomes in ovarian cancer patients. The study provides insights into the biology of netrin signaling in DTC cluster survival and warrants development of therapies to block netrin signaling for treating serous ovarian cancer.

      Strengths:

      - The study identifies Netrin signaling to be important in disseminated cancer spheroid survival

      - A Novel GO-CRISPR methodology was used to find key genes and pathways essential for disseminated cancer cell survival

      Thanks for the endorsement of our work and its importance to metastasis in ovarian cancer.

      Weaknesses:

      - The term dormancy is not fully validated and requires additional confirmation to claim the importance of Netrin signaling in "dormant" cancer survival.

      - Findings shown in the study largely relate to cancer dissemination and DTS survival rather than cancer dormancy.

      Much of the validation of dormancy and cell cycle arrest in HGSOC spheroids, as well as the culture model, have been published previously and hence was not repeated here.  I think this reviewer will appreciate the updated citations and explanations to better illustrate the state of knowledge.  We have also added new experiments that further emphasize the dormant state of spheroid cells in culture and xenografts, as well as patient derived spheroids used in this study.

      Reviewer #1 (Recommendations for Authors):

      (1) It is unclear what spheroid/adherent enrichment ratio is and how it ties into genes affecting cell viability. Why is an ER below 1 the criteria for selecting survival genes?

      Our screen uses the ‘guide only’ comparison in each culture condition to establish a gene score under that specific condition.  A low adherent score captures genes that are essential under standard culture conditions where cells are proliferating and this can include genes needed for proliferation or other basic functions in cell physiology.  A low spheroid score identifies the genes that are most depleted in suspension when cells are growth arrested and this is an indication of cell death in this condition.  Since gene knock outs are first established in adherent proliferating conditions, essential genes under these conditions will already start to become depleted from the population before suspension culture.  By selecting genes with a ratio of <1 we can identify those that are most relevant to dormant suspension culture conditions.  Ultimately, the lowest enrichment ratio scores represent genes whose loss of function is dispensable in the initial adherent condition, but critical for survival in suspension and this is what we aimed to identify. We’ve updated Figure 1B to illustrate this and we’ve updated the explanation of the enrichment ratio on page 6, lines 144 to 147 of the results.

      (2) The WB for phospho-p38 in figure 1A for OVCAR8 line does not show increased phosphorylation in the spheroid relative to the adherent. If anything, phospho-p38 appears to be reduced in the spheroid. Can the authors provide a better western blot?

      We’ve updated this blot with a longer exposure, see Figure 1A.  Phosphorylation levels of p38 are essentially unchanged in OVCAR8 cells in suspension culture, although the overall levels of p38 may be slightly reduced in dormant culture conditions.

      (3) How did the authors confirm dormancy apart from western blot for phospho-ERK vs phospho-p38? Authors should add EdU/BrdU staining and/or Ki67 staining to confirm dormancy.

      Previous publications that appear as citations 7,10, and 33 in the reference list established the growth arrest state of these cells in suspension culture in the past.  This included measuring other known markers of dormancy and quiescence such as p27, p130, and reduced cyclin/cdk activity and 3H-thymidine incorporation. In addition, other associated characteristics of dormancy such as EMT and catabolic metabolism have been demonstrated in these culture conditions (see citation 11 and Rafehi et al. Endocr. Relat. Cancer 23;147-59).  We’ve added these additional citations to our descriptions of dormant spheroid culture to better clarify the status of these cells in our experiments (see page 6, lines 126-28).  To ensure that cells are growth arrested in the experiments shown in this paper, we have updated Figure 1A to include blots of p130 and Ki67 to further emphasize that spheroid cells are not proliferating as the quiescence marker (p130) is high and the proliferative marker (Ki67) is lost in suspension culture.

      (4) Can the authors report spheroid volume over time in culture? How was viability measured?

      We’ve updated the methods (see page 27, line 574) to better highlight the description of cell survival that answers both of these questions. At the ends of experimental time points in both the screen and viability assays we captured live cells by replating on adherent plasticware. We fixed and stained with crystal violet and photographed plates to illustrate the sizes of spheroids (shown in Fig. 2 Supplement 1E, Fig. 6C, and 7D). We subsequently extracted the dye and quantitated it spectrophotometrically to quantitatively compare biomass of viable cells between experiments irrespective of the relatively random shapes of spheroids. We found reattachment and staining in this manner to match traditional viability assays such as CellTiter-Glo in a previous paper (10). Furthermore, biomass never increases in culture and diminishes gradually over time in culture consistent with the non-proliferative state of these experiments. Double checks of this equivalency of viability and reattached biomass measurments, as well as demonstrating that biomass is lost over time, are shown in Fig. 2 Supplement 1E that compares reattached crystal violet staining measurements with CellTiter-Glo for DYRK1A knock out cells over time in culture. In addition, we include a comparison of crystal violet staining of reattached spheroids with trypan blue dye exclusion in Fig. 5G and H. In both cases reattachment and more direct viability assays demonstrate the same conclusion that Netrin signaling supports viability in dormant culture.

      (5) Please show survival significance of Netrin signaling genes in recurrence/relapse free survival to claim importance in cancer dormancy.

      See Fig. 7 Supplement 1C where we include the recurrence free survival data. Netrin-1, and -3 high expressors also have a numerically shorter progression free survival but it is not statistically significant. Netrin-1 overexpression alone is also shown and it shows shorter survival with a P-value of 0.0735. Elevated survival of dormant cells in a residual disease state is expected to increase the chance of relapse and shorten this interval. Thus, this data is consistent with our model, but lacks statistical significance. 

      There are many alternative ways to interpret what shorter progression free survival, or overall survival, may mean biologically. Since survival of dormant cells is but one of them, we also added new data to experimentally investigate the role of endogenous Netrin signaling in dormant residual disease in Fig. 6 and described on page 12, lines 266-87.  We used xenograft experiments to show OVCAR8 spheroids form and withdraw from the cell cycle equivalently to suspension culture following intraperitoneal injection.  Furthermore, loss of Netrin signaling due to receptor deletions compromises survival during this early window before disseminated lesions form.  This argues that Netrin signaling contributes to survival during this window of dormancy.  In addition, mice engrafted with mutant cells experience prolonged survival when Netrin signaling is blocked.  Together, these experiments further argue that Netrin signaling supports survival in the dormant, non-proliferative phase, and leads to reduced survival of mice.

      (6) The authors show IHC staining of patient ascites derived HGSOC spheroids. However, no marker for dormancy is shown in these spheroids. Adding Ki67 staining or phospho-ERK vs phospho-p38 would be necessary to confirm cancer dormancy.

      We have added new staining for Ki67 and p130 that compares these markers in HGSOC tumors where Ki67 is high and p130 is low with ascites derived spheroids where staining is the opposite. Importantly, expression of p130 is linked to cellular quiescence and is not found to accumulate in the nucleus of cells that are just transiting through G1.  This confirms that the ascites derived spheroids are dormant.  See Fig. 4A-E and described on page 9, lines 201-7.

      (7) Overall, the findings are interesting in the context of cancer dissemination. There is not enough evidence for cancer dormancy and the importance of Netrin signaling in the survival of cancer dormancy. Overexpression of Netrin increases phosphorylation of ERK, leading one to expect an increase in proliferation. This suggests that Netrin breaks cancer cells out of dormancy, into a proliferative state.

      We have found that the discovery of Netrin activation of MEK-ERK in growth arrested cells is counterintuitive to many cancer researchers.  However, this axis exists in other paradigms of Netrin signaling in axon outgrowth that are not proliferation related (see citation 26, Forcet et al. Nature 417; 443-7 as an example).  We have added Fig. 5D and descriptions on page 11, lines 244-52 to better clarify that Netrins CAN’T induce cell proliferation through ERK.  Addition of recombinant Netrin-1 can only induce ERK phosphorylation in suspension culture conditions and not in quiescent adherent conditions.  The small magnitude of ERK phosphorylation induced by Netrin-1 in suspension compared to treating adherent, quiescent cells with the same concentration of mitogenic EGF further emphasizes that this is not a proliferative signal.  Lastly, the new xenograft experiment in Fig. 6A-D (described on page 12, lines 266-81 demonstrates the growth arrested context in which Netrin signaling in dormant spheroids leads supports viability.

      (8) If authors wish to claim cancer dormancy as the premise of their study, additional confirmatory experiments are required to support their claims. Alternatively, based on the current findings of the study, it would be best to change the premise of the article to Netrin signaling in cancer dissemination and survival of disseminated cancer spheroids rather than cancer dormancy.

      I expect that this reviewer will agree that we have added more than sufficient explanations of background work on HGSOC spheroid dormancy from the literature, as well as new experiments that address their questions about dormancy in our experiments.

      Reviewer #2 (Public Review):

      Summary:

      In this article, the authors employed modified CRISPR screens ["guide-only (GO)-CRISPR"] in the attempt to identify the genes which may mediate cancer cell dormancy in the high grade serous ovarian cancer (HGSOC) spheroid culture models. Using this approach, they observed that abrogation of several of the components of the netrin (e.g., DCC, UNC5Hs) and MAPK pathways compromise the survival of non-proliferative ovarian cancer cells. This strategy was complemented by the RNAseq approach which revealed that a number of the components of the netrin pathway are upregulated in non-proliferative ovarian cancer cells and that their overexpression is lost upon disruption of DYRK1A kinase that has been previously demonstrated to play a major role in survival of these cells. Perampalam et al. then employed a battery of cell biology approaches to support the model whereby the Netrin signaling governs the MEK-ERK axis to support survival of non-proliferative ovarian cancer cells. Moreover, the authors show that overexpression of Netrins 1 and 3 bolsters dissemination of ovarian cancer cells in the xenograft mouse model, while also providing evidence that high levels of the aforementioned factors are associated with poor prognosis of HGSOC patients.

      Strengths:

      Overall it was thought that this study is of potentially broad interest in as much as it provides previously unappreciated insights into the potential molecular underpinnings of cancer cell dormancy, which has been associated with therapy resistance, disease dissemination, and relapse as well as poor prognosis. Notwithstanding the potential limitations of cellular models in mimicking cancer cell dormancy, it was thought that the authors provided sufficient support for their model that netrin signaling drives survival of non-proliferating ovarian cancer cells and their dissemination. Collectively, it was thought that these findings hold a promise to significantly contribute to the understanding of the molecular mechanisms of cancer cell dormancy and in the long term may provide a molecular basis to address this emerging major issue in the clinical practice.

      Thanks for the kind words about the importance of our work in the broader challenges of cancer treatment.

      Weaknesses:

      Several issues were observed regarding methodology and data interpretation. The major concerns were related to the reliability of modelling cancer cell dormancy. To this end, it was relatively hard to appreciate how the employed spheroid model allows to distinguish between dormant and e.g., quiescent or even senescent cells. This was in contrast to solid evidence that netrin signaling stimulates abdominal dissemination of ovarian cancer cells in the mouse xenograft and their survival in organoid culture. Moreover, the role of ERK in mediating the effects of netrin signaling in the context of the survival of non-proliferative ovarian cancer cells was found to be somewhat underdeveloped.

      Experiments previously published in citation 7 show that growth arrest in patient ascites derived spheroids is fully reversible and that argued against non-proliferative spheroids being a form of senescence and moved this work into the dormancy field.  We have added extensive new support for our model systems and data to address the counterintuitive aspects of MEK-ERK signaling in survival instead of proliferation. 

      Reviewer #1 Recommendations for Authors

      (1) A better characterization of the spheroid model may be warranted, including staining for the markers of quiescence and senescence (including combining these markers with staining for the components of the netrin pathway)

      See Figure 1A and page 6, lines 126-36 where we have added blots for Ki67 and p130 to better emphasize the arrested proliferative state of cells in our screening conditions.  We have also added these same controls for patient ascites-derived spheroids in Figure 4 and described on page 9, lines 203-7.  One realization from this CRISPR screen, and others in our lab, is that it identifies functionally important aspects of cell physiology and not necessarily ones that are easily explored using commercially available antibodies.  Netrin-1 and -3 staining of patient derived spheroids in Fig. 4, as well as cell line spheroids stained in Fig. 4 Supplement 1 further support the relevance of this pathway in dormant cancer cells because Netrins are expressed in the right place at the right time.  The Netrin-1 stimulation experiments in Fig. 5C were originally carried out to probe HGSOC cells for functionality of Netrin receptors since we couldn’t reliably detected them by blotting or staining with available antibodies.  This demonstrates that this pathway is active in the various HGSOC cell lines we’ve used and specifically, using OVCAR8 cells, we show it is only active in suspension culture conditions.

      (2) In figure 1A it appears that total p38 levels are reduced in some cell lines in spheroid vs. adherent culture. The authors should comment on this.

      These blots have been updated to be more clear.  Overall p38 levels may be reduced in some cell lines and when compared with activation levels of phosphorylated p38 it suggests the fraction of activated p38 is higher. OVCAR8 cells may be an exception where the overall activity level remains approximately the same.

      (3) The authors should perhaps provide a clearer rationale for choosing to focus on the netrin signaling vs. e.g., GPCR signaling, and consider more explicit defining of "primary" vs. "tertiary" categories in Reactome gene set analysis.

      We’ve updated Fig. 1E and the text on page7, lines 161-5 to illustrate which gene categories identified in the screen belong to which tiers of Reactome categories. It better visualizes why we have investigated the Axon guidance pathway that includes Netrin because it is a highly specific signaling pathway that scores similarly to the broader and less specific categories at the very top of the list. As an aside, the GPCR signaling and GPCR downstream signaling have proven to be fairly intractable categories.  As best we can tell the GPCR downstream signaling category is full of MAPK family members and likely represents some redundancy with MAPK further down.  

      (4) In figure 3A-C, including factors whose expression did not appear to change between adherent and suspension conditions may be warranted as the internal control. Figure 3D-F may benefit from some sort of quantification.

      The mRNA expression levels are normalized to GAPDH as an internal control. We have updated this figure and re-plotted it as fold change relative to adherent culture cells with statistical comparisons to indicate which are significantly upregulated in suspension culture.

      The IHC experiments are now in Fig. 4D-F and show positive staining for Netrin-1 and -3.  Netrin-3 is easiest to see, while Netrin-1 is trickier because the difference with the no primary antibody control isn’t intensity, but the tint of the DAB stain.  We had to counter stain the patient spheroids with Hematoxylin in order for the slide scanner to find the best focal plane and make image registration between sections possible.  This unfortunately makes the Netrin-1 staining rather subtle.  For cell line spheroids in the Fig. 4, Supplement 1 we didn’t need the slide scanner and show negative controls without counter stain that are much more convincing of Netrin-1 detection and reassure us that our staining detects the intended target.  We’ve updated the labels in Fig. 4 and Fig. 4, Supplement 1 for this to be more intuitive.  Unfortunately, relying on the tint of the DAB stain leaves this as a qualitative experiment.

      - In figure 4C-E the authors show that Netrin-1 stimulation induces ERK phosphorylation whereby it is argued that this is a "low-level" stimulation of ERK signaling required for the survival of ovarian cells in the suspension. This is however hard to appreciate, and it was thought that having adherent cells in parallel would be helpful to wage whether this indeed is a "low level" ERK activity. Moreover, the authors should likely include downstream substrates of ERK (e.g., RSKs) as well as p38 in these experiments. The control experiments for the effects of PD184352 on ERK phosphorylation also appear to be warranted. Finally, performing the experiments with PD184352 in the presence of Netrin-1 stimulation would also be advantageous.

      We have added a new Netrin-1 stimulation experiment in Fig. 4D (described on page 11, line 244-52) that shows that Netrins can only activate  very low levels of ERK phosphorylation in suspension when proliferation is arrested. Netrin-1 stimulation of quiescent adherent cells where stimulation of proliferation is possible shows that Netrins are unable to activate ERK phosphorylation in this condition.  In contrast, we also stimulate quiescent adherent OVCAR8 cells with an equal concentration of EGF (a known mitogen) to offer high level ERK phosphorylation as a side by side comparison.  I think that this offers clear evidence that Netrin signaling is inconsistent with inducing cell proliferation.  We’ve also updated citations in the introduction to include citation 26 that offers a previously reported paradigm of Netrin-ERK signaling in axon outgrowth that is a non-cancer, non-proliferative context to remind readers that Netrins utilize MEK-ERK differently. 

      We highlight Netrin-MEK-ERK signaling as key to survival for a number of reasons.  First, Netrin signaling in this paradigm does not fit the dependence receptor paradigm where loss of Netrin receptors protect against cell death.  Fig. 5B rules this out as receptor loss never offers a survival advantage, but clearly receptor deletions compromise survival in suspension culture.  Second, positive Netrin signaling is known to support survival by inactivating phosphorylation of DAPK1.  We’ve added this experiment as Fig. 5 Supplement 1D and show that loss of Netrin receptors doesn’t reduce DAPK1 phosphorylation in a time course of suspension culture.  Consequently, we conclude this isn’t the survival signal either.  Since MEK and ERK family members scored in our screen, we investigated their role in survival.  We now show two different MEK inhibitors with different inhibitory mechanisms to confirm that MEK inhibition induces cell death. In addition to the previous PD184352 inhibitor in our first submission, we’ve added Trametinib as well and this is shown in Fig. 5G.  Since it is surprising the MEK inhibition can kill instead of just arrest proliferation, we’ve also added another cell death assay in which we show trypan blue dye exclusion as a second look at survival.  This is now Fig. 5H.  Lastly, we include Trametinib inhibition of ERK phosphorylation in these assays in Fig. 5I.  While we leave open what takes place downstream of ERK, our model in Fig. 5J offers a very detailed look at the components upstream.

      - Does inhibition of ERK prevent the abdominal spread of ovarian cancer cells? The authors may feel that this is out of the scope of the study, which I would agree with, but then the claims regarding ERK being the major mediator of the effects of netrin signaling should be perhaps slightly toned down.

      We agree that loss of function xenograft experiments will enhance our discovery of Netrin’s role in dormancy and metastasis.  We have added a new Fig. 6 that uses xenografts with Netrin receptor deficient OVCAR8 cells (UNC5 4KO).  It demonstrates that two weeks following IP engraftment we can isolate spheroids from abdominal washes and that cells have entered a state of reduced proliferation as determined by lowered Ki67 expression as well as other proliferation inducing genes.  In the case of UNC5 4KO cells, there is significant attrition of these cells as determined by recovering spheroids in adherent culture (Fig.6C) and by Alu PCR to detect human cells in abdominal washes (Fig. 6D).  Lastly, xenografts of UNC5 4KO cells cause much less aggressive disease and significantly extend survival of these mice (Fig. 6E,F).  Not exactly the experiment that the reviewer is asking for, but a clear indication that Netrin signaling supports survival in xenograft model of dormancy.

      - Notwithstanding that this could be deduced from figures 6D and F, it would be helpful if the number of mice used in each experimental group is clearly annotated in the corresponding figure legends. Moreover, indicating the precise statistical tests that were used in the figures would be helpful (e.g., specifying whether anova is one-way, two-way, or?)

      We have added labels to what is now Fig. 8B to indicate the number of animals used for each genotype of cells.  We have also updated figure legends to include more details of statistical tests used in each instance.

    1. Reviewer #2 (Public Review):

      Summary:

      The manuscript by Gitanjali Roy et al. applies deep transfer learning (DEGAS) to assign patient-level disease attributes (metadata) to single cells of T2D and non-diabetic patients, including obese patients. This led to the identification of a singular cluster of T2D-associated β-cells; and two subpopulations of obese- β-cells derived from either non-diabetic or T2D donors. The objective was to identify novel and established genes implicated in T2D and obesity. Their final goal is to validate their findings at the protein level using immunohistochemistry of pancreas tissue from non-diabetic and T2D organ donors.

      Strengths:

      This paper is well-written, and the findings are relevant for β-cell heterogeneity in T2D and obesity.

      Weaknesses:

      The validation they provide is not sufficiently strong: no DLK1 immunohistochemistry is shown of obese patient-derived sections. Additional presumptive relevant candidates from this transcriptomic analysis should be screened for, at the protein level.

    2. eLife assessment

      This is a useful study that used DEGAS, a deep transfer learning tool, to identify distinct pancreatic beta cell subpopulations that could be associated with type 2 diabetes (T2D) and/or obesity status. The data supporting the authors' findings is solid and demonstrates that DEGAS will be a helpful tool for analyzing cell-specific transcriptomic phenotypes. This study will be of interest to researchers studying the genetics of T2D.