10,000 Matching Annotations
  1. Sep 2024
    1. eLife assessment

      This important work, leveraging state-of-the-art whole-night sleep EEG-fMRI methods, advances our understanding of the brain states underlying sleep and wakefulness. Despite a small sample size, the authors present convincing evidence for substates within N2 and REM sleep stages, with reliable transition structure, supporting the perspective that there are more than the five canonical sleep/wake states.

    2. Reviewer #1 (Public review):

      Summary:

      The study made fundamental findings in investigations of the dynamic functional states during sleep. Twenty-one HMM states were revealed from the fMRI data, surpassing the number of EEG-defined sleep stages, which can define sub-states of N2 and REM. Importantly, these findings were reproducible over two nights, shedding new light on the dynamics of brain function during sleep.

      Strengths:

      The study provides the most compelling evidence on the sub-states of both REM and N2 sleep. Moreover, they showed these findings on dynamics states and their transitions were reproducible over two nights of sleep. These novel findings offered unique information in the field of sleep neuroimaging.

      Comments on revised version:

      Nice work! All my concerns have been addressed, and I have no further suggestions.

    3. Reviewer #2 (Public review):

      Summary:

      Yang and colleagues used a Hidden Markov Model (HMM) on whole-night fMRI to isolate sleep and wake brain states in a data-driven fashion. They identify more brain states (21) than the five sleep/wake stages described in conventional PSG-based sleep staging, show that the identified brain states are stable across nights, and characterize the brain states in terms of which networks they primarily engage.

      Strengths:

      This work's primary strengths are its dataset of two nights of whole-night concurrent EEG-fMRI (including REM sleep), and its sound methodology.

      Weaknesses:

      Weaknesses are its small sample size, and limited attempts at relating the identified fMRI brain states back to EEG.

      General appraisal:

      The paper's conclusions are generally well-supported, but additional analyses could improve the work further.<br /> The authors' main focus lies in identifying fMRI-based brain states, and they succeed at demonstrating both the presence and robustness of these states in terms of cross-night stability. Additional characterization of brain states in terms of which networks these brain states primarily engage adds additional insights.

      A missed opportunity remains the absence of more analyses relating the HMM states back to EEG. While the authors show how power in different EEG bands varies with HMM state (Supplementary Figures 10 and 11) it would be much more informative to show the complete EEG spectra for each of the 21 HMM states, organized by PSG-based sleep/wake state. This would enable answering how EEG spectra of, say, different N2-related HMM states compare. Similarly, it is presently unclear whether anything noticeable happens within the EEG timecourse at the moment of an HMM class switch (particularly when the PSG stage remains stable). Such analyses might have shown that fMRI-based brain states map onto familiar EEG substates, or reveal novel EEG changes that have so far gone unnoticed. Furthermore, if band-specific analyses are to be performed, care should be taken to specify bands in accordance with the dominant sleep EEG features (e.g., slow oscillation and sigma/spindle bands are currently missing).

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The study made fundamental findings in investigations of the dynamic functional states during sleep. Twenty-one HMM states were revealed from the fMRI data, surpassing the number of EEG-defined sleep stages, which can define sub-states of N2 and REM. Importantly, these findings were reproducible over two nights, shedding new light on the dynamics of brain function during sleep.

      Strengths:

      The study provides the most compelling evidence on the sub-states of both REM and N2 sleep. Moreover, they showed these findings on dynamics states and their transitions were reproducible over two nights of sleep. These novel findings offered unique information in the field of sleep neuroimaging.

      Weaknesses:

      The only weakness of this study has been acknowledged by the authors: limited sample size.

      We thank the reviewer for the overall enthusiasm for this study.

      Reviewer #1 (Recommendations For The Authors):

      (1) Were there differences in the extent of head motion during sleep among sleep stages? How was the potential motion parameter differences handled during the statistical analyses?

      If there were large head motions that continued for a long time (e.g., longer than 1 minute), how did the authors deal with that scanning session? For an extremely long scanning session (3 hours), how was motion correction conducted? It would be great if the authors could provide more details.

      We found that N3 sleep stage had lowest head motion, followed by REM, N2, N1, and lastly Wake. In other words, participants have lower head motion during sleep than during Wakefulness. We added this information to the Supplemental Results, copied below.

      We performed standardized motion correction during preprocessing using AFNI regardless of the duration of the scans. We did not include motion parameters in the HMM model. Time frames with Excessive head motion (any of 6 head motion parameters exceeding 0.3 mm or degree) was censored. Previous analysis of the same data indicated that motion during extended sleep scans is comparable to the motion observed in shorter resting-state scans (Moehlman et al., 2019).

      In Supplemental Results, “Motion parameters with sleep stages.

      Averaged motion across six motion parameters decreased from wake to light sleep to deep sleep at night 2. For example, mean (standard deviation) motion for each sleep stage is as follows, N1: 0.043 (0.37); N2: 0.039 (0.033); N3: 0.035 (0.031); REM: 0.035 (0.032); Wake: 0.057 (0.052).

      Similarly, the percentage of timepoints retained after censoring decreased from wake to light sleep to deep sleep at night 2. N1: 91%; N2: 93%; N3: 96%; REM: 89%; Wake 90%.”

      In the method section, “Previous analysis of the same data indicated that motion during extended sleep scans is comparable to the motion observed in shorter resting-state scans (Moehlman et al., 2019). We also found that motion is lower during deep sleep compared to wake, see Supplemental Results.”

      (2) It is possible that the data input for the HMM analyses might vary among participants and between the two nights, how did the authors deal with this issue during statistical analyses?

      This is a great question. We standardized BOLD timecourses for each participant and each night to avoid differences among participants and between two nights. We revised the description in the method section to make this point clear.

      In the method section, “To prepare the data for analysis, we first standardized the participant-specific sets of 300 ROI timecourses (scaled to a mean of 0, and a standard deviation of 1), which were then concatenated across all participants. This standardization was performed separately for each night. ”

      (3) Figures 2 and 4, the top part seems to be missing, e.g., "Night 2" in Figure 2, and "N2-related" in Figure 4.

      Thank you for pointing out these errors. We fixed them.

      (4) Figure 3 seems to be more stretched vertically than horizontally.

      We revised the figure to ensure it appears balanced on both sides.

      Reviewer #2 (Public Review):

      Summary:

      Yang and colleagues used a Hidden Markov Model (HMM) on whole-night fMRI to isolate sleep and wake brain states in a data-driven fashion. They identify more brain states (21) than the five sleep/wake stages described in conventional PSG-based sleep staging, show that the identified brain states are stable across nights, and characterize the brain states in terms of which networks they primarily engage.

      Strengths:

      This work's primary strengths are its dataset of two nights of whole-night concurrent EEG-fMRI (including REM sleep), and its sound methodology.

      Weaknesses:

      The study's weaknesses are its small sample size and the limited attempts at relating the identified fMRI brain states back to EEG.

      We thank the reviewer for the positive feedback and helpful suggestions for this study.

      General appraisal:

      The paper's conclusions are generally well-supported, but some additional analyses and discussions could improve the work.

      The authors' main focus lies in identifying fMRI-based brain states, and they succeed at demonstrating both the presence and robustness of these states in terms of cross-night stability. Additional characterization of brain states in terms of which networks these brain states primarily engage adds additional insights.

      A somewhat missed opportunity is the absence of more analyses relating the HMM states back to EEG. It would be very helpful to the sleep field to see how EEG spectra of, say, different N2-related HMM states compare. Similarly, it is presently unclear whether anything noticeable happens within the EEG time course at the moment of an HMM class switch (particularly when the PSG stage remains stable). While the authors did look at slow wave density and various physiological signals in different HMM states, a characterization of the EEG itself in terms of spectral features is missing. Such analyses might have shown that fMRI-based brain states map onto familiar EEG substates, or reveal novel EEG changes that have so far gone unnoticed.

      We thank the reviewer for this great suggestion. We performed EEG spectral analysis on each HMM state. Results were added to Suppementary Results and Supplementary Figure 10 and 11 (Copied below). Specifically, we confirmed that N3-related states had highest Delta power and that the Deep-N2 module showed different spectral profiles compared to Light-N2 module.

      In Supplemental Results: “We conducted spectral analysis for each TR and calculated the average power spectrum for each common EEG brainwave—Delta (0.5-4 Hz), Theta (4-8 Hz), Alpha (8-13 Hz), Beta (13-30 Hz), and Gamma (30-100 Hz)—across the 21 HMM states. See Supplementary Figure 10 and 11 for night 2 and night 1 data, respectively. As expected, we found that N3-related states 8 and 10 had highest Delta power in both nights. In addition, the Deep-N2 module had higher power in Theta and Alpha bands compared to the Light-N2 module.”

      It is unclear how the presently identified HMM brain states relate to the previously identified NREM and wake states by Stevner et al. (2019), who used a roughly similar approach. This is important, as similar brain states across studies would suggest reproducibility, whereas large discrepancies could indicate a large dependence on particular methods and/or the sample (also see later point regarding generalizability).

      This is a great question. There are some similarities and differences between the current study and Stevner et al. (2019). We discussed this in the Supplementary Discussion. Copied below.

      In the Supplementary Discussion: “Both studies demonstrated that HMM states can be effectively divided into meaningful modules solely based on transition probabilities. Furthermore, both studies indicated that pre-sleep wakefulness differs from post-sleep wakefulness.

      However, despite the similar approaches used, key differences in data acquisition and analysis make it challenging to directly compare HMM states between these two studies. Firstly, Stevner et al. (2019) collected only 1-hour-long sleep data from 18 participants, whereas our current study includes 8-hour-long sleep data from 12 participants for two consecutive nights. As discussed in the main text, full sleep cycling cannot be obtained from 1-hour long sleep due to the lack of REM stage and incomplete sleep cycles. Secondly, in Stevner et al. (2019) (Figure 4e), the four wake-NREM stages had roughly the same duration. In contrast, in our current study (Night 2, Figure 2A), the N2 stage comprises 43% of total sleep, which aligns with the natural N2 composition of nocturnal sleep stages. This discrepancy might explain the different number of N2-related states found in the two studies, with 3 out of 19 in Stevner et al. (2019) versus 13 out of 21 in our current study.”

      More justice could be done to previous EEG-based efforts moving beyond conventional AASM-defined sleep/wake states. Various EEG studies performed data-driven clustering of brain states, typically indicating more than 5 traditional brain states (e.g., Koch et al. 2014, Christensen et al. 2019, Decat. et al 2022). Beyond that, countless subdivisions of classical sleep stages have been proposed (e.g., phasic/tonic REM, N2 with/without spindles, N3 with global/local slow waves, cyclic alternating patterns, and many more). While these aren't incorporated into standard sleep stage classification, the current manuscript could be misinterpreted to suggest that improved/data-driven classifications cannot be achieved from EEG, which is incorrect.

      We agree with the reviewer that previous EEG-based efforts should be mentioned. We now added this in the manuscript. Copied below.

      In the Discussion section, “Third, we chose to not include EEG features in our data-driven model. However, the current method is not limited to fMRI data and can be applied to EEG data. Given that previous data-driven studies based on EEG data have suggested that there might be more than five traditional sleep stages (Christensen et al., 2019; Decat et al., 2022; Koch et al., 2014), as well as subdivisions within these traditional sleep stages (Brandenberger et al., 2005; Decat et al., 2022; Simor et al., 2020), future studies may apply data-driven models on both fMRI and EEG data. ”

      More discussion of the limitations of the current sample and generalizability would be helpful. A sample of N=12 is no doubt impressive for two nights of concurrent whole-night EEG-fMRI. Still, any data-driven approach can only capture the brain states that are present in the sample, and 12 individuals are unlikely to express all brain states present in the population of young healthy individuals. Add to that all the potentially different or altered brain states that come with healthy ageing, other demographic variables, and numerous clinical disorders. How do the authors expect their results to change with larger samples and/or varying these factors? Perhaps most importantly, I think it's important to mention that the particular number of identified brain states (here 21, and e.g. 19 in Stevner) is not set in stone and will likely vary as a function of many sample- and methods-related factors.

      We thank the reviewer for the great suggestions. We now included these points when discussing limitations in the Discussion section. We think that a HMM model with larger sample size might produce more fine-grained results, but this remains to be investigated when a more extensive dataset becomes available.

      In the Discussion section, “Secondly, while our study involved a relatively small number of participants (12), it included a large amount of fMRI data (~16 hours scan) per participant. Although the HMM trained on data from 12 participants was robust, the generalizability of the current results to different populations—such as healthy aging individuals and clinical populations—needs to be demonstrated in future studies, particularly with larger sample sizes and more diverse populations.”

      “Fourth, while we selected 21 HMM brain sleep states based on model evaluation parameters in the current study, the exact number of sleep states is not fixed and likely depends on various sample- and methods-related factors, such as sample size and model setups.”

    1. eLife assessment

      The methods and findings of the current work are important and well-grounded. The strength of the evidence presented is convincing and backed up by rigorous methodology. The work, when elaborated on how to access the app, will have far-reaching implications for current clinical practice.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors aimed to develop and validate an automated, deep learning-based system for scoring the Rey-Osterrieth Complex Figure Test (ROCF), a widely used tool in neuropsychology for assessing memory deficits. Their goal was to overcome the limitations of manual scoring, such as subjectivity and time consumption, by creating a model that provides automatic, accurate, objective, and efficient assessments of memory deterioration in individuals with various neurological and psychiatric conditions.

      Strengths:

      Comprehensive Data Collection: The authors collected over 20,000 hand-drawn ROCF images from a wide demographic and geographic range, ensuring a robust and diverse dataset. This extensive data collection is critical for training a generalizable and effective deep learning model.

      Advanced Deep Learning Approach: Utilizing a multi-head convolutional neural network to automate ROCF scoring represents a sophisticated application of current AI technologies. This approach allows for detailed analysis of individual figure elements, potentially increasing the accuracy and reliability of assessments.

      Validation and Performance Assessment: The model's performance was rigorously evaluated against crowdsourced human intelligence and professional clinician scores, demonstrating its ability to outperform both groups. The inclusion of an independent prospective validation study further strengthens the credibility of the results.

      Robustness Analysis Efficacy: The model underwent a thorough robustness analysis, testing its adaptability to variations in rotation, perspective, brightness, and contrast. Such meticulous examination ensures the model's consistent performance across different clinical imaging scenarios, significantly bolstering its utility for real-world applications.

      Appraisal and discussion:

      By leveraging a comprehensive dataset and employing advanced deep learning techniques, they demonstrated the model's ability to outperform both crowdsourced raters and professional clinicians in scoring the ROCF. This achievement represents a significant step forward in automating neuropsychological assessments, potentially revolutionizing how memory deficits are evaluated in clinical settings. Furthermore, the application of deep learning to clinical neuropsychology opens avenues for future research, including the potential automation of other neuropsychological tests and the integration of AI tools into clinical practice. The success of this project may encourage further exploration into how AI can be leveraged to improve diagnostic accuracy and efficiency in healthcare.

      However, the critique regarding the lack of detailed analysis across different patient demographics, the inadequacy of network explainability, and concerns about the selection of median crowdsourced scores as ground truth raises questions about the completeness of their objectives. These aspects suggest that while the aims were achieved to a considerable extent, there are areas of improvement that could make the results more robust and the conclusions stronger.

      Comments on revised version:

      I appreciate the opportunity to review this revised submission. Having considered the other reviews, I believe this study presents an important advance in using AI methods for clinical applications, which is both innovative and has implications beyond a single subfield.

      The authors have developed a system using fundamental AI that appears sufficient for clinical use in scoring the Rey-Osterrieth Complex Figure (ROCF) test. In human neuropsychology, tests that generate scores like this are a key part of assessing patients. The evidence supporting the validity of the AI scoring system is compelling. This represents a valuable step towards evaluating more complex neurobehavioral functions.

      However, one area where the study could be strengthened is in the explainability of the AI methods used. To ensure the scores are fully transparent and consistent for clinical use, it will be important for future work to test the robustness of the approach, potentially by comparing multiple methods. Examining other latent variables that can explain patients' cognitive functioning would also be informative.

      In summary, I believe this study provides an important proof-of-concept with compelling evidence, while also highlighting key areas for further development as this technology moves towards real-world clinical applications.

    3. Reviewer #2 (Public Review):

      The authors aimed to develop and validate a machine-learning driven neural network capable of automatic scoring of the Rey-Osterrieth Complex Figure. They aimed to further assess the robustness of the model to various parameters such as tilt and perspective shift in real drawings. The authors leveraged the use of a huge sample of lay workers in scoring figures and also a large sample of trained clinicians to score a subsample of figures. Overall, the authors found their model to have exceptional accuracy and perform similarly to crowdsourced workers and clinicians with, in some cases, less degree of error/score dispersion than clinicians.

    4. Reviewer #3 (Public Review):

      This study presented a valuable inventory of scoring a neuropsychological test, ROCFT, with constructing an artificial intelligence model.

      Comments on latest version:

      The authors made the system with fundamental AI that is sufficient for clinical use for humans. In human neuropsychology, the test that generates the score is fundamental and relatively easy. Neuropsychologists apply patients to many tests; therefore, the present system is one of them, where we cannot tell the total neurofunction of a patient. The evidence for scoring is thought to be compelling quality, enough for clinical use now and we progress to evaluate other more complicated human neuropsychological functions. For example, persons with dementia change their performance easily when they feel other emotions (worry, boredom, etc. ) and notice other stimulation (announcements in the hospital, a walking nurse by chance, etc.). The score of ROCF is definitely changing, compelling the effort of AI scoring. We should grasp this behavior of humans with diverse tests totally. Therefore, scoring AI with compelling quality is a fundamental step for the next, evaluation against the changeable and ambiguous neurobehavior of humans.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Comment #1: Insufficient Network Analysis for Explainability: The paper does not sufficiently delve into network analysis to determine whether the model's predictions are based on accurately identifying and matching the 18 items of the ROCF or if they rely on global, item-irrelevant features. This gap in analysis limits our understanding of the model's decision-making process and its clinical relevance.

      Response #1: Thank you for your comment. We acknowledge the importance of understanding the decision-making process of AI models is crucial for their acceptance and utility in clinical settings. However, we believe that our current approach, which focuses on providing individual scores for each of the 18 items of the Rey-Osterrieth Complex Figure (ROCF), inherently offers a higher level of explainability and practical utility for clinicians than a network analysis could. Our multi-head convolutional neural network is designed with a dedicated output head for each of the 18 items in the ROCF, and thus provides separate scores for each of the 18 items in the ROCF. This architecture helps that the model focuses on individual elements rather than relying on global, item-irrelevant features.

      This item-specific approach directly aligns with the traditional clinical assessment method, thereby making the results more interpretable and actionable for clinicians. The individual scores for each item provide detailed insights into a patient's performance. Clinicians can use these scores to identify specific areas of strength and weakness in a patient's visuospatial memory and drawing abilities.

      Furthermore, we evaluated the model's performance on each of the 18 items separately, providing detailed metrics that show consistent accuracy across all items. This item-level performance analysis offers clear evidence that the model is not relying on irrelevant global features but is indeed making decisions based on the specific characteristics of each item. We believe that our approach provides a level of explainability that is directly useful and relevant to clinical practitioners.

      Comment #2: Generative Model Consideration: The critique suggests exploring generative models to model the joint distribution of images and scores, which could offer deeper insights into the relationship between scores and specific visual-spatial disabilities. The absence of this consideration in the study is seen as a missed opportunity to enhance the model's explainability and clinical utility.

      Response #2: Thank you for your thoughtful comment and the suggestion to explore generative models. We appreciate the potential benefits that generative models to model the joint distribution of images and scores. However, we chose not to pursue this approach in our study for several reasons: First, our primary goal was to develop a model that provides accurate and interpretable scores for each of the 18 individual items in the ROCF figure. Second, generative models, while powerful, would add a layer of complexity that might diminish the clarity and immediate clinical applicability of our results. Generative models, (particularly deep learning-based) can be challenging to interpret in terms of how they make decisions or why they produce specific outputs. This lack can be a concern in critical applications involving neurological and psychiatric disorders. Clinicians require tools that provide clear insights without the need for additional layers of analysis. Our current model provides detailed, item-specific scores that clinicians can directly use to assess visuospatial memory and drawing abilities. Initially, we explored using generative models (i.e. GANs) for data augmentation to address the scarcity of low-score images compared to high-score images. Moreover, for the low-score images, the same score can be achieved by numerous combinations of figure elements. However, due to our extensive available dataset, we did not observe any substantial performance improvements in our model. Nevertheless, future studies could explore generative models, such as Variational Autoencoders (VAEs) or Bayesian Networks, and test them on the data from the current prospective study to compare their performance with our results.

      In the revised manuscript, we have included additional sentences discussing the potential use of generative models and their implications for future research.

      “The data augmentation did not include generative models. Initially, we explored using generative models, specifically GANs, for data augmentation to address the scarcity of low-score images compared to high-score images. However, due to the extensive available dataset, we did not observe any substantial performance improvements in our model. Nevertheless, Future studies could explore generative models, such as Variational Autoencoders (VAEs) or Bayesian Networks, which can then be tested on the data from the current prospective study and compared with our results.”

      Comment #3: Lack of Detailed Model Performance Analysis Across Subject Conditions: The study does not provide a detailed analysis of the model's performance across different ages, health conditions, etc. This omission raises questions about the model's applicability to diverse patient populations and whether separate models are needed for different subject types.

      Response #3: Thank you for your this important comment. Although the initial version of our manuscript already provided detailed “item-specific” and “across total scores” performance metrics, we recognize the importance of including detailed analyses across different patient demographics to enhance the robustness and applicability of our findings. In response to your comment, we have conducted additional analyses that provide a comprehensive evaluation of model performance across various patient demographics, such as age groups, gender, and different neurological and psychiatric conditions. This additional analysis demonstrates the generalizability and reliability of our model across diverse populations. We have included these analyses in the revised manuscript.

      “In addition, we have conducted a comprehensive model performance analysis to evaluate our model's performance across different ROCF conditions (copy and recall), demographics (age, gender), and clinical statuses (healthy individuals and patients) (Figure 4A). These results have been confirmed in the prospective validation study (Supplementary Figure S6). Furthermore, we included an additional analysis focusing on specific diagnoses to assess the model's performance in diverse patient populations (Figure 4B). Our findings demonstrate that the model maintains high accuracy and generalizes well across various demographics and clinical conditions.”

      Comment #4: Data Augmentation: While the data augmentation procedure is noted as clever, it does not fully encompass all affine transformations, potentially limiting the model's robustness.

      Response #4: We appreciate your feedback on our data augmentation strategy. We acknowledge that while our current approach significantly improves robustness against certain semantic transformations, it may not fully cover all possible affine transformations.

      Here, we provide further clarification and justification for our chosen methods and their impact on the model's performance: In our study, we implemented a data augmentation pipeline to enhance the robustness of our model against common and realisitc geometric and semantic-preserving transformations. This pipeline included rotations, perspective changes, and Gaussian blur, which we found to be particularly effective in improving the model's resilience to variations in input data. These transformations are particularly relevant for the present application since users in real-life are likely to take pictures of drawings that might be slightly rotated or with a slightly tilted perspective. With these intuitions in mind, we randomly transformed drawings during training. Each transformation was a combination of Gaussian blur, a random perspective change, and a rotation with angles chosen randomly between -10° and 10°. These transformations are representative of realistic scenarios where images might be slightly tilted or photographed from different angles. We intentionally did not explicitly address all affine transformations, such as shearing or more complex geometric transformations because these transformations could alter the score of individual items of the ROCF and would be disruptive to the model.

      As noted in our manuscript and demonstrated in supplementary Figure S1, the data augmentation pipeline significantly improved the model's robustness against rotations and changes in perspective. Importantly, our tablet-based scoring application can further ensure that the photos taken do not exhibit excessive semantic transformations. By leveraging the gyroscope built into the tablet, the application can help users align the images properly, minimizing issues such as excessive rotation or skew. This built-in functionality helps maintain the quality and consistency of the images, reducing the likelihood of significant semantic transformations that could affect model performance.

      Comment #5: Additionally, the rationale for using median crowdsourced scores as ground truth, despite evidence of potential bias compared to clinician scores, is not adequately justified.

      Response #5: Thank you for this valuable comment. Clarifying the rationale behind using the median score of crowdsourcing as the ground truth is indeed important. To reach high accuracy in predicting individual sample scores of the ROCFs, it is imperative that the scores of the training set are based on a systematic scheme with as little human bias as possible influencing the score. However, our analysis (see results section) and previous work (Canham et al., 2000) suggested that the scoring conducted by clinicians may not be consistent, because the clinicians may be unwittingly influenced by the interaction with the patient/participant or by the clinicians factor (e.g. motivation and fatigue). For this reason and the incomplete availability of clinician scores for all figures (i.e. for 19% of the 20’225 figures), we did not use the clinicians scores as ground truth scores. Instead, we have trained a large pool (5000 workers) of human internet workers (crowdsourcing) to score ROCFs drawings guided by our self-developed interactive web application. Each element of the figure was scored by several human workers (13 workers on average per figure). We have obtained the ground truth for each drawing by computing the median for each item in the figure, and then summed up the medians to get the total score for the drawing in question. To further ensure high-quality data annotation, we identified and excluded crowdsourcing participants that have a high level of disagreement (>20% disagreement) with this rating from trained clinicians, who carefully scored manually a subset of the data in the same interactive web application.

      We chose the median score for several reasons: (1) the median score is less influenced by outliers compared to the mean. Given the variability of scoring between different clinicians and human workers (see human MSE and clinician MSE), using the median ensures that the ground truth is not skewed by extreme values, leading to more stable and reliable scores. (2) Crowdsource data do not always follow a normal distribution. In cases where the distribution is skewed or not symmetric, the median can be a more representative measure of the center. (3) The original scoring system involves ordinal scales (0,0.5,1,2). For ordinal scales, the median is often more appropriate than the mean. Finally, by aggregating multiple scores from a large pool of crowdsourced raters, the median provides a consensus that reflects the most common assessment. This approach mitigates the variability introduced by individual rater biases and ensures a more consistent ground truth. In clinical settings, the consensus of multiple expert opinions often serves as the benchmark for assessments. The use of median scores mirrors this practice, providing a ground truth that is representative of collective human judgment.

      Canham, R. O., S. L. Smith, and A. M. Tyrrell. 2000. “Automated Scoring of a Neuropsychological Test:

      The Rey Osterrieth Complex Figure.” Proceedings of the 26th Euromicro Conference. EUROMICRO 2000. Informatics: Inventing the Future. https://doi.org/10.1109/eurmic.2000.874519.

      Reviewer #2:

      Comment #1: There is no detail on how the final scoring app can be accessed and whether it is medical device-regulated.

      Response #1: We appreciate the opportunity to provide more information about the current status and plans for our scoring application. At this stage, the final scoring app is not publicly accessible as it is currently undergoing rigorous beta testing with a select group of clinicians in real-world settings. The feedback from these clinicians is instrumental in refining the app’s features, interface, and overall functionality to improve its usability and user experience. This ensures that the app meets the high standards required for clinical tools. Following the successful completion of the beta testing phase, we aim to seek FDA approval for the scoring app. Achieving this regulatory milestone will guarantee that the app meets the stringent requirements for medical devices, providing an additional layer of confidence in its safety and efficacy for clinical use. Once FDA approval is obtained, we plan to make the app publicly accessible to clinicians and healthcare institutions worldwide. Detailed instructions on how to access and use the app will be provided at that time on our website (https://www.psychology.uzh.ch/en/areas/nec/plafor/research/rfp.html).

      Comment #2: No discussion on the difference in sample sizes between the pre-registration of the prospective study and the results (e.g., aimed for 500 neurological patients but reported data from 288). Demographics for the assessment of the representation of healthy and non-healthy participants were not present.

      Response #2: Thank you for your comment. We believe there might have been a misunderstanding regarding our preregistration details. In the preregistration, we planned to prospectively acquire ROCF drawings from 1000 healthy subjects. Each subject should have drawn two ROCF drawings (copy and memory condition). Consequently, 2000 samples should have been collected. In addition, in our pre-registration plan, we aimed to collect 500 drawings from patients (i.e. 250 patients), not 500 patients as the reviewer suggested (https://osf.io/82796). Thus in total, the goal was to obtain 2500 ROCF figures. The final prospective data set, which contained 2498 ROCF images from 961 healthy adults and 288 patients very closely matches the sample size, we aimed for in the the pre-registration. We do not see a necessity to discuss this negligible discrepancy in the main manuscript. The prospective data set remains substantial and sufficient to test our model on the independent prospective data set. Importantly, we want to highlight that the test set in the retrospective data set (4045 figures) was also never seen by the model. Both the retrospective and prospective data sets demonstrate substantial global diversity as the data has been collected in 90 different countries. Please note, that Supplementary Figures S2 & S3 provide detailed demographics of the participants in the prospectively collected data, present their performance in the copy and (immediate) recall condition across the lifespan, and the worldwide distribution of the origin of the data.

      Comment #3: Supplementary Figure S1 & S4 is poor quality, please increase resolution.

      Response #3: We apologize for the poor quality of Supplementary Figures S1 and S4 in the initial submission. In the revised version of our submission, we have increased the resolution of both Supplementary Figure S1 and Supplementary Figure S4 to ensure that all details are clearly visible and the figures are of high quality.

      Comment #4: Regarding medical device regulation; if the app is to be used in clinical practice (as it generates a score and classification of performance), I believe such regulation is necessary - but there are ways around it. This should be detailed.

      Response #4: We agree that regulation is essential for any application intended for use in clinical practice, particularly one that generates scores and classifications of performance. As discussed in response #1, the final scoring application is currently undergoing intensive beta testing in real-world settings with a limited group of clinicians and is therefore not publicly accessible at this time. We are fully committed to obtaining the necessary regulatory approvals before the app is made publicly accessible for clinical use. Once the beta testing phase is complete and the app has been refined based on clinician feedback, we will prepare and submit a comprehensive regulatory dossier. This submission will include all necessary data on the app's development, testing, validation, and clinical utility. We are adhering to relevant regulatory standards and guidelines, such as ISO 13485 for medical devices and the FDA's guidance on software as a medical device (SaMD).

      Comment #7: Need to clarify that work was already done and pre-printed in 2022 for the main part of this study, and that this paper contributes to an additional prospective study.

      Response #7: We would like to clarify that the pre-print the reviewer is referring to is indeed the current paper submitted to ELife. The submitted paper includes both the work that was pre-printed in 2022 and the additional prospective study, as you correctly identified.

      Reviewer #3:

      Comment #1: The considerable effort and cost to make the model only for an existing neuropsychological test.

      Response #1: We acknowledge that significant effort and resources were dedicated to developing our model for the Rey-Osterrieth Complex Figure (ROCF) test. Below, we provide a detailed rationale for this investment and the broader implications of our work. The ROCF test is one of the most widely used neuropsychological assessments worldwide, providing critical insights into visuospatial memory and executive function. While the initial effort and cost are substantial, the long-term benefits of an automated, reliable, objective, fast and widely applicable neuropsychological assessment tool justify the investment. The scoring application will significantly reduce the time for scoring the test and thus provide more efficient use of clinical resources, and the potential for broader applications makes this a worthwhile endeavor. The methods and infrastructure developed for this model can be adapted and scaled to other neuropsychological tests and assessments (e.g. Taylor Figure).

      Comment #2: I was truly impressed by the authors' establishment of a system that organizes the methods and fields of diverse specialties in such a remarkable way. I know the primary purpose of ROCFT. However, beyond the score, neuropsychologically, these are observed by specialists while ROCFT and that is attractive of the test: the turn of each stroke (e.g., from right to left, from the main structure to the margin or small structure), the process to total completeness as a figure, e.g., confidential speed and concentration, the boldness of strokes, unnatural fragmentation of strokes, the not deviated place in a paper, turning of the figure itself (before the scanning level), the total size, the level compared with the age, education, and experiences of the patient. Those are reflected by the disease, visuospatial intelligence, executive function, and ability to concentrate. Scores are crucial, but by observing the drawing process, we can obtain diverse facts or parts of symptoms that imply the complications of human behavior.

      Response #2: Thank you for your insightful comments and observations regarding our system for organizing diverse specialties within the ROCFT methodology. We agree that beyond the numerical scores, the detailed observation of the drawing process provides invaluable neuropsychological insights. How strokes are executed, from their direction and placement to the overall completion process, offers a nuanced understanding of factors like spatial orientation, concentration, and executive function. In fact, we are working on a ROCF pen tracking application, which enables the patient to draw the ROCF with a digital pen on a tablet. The tablet can 1) assess the sequence order of drawing the items and the number of strokes, 2) record the exact coordinate of each drawn pixel at each time point of the assessment, 3) measure the duration for each pen stroke as well as total drawing time, and 4) assess the pen stroke pressure. Through this, we aim to extract additional information on processing speed, concentration, and other cognitive domains. However, this development is outside the scope of the current manuscript.

    1. eLife assessment

      This is an important paper that reports in vivo physiological abnormalities in the hippocampus of a rat model of traumatic brain injury (TBI). In this study, authors focused on changes in theta-gamma phase coupling and action potential entrainment to theta, phenomena hypothesized to be critical for cognition. While the authors provide solid evidence of deficits in both features post-TBI, the study would have been stronger with a more hypothesis-driven approach and consideration of alterations of the animal's behavioral state or sensorimotor deficits beyond memory processes.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigated how traumatic brain injury affects oscillatory and single-unit hippocampal activity in awake-behaving rats.

      Strengths:

      The use of high-density laminar electrodes enabled precise localization of recording sites. To ensure an unbiased, rigorous approach, single-unit analysis was performed by a reviewer who was blind to experimental conditions. A proof of concept study was undertaken to characterize the pathology that resulted from the specific TBI model used in the main study. There was an effort to link abnormalities in hippocampal activity to memory disruption by running a cohort of rats on the Morris Water Maze task.

      Weaknesses:

      The paper is written as if the experiment was exploratory and not hypothesis-driven despite the fact that there is a wealth of experimental evidence about this TBI model that could have informed very specific predictions to test a hypothesis that is only hinted at in the discussion. The number of rats used for the spatial working memory experiment is not reported. Some of the statistics are not completely reported. It is also unclear what the rationale was for recording single units in a novel and familiar environment. Furthermore, this analysis comparing single-unit activity between familiar and novel environments is quite rudimentary. There are much more rigorous analyses to answer the question of how hippocampal single-unit firing patterns differ across changes in environments. There are details lacking about the number of units recorded per session and per rat, all of which are usually reported in studies that record single units. Spatial working memory assessment is delegated to a single panel of a supplementary figure. More importantly, there is no effort to dissociate between spatial working memory deficits and other motor, motivational, or sensory deficits that could have been driving the lower "memory score" in the experimental group.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigate changes in theta-gamma phase amplitude coupling, and action potential entrainment to theta following traumatic brain injury (TBI). Both phenomena are widely hypothesized to be important for cognition, and the authors report deficits in both after TBI. The manuscript is well-written, the figures are well-constructed, and the author's use of high-level analysis methods for TBI EEG data collected from awake, behaving animals is welcome.

      Major Comments:

      - The animal n's are small (4 sham and 5 injured). In Figure 3, for instance, one wonders if panels D and E might have shown significant differences if more animals had been recorded.

      - The text focuses on deficits in the theta and gamma bands, but the reduction in power appears to be broadband (see Figure 1F, especially Pyramidal cell layer panel). Therefore, the overall decrease in broadband (in the injured population) must be normalized between sham and injured animals before a selective comparison between sham and injured animals can be conducted. That is the only way that selective narrow bands i.e., theta and low gamma can be compared between the two cohorts. A brief discussion of the significance of a broadband decrease would be appreciated.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors studied the effects of traumatic brain injury created by LFPI procedure on the CA1 at the network level. The major findings in this study seem to be that the TBI reduces theta and gamma powers in CA1, reduces phase-amplitude coupling in between theta and gamma bands as well as disrupts the gamma entrainment of interneurons. I think the authors have made some important discoveries that could help advance the understanding of TBI effects at the physiological level, however, more investigations into deciphering the relationship of the behavioral and brain states to the observed effects would help clarify the interpretations for the readers.

      Strengths:

      The authors in this study were able to combine behavioral verification of the TBI model with the laminar electrophysiological recordings of the CA1 region to bring forward network-level anomalies such as the temporal coordination of network-level oscillations as well as in the firing of the interneurons. Indeed, it seems that the findings may serve future studies to functionally better understand and/or refine the therapies for the TBI.

      Weaknesses:

      Discoveries made in the paper and their broad interpretations can be helped with further characterization and comparison among the brain and behavioral states both during immobility and movement. The impact of brain injury in several parts of the brain can alter brain-wide LFP and/or behavior. The altered behavior and/or LFP patterns might then lead to reduced spiking and unreliable LFP oscillations in the hippocampus. Hence, claims made in the abstract such as "These results reveal deficits in information encoding and retrieval schemes essential to cognition that likely underlie TBI-associated learning and memory impairments, and elucidate potential targets for future neuromodulation therapies" do not have enough evidence to test whether the disruptions were information encoding and retrieval related or due to sensory-motor and/or behavioral deficits that could also occur during TBI.

      Movement velocity is already known to be correlated to the entrainment of spikes with the theta rhythm and also in some cases with the gamma oscillations. So, it is important to disentangle the differences in behavioral variables and the observed effects. As an example, the author's claims of disrupted temporal coding (as shown in the graphical abstract) might have suffered from these confounds. The observed results of reduced entrainment might, on one hand, be due to the decreased LFP power (induced by injury in different brain areas) resulting in altered behavior and/or the unreliable oscillations of the LFP bands such as theta and gamma, rather than memory encoding and retrieval related disruption of spikes synchrony to the rhythms, while on the other hand, they may simply be due to reduced excitability in the neurons particularly in the behavioral and brain state in which the effects were observed, rather than disrupted temporal code. Hence, further investigations into dissociating these factors could help readers mechanistically understand the interesting results observed by the authors.

    5. Author response:

      We would like to thank the editors and reviewers for their constructive feedback, and we look forward to addressing their comments in the revised manuscript. We also appreciate the acknowledgment that the use of laminar electrodes in awake-behaving animals is an important advancement for the TBI community, and that our results provide a potential physiological link between coalescing TBI pathologies and cognitive deficits. We believe that integrating the reviewer comments will help to make our analyses even more rigorous and will improve the overall manuscript. Please find comments related to specific concerns raised in the public review below:

      The paper is written as if the experiment was exploratory and not hypothesis-driven despite the fact that there is a wealth of experimental evidence about this TBI model that could have informed very specific predictions to test a hypothesis that is only hinted at in the discussion… It is also unclear what the rationale was for recording single units in a novel and familiar environment. Furthermore, this analysis comparing single-unit activity between familiar and novel environments is quite rudimentary. There are much more rigorous analyses to answer the question of how hippocampal single-unit firing patterns differ across changes in environments.

      Previous mechanistic and physiological studies suggested interneuronal dysfunction following TBI that we hypothesized would disrupt oscillatory dynamics underlying temporal coding (single unit entrainment to theta/gamma, phase precession, and phase-amplitude coupling). These are known to support hippocampal-dependent learning and memory tasks such as the Morris Water Maze. While we did not record during a goal-directed behavioral task, the goal of recording in a familiar and novel environment was to assess remapping across these environments. Unfortunately, occupancy in the two environments was not high enough to rigorously characterize place cell specificity and phase precession or and investigate remapping, although putative place cells were identified. Despite this shortcoming, we were still able to confirm that the spike timing of interneurons relative to hippocampal oscillations was disrupted which we believe underlies the massive reduction in theta-gamma phase amplitude coupling reported. This opens the door to more strongly hypothesis-driven, mechanistic studies (i.e. closed loop stimulation) to alter the spike timing of interneurons relative to theta phase and potentially rescue these effects on phase amplitude coupling and behavior.

      The number of rats used for the spatial working memory experiment is not reported. Some of the statistics are not completely reported… There are details lacking about the number of units recorded per session and per rat, all of which are usually reported in studies that record single units.

      The number of rats used for the spatial working memory task was reported in the text and Figure legend where the statistics were reported, but we will ensure that the statistics are more completely reported by including relevant statistical results and parameters outside of the test used and p-value. Additionally, we will report the number of units recorded per animal.

      Spatial working memory assessment is delegated to a single panel of a supplementary figure. More importantly, there is no effort to dissociate between spatial working memory deficits and other motor, motivational, or sensory deficits that could have been driving the lower "memory score" in the experimental group

      The spatial working memory deficit that we report in the Morris Water Maze is not a novel finding and has been demonstrated numerous times in this TBI model. Our goal in including this was to increase the rigor of the study by verifying this deficit in our hands at the injury level used for these physiology experiments. The dissociation between spatial working memory deficits and other motor, motivational, or sensory deficits from TBI in the Morris Water Maze (e.g. swim speed and escape latency with visible platforms) has been well characterized in this TBI model at many injury levels including more severe injuries than those used in this study. We will address this in the Discussion as it is an important point.

      The text focuses on deficits in the theta and gamma bands, but the reduction in power appears to be broadband (see Figure 1F, especially Pyramidal cell layer panel). Therefore, the overall decrease in broadband (in the injured population) must be normalized between sham and injured animals before a selective comparison between sham and injured animals can be conducted. That is the only way that selective narrow bands i.e., theta and low gamma can be compared between the two cohorts. A brief discussion of the significance of a broadband decrease would be appreciated.

      We agree that there is a broadband downward shift in power following TBI especially in the pyramidal cell layer. We will include a normalization of the power spectra in order to specifically compare the theta and gamma bands between sham and injured rats and include discussion about the broadband decrease.

      Discoveries made in the paper and their broad interpretations can be helped with further characterization and comparison among the brain and behavioral states both during immobility and movement. The impact of brain injury in several parts of the brain can alter brain-wide LFP and/or behavior. The altered behavior and/or LFP patterns might then lead to reduced spiking and unreliable LFP oscillations in the hippocampus. Hence, claims made in the abstract such as "These results reveal deficits in information encoding and retrieval schemes essential to cognition that likely underlie TBI-associated learning and memory impairments, and elucidate potential targets for future neuromodulation therapies" do not have enough evidence to test whether the disruptions were information encoding and retrieval related or due to sensory-motor and/or behavioral deficits that could also occur during TBI.

      Movement velocity is already known to be correlated to the entrainment of spikes with the theta rhythm and also in some cases with the gamma oscillations. So, it is important to disentangle the differences in behavioral variables and the observed effects. As an example, the author's claims of disrupted temporal coding (as shown in the graphical abstract) might have suffered from these confounds. The observed results of reduced entrainment might, on one hand, be due to the decreased LFP power (induced by injury in different brain areas) resulting in altered behavior and/or the unreliable oscillations of the LFP bands such as theta and gamma, rather than memory encoding and retrieval related disruption of spikes synchrony to the rhythms, while on the other hand, they may simply be due to reduced excitability in the neurons particularly in the behavioral and brain state in which the effects were observed, rather than disrupted temporal code. Hence, further investigations into dissociating these factors could help readers mechanistically understand the interesting results observed by the authors.

      We agree that changes in hippocampal physiology that we report could arise due to disrupted inputs from TBI, and this study is inherently limited due to recording exclusively from CA1. We chose to record from the hippocampus due to its importance for learning and memory, and its vulnerability in TBI. Future studies will investigate how hippocampal afferents are affected by injury, and we hope that the layer-specific changes we report will help to inform which inputs may be preferentially disrupted. Importantly, these inputs along with local processing within the hippocampus change drastically depending on the behavior of the animal. We will more rigorously assess movement and the behavioral state of the rats when comparing physiological properties, especially the firing rates reported in Figure 3.

    1. eLife assessment

      In this valuable study, the authors use deep learning models to provide solid evidence that epithelial wounding triggers bursts of cell division at a characteristic distance away from the wound. The documentation provided by the authors should allow other scientists to readily apply these methods, which are particularly appropriate where unsupervised machine-learning algorithms have difficulties.

    2. Author Response

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      The authors present a number of deep learning models to analyse the dynamics of epithelia. In this way they want to overcome the time-consuming manual analysis of such data and also remove a potential operator bias. Specifically, they set up models for identifying cell division events and cell division orientation. They apply these tools to the epithelium of the developing Drosophila pupal wing. They confirm a linear decrease of the division density with time and identify a burst of cell division after healing of a wound that they had induced earlier. These division events happen a characteristic time after and a characteristic distance away from the wound. These characteristic quantities depend on the size of the wound.

      Strengths:

      The methods developed in this work achieve the goals set by the authors and are a very helpful addition to the toolbox of developmental biologists. They could potentially be used on various developing epithelia. The evidence for the impact of wounds on cell division is compelling.

      The methods presented in this work should prove to be very helpful for quantifying cell proliferation in epithelial tissues.

      We thank the reviewer for the positive comments!

      Reviewer #2 (Public Review):

      In this manuscript, the authors propose a computational method based on deep convolutional neural networks (CNNs) to automatically detect cell divisions in two-dimensional fluorescence microscopy timelapse images. Three deep learning models are proposed to detect the timing of division, predict the division axis, and enhance cell boundary images to segment cells before and after division. Using this computational pipeline, the authors analyze the dynamics of cell divisions in the epithelium of the Drosophila pupal wing and find that a wound first induces a reduction in the frequency of division followed by a synchronised burst of cell divisions about 100 minutes after its induction.

      Comments on revised version:

      Regarding the Reviewer's 1 comment on the architecture details, I have now understood that the precise architecture (number/type of layers, activation functions, pooling operations, skip connections, upsampling choice...) might have remained relatively hidden to the authors themselves, as the U-net is built automatically by the fast.ai library from a given classical choice of encoder architecture (ResNet34 and ResNet101 here) to generate the decoder part and skip connections.

      Regarding the Major point 1, I raised the question of the generalisation potential of the method. I do not think, for instance, that the optimal number of frames to use, nor the optimal choice of their time-shift with respect to the division time (t-n, t+m) (not systematically studied here) may be generic hyperparameters that can be directly transferred to another setting. This implies that the method proposed will necessarily require re-labeling, re-training and re-optimizing the hyperparameters which directly influence the network architecture for each new dataset imaged differently. This limits the generalisation of the method to other datasets, and this may be seen as in contrast to other tools developed in the field for other tasks such as cellpose for segmentation, which has proven a true potential for generalisation on various data modalities. I was hoping that the authors would try themselves testing the robustness of their method by re-imaging the same tissue with slightly different acquisition rate for instance, to give more weight to their work.

      We thank the referee for the comments. Regarding this particular biological system, due to photobleaching over long imaging periods (and the availability of imaging systems during the project), we would have difficulty imaging at much higher rates than the 2 minute time frame we currently use. These limitations are true for many such systems, and it is rarely possible to rapidly image for long periods of time in real experiments. Given this upper limit in framerate, we could, in principle, sample this data at a lower framerate, by removing time points of the videos but this typically leads to worse results. With some pilot data, we have tried to use fewer time intervals for our analysis but they always gave worse results. We found we need to feed the maximum amount of information available into the model to get the best results (i.e. the fastest frame rate possible, given the data available). Our goal is to teach the neural net to identify dynamic space-time localised events from time lapse videos, in which the duration of an event is a key parameter. Our division events take 10 minutes or less to complete therefore we used 5 timepoints in the videos for the deep learning model. If we considered another system with dynamic events which have a duration T when we would use T/t timepoints where t is the minimum time interval (for our data t=2min). For example if we could image every minute we would use 10 timepoints. As discussed below, we do envision other users with different imaging setups and requirements may need to retrain the model for their own data and to help with this, we have now provided more detailed instructions how to do this (see later).

      In this regard, and because the authors claimed to provide clear instructions on how to reuse their method or adapt it to a different context, I delved deeper into the code and, to my surprise, felt that we are far from the coding practice of what a well-documented and accessible tool should be.

      To start with, one has to be relatively accustomed with Napari to understand how the plugin must be installed, as the only thing given is a pip install command (that could be typed in any terminal without installing the plugin for Napari, but has to be typed inside the Napari terminal, which is mentioned nowhere). Surprisingly, the plugin was not uploaded on Napari hub, nor on PyPI by the authors, so it is not searchable/findable directly, one has to go to the Github repository and install it manually. In that regard, no description was provided in the copy-pasted templated files associated to the napari hub, so exporting it to the hub would actually leave it undocumented.

      We thank the referee for suggesting the example of (DeXtrusion, Villars et al. 2023). We have endeavoured to produce similarly-detailed documentation for our tools. We now have clear instructions for installation requiring only minimal coding knowledge, and we have provided a user manual for the napari plug-in. This includes information on each of the options for using the model and the outputs they will produce. The plugin has been tested by several colleagues using both Windows and Mac operating systems.

      Author response image 1.

      Regarding now the python notebooks, one can fairly say that the "clear instructions" that were supposed to enlighten the code are really minimal. Only one notebook "trainingUNetCellDivision10.ipynb" has actually some comments, the other have (almost) none nor title to help the unskilled programmer delving into the script to guess what it should do. I doubt that a biologist who does not have a strong computational background will manage adapting the method to its own dataset (which seems to me unavoidable for the reasons mentioned above).

      Within the README file, we have now included information on how to retrain the models with helpful links to deep learning tutorials (which, indeed, some of us have learnt from) for those new to deep learning. All Jupyter notebooks now include more comments explaining the models.

      Finally regarding the data, none is shared publicly along with this manuscript/code, such that if one doesn't have a similar type of dataset - that must be first annotated in a similar manner - one cannot even test the networks/plugin for its own information. A common and necessary practice in the field - and possibly a longer lasting contribution of this work - could have been to provide the complete and annotated dataset that was used to train and test the artificial neural network. The basic reason is that a more performant, or more generalisable deep-learning model may be developed very soon after this one and for its performance to be fairly compared, it requires to be compared on the same dataset. Benchmarking and comparison of methods performance is at the core of computer vision and deep-learning.

      We thank the referee for these comments. We have now uploaded all the data used to train the models and to test them, as well as all the data used in the analyses for the paper. This includes many videos that were not used for training but were analysed to generate the paper’s results. The link to these data sets is provided in our GitHub page (https://github.com/turleyjm/cell-division-dl- plugin/tree/main). In the folder for the data sets and in the GitHub repository, we have included the Jupyter notebooks used to train the models and these can be used for retraining. We have made our data publicly available at Zenodo dataset https://zenodo.org/records/10846684 (added to last paragraph of discussion). We have also included scripts that can be used to compare the model output with ground truth, including outputs highlighting false positives and false negatives. Together with these scripts, models can be compared and contrasted, both in general and in individual videos. Overall, we very much appreciate the reviewer’s advice, which has made the plugin much more user- friendly and, hopefully, easier for other groups to train their own models. Our contact details are provided, and we would be happy to advise any groups that would like to use our tools.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors present a number of deep-learning models to analyse the dynamics of epithelia. In this way, they want to overcome the time-consuming manual analysis of such data and also remove a potential operator bias. Specifically, they set up models for identifying cell division events and cell division orientation. They apply these tools to the epithelium of the developing Drosophila pupal wing. They confirm a linear decrease of the division density with time and identify a burst of cell division after the healing of a wound that they had induced earlier. These division events happen a characteristic time after and a characteristic distance away from the wound. These characteristic quantities depend on the size of the wound.

      Strength:

      The methods developed in this work achieve the goals set by the authors and are a very helpful addition to the toolbox of developmental biologists. They could potentially be used on various developing epithelia. The evidence for the impact of wounds on cell division is solid.

      Weakness:

      Some aspects of the deep-learning models remained unclear, and the authors might want to think about adding details. First of all, for readers not being familiar with deep-learning models, I would like to see more information about ResNet and U-Net, which are at the base of the new deep-learning models developed here. What is the structure of these networks?

      We agree with the Reviewer and have included additional information on page 8 of the manuscript, outlining some background information about the architecture of ResNet and U-Net models.

      How many parameters do you use?

      We apologise for this omission and have now included the number of parameters and layers in each model in the methods section on page 25.

      What is the difference between validating and testing the model? Do the corresponding data sets differ fundamentally?

      The difference between ‘validating’ and ‘testing’ the model is validating data is used during training to determine whether the model is overfitting. If the model is performing well on the training data but not on the validating data, this a key signal the model is overfitting and changes will need to be made to the network/training method to prevent this. The testing data is used after all the training has been completed and is used to test the performance of the model on fresh data it has not been trained on. We have removed refence to the validating data in the main text to make it simpler and add this explanation to the methods. There is no fundamental (or experimental) difference between each of the labelled data sets; rather, they are collected from different biological samples. We have now included this information in the Methods text on page 24.

      How did you assess the quality of the training data classification?

      These data were generated and hand-labelled by an expert with many years of experience in identifying cell divisions in imaging data, to give the ground truth for the deep learning model.

      Reviewer #1 (Recommendations For The Authors):

      You repeatedly use 'new', 'novel' as well as 'surprising' and 'unexpected'. The latter are rather subjective and it is not clear based on what prior knowledge you make these statements. Unless indicated otherwise, it is understood that the results and methods are new, so you can delete these terms.

      We have deleted these words, as suggested, for almost all cases.

      p.4 "as expected" add a reference or explain why it is expected.

      A reference has now been included in this section, as suggested.

      p.4 "cell divisions decrease linearly with time" Only later (p.10) it turns out that you think about the density of cell divisions.

      This has been changed to "cell division density decreases linearly with time".

      p.5 "imagine is largely in one plane" while below "we generated a 3D z-stack" and above "our in vivo 3D image data" (p.4). Although these statements are not strictly contradictory, I still find them confusing. Eventually, you analyse a 2D image, so I would suggest that you refer to your in vivo data as being 2D.

      We apologise for the confusion here; the imaging data was initially generated using 3D z-stacks but this 3D data is later converted to a 2D focused image, on which the deep learning analysis is performed. We are now more careful with the language in the text.

      p.7 "We have overcome (...) the standard U-Net model" This paragraph remains rather cryptic to me. Maybe you can explain in two sentences what a U-Net is or state its main characteristics. Is it important to state which class you have used at this point? Similarly, what is the exact role of the ResNet model? What are its characteristics?

      We have included more details on both the ResNet and U-Net models and how our model incorporates properties from them on Page 8.

      p.8 Table 1 Where do I find it? Similarly, I could not find Table 2.

      These were originally located in the supplemental information document, but have been moved to the main manuscript.

      p.9 "developing tissue in normal homeostatic conditions" Aren't homeostatic and developing contradictory? In one case you maintain a state, in the other, it changes.

      We agree with the Reviewer and have removed the word ‘homeostatic’.

      p.9 "Develop additional models" I think 'models' refers to deep learning models, not to physical models of epithelial tissue development. Maybe you can clarify this?

      Yes, this is correct; we have phrased this better in the text.

      p.12 "median error" median difference to the manually acquired data?

      Yes, and we have made this clearer in the text, too.

      p.12 "we expected to observe a bias of division orientation along this axis" Can you justify the expectation? Elongated cells are not necessarily aligned with the direction of a uniaxially applied stress.

      Although this is not always the case, we have now included additional references to previous work from other groups which demonstrated that wing epithelial cells do become elongated along the P/D axis in response to tension.

      p.14 "a rather random orientation" Please, quantify.

      The division orientations are quantified in Fig. 4F,G; we have now changed our description from ‘random’ to ‘unbiased’.

      p.17 "The theories that must be developed will be statistical mechanical (stochastic) in nature" I do not understand. Statistical mechanics refers to systems at thermodynamic equilibrium, stochastic to processes that depend on, well, stochastic input.

      We have clarified that we are referring to non-equilibrium statistical mechanics (the study of macroscopic systems far from equilibrium, a rich field of research with many open problems and applications in biology).

      Reviewer #2 (Public Review):

      In this manuscript, the authors propose a computational method based on deep convolutional neural networks (CNNs) to automatically detect cell divisions in two-dimensional fluorescence microscopy timelapse images. Three deep learning models are proposed to detect the timing of division, predict the division axis, and enhance cell boundary images to segment cells before and after division. Using this computational pipeline, the authors analyze the dynamics of cell divisions in the epithelium of the Drosophila pupal wing and find that a wound first induces a reduction in the frequency of division followed by a synchronised burst of cell divisions about 100 minutes after its induction.

      In general, novelty over previous work does not seem particularly important. From a methodological point of view, the models are based on generic architectures of convolutional neural networks, with minimal changes, and on ideas already explored in general. The authors seem to have missed much (most?) of the literature on the specific topic of detecting mitotic events in 2D timelapse images, which has been published in more specialized journals or Proceedings. (TPMAI, CCVPR etc., see references below). Even though the image modality or biological structure may be different (non-fluorescent images sometimes), I don't believe it makes a big difference. How the authors' approach compares to this previously published work is not discussed, which prevents me from objectively assessing the true contribution of this article from a methodological perspective.

      On the contrary, some competing works have proposed methods based on newer - and generally more efficient - architectures specifically designed to model temporal sequences (Phan 2018, Kitrungrotsakul 2019, 2021, Mao 2019, Shi 2020). These natural candidates (recurrent networks, long-short-term memory (LSTM) gated recurrent units (GRU), or even more recently transformers), coupled to CNNs are not even mentioned in the manuscript, although they have proved their generic superiority for inference tasks involving time series (Major point 2). Even though the original idea/trick of exploiting the different channels of RGB images to address the temporal aspect might seem smart in the first place - as it reduces the task of changing/testing a new architecture to a minimum - I guess that CNNs trained this way may not generalize very well to videos where the temporal resolution is changed slightly (Major point 1). This could be quite problematic as each new dataset acquired with a different temporal resolution or temperature may require manual relabeling and retraining of the network. In this perspective, recent alternatives (Phan 2018, Gilad 2019) have proposed unsupervised approaches, which could largely reduce the need for manual labeling of datasets.

      We thank the reviewer for their constructive comments. Our goal is to develop a cell detection method that has a very high accuracy, which is critical for practical and effective application to biological problems. The algorithms need to be robust enough to cope with the difficult experimental systems we are interested in studying, which involve densely packed epithelial cells within in vivo tissues that are continuously developing, as well as repairing. In response to the above comments of the reviewer, we apologise for not including these important papers from the division detection and deep learning literature, which are now discussed in the Introduction (on page 4).

      A key novelty of our approach is the use of multiple fluorescent channels to increase information for the model. As the referee points out, our method benefits from using and adapting existing highly effective architectures. Hence, we have been able to incorporate deeper models than some others have previously used. An additional novelty is using this same model architecture (retrained) to detect cell division orientation. For future practical use by us and other biologists, the models can easily be adapted and retrained to suit experimental conditions, including different multiple fluorescent channels or number of time points. Unsupervised approaches are very appealing due to the potential time saved compared to manual hand labelling of data. However, the accuracy of unsupervised models are currently much lower than that of supervised (as shown in Phan 2018) and most importantly well below the levels needed for practical use analysing inherently variable (and challenging) in vivo experimental data.

      Regarding the other convolutional neural networks described in the manuscript:

      (1) The one proposed to predict the orientation of mitosis performs a regression task, predicting a probability for the division angle. The architecture, which must be different from a simple Unet, is not detailed anywhere, so the way it was designed is difficult to assess. It is unclear if it also performs mitosis detection, or if it is instead used to infer orientation once the timing and location of the division have been inferred by the previous network.

      The neural network used for U-NetOrientation has the same architecture as U-NetCellDivision10 but has been retrained to complete a different task: finding division orientation. Our workflow is as follows: firstly, U-NetCellDivision10 is used to find cell divisions; secondly, U-NetOrientation is applied locally to determine the division orientation. These points have now been clarified in the main text on Page 14.

      (2) The one proposed to improve the quality of cell boundary images before segmentation is nothing new, it has now become a classic step in segmentation, see for example Wolny et al. eLife 2020.

      We have cited similar segmentation models in our paper and thank the referee for this additional one. We had made an improvement to the segmentation models, using GFP-tagged E-cadherin, a protein localised in a thin layer at the apical boundary of cells. So, while this is primarily a 2D segmentation problem, some additional information is available in the z-axis as the protein is visible in 2-3 separate z-slices. Hence, we supplied this 3-focal plane input to take advantage of the 3D nature of this signal. This approach has been made more explicit in the text (Pages 14, 15) and Figure (Fig. 2D).

      As a side note, I found it a bit frustrating to realise that all the analysis was done in 2D while the original images are 3D z-stacks, so a lot of the 3D information had to be compressed and has not been used. A novelty, in my opinion, could have resided in the generalisation to 3D of the deep-learning approaches previously proposed in that context, which are exclusively 2D, in particular, to predict the orientation of the division.

      Our experimental system is a relatively flat 2D tissue with the orientation of the cell divisions consistently in the xy-plane. Hence, a 2D analysis is most appropriate for this system. With the successful application of the 2D methods already achieving high accuracy, we envision that extension to 3D would only offer a slight increase in effectiveness as these measurements have little room for improvement. Therefore, we did not extend the method to 3D here. However, of course, this is the next natural step in our research as 3D models would be essential for studying 3D tissues; such 3D models will be computationally more expensive to analyse and more challenging to hand label.

      Concerning the biological application of the proposed methods, I found the results interesting, showing the potential of such a method to automatise mitosis quantification for a particular biological question of interest, here wound healing. However, the deep learning methods/applications that are put forward as the central point of the manuscript are not particularly original.

      We thank the referee for their constructive comments. Our aim was not only to show the accuracy of our models but also to show how they might be useful to biologists for automated analysis of large datasets, which is a—if not the—bottleneck for many imaging experiments. The ability to process large datasets will improve robustness of results, as well as allow additional hypotheses to be tested. Our study also demonstrated that these models can cope with real in vivo experiments where additional complications such as progressive development, tissue wounding and inflammation must be accounted for.

      Major point 1: generalisation potential of the proposed method.

      The neural network model proposed for mitosis detection relies on a 2D convolutional neural network (CNN), more specifically on the Unet architecture, which has become widespread for the analysis of biology and medical images. The strategy proposed here exploits the fact that the input of such an architecture is natively composed of several channels (originally 3 to handle the 3 RGB channels, which is actually a holdover from computer vision, since most medical/biological images are gray images with a single channel), to directly feed the network with 3 successive images of a timelapse at a time. This idea is, in itself, interesting because no modification of the original architecture had to be carried out. The latest 10-channel model (U-NetCellDivision10), which includes more channels for better performance, required minimal modification to the original U-Net architecture but also simultaneous imaging of cadherin in addition to histone markers, which may not be a generic solution.

      We believe we have provided a general approach for practical use by biologists that can be applied to a range of experimental data, whether that is based on varying numbers of fluorescent channels and/or timepoints. We envisioned that experimental biologists are likely to have several different parameters permissible for measurement based on their specific experimental conditions e.g., different fluorescently labelled proteins (e.g. tubulin) and/or time frames. To accommodate this, we have made it easy and clear in the code on GitHub how these changes can be made. While the model may need some alterations and retraining, the method itself is a generic solution as the same principles apply to very widely used fluorescent imaging techniques.

      Since CNN-based methods accept only fixed-size vectors (fixed image size and fixed channel number) as input (and output), the length or time resolution of the extracted sequences should not vary from one experience to another. As such, the method proposed here may lack generalization capabilities, as it would have to be retrained for each experiment with a slightly different temporal resolution. The paper should have compared results with slightly different temporal resolutions to assess its inference robustness toward fluctuations in division speed.

      If multiple temporal resolutions are required for a set of experiments, we envision that the model could be trained over a range of these different temporal resolutions. Of course, the temporal resolution, which requires the largest vector would be chosen as the model's fixed number of input channels. Given the depth of the models used and the potential to easily increase this by replacing resnet34 with resnet50 or resnet101 the model would likely be able to cope with this, although we have not specifically tested this. (page 27)

      Another approach (not discussed) consists in directly convolving several temporal frames using a 3D CNN (2D+time) instead of a 2D, in order to detect a temporal event. Such an idea shares some similarities with the proposed approach, although in this previous work (Ji et al. TPAMI 2012 and for split detection Nie et al. CCVPR 2016) convolution is performed spatio-temporally, which may present advantages. How does the authors' method compare to such an (also very simple) approach?

      We thank the Reviewer for this insightful comment. The text now discusses this (on Pages 8 and 17). Key differences between the models include our incorporation of multiple light channels and the use of much deeper models. We suggest that our method allows for an easy and natural extension to use deeper models for even more demanding tasks e.g. distinguishing between healthy and defective divisions. We also tested our method with ‘difficult conditions’ such as when a wound is present; despite the challenges imposed by the wound (including the discussed reduction in fluorescent intensities near the wound edge), we achieved higher accuracy compared to Nie et al. (accuracy of 78.5% compared to our F1 score of 0.964) using a low-density in vitro system.

      Major point 2: innovatory nature of the proposed method.

      The authors' idea of exploiting existing channels in the input vector to feed successive frames is interesting, but the natural choice in deep learning for manipulating time series is to use recurrent networks or their newer and more stable variants (LSTM, GRU, attention networks, or transformers). Several papers exploiting such approaches have been proposed for the mitotic division detection task, but they are not mentioned or discussed in this manuscript: Phan et al. 2018, Mao et al. 2019, Kitrungrotaskul et al. 2019, She et al 2020.

      An obvious advantage of an LSTM architecture combined with CNN is that it is able to address variable length inputs, therefore time sequences of different lengths, whereas a CNN alone can only be fed with an input of fixed size.

      LSTM architectures may produce similar accuracy to the models we employ in our study, however due to the high degree of accuracy we already achieve with our methods, it is hard to see how they would improve the understanding of the biology of wound healing that we have uncovered. Hence, they may provide an alternative way to achieve similar results from analyses of our data. It would also be interesting to see how LTSM architectures would cope with the noisy and difficult wounded data that we have analysed. We agree with the referee that these alternate models could allow an easier inclusion of difference temporal differences in division time (see discussion on Page 20). Nevertheless, we imagine that after selecting a sufficiently large input time/ fluorescent channel input, biologists could likely train our model to cope with a range of division lengths.

      Another advantage of some of these approaches is that they rely on unsupervised learning, which can avoid the tedious relabeling of data (Phan et al. 2018, Gilad et al. 2019).

      While these are very interesting ideas, we believe these unsupervised methods would struggle under the challenging conditions within ours and others experimental imaging data. The epithelial tissue examined in the present study possesses a particularly high density of cells with overlapping nuclei compared to the other experimental systems these unsupervised methods have been tested on. Another potential problem with these unsupervised methods is the difficulty in distinguishing dynamic debris and immune cells from mitotic cells. Once again despite our experimental data being more complex and difficult, our methods perform better than other methods designed for simpler systems as in Phan et al. 2018 and Gilad et al. 2019; for example, analysis performed on lower density in vitro and unwounded tissues gave best F1 scores for a single video was 0.768 and 0.829 for unsupervised and supervised respectively (Phan et al. 2018). We envision that having an F1 score above 0.9 (and preferably above 0.95), would be crucial for practical use by biologists, hence we believe supervision is currently still required. We expect that retraining our models for use in other experimental contexts will require smaller hand labelled datasets, as they will be able to take advantage of transfer learning (see discussion on Page 4).

      References :

      We have included these additional references in the revised version of our Manuscript.

      Ji, S., Xu, W., Yang, M., & Yu, K. (2012). 3D convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence, 35(1), 221-231. >6000 citations

      Nie, W. Z., Li, W. H., Liu, A. A., Hao, T., & Su, Y. T. (2016). 3D convolutional networks-based mitotic event detection in time-lapse phase contrast microscopy image sequences of stem cell populations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 55-62).

      Phan, H. T. H., Kumar, A., Feng, D., Fulham, M., & Kim, J. (2018). Unsupervised two-path neural network for cell event detection and classification using spatiotemporal patterns. IEEE Transactions on Medical Imaging, 38(6), 1477-1487.

      Gilad, T., Reyes, J., Chen, J. Y., Lahav, G., & Riklin Raviv, T. (2019). Fully unsupervised symmetry-based mitosis detection in time-lapse cell microscopy. Bioinformatics, 35(15), 2644-2653.

      Mao, Y., Han, L., & Yin, Z. (2019). Cell mitosis event analysis in phase contrast microscopy images using deep learning. Medical image analysis, 57, 32-43.

      Kitrungrotsakul, T., Han, X. H., Iwamoto, Y., Takemoto, S., Yokota, H., Ipponjima, S., ... & Chen, Y. W. (2019). A cascade of 2.5 D CNN and bidirectional CLSTM network for mitotic cell detection in 4D microscopy image. IEEE/ACM transactions on computational biology and bioinformatics, 18(2), 396-404.

      Shi, J., Xin, Y., Xu, B., Lu, M., & Cong, J. (2020, November). A Deep Framework for Cell Mitosis Detection in Microscopy Images. In 2020 16th International Conference on Computational Intelligence and Security (CIS) (pp. 100-103). IEEE.

      Wolny, A., Cerrone, L., Vijayan, A., Tofanelli, R., Barro, A. V., Louveaux, M., ... & Kreshuk, A. (2020). Accurate and versatile 3D segmentation of plant tissues at cellular resolution. Elife, 9, e57613.

    1. Joint Public Review

      Summary:

      The authors sought to elucidate the mechanism by which infections increase sleep in Drosophila. Their work is important because it further supports the idea that the blood-brain barrier is involved in brain-body communication, and because it advances the field of sleep research. Using knock-down and knock-out of cytokines and cytokine receptors specifically in the endocrine cells of the gut (cytokines) as well as in the glia forming the blood-brain barrier (BBB) (cytokines receptors), the authors show that cytokines, upd2 and upd3, secreted by entero-endocrine cells in response to infections increase sleep through the Dome receptor in the BBB. They also show that gut-derived Allatostatin (Alst) A promotes wakefulness by inhibiting Alst A signaling that is mediated by Alst receptors expressed in BBB glia. Their results suggest there may be additional mechanisms that promote elevated sleep during gut inflammation.<br /> The authors suggest that upd3 is more critical than upd2, which is not sufficiently addressed or explained. In addition, the study uses the gut's response to reactive oxygen molecules as a proxy for infection, which is not sufficiently justified. Finally, further verification of some fundamental tools used in this paper would further solidify these findings making them more convincing.

      Strengths:

      (1) The work addresses an important topic and proposes an intriguing mechanism that involves several interconnected tissues. The authors place their research in the appropriate context and reference related work, such as literature about sickness-induced sleep, ROS, the effect of nutritional deprivation on sleep, sleep deprivation and sleep rebound, upregulated receptor expression as a compensatory mechanism in response to low levels of a ligand, and information about Alst A.

      (2) The work is, in general, supported by well-performed experiments that use a variety of different tools, including multiple RNAi lines, CRISPR, and mutants, to dissect both signal-sending and receiving sides of the signaling pathway.

      (3) The authors provide compelling evidence that shows that endocrine cells from the gut are the source of the upd cytokines that increase daytime sleep, that the glial cells of the BBB are the targets of these upds, and that upd action causes the downregulation of Alst receptors in the BBB via the Jak/Stat pathways.

      Weaknesses:

      (1) There is a limited characterization of cell types in the midgut which are classically associated with upd cytokine production.

      (2) Some of the main tools used in this manuscript to manipulate the gut while not influencing the brain (e.g., Voilà and Voilà + R57C10-GAL80), are not directly shown to not affect gene expression in the brain. This is critical for a manuscript delving into intra-organ communication, as even limited expression in the brain may lead to wrong conclusions.

      (3) The model of gut inflammation used by the authors is based on the increase in reactive oxygen species (ROS) obtained by feeding flies food containing 1% H2O2. The use of this model is supported by the authors rather weakly in two papers (refs. 26 and 27 ): The paper by Jiang et al. (ref. 26) shows that the infection by Pseudomonas entomophila induces cytokine responses upd2 and 3, which are also induced by the Jnk pathway. In addition, no mention of ROS could be found in Buchon et al. (ref 27); this is a review that refers to results showing that ROS are produced by the NADPH oxidase DUOX as part of the immune response to pathogens in the gut. Thus, there is no strong support for the use of this model.

      (4) Likewise, there is no support for the use of ROS in the food instead a direct infection by pathogenic bacteria. Furthermore, it is known that ROS damages the gut epithelium, which in turn induces the expression of the cytokines studied. Thus the effects observed may not reflect the response to infection. In addition, Majcin Dorcikova et al. (2023). Circadian clock disruption promotes the degeneration of dopaminergic neurons in male Drosophila. Nat Commun. 2023 14(1):5908. doi: 10.1038/s41467-023-41540-y report that the feeding of adult flies with H2O2 results in neurodegeneration if associated with circadian clock defects. Thus, it would be important to discuss or present controls that show that the feeding of H2O2 does not cause neuronal damage.

      (5) The novelty of the work is difficult to evaluate because of the numerous publications on sleep in Drosophila. Thus, it would be very helpful to read from the authors how this work is different and novel from other closely related works such as: Li et al. (2023) Gut AstA mediates sleep deprivation-induced energy wasting in Drosophila. Cell Discov. 23;9(1):49. doi: 10.1038/s41421-023-00541-3.

    1. eLife assessment

      This valuable contribution follows past descriptions of ciliation defects, potentially linked to cholinergic neuronal dysfunction, associated with mutated G2019S Lrrk2 expression. The strength of evidence is considered solid and broadly supportive of the claims concerning well-characterized cilia changes in cholinergic neurons over time in the model; however, additional work may be required to define the specificity of the pRab12 antibody in the IHC technique, dependence on LRRK2, and clarification of the cilia phenotype in sporadic PD brains that exists (for the moment) only in a non-peer-reviewed pre-print, despite the prominence of these (preliminary) results highlighted in the abstract and text of the current manuscript. It is hoped that the authors will begin to address the feedback provided by the expert reviewers to help provide a more mechanistic basis for the audience interested in cholinergic defects associated with Parkinson's disease.

    2. Reviewer #1 (Public review):

      Summary:

      This study represents valuable insight into the potential contribution of ciliation deficits and cholinergic neuron survival in an etiologically appropriate Parkinson's disease mouse model. The evidence presented is convincing, employing a validated methodology to assess measures across multiple brain regions and time points, with adequate observation numbers. Similarities between some of the data here and human patients further validate the model, and the study provides numerous avenues to aid future advances.

      Strengths:

      Overall, this study presents a thorough analysis of ciliary defects and cell loss in cholinergic neurons throughout the brain in the LRRK2 G2019S knockin mouse model of Parkinson's disease. The authors aimed to characterize ciliary defects in areas not only implicated in PD but also in cholinergic neuron function. Additionally, they repeated measures across age and sex, presenting a body of work that is more readily translatable to human disease states. The strengths of the paper included the breadth of brain regions tested and additional mechanistic contributions of LRRK2 that may correlate to their observed phenotypes. The study conveys to the reader the ciliary phenotype observed in all the cholinergic neurons assessed throughout the brains of knock-in LRRK2 mutant mice. Importantly, the pattern of changes is, in some instances, strikingly similar to PD, which strengthens the case for construct and face validation of the G2019S knock-in mouse model. Future investigations of the physiological and behavioural correlates/consequences of these changes will inform ongoing and, as yet untried, therapeutic intervention attempts.

      Weaknesses:

      At times, the claims are only partially substantiated by how the data are presented (e.g., inappropriate statistics within an age (t-tests, not ANOVA) and a lack of comparison between ages (despite referring to the progress of a phenotype). More appropriate statistical analyses and revisions to the data presentation are required to substantiate basic and more 'progressive' conclusions. Further, distributing the central claim over 10 figures dilutes the impact, many of which could be compressed into a couple of single figures (e.g., cell counts in all regions and ciliation). Also, a summary graphic showing the brain regions affected by ciliation alterations and cell loss at young, middle, and old age in the GS mice would be hugely beneficial. This peer would like to see more discussion of how the observed changes would impact circuit-level function and more speculation of the underlying mechanisms leading to the deficits. Minor changes to the abstract and introduction (to include more detail in the rationale and supporting evidence) are recommended, as summaries of existing literature are vague and could flow better between one statement and the next.

    3. Reviewer #2 (Public review):

      Summary:

      LRRK2 has previously been shown to affect cilia formation and stability both in vitro and in vivo, in striatal cholinergic interneurons, in both transgenic mice and in human post-mortem brain samples from subjects carrying one of the LRRK2 pathogenic mutations: G2019S. In the current study, Brahmia and colleagues have conducted a comprehensive assessment of G2019S knock-in mice to address some gaps in the field, specifically: extending analysis to additional cholinergic neurons across 3 time points and determining the functional consequences of the ciliation deficits. They find that primary cilia are lost in all cholinergic neurons, with basal forebrain cholinergic neurons displaying an early onset (in 4-5-month-old mice) compared with other regions. They also show early dystrophic changes in cholinergic axons derived from basal forebrain and brainstem cholinergic neurons and age-dependent cholinergic cell loss in select forebrain and brainstem nuclei.

      Strengths:

      This is a comprehensive and careful analysis of ciliary deficits and their downstream consequences, which we assume are deficits in innervation and cell loss.

      Weaknesses:

      This study is observational and does not address the underlying mechanisms. The data on pRab12, although downstream of LRRK2, does not clearly address this and, instead, raises more questions than answers: e.g., is there really differentiation from Rab10 and its phosphorylation or is it primarily due to the limitations of pRab10 antibodies with regards to the lack of suitability of this antibody in mouse brain sections (could immunoblots on brain punches have been performed to overcome this?). Are Rab10, Rab12, and LRRK2 expressed at different levels in the vulnerable cell types? Plenty of recent high-quality single-cell/single nuclear RNA-seq data could have been used to address such a fundamental question. LRRK2 small molecule inhibitors are available and progressing in the clinic. They could/should have been used to demonstrate the LRRK2 dependence, reversibility, and timing of therapeutic intervention. The authors suggest that the mouse data mirror (and potentially explain) the cholinergic loss in PD patient brains, but this is not measured in the current work (the authors do acknowledge this limitation and suggest that this is an important further study). There are some recent human data (Khan et al 2024 PMID: 38293195, BioRxiv, which the authors cite) showing loss of primary cilia and cholinergic neurons in sporadic PD (no evidence of aberrant LRRK2 activity) and, interestingly, this is not further exacerbated in G2019S carriers, which may suggest a more complex underlying mechanism.

    4. Reviewer #3 (Public review):

      Summary:

      The authors described cilia deficits, phospho-Rab12 accumulation, dystrophic axons in cholinergic neurons, and loss of the cholinergic neurons in the mouse brains of G2019S-LRRK2 knock-in mice, a preclinical animal model for Parkinson's disease. They showed that the above changes associated with cholinergic neurons are age-dependent and region-specific. The observation is interesting considering the neuron-type-specific effect of the LRRK2-G2019S in mice.

      Strengths:

      The observations are important and show neuron type-specific effects of the PD mutation of LRRK2 relevant to PD pathologies.

      Weaknesses:

      The authors may over-interpret the data, and the study may lack mechanistic investigation.

    1. eLife assessment

      In this manuscript, Griesius et al analyze the dendritic integration properties of NDNF and OLM interneurons, and the current dataset suggests that even though both cell types display supralinear NMDA receptor-dependent synaptic integration, this may be associated with dendritic calcium transients only in NDNF interneurons. These findings are important because they could shed light on the functional diversity of different classes of interneurons in the mouse neocortex and hippocampus, which in turn can have major implications for understanding information flow in complex neural circuits. They are considered as being currently incomplete, however, due to: (i) the large variability and small sample size of multiple datasets, which prevents a finer evaluation of cellular and molecular mechanisms accounting for the difference in the integrative properties of different interneuron types; (ii) lack of control experiments to rule out that the effect of the NMDA antagonist AP5 on synaptic integration is not confounded by potential phototoxicity damage; (iii) lack of a precise control of the uncaging location.

    2. Reviewer #1 (Public review):

      The manuscript by Griesius et al. addresses the dendritic integration of synaptic input in cortical GABAergic interneurons (INs). Dendritic properties, passive and active, of principal cells have been extensively characterized, but much less is known about the dendrites of INs. The limited information is particularly relevant in view of the high morphological and physiological diversity of IN types. The few studies that investigated IN dendrites focused on parvalbumin-expressing INs. In fact, in a previous study, the authors examined dendritic properties of PV INs, and found supralinear dendritic integration in basal, but not in apical dendrites (Cornford et al., 2019 eLife).

      In the present study, complementary to the prior work, the authors investigate whether dendrite-targeting IN types, NDNF-expressing neurogliaform cells, and somatostatin(SOM)-expressing O-LM neurons, display similar active integrative properties by combining clustered glutamate-uncaging and pharmacological manipulations with electrophysiological recording and calcium imaging from genetically identified IN types in mouse acute hippocampal slices.

      The main findings are that NDNF IN dendrites show strong supralinear summation of spatially- and temporally-clustered EPSPs, which is changed into sublinear behavior by bath application of NMDA receptor antagonists, but not by Na+-channel blockers. L-type calcium channel blockers abolished the supralinear behavior associated calcium transients but had no or only weak effect on EPSP summation. SOM IN dendrites showed similar, albeit weaker NMDA-dependent supralinear summation, but no supralinear calcium transients were detected in these INs. In summary, the study demonstrates that different IN types are endowed with active dendritic integrative mechanisms, but show qualitative and quantitative divergence in these mechanisms.

      While the research is conceptionally not novel, it constitutes an important incremental gain in our understanding of the functional diversity of GABAergic INs. In view of the central roles of IN types in network dynamics and information processing in the cortex, results and conclusions are of interest to the broader neuroscience community.

      The experiments are well designed, and closely follow the approach from the previous publication in parts, enabling direct comparison of the results obtained from the different IN types. The data is convincing and the conclusions are well-supported, and the manuscript is very well-written.

      I see only a few open questions and some inconsistencies in the presentation of the data in the figures (see details below).

    3. Reviewer #2 (Public review):

      Summary:

      Griesius et al. investigate the dendritic integration properties of two types of inhibitory interneurons in the hippocampus: those that express NDNF+ and those that express somatostatin. They found that both neurons showed supralinear synaptic integration in the dendrites, blocked by NMDA receptor blockers but not by blockers of Na+ channels. These experiments are critically overdue and very important because knowing how inhibitory neurons are engaged by excitatory synaptic input has important implications for all theories involving these inhibitory neurons.

      Strengths:

      (1) Determined the dendritic integration properties of two fundamental types of inhibitory interneurons.

      (2) Convincing demonstration that supra-threshold integration in both cell types depends on NMDA receptors but not on Na+ channels.

      Weaknesses:

      It is unknown whether highly clustered synaptic input, as used in this study (and several previous studies), occurs physiologically.

    4. Reviewer #3 (Public review):

      Summary:

      The authors study the temporal summation of caged EPSPs in dendrite-targeting hippocampal CA1 interneurons. There are some descriptive data presented, indicating non-linear summation, which seems to be larger in dendrites of NDNF expressing neurogliaform cells versus OLM cells. However, the underlying mechanisms are largely unclear.

      Strengths:

      Focal 2-photon uncaging of glutamate is a nice and detailed method to study temporal summation of small potentials in dendritic segments.

      Weaknesses:

      (1) NMDA-receptor signaling in NDNF-IN. The authors nicely show that temporal summation in dendrites of NDNF-INs is to a certain extent non-linear. However, this non-linearity varies massively from cell to cell (or dendrite to dendrite) from 0% up to 400% (Figure S2). The reason for this variability is totally unclear. Pharmacology with AP5 hints towards a contribution of NMDA receptors. However, the authors claim that the non-linearity is not dependent on EPSP amplitude (Figure S2), which should be the case if NMDA-receptors are involved. Unfortunately, there are no voltage-clamp data of NMDA currents similar to the previous study. This would help to see whether NMDA-receptor contribution varies from synapse to synapse to generate the observed variability? Furthermore, the NMDA- and AMPA-currents would help to compare NDNF with the previously characterized PV cells and would help to contribute to our understanding of interneuron function.

      (2) Sublinear summation in NDNF-INs. In the presence of AP5, the temporal summation of caged EPSPs is sublinear. That is potentially interesting. The authors claim that this might be dependent on the diameter of dendrites. Many voltage-gated channels can mediate such things as well. To conclude the contribution of dendritic diameter, it would be helpful to at least plot the extent of sublinearity in single NDNF dendrites versus the dendritic diameter. Otherwise, this statement should be deleted.

      (3) Nonlinear EPSP summation in OLM-IN. The authors do similar experiments in dendrite-targeting OLM-INs and show that the non-linear summation is smaller than in NDNF cells. The reason for this remains unclear. The authors claim that this is due to the larger dendritic diameter in OLM cells. However, there is no analysis. The minimum would be to correlate non-linearity with dendritic diameter in OLM-cells. Very likely there is an important role of synapse density and glutamate receptor density, which was shown to be very low in proximal dendrites of OLM cells and strongly increase with distance (Guirado et al. 2014, Cerebral Cortex 24:3014-24, Gramuntell et al. 2021, Front Aging Neurosci 13:782737). Therefore, the authors should perform a set of experiments in more distal dendrites of OLM cells with diameters similar to the diameters of the NDNF cells. Even better would be if the authors would quantify synapse density by counting spines and show how this density compares with non-linearity in the analyzed NDNF and OLM dendrites.

      (4) NMDA in OLM. Similar to the NDNF cells, the authors claim the involvement of NMDA receptors in OLM cells. Again there seems to be no dependence on EPSP amplitude, which is not understandable at this point (Figure S3). Even more remarkable is the fact that the authors claim that there is no dendritic calcium increase after activation of NMDA receptors. Similar to NDNF-cell analysis there are no NMDA currents in OLMs. Unfortunately, even no calcium imaging experiments were shown. Why? Are there calcium-impermeable NNDA receptors in OLM cells? To understand this phenomenon the minimum is to show some physiological signature of NMDA-receptors, for example, voltage-clamp currents. Furthermore, it would be helpful to systematically vary stimulus intensity to see some calcium signals with larger stimulation. In case there is still no calcium signal, it would be helpful to measure reversal potentials with different ion compositions to characterize the potentially 'Ca2+ impermeable' voltage-dependent NMDA receptors in OLM cells.

    1. eLife assessment

      This study used electrophysiology and imaging to show that the majority of excitatory cells in the dentate gyrus of adult mice have very slow oscillations during non-rapid eye movement (NREM) sleep. The oscillations were influenced by serotonin when it was released during NREM sleep. Moreover, the serotonin receptor type 1a mediated the effect, and reducing these receptors impaired a type of memory. The significance of the study is important and the strength of the evidence is solid, but revisions to the figures and making conclusions more consistent with the data could improve the significance and strength of evidence.

    2. Reviewer #1 (Public review):

      Summary:

      This study provides convincing evidence on the infraslow oscillation of DG cells during NREM sleep, and how serotonergic innervation modulates hippocampal activity pattern during sleep and memory.

      Strengths and Weaknesses:

      The authors used state-of-the-art techniques to carry out these experiments. Given that the functional role of infraslow rhythm still remains to be studied, this study provides convincing evidence of the role of DG cells in regulating infraslow rhythm, sleep microarchitecture, and memory.

      I have a few minor comments.

      (1) Decreased infraslow rhythm during NREMs in the 5ht1a KO mice is striking. It would be helpful to know whether sleep-wake states, MAs, and transitions to REMs are changed.

      (2) It would be interesting to discuss whether the magnitude in changes of infraslow rhythm strength is correlated with memory performance (Figure 6).

      (3) The authors should cite the Oikonomou Neuron paper that describes slow oscillatory activity of DRN SERT neurons during NREM sleep.

      (4) The authors should clarify how they define the phasic pattern of the photometry signal.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigated DG neuronal activity at the population and single-cell level across sleep/wake periods. They found an infraslow oscillation (0.01-0.03 Hz) in both granule cells (GC) and mossy cells (MC) during NREM sleep.

      The important findings are:

      (1) The antiparallel temporal dynamics of DG neuron activities and serotonin neuron activities/extracellular serotonin levels during NREM sleep, and

      (2) The GC Htr1a-mediated GC infraslow oscillation.

      Strengths:

      (1) The combination of polysomnography, Ca-fiber photometry, two-photon microscopy, and gene depletion is technically sound. The coincidence of microarousals and dips in DG population activity is convincing. The dip in activity in upregulated cells is responsible for the dip at the population level.

      (2) DG GCs express excitatory Htr4 and Htr7 in addition to inhibitory Htr1a, but deletion of Htr1a is sufficient to disrupt DG GC infraslow oscillation, supporting the importance of Htr1a in DG activity during NREM sleep.

      Weaknesses:

      (1) The current data set and analysis are insufficient to interpret the observation correctly.

      a. In Figure 1A, during NREM, the peaks and troughs of GC population activities seem to gradually decrease over time. Please address this point.

      b. In Figure 1F, about 30% of Ca dips coincided with MA (EMG increase) and 60% of Ca dips did not coincide with EMG increase. If this is true, the readers can find 8 Ca dips which are not associated with MAs from Figure 1E. If MAs were clustered, please describe this properly.

      c. In Figure 1F, the legend stated the percentage during NREM. If the authors want to include the percentage of wake and REM, please show the traces with Ca dips during wake and REM. This concern applies to all pie charts provided by the authors.

      d. In Figure 1C, please provide line plots connecting the same session. This request applies to all related figures.

      e. In Figure 2C, the significant increase during REM and the same level during NREM are not convincing. In Figure 2A, the several EMG increasing bouts do not appear to be MA, but rather wakefulness, because the duration of the EMG increase is greater than 15 seconds. Therefore, it is possible that the wake bouts were mixed with NREM bouts, leading to the decrease of Ca activity during NREM. In fact, In Figure 2E, the 4th MA bout seems to be the wake bout because the EMG increase lasts more than 15 seconds.

      f. Figure 5D REM data are interesting because the DRN activity is stably silenced during REM. The varied correlation means the varied DG activity during REM. The authors need to address it.

      g. In Figure 6, the authors should show the impact of DG Htr1a knockdown on sleep/wake structure including the frequency of MAs. I agree with the impact of Htr1a on DG ISO, but possible changes in sleep bout may induce the DG ISO disturbance.

      (2) It is acceptable that DG Htr1a KO induces the reduced freezing in the CFC test (Figure 6E, F), but it is too much of a stretch that the disruption of DG ISO causes impaired fear memory. There should be a correlation.

      (3) It is necessary to describe the extent of AAV-Cre infection. The authors injected AAV into the dorsal DG (AP -1.9 mm), but the histology shows the ventral DG (Supplementary Figure 4), which reduces the reliability of this study.

    4. Reviewer #3 (Public review):

      Summary:

      The authors employ a series of well-conceived and well-executed experiments involving photometric imaging of the dentate gyrus and raphe nucleus, as well as cell-type specific genetic manipulations of serotonergic receptors that together serve to directly implicate serotonergic regulation of dentate gyrus (DG) granule (GC) and mossy cell (MC) activity in association with an infra slow oscillation (ISO) of neural activity has been previously linked to general cortical regulation during NREM sleep and microarousals.

      Strengths:

      There are a number of novel and important results, including the modulation of dentage granule cell activity by the infraslow oscillation during NREM sleep, the selective association of different subpopulations of granule cells to microarousals (MA), the anticorrelation of raphe activity with infraslow dentate activity.

      The discussion includes a general survey of ISOs and recent work relating to their expression in other brain areas and other potential neuromodulatory system involvement, as well as possible connections with infraslow oscillations, micro-arousals, and sensory sensitivity.

      Weaknesses:

      (1) The behavioral results showing contextual memory impairment resulting from 5-HT1a knockdown are fine but are over-interpreted. The term memory consolidation is used several times, as well as references to sleep-dependence. This is not what was tested. The receptor was knocked down, and then 2 weeks later animals were found to have fear conditioning deficits. They can certainly describe this result as indicating a connection between 5-HT1a receptor function and memory performance, but the connection to sleep and consolidation would just be speculation. The fact that 5-HT1a knockdown also impacted DG ISOs does not establish dependency. Some examples of this are:

      a. The final conclusion asserts "Together, our study highlights the role of neuromodulation in organizing neuronal activity during sleep and sleep-dependent brain functions, such as memory.". However, the reported memory effects (impairment of fear conditioning) were not shown to be explicitly sleep-dependent.

      b. Earlier in the discussion it mentions "Finally, we showed that local genetic ablation of 5-HT1a receptors in GCs impaired the ISO and memory consolidation". The effect shown was on general memory performance - consolidation was not specifically implicated.

      (2) The assertion on page 9 that the results demonstrate "that the 5-HT is directly acting in the DG to gate the oscillations" is a bit strong given the magnitude of effect shown in Figure 6D, and the absence of demonstration of negative effect on cortical areas that also show ISO activity and could impact DG activity (see requested cortical sigma power analysis).

      (3) Recent work has shown that abnormal DG GC activity can result from the use of the specific Ca indicator being used (GCaMP6s). (Teng, S., Wang, W., Wen, J.J.J. et al. Expression of GCaMP6s in the dentate gyrus induces tonic-clonic seizures. Sci Rep 14, 8104 (2024). https://doi.org/10.1038/s41598-024-58819-9). The authors of that study found that the effect seemed to be specific to GCaMP6s and that GCaMP6f did not lead to abnormal excitability. Note this is of particular concern given similar infraslow variation of cortical excitability in epilepsy (cf Vanhatalo et al. PNAS 2004). While I don't think that the experiments need to be repeated with a different indicator to address this concern, you should be able to use the 2p GCaMP7 experiments that have already been done to provide additional validation by repeating the analyses done for the GCaMP6s photometry experiments. This should be done anyway to allow appropriate comparison of the 2p and photometry results.

      (4) While the discussion mentions previous work that has linked ISOs during sleep with regulation of cortical oscillations in the sigma band, oddly no such analysis is performed in the current work even though it is presumably available and would be highly relevant to the interpretation of a number of primary results including the relationship between the ISOs and MAs observed in the DG and similar results reported in other areas, as well as the selective impact of DG 5-HT1a knockdown on DG ISOs. For example, in the initial results describing the cross-correlation of calcium activity and EMG/EEG with MA episodes (paragraph 1, page 4), similar results relating brief arousals to the infraslow fluctuation in sleep spindles (sigma band) have been reported also at .02 Hz associated with variation in sensory arousability (cf. Cardis et al., "Cortico-autonomic local arousals and heightened somatosensory arousability during NREMS of mice in neuropathic pain", eLife 2021). It would be important to know whether the current results show similar cortical sigma band correlations. Also, in the results on ISO attenuation following 5-HT1 knockdown on page 7 (Figure 6), how is cortical EEG affected? Is ISO still seen in EEG but attenuated in DG?

      (5) The illustrations of the effect of 5-HT1a knockdown shown in Figure 6 are somewhat misleading. The examples in panels B and C show an effect that is much more dramatic than the overall effect shown in panel D. Panels B and C do not appear to be representative examples. Which of the sample points in panel D are illustrated in panels B and C? It is not appropriate to arbitrarily select two points from different animals for comparison, or worse, to take points from the extremes of the distributions. If the intent is to illustrate what the effect shown in D looks like in the raw data, then you need to select examples that reflect the means shown in panel D. It is also important to show the effect on cortical EEG, particularly in sigma band to see if the effects are restricted to the DG ISOs. It would also be helpful to show that MAs and their correlations as shown in Figure 1 or G as well as broader sleep architecture are not affected.

      (6) On page 9 of the results it states that GCs and MCs are upregulated during NREM and their activity is abruptly terminated by MAs through a 5-HT mediated mechanism. I didn't see anything showing the 5-HT dependence of the MA activity correlation. The results indicate a reduction in ISO modulation of GC activity but not the MA-correlated activity. I would like to see the equivalent of Figure 1,2 G panels with the 5-HT1a manipulation.

    1. eLife assessment

      This important study by Wong et al. addresses a longstanding question in the field of associative learning regarding how a motivationally relevant event can be inferred from prior learning based on neutral stimulus-stimulus associations. The research provides convincing behavioral and neurophysiological evidence to address this important question. The manuscript will be interesting for researchers in behavioral and cognitive neuroscience.

    2. Reviewer #1 (Public review):

      Summary:

      This study is an important follow-up to their prior work - Wong et al. (2019), starting with clear questions and hypotheses, followed by a series of thoughtful and organized experiments. The method and results are convincing. Experiment 1 demonstrated the sensory preconditioned fear with few (8) or many (32) sound-light pairings. Experiments 2A and 2B showed the role of PRh NMDA receptors during conditioning for online integration, revealing that this contribution is present only after a few sound-light pairings, not after many sound-light pairings. Experiments 3A and 3B showed the contribution of PRh-BLA communication to online integration, again only after a few but not after many. Contrary to Experiments 3A and 3B, Experiments 4A and 4B showed the contribution of PRh-BLA communication to integration at test only after many but not few sound-light pairings.

      Strengths:

      Throughout the manuscript, the methods and results are clearly organized and described, and the use of statistics is solid, all contributing to the overall clarity of the research. The discussion section was also well-written, effectively comparing the current research with the prior work and offering insightful interpretations and potential future directions for this line of research. I have only a limited amount of concerns about some results and some details of experiments/statistics.

      Weaknesses:

      Could you provide further interpretation regarding line 171: the observation that sensory preconditioned fear increased with the number of sound-light pairings? Was this increase due to better sound-light association learning during Stage 1? Additionally, were there any experimental differences between Experiment 1 and the other experiments that might explain why freezing was higher in the P32 group compared to the P8 group? This pattern seemed to be absent in the other experiments. If we consider the hypothesis that the online integration mechanism is more active with fewer pairings and the chaining mechanism at the test is more prominent with many pairings, we wouldn't expect a difference between the P8 and P32 groups. Given the relatively small sample size in Experiment 1, the authors might consider conducting a cross-experiment analysis or something similar to investigate this further.

    3. Reviewer #2 (Public review):

      This manuscript builds on the authors' earlier work, most recently Wong et al. 2019, in which they showed the importance of the perirhinal cortex (PRh) during the first-order conditioning stage of sensory preconditioning. Sensory preconditioning requires learning between two neutral stimuli (S2-S1) and subsequent development of a conditioned response to one of the neutral stimuli after pairing of the other stimulus with a motivationally relevant unconditioned stimulus (S1-US). One highly debated question regarding the mechanisms of learning of sensory preconditioning has been whether conditioned responses evoked by the indirectly trained stimulus (S2) occur through a mediated representation at the time of the first-order US training, or whether the conditioned responses develop through a chained evoked representation (S2--> S1 --> US) at the time of test. The authors' prior findings provided strong evidence for PRh being involved in mediated learning during the first-order training. They showed that protein synthesis was required during the first-order S1-US learning to support the conditioned response to the indirectly trained stimulus (S2) at the test.

      One question remaining following the previous paper was whether certain conditions may promote a chaining mechanism over mediated learning, as there is some evidence for chained representations at the time of the test. In this paper, the authors directly address this important question and find unambiguous results that the extent of training during the preconditioning stage impacts the involvement of PRh during the first-order conditioning or stage 2. They show that putative blockade of synaptic changes in PRh, using an NMDA antagonist, disrupts responding to the preconditioned cue at test during shorter duration preconditioning training (8 trials), but not during extended training (32 trials). They also show that this is the case for communication between the PRh and BLA during the same stage of training using a contralateral inactivation approach. This confirms their previous findings in 2019 of connectivity between these regions for the short-duration training, while they observe here for the first time that this is not the case for extended training. Finally, they show that with extended training, communication between BLA and the PRh is required at the final test of the preconditioned stimulus, but not for the short duration training.

      The results are clear and extremely consistent across experiments within this paper as well as with earlier work. The experiments here are thorough, and well-conceived, and address an important and highly debated question in the field regarding the neural and psychological mechanisms underlying sensory preconditioning. This work is highly impactful for the field as the debate over mediated versus chaining mechanisms has been an important topic for more than 70 years.

    4. Reviewer #3 (Public review):

      The authors tested whether the number of stimulus-stimulus pairings alters whether preconditioned fear depends on online integration during the formation of the stimulus-outcome memory or during the probe test/mobilization phase, when the original stimulus, which was never paired with aversive events, elicits fear via chaining of stimulus-stimulus and stimulus-outcome memories. They found that sensory preconditioning was successful with either 8 or 32 stimulus-stimulus pairings. Perirhinal cortex NMDA receptor blockade during stimulus-outcome learning impaired preconditioning following 8 but not 32 pairings during preconditioning. Therefore, perirhinal cortex NMDA activity is required for online integration or mediated learning. Perirhinal-basolateral amygdala had nearly identical effects with the same interpretation: these areas communicate during stimulus-outcome learning, and this online communication is required for later expressing preconditioned fear. Disconnection prior to the probe test, when chaining might occur, had different effects: it impaired the expression of preconditioned fear in rats that received 32, but not 8, pairings during preconditioning. The study has several strengths and provides a thoughtful discussion of future experiments. The study is highly impactful and significant; the authors were successful in describing the behavioral and neurobiological mechanisms of mediated learning versus chaining in sensory preconditioning, which is often debated in the learning field. Therefore this study will have a significant impact on the behavioral neurobiology and learning fields.

      Strengths:

      Careful, rigorous experimental design and statistics.

      The discussion leaves open questions that are very much worth exploring. For example - why did perirhinal-amygdala disconnection prior to the probe have no effect in the 8-pairing group, when bilateral perirhinal inactivation did (in Wong et al, 2019)? The authors propose that perirhinal cortex outputs bypass the amygdala during the probe test, which is an excellent hypothesis to test.

      The authors provide evidence that both mediated learning and chaining occur.

      Weaknesses:

      This is inherent to all neural interference and behavioral experiments: biological/psychological functions do not typically operate binarily. There is no single clear number or parameter at which mediated learning or chaining happens, and both probably happen to some extent. Addressing this is even more difficult given behavioral variability across subjects, implant sites, etc. Thus, this is not so much a weakness particular to this study as much as an existential problem, which the authors were able to work around with careful experimental design and appropriate controls.

    1. eLife assessment

      This important work combines theory and experiment to assess how humans make decisions about sequences of pairs of correlated observations. The normative theory for evidence integration in correlated environments will be informative for future investigations. However, the developed theory and data analysis seem currently incomplete: it remains to be seen if the derived decision strategy is indeed normative, or only an approximation thereof, and behavioral modelling would benefit from the assessment of alternative models.

    2. Reviewer #1 (Public review):

      Summary:

      The behavioral strategies underlying decisions based on perceptual evidence are often studied in the lab with stimuli whose elements provide independent pieces of decision-related evidence that can thus be equally weighted to form a decision. In more natural scenarios, in contrast, the information provided by these pieces is often correlated, which impacts how they should be weighted. Tardiff, Kang & Gold set out to study decisions based on correlated evidence and compare the observed behavior of human decision-makers to normative decision strategies. To do so, they presented participants with visual sequences of pairs of localized cues whose location was either uncorrelated, or positively or negatively correlated, and whose mean location across a sequence determined the correct choice. Importantly, they adjusted this mean location such that, when correctly weighted, each pair of cues was equally informative, irrespective of how correlated it was. Thus, if participants follow the normative decision strategy, their choices and reaction times should not be impacted by these correlations. While Tardiff and colleagues found no impact of correlations on choices, they did find them to impact reaction times, suggesting that participants deviated from the normative decision strategy. To assess the degree of this deviation, Tardiff et al. adjusted drift-diffusion models (DDMs) for decision-making to process correlated decision evidence. Fitting these models to the behavior of individual participants revealed that participants considered correlations when weighing evidence, but did so with a slight underestimation of the magnitude of this correlation. This finding made Tardiff et al. conclude that participants followed a close-to-normative decision strategy that adequately took into account correlated evidence.

      Strengths:

      The authors adjust a previously used experimental design to include correlated evidence in a simple, yet powerful way. The way it does so is easy to understand and intuitive, such that participants don't need extensive training to perform the task. Limited training makes it more likely that the observed behavior is natural and reflective of everyday decision-making. Furthermore, the design allowed the authors to make the amount of decision-related evidence equal across different correlation magnitudes, which makes it easy to assess whether participants correctly take account of these correlations when weighing evidence: if they do, their behavior should not be impacted by the correlation magnitude.

      The relative simplicity with which correlated evidence is introduced also allowed the authors to fall back to the well-established DDM for perceptual decisions, which has few parameters, is known to implement the normative decision strategy in certain circumstances, and enjoys a great deal of empirical support. The authors show how correlations ought to impact these parameters, and which changes in parameters one would expect to see if participants mis-estimate these correlations or ignore them altogether (i.e., estimate correlations to be zero). This allowed them to assess the degree to which participants took into account correlations on the full continuum from perfect evidence weighting to complete ignorance. With this, they could show that participants in fact performed rational evidence weighting if one assumed that they slightly underestimated the correlation magnitude.

      Weaknesses:

      The experiment varies the correlation magnitude across trials such that participants need to estimate this magnitude within individual trials. This has several consequences:

      (1) Given that correlation magnitudes are estimated from limited data, the (subjective) estimates might be biased towards their average. This implies that, while the amount of evidence provided by each 'sample' is objectively independent of the correlation magnitude, it might subjectively depend on the correlation magnitude. As a result, the normative strategy might differ across correlation magnitudes, unlike what is suggested in the paper. In fact, it might be the case that the observed correlation magnitude underestimates corresponds to the normative strategy.

      (2) The authors link the normative decision strategy to putting a bound on the log-likelihood ratio (logLR), as implemented by the two decision boundaries in DDMs. However, as the authors also highlight in their discussion, the 'particle location' in DDMs ceases to correspond to the logLR as soon as the strength of evidence varies across trials and isn't known by the decision maker before the start of each trial. In fact, in the used experiment, the strength of evidence is modulated in two ways:<br /> (i) by the (uncorrected) distance of the cue location mean from the decision boundary (what the authors call the evidence strength) and<br /> (ii) by the correlation magnitude. Both vary pseudo-randomly across trials, and are unknown to the decision-maker at the start of each trial. As previous work has shown (e.g. Kiani & Shadlen (2009), Drugowitsch et al. (2012)), the normative strategy then requires averaging over different evidence strength magnitudes while forming one's belief. This averaging causes the 'particle location' to deviate from the logLR. This deviation makes it unclear if the DDM used in the paper indeed implements the normative strategy, or is even a good approximation to it.

      Given that participants observe 5 evidence samples per second and on average require multiple seconds to form their decisions, it might be that they are able to form a fairly precise estimate of the correlation magnitude within individual trials. However, whether this is indeed the case is not clear from the paper.

      Furthermore, the authors capture any underestimation of the correlation magnitude by an adjustment to the DDM bound parameter. They justify this adjustment by asking how this bound parameter needs to be set to achieve correlation-independent psychometric curves (as observed in their experiments) even if participants use a 'wrong' correlation magnitude to process the provided evidence. Curiously, however, the drift rate, which is the second critical DDM parameter, is not adjusted in the same way. If participants use the 'wrong' correlation magnitude, then wouldn't this lead to a mis-weighting of the evidence that would also impact the drift rate? The current model does not account for this, such that the provided estimates of the mis-estimated correlation magnitudes might be biased.

      Lastly, the paper makes it hard to assess how much better the participants' choices would be if they used the correct correlation magnitudes rather than underestimates thereof. This is important to know, as it only makes sense to strictly follow the normative strategy if it comes with a significant performance gain.

    3. Reviewer #2 (Public review):

      Summary:

      This study by Tardiff, Kang & Gold seeks to: i) develop a normative account of how observers should adapt their decision-making across environments with different levels of correlation between successive pairs of observations, and ii) assess whether human decisions in such environments are consistent with this normative model.

      The authors first demonstrate that, in the range of environments under consideration here, an observer with full knowledge of the generative statistics should take both the magnitude and sign of the underlying correlation into account when assigning weight in their decisions to new observations: stronger negative correlations should translate into stronger weighting (due to the greater information furnished by an anticorrelated generative source), while stronger positive correlations should translate into weaker weighting (due to the greater redundancy of information provided by a positively correlated generative source). The authors then report an empirical study in which human participants performed a perceptual decision-making task requiring accumulation of information provided by pairs of perceptual samples, under different levels of pairwise correlation. They describe a nuanced pattern of results with effects of correlation being largely restricted to response times and not choice accuracy, which could partly be captured through fits of their normative model (in this implementation, an extension of the well-known drift-diffusion model) to the participants' behaviour while allowing for mis-estimation of the underlying correlations.

      Strengths:

      As the authors point out in their very well-written paper, appropriate weighting of information gathered in correlated environments has important consequences for real-world decision-making. Yet, while this function has been well studied for 'high-level' (e.g. economic) decisions, how we account for correlations when making simple perceptual decisions on well-controlled behavioural tasks has not been investigated. As such, this study addresses an important and timely question that will be of broad interest to psychologists and neuroscientists. The computational approach to arrive at normative principles for evidence weighting across environments with different levels of correlation is very elegant, makes strong connections with prior work in different decision-making contexts, and should serve as a valuable reference point for future studies in this domain. The empirical study is well designed and executed, and the modelling approach applied to these data showcases a deep understanding of relationships between different parameters of the drift-diffusion model and its application to this setting. Another strength of the study is that it is preregistered.

      Weaknesses:

      In my view, the major weaknesses of the study center on the narrow focus and subsequent interpretation of the modelling applied to the empirical data. I elaborate on each below:

      Modelling interpretation: the authors' preference for fitting and interpreting the observed behavioural effects primarily in terms of raising or lowering the decision bound is not well motivated and will potentially be confusing for readers, for several reasons. First, the entire study is conceived, in the Introduction and first part of the Results at least, as an investigation of appropriate adjustments of evidence weighting in the face of varying correlations. The authors do describe how changes in the scaling of the evidence in the drift-diffusion model are mathematically equivalent to changes in the decision bound - but this comes amidst a lengthy treatment of the interaction between different parameters of the model and aspects of the current task which I must admit to finding challenging to follow, and the motivation behind shifting the focus to bound adjustments remained quite opaque. Second, and more seriously, bound adjustments of the form modelled here do not seem to be a viable candidate for producing behavioural effects of varying correlations on this task. As the authors state toward the end of the Introduction, the decision bound is typically conceived of as being "predefined" - that is, set before a trial begins, at a level that should strike an appropriate balance between producing fast and accurate decisions. There is an abundance of evidence now that bounds can change over the course of a trial - but typically these changes are considered to be consistently applied in response to learned, predictable constraints imposed by a particular task (e.g. response deadlines, varying evidence strengths). In the present case, however, the critical consideration is that the correlation conditions were randomly interleaved across trials and were not signaled to participants in advance of each trial - and as such, what correlation the participant would encounter on an upcoming trial could not be predicted. It is unclear, then, how participants are meant to have implemented the bound adjustments prescribed by the model fits. At best, participants needed to form estimates of the correlation strength/direction (only possible by observing several pairs of samples in sequence) as each trial unfolded, and they might have dynamically adjusted their bounds (e.g. collapsing at a different rate across correlation conditions) in the process. But this is very different from the modelling approach that was taken. In general, then, I view the emphasis on bound adjustment as the candidate mechanism for producing the observed behavioural effects to be unjustified (see also next point).

      Modelling focus: Related to the previous point, it is stated that participants' choice and RT patterns across correlation conditions were qualitatively consistent with bound adjustments (p.20), but evidence for this claim is limited. Bound adjustments imply effects on both accuracy and RTs, but the data here show either only effects on RTs, or RT effects mixed with accuracy trends that are in the opposite direction to what would be expected from bound adjustment (i.e. slower RT with a trend toward diminished accuracy in the strong negative correlation condition; Figure 3b). Allowing both drift rate and bound to vary with correlation conditions allowed the model to provide a better account of the data in the strong correlation conditions - but from what I can tell this is not consistent with the authors' preregistered hypotheses, and they rely on a posthoc explanation that is necessarily speculative and cannot presently be tested (that the diminished drift rates for higher negative correlations are due to imperfect mapping between subjective evidence strength and the experimenter-controlled adjustment to objective evidence strengths to account for effects of correlations). In my opinion, there are other candidate explanations for the observed effects that could be tested but lie outside of the relatively narrow focus of the current modelling efforts. Both explanations arise from aspects of the task, which are not mutually exclusive. The first is that an interesting aspect of this task, which contrasts with most common 'univariate' perceptual decision-making tasks, is that participants need to integrate two pieces of information at a time, which may or may not require an additional computational step (e.g. averaging of two spatial locations before adding a single quantum of evidence to the building decision variable). There is abundant evidence that such intermediate computations on the evidence can give rise to certain forms of bias in the way that evidence is accumulated (e.g. 'selective integration' as outlined in Usher et al., 2019, Current Directions in Psychological Science; Luyckx et al., 2020, Cerebral Cortex) which may affect RTs and/or accuracy on the current task. The second candidate explanation is that participants in the current study were only given 200 ms to process and accumulate each pair of evidence samples, which may create a processing bottleneck causing certain pairs or individual samples to be missed (and which, assuming fixed decision bounds, would presumably selectively affect RT and not accuracy). If I were to speculate, I would say that both factors could be exacerbated in the negative correlation conditions, where pairs of samples will on average be more 'conflicting' (i.e. further apart) and, speculatively, more challenging to process in the limited time available here to participants. Such possibilities could be tested through, for example, an interrogation paradigm version of the current task which would allow the impact of individual pairs of evidence samples to be more straightforwardly assessed; and by assessing the impact of varying inter-sample intervals on the behavioural effects reported presently.

    1. eLife assessment

      This important work identifies a non-autophagic role for ATG5 in lysosomal repair and the trafficking of the glucose transporter GLUT1 to the cell surface, mediated through the retromer complex. The evidence supporting the conclusions is solid.

    1. eLife assessment

      Supported by convincing data, this valuable study demonstrates that the Chitinase 3-like protein 1 (Chi3l1) interacts with gut microbiota and protects animals from intestinal injury in laboratory colitis model. The revised manuscript sufficiently addressed the reviewers' comments. The work will be of interest to scientists studying crosstalk between gut microbiota and inflammatory diseases.

    2. Reviewer #1 (Public review):

      The manuscript by Chen et al. investigated the interaction between CHI3L1, a chitinase-like protein in the 18 glycosyl hydrolase family, and gut bacteria in the mucosal layers. The authors provided evidence to document the direct interaction between CHI3L1 and peptidoglycan, a major component of bacterial cell wall. Doing so, Chi3l1 produced by gut epithelial cells regulates the balance of gut microbiome and diminishes DSS-induced colitis, potentially through the colonization of protective gram-positive bacteria such as lactobacillus.

      The study is the first to systemically document the interactions between Chi3L1 and microbiome. Convincing data were shown to characterize the imbalance of gram-positive bacteria in the newly generated gut epithelial-specific Chi3L1 deficient mice. Comprehensive FMT experiments were performed to demonstrate the contributions of gut microbiome using the mouse colitis model. The manuscript is strengthened by additional mechanistic studies concerning the binding between Chi3l1 and peptidoglycan, and discussions on existing body of literature demonstrating that detrimental roles of Chi3l1 in mouse IBD model, which conflict with the current study.

    3. Reviewer #2 (Public review):

      Chen et al. investigated the regulatory mechanism of bacterial colonization in the intestinal mucus layer in mice and its implications to intestinal diseases. They demonstrated that Chi3l1 is a protein produced and secreted by intestinal epithelial cells into the mucus layer upon response to the gut microbiota, which has a turnover effect on facilitating the colonization of gram-positive bacteria in the mucosa. The data also indicate that Chi3l1 interacts with the peptidoglycan of the bacteria cell wall, supporting the colonization of beneficial bacteria strains such as Lactobacillus, and that deficiency in Chi3l1 predisposes mice to colitis. The inclusion of a small but pertinent piece of human data added to solidify their findings in mice.

      Overall, the experiments were appropriately designed and executed with precision. The revised manuscript represents a significant improvement over the initial version. The inclusion of new, higher-resolution images provides stronger support for the conclusions drawn. Additionally, statistical analyses of the imaging data, as recommended, have been integrated. The authors have effectively addressed the majority of the reviewers' suggestions and criticisms, making this version well-suited for publication.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1:

      (1) In Figure 1, it is curious that the authors only chose E.coli and staphytlococcus sciuri to test the induction of Chi3l1. What about other bacteria? Why does only E.coli but not staphytlococcus sciuri induce chi3l1 production? It does not prove that the gut microbiome induces the expression of Chi3l1. If it is the effect of LPS, does it trigger a cell death response or inflammatory responses that are known to induce chi3l1 production? What is the role of peptidoglycan in this experiment? Also, it is recommended to change WT to SPF in the figure and text, as no genetic manipulation was involved in this figure.

      Thank you for your valuable feedback and insightful suggestions. In our study, we tried to identify bacteria from murine gut contents and feces using 16S sequencing. However, only E. coli and Staphylococcus sciuri were identified (Figure 1D). Consequently, our experiments were limited to these two bacterial strains. While we have not tested other bacteria, our data suggest that not all bacteria can induce the expression of Chi3l1. Given that E. coli is Gram-negative and Staphylococcus sciuri is Gram-positive, we hypothesized that the difference in their ability to induce Chi3l1 expression might be due to variations between Gram-negative and Gram-positive bacteria, such as the presence of lipopolysaccharides (LPS).

      To test this hypothesis, we used LPS to induce Chi3l1 expression. Consistent with our hypothesis, LPS successfully induced Chi3l1 expression (Figure 1F&G). Additionally, we observed that Chi3l1 expression is significantly upregulated in specific pathogen-free (SPF) mice compared to germ-free mice (Figure 1A), demonstrating that the gut microbiome induces the expression of Chi3l1.

      Although we have not examined cell death or inflammatory responses, the protective role of Chi3l1 shown in Figure 5 suggests that any such responses would be mild and negligible. Regarding the role of peptidoglycan in the induction of Chi3l1 expression in DLD-1 cells, we have not yet explored this aspect. However, we agree with your suggestion that it would be worthwhile to investigate this in future experiments.

      We have also made the suggested modifications to the labeling (Figure 1A) and the clarification in the revised manuscript accordingly (page 3, Line 95-96; Line 102-106).

      Thank you again for your constructive feedback.

      (2) In Figure 2, the binding between Chi3l1 and PGN needs better characterization, regarding the affinity and how it compares with the binding between Chi3l1 and chitin. More importantly, it is unclear how this interaction could facilitate the colonization of gram-positive bacteria.

      Thank you for your insightful suggestions and we have performed the suggested experiments and included the results in the revised manuscript (Figure 2E-G, page 3-4, Line 132-146).

      Our results indicate that Chi3l1 interact with PGN in a dose-increase manner (Figure 2E). In contrast, the binding between Chi3l1 and chitin did not exhibit dose dependency (Figure 2E). These findings suggest a specific and distinct binding mechanism for Chi3l1 with PGN compared to chitin.

      We conducted DLD-1 cell-bacteria adhesion experiments, using GlmM mutant (PGN synthesis mutant) and K12 (wild-type) bacteria to test their adhesion capabilities. The results showed that the adhesion ability of the GlmM mutant to cells significantly decreased (Figure 2F). Additionally, after knocking down Chi3l1 in DLD-1 cells, we observed a decreased bacterial adhesion (Figure 2G). These findings suggest that Chi3l1 and PGN interaction plays a crucial role in bacterial adhesion.

      (3) In Figure 3, the abundance of furmicutes and other gram-positive species is lower in the knockout mice. What is the rationale for choosing lactobacillus in the following transfer experiments?

      We appreciate your thorough review. Among the Gram-positive bacteria that we have sequenced and analyzed, Lactobacillus occupies the largest proportion. Given the significant presence and established benefits of Lactobacillus, we chose it for the subsequent transfer experiments to leverage its known properties and availability, thereby ensuring the robustness and reproducibility of our findings.This is supported by the study referenced below.

      Lamas B, Richard ML, Leducq V, Pham HP, Michel ML, Da Costa G, Bridonneau C, Jegou S, Hoffmann TW, Natividad JM, Brot L, Taleb S, Couturier-Maillard A, Nion-Larmurier I, Merabtene F, Seksik P, Bourrier A, Cosnes J, Ryffel B, Beaugerie L, Launay JM, Langella P, Xavier RJ, Sokol H. CARD9 impacts colitis by altering gut microbiota metabolism of tryptophan into aryl hydrocarbon receptor ligands. Nat Med. 2016 Jun;22(6):598-605. doi: 10.1038/nm.4102. Epub 2016 May 9. PMID: 27158904; PMCID: PMC5087285.

      (4) FDAA-labeled E. faecalis colonization is decreased in the knockouts. Is it specific for E. faecalis, or it is generally true for all gram-positive bacteria? What about the colonization of gram-negative bacteria?

      Thank you for your insightful suggestions and we have investigated the colonization of gram-negative bacteria, OP50-mcherry (a strain of E.coli that express mCherry) and included the results in the updated manuscript (Supplementary Figure 3B, page 5, Line 197-200). We performed rectal injection of both wildtype and Chi11-/- mice with mCherry-OP50, and found that Chi11-/- mice had much higher colonization of E. coli compared to wildtype mice.

      (5) In Figure 5, the fact that FMT did not completely rescue the phenotype may point to the role of host cells in the processes. The reason that lactobacillus transfer did completely rescue the phenotypes could be due to the overwhelming protective role of lactobacillus itself, as the experiments were missing villin-cre mice transferred with lactobacillus.

      Thank you for your valuable feedback and thorough review. In our study, pretreatment with antibiotics in mice to eliminate gut microbiota demonstrated that IEC∆Chil1 mice exhibited a milder colitis phenotype (Supplementary Figure 4). This suggests that Chi3l1-expressing host cells are likely to play a detrimental role in colitis. Consequently, the failure of FMT to completely rescue the phenotype is likely due to the incomplete preservation of bacteria in the feces during the transfer experiment.

      We agree with your assessment of the protective role of lactobacillus. This also explains the significant difference in colitis phenotype between Villin-cre and IEC∆Chil1 mice (Figure 5B-E), as lactobacillus levels are significantly lower in IEC∆Chil1 mice (Figure 4F). Given the severity of colitis in Villin-cre mice at 7 days post-DSS, even if lactobacillus were transferred back to these mice, it is unlikely to result in a significant improvement.

      (6) Conflicting literature demonstrating the detrimental roles of Chi3l1 in mouse IBD model needs to be acknowledged and discussed.

      Thank you for your insightful suggestions and we have included additional discussions in the revised manuscript (page 6-7, Line 258-274).

      Reviewer #2 (Public Review):

      (1) Images are of great quality but lack proper quantification and statistical analysis. Statements such as "substantial increase of Chi3l1 expression in SPF mice" (Fig.1A), "reduced levels of Firmicutes in the colon lumen of IEC ∆ Chil1" (Fig.3F), "Chil1-/- had much lower colonization of E.faecalis" (Fig.4G), or "deletion of Chi3l1 significantly reduced mucus layer thickness" (Supplemental Figure 3A-B) are subjective. Since many conclusions were based on imaging data, the authors must provide reliable measures for comparison between conditions, as long as possible, such as fluorescence intensity, area, density, etc, as well as plots and statistical analysis.

      Thank you for your insightful suggestions and we have performed the suggested statistical analysis on most of the figures and included the analysis in the revised manuscript (Figure 1A, Figure 3E&F, Supplementary Figure 3B&C).Given large quantity of dietary fiber intertwined with bacteria, it is challenging to make a reliable quantification of bacteria in Figure 4G. However, it is easy to distinguish bacteria from dietary fiber under the microscope. We have exclusively analyzed gut sections from six mice in each group, and the results are consistent between the two groups.

      (2) In the fecal/Lactobacillus transplantation experiments, oral gavage of Lactobacillus to IECChil1 mice ameliorated the colitis phenotype, by preventing colon length reduction, weight loss, and colon inflammation. These findings seem to go against the notion that Chi3l1 is necessary for the colonization of Lactobacillus in the intestinal mucosa. The authors could speculate on how Lactobacillus administration is still beneficial in the absence of Chi3l1. Perhaps, additional data showing the localization of the orally administered bacteria in the gut of Chi3l1 deficient mice would clarify whether Lactobacillus are more successfully colonizing other regions of the gut, but not the mucus layer. Alternatively, later time points of 2% DSS challenge, after Lactobacillus transplantation, would suggest whether the gut colonization by Lactobacillus and therefore the milder colitis phenotype, is sustained for longer periods in the absence of Chi3l1.

      Thank you for your thorough review and insightful suggestions. Since we pretreated mice with antibiotics, the intestinal mucus layer is likely damaged according to a previous study (PMID: 37097253). Therefore, gavaged Lactobacillus cannot colonize in the mucus layer. Moreover, existing studies have shown that the protective effect of Lactobacillus is mainly derived from its metabolites or thallus components, rather than the living bacteria itself (PMID: 36419205, PMID: 27516254).

      Zhan M, Liang X, Chen J, Yang X, Han Y, Zhao C, Xiao J, Cao Y, Xiao H, Song M. Dietary 5-demethylnobiletin prevents antibiotic-associated dysbiosis of gut microbiota and damage to the colonic barrier. Food Funct. 2023 May 11;14(9):4414-4429. doi: 10.1039/d3fo00516j. PMID: 37097253.

      Montgomery TL, Eckstrom K, Lile KH, Caldwell S, Heney ER, Lahue KG, D'Alessandro A, Wargo MJ, Krementsov DN. Lactobacillus reuteri tryptophan metabolism promotes host susceptibility to CNS autoimmunity. Microbiome. 2022 Nov 23;10(1):198. doi: 10.1186/s40168-022-01408-7. PMID: 36419205.

      Piermaría J, Bengoechea C, Abraham AG, Guerrero A. Shear and extensional properties of kefiran. Carbohydr Polym. 2016 Nov 5;152:97-104. doi: 10.1016/j.carbpol.2016.06.067. Epub 2016 Jun 23. PMID: 27516254.

      Reviewer #3 (Public Review):

      The claim that mucus-associated Ch3l1 controls colonization of beneficial Gram-positive species within the mucus is not conclusive. The study should take into account recent discoveries on the nature of mucus in the colon, namely its mobile fecal association and complex structure based on two distinct mucus barrier layers coming from proximal and distal parts of the colon (PMID: ). This impacts the interpretation of how and where Ch3l1 is expressed and gets into the mucus to promote colonization. It also impacts their conclusions because the authors compare fecal vs. tissue mucus, but most of the mucus would be attached to the feces. Of the mucus that was claimed to be isolated from the WT and IEC Ch3l1 KO, this was not biochemically verified. Such verification (e.g. through Western blot) would increase confidence in the data presented. Further, the study relies upon relative microbial profiling, which can mask absolute numbers, making the claim of reduced overall Gram-positive species in mice lacking Ch3l1 unproven. It would be beneficial to show more quantitative approaches (e.g. Quantitative Microbial Profiling, QMP) to provide more definitive conclusions on the impact of Ch3l1 loss on Gram+ microbes.

      You raise an excellent point about the data interpretation, and we appreciate your insightful suggestions. We have included the discussion regarding the recent discoveries in the revised manuscript (page 7-8, Line 304-312). According to the recent discovery, the mucus in the proximal colon forms a primary encapsulation barrier around fecal material, while the mucus in the distal colon forms a secondary barrier. Our findings indicate that Chi3l1 is expressed throughout the entire colon, including the proximal, middle, and distal sections (See Author response image 1 below, P.S. Chi3l1 detection in colon presented in the manuscript are from the middle section). This suggests that Chi3l1 likely promotes bacterial colonization across the entire colon. Despite most mucus being expelled with feces, the

      constant production of mucus and the minimal presence of Chi3l1 in feces (Figure 4C) indicate that Chi3l1 continuously plays a role in promoting the colonization of microbiota.

      Author response image 1.

      Chi3l1 express in the proximal and distal colon. Immunofluoresence staining on proximal and distal colon sections to detect Chi3l1 (Red) expression. Nuclei were detected with DAPI (blue). Scale bars, 50um.

      Given the isolation method of the mucus layer, we followed the paper titled "The Antibacterial Lectin RegIIIγ Promotes the Spatial Segregation of Microbiota and Host in the Intestine" (PMID: 21998396). Although we did not find a suitable marker representative of the mucus layer for western blotting, we performed protein mass spectrometry on the isolated mucus layers and analyzed the data by comparing it with established research ("Proteomic Analyses of the Two Mucus Layers of the Colon Barrier Reveal That Their Main Component, the Muc2 Mucin, Is Strongly Bound to the Fcgbp Protein," PMID: 19432394). Our data showed a high degree of overlap with the proteins identified in established studies (see Author response image 2 below).

      Author response image 2.

      Comparison of mucus layer proteins identified by mass spectrometry between Our team and the Hansson team Mucus layer proteins identified by mass spectrometry between our team and the Hansson team (PMID: 19432394) are compared.

      Due to a lack of expertise, it has been challenging for us to perform reliable QMP experiments. However, since QMP involves qPCR combined with bacterial sequencing, we conducted 16S rRNA sequencing and confirmed the quantity of certain bacteria by qPCR (revised manuscript, Figure 3B, H, Figure 4E, F, Supplementary Figure 3A). Therefore, our data is reliable to some extent.

      Other weaknesses lie in the execution of the aims, leaving many claims incompletely substantiated. For example, much of the imaging data is challenging for the reader to interpret due to it being unfocused, too low of magnification, not including the correct control, and not comparing the same regions of tissues among different in vivo study groups. Statistical rigor could be better demonstrated, particularly when making claims based on imaging data. These are often presented as single images without any statistics (i.e. analysis of multiple images and biological replicates). These images include the LTA signal differences, FISH images, Enterococcus colonization, and mucus thickness.

      Thank you for your thorough review and insightful suggestions. We have performed the recommended statistical analysis on most of the figures and included the analysis in the revised manuscript (Figure 1A, Figure 3E&F, Supplementary Figure 3B&C). We have also added arrows in Figure 2B to make the figure easier to understand. Additionally, we repeated some key experiments to show the same regions of tissues among different groups. We will upload higher resolution figures during the revision. Thank you again for your constructive feedback.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It is recommended to change WT to SPF in the figure and text, as no genetic manipulation was involved in Figure 1.

      Thank you for your insightful suggestion. We have also made the suggested modifications to the labeling (revised manuscript, Figure 1A).

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is well-written, but it would benefit from a critical reading to correct some typos and small grammar issues. Histological and IF images would be more informative if they contained arrows and labels guiding the reader's attention to what the authors want to show. More details about the structures shown in the figures should be included in the legends.

      Thank you for your thorough review and insightful suggestions. We have revised the manuscript to correct noticeable typos and grammar issues. Arrows have been added to Figure 2A&B to make the figures easier to understand. Additionally, we have included a detailed description of the structural similarities and differences between chitin and peptidoglycan in the figure legend ( revised manuscript, page 19, line 730-733).

      Minor points:

      • Page 1, line 36: Please correct "mice models" to "mouse models".

      Thank you for your insightful suggestion and we have made the suggested correction in the revised manuscript (page 1, line 41).

      • Page 3, line 110: "by comparing the structure of chitin with that of peptidoglycan (PGN), a component of bacterial cells walls, we observed that they have similar structures (Fig.2A)". Although both structures are shown side-by-side, no similarities are mentioned or highlighted in the text, figure, or legend.

      Thank you for your insightful suggestion and we have included a detailed description of the structural similarities and differences between chitin and peptidoglycan in the figure legend (revised manuscript, page 19, line 730-733).

      • Fig.5C and Fig.5G: y axis brings "weight (%)". I believe the authors mean "weight change (%)"?

      We agrees with your suggestion and has corrected the labeling according to your suggestion (revised manuscript, Figure 5C and G)

      • Page 8: Genotyping method is described as a protocol. Please modify it.

      Thank you for your constructive suggestion and we have modified the genotyping method in the revised manuscript (page 8, line 339-349)

      • Please expand on the term "scaffold model" used in the abstract and discussion.

      Thank you for your thorough review. In this model, Chi3l1 acts as a key component of the scaffold. By binding to bacterial cell wall components like peptidoglycan, Chi3l1 helps anchor and organize bacteria within the mucus layer. This interaction facilitates the colonization of beneficial bacteria such as Lactobacillus, which are important for gut health. We included more descriptions regarding scaffold model in the revised manuscript (page 6, line 248-250)

      • Discussion session often recapitulates results description, which makes the text repetitive.

      Thank you for your constructive suggestion and we have removed unnecessary results description in the discussion session in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      Major comments

      (1) Figure 1A. The staining is very faint, and hard to see. The reader cannot be certain those are Ch311-positive cells. Higher Mag is needed.

      Thank you for your insightful suggestion and we have included the higher resolution figures in the revised manuscript Figure 1A.

      (2) The mucus is produced largely by the proximal colon, is adherent to the feces, and mobile with the feces (PMID: 33093110). Therefore it is important to determine where the Ch311 is being expressed to be released into the lumen. Further Ch3l1 expression studies are needed to be done in both proximal and distal colon.

      Thank you for your thorough review and insightful suggestions. We have addressed this part in our public review. Additionally, we agree with your suggestions and will conduct further studies on Chi3l1 expression in both the proximal and distal colon.

      (3) Figure 1B. The image is out of focus for the Ileum, and the DAPI signal needs to be brought up for the colon. Which part of the colon is this? The UEA1+ cells do not really look like goblet cells. A better image with clearer goblet cells is needed.

      Thank you for your constructive suggestions. In the revised manuscript, we have included higher-resolution images (Figure 1B). The middle colon (approximately 3 to 4 cm distal from the cecum) was harvested for staining. In addition to UEA-1, we utilized anti-MUC2 antibody to label goblet cells in this colon segment (see Author response image 3 below). The patterns of goblet cells identified by UEA-1 or MUC2 antibodies are similar. The UEA-1-positive cells shown in Figure 1B are presumed to be goblet cells.

      Author response image 3.

      Goblet Cell Distribution in the Middle Colon. Goblet cells in the middle segment of the colon (approximately 3 to 4 cm distal from the cecum) were detected using immunofluorescence with antibodies against UEA-1 (green) and MUC2 (red). Scale bar=50μm. Representative images are shown from three mice individually stained for each antibody.

      (4) Figure 1G. There needs to be some counterstain or contrast imaging to show evidence that cells are present in the untreated sample.

      Thank you for your insightful suggestions. We have annotated the cells present in the untreated sample based on the overexposure in the revised manuscript (Figure 1G).

      (5) Figure 3B. Is this absolute quantification? How were the data normalized to allow comparison of microbial loads?

      Thank you for your thorough review. Figure 3B presents absolute quantification data based on the methodology described in the paper titled "The Antibacterial Lectin RegIIIγ Promotes the Spatial Segregation of Microbiota and Host in the Intestine" (PMID: 21998396). Briefly, we amplified a short segment (179 bp) of the 16S rRNA gene using conserved 16S rRNA-specific primers and OP50 (a strain of E. coli) as the template. After gel extraction and concentration measurement, the PCR products were diluted to gradient concentrations (0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24, 20.48 pg/µl). These gradient concentrations were used as templates for qPCR to generate a standard curve based on Ct values and bacterial concentration. The standard curve is used to calculate bacterial concentration in the samples. The data presented in Figure 3B represent the weight of bacteria/milligram sample, calculated as (bacterial concentration x bacterial volume) / (weight of feces or gut content).

      (6) Figure 3D. The major case is made for a dramatic reduction in Gram+ species, but Figure 1D does not show a dramatic change. Is this difference significant?

      Thank you for your thorough review. We don’t think we are clear about your question. However, there was no significant difference in Figure 3D. The dramatic reduction in Gram+ species are made based on the LTA, Firmicutes FISH, individual species comparison between WT and KO mice, bacterial QPCR results together (Figure 3E-H).

      (7) Figures 3E and 3F. These stainings are alone not convincing of reduced Gram+ in the KOs. Some stats are required for these images. An independent complementary method is also needed to quantify these with statistics since this data is so central to the study's conclusions.

      Thank you for your constructive suggestions. We have included statistical analysis in the revised manuscript (Figure 3E and F). Given large quantity of dietary fiber intertwined with bacteria, it is challenging to make a reliable quantification of bacteria in Figure 3E. However, it is easy to distinguish bacteria from dietary fiber under the microscope. We have exclusively analyzed gut sections from six mice in each group, and the results are consistent with the Firmicutes FISH results. Complementary method such as bacterial QPCR have been employed to quantify these (Figure 4E, F). Due to a lack of expertise, it has been challenging for us to perform reliable QMP experiments.

      (8) Figure 3G. To make quantitative conclusions, the authors need to do quantitative microbial profiling (QMP) of the microbiota. Relative abundance masks absolute numbers, which could be increased. There are qPCR-based QMP platforms the authors could use (PMID: PMIDs: 31940382, 33763385).

      Thank you for your constructive suggestions. Due to a lack of expertise, it has been challenging for us to perform reliable QMP experiments. However, since QMP involves qPCR combined with bacterial sequencing, we conducted 16S rRNA sequencing and confirmed the quantity of certain bacteria by qPCR (revised manuscript, Figure 3B, H, Figure 4E, F, Supplementary Figure 3A). In addition to the original bacterial qPCR data presented in the manuscript, we included another bacterial species, Turicibater. Consistent with the 16S rRNA sequencing analysis data, qPCR results showed that Turicibacter was more abundant in IECΔChil1 mice than Villin-cre mice (revised manuscript, supplementary Figure 3A, page 4, line 171-173) Therefore, our data is reliable to some extent.

      (9) Figure 4B. The data nicely shows Ch3l1 in mucus. However, no data supports the authors' main claim Ch3h1 binds Gram-positive bacteria in situ. Dual staining of Ch3l1 with Firmicutes probe would be supportive to show this interaction is happening in vivo.

      You raise an excellent point, and we agree with your suggestion that we should confirm Chi3l1 binding to Gram-positive bacteria in situ. During the study, we attempted dual staining of Chi3l1 with a universal bacterial 16S FISH probe several times, but we were unsuccessful. Despite various optimizations of the protocol, we were only able to detect bacteria, not Chi3l1. It appears that the antibody is not suitable for this method.

      (10) Figures 4D - F. Because mucus is associated with feces (PMID: ), the data with feces likely contains both Muc2/mucus and Feces. Therefore, it is unclear what the "mucus" is referring to in these figures. To support the authors' conclusions, there needs to be some validation that mucus was purified in the assays. This must be confirmed at a minimum by PAS staining on SDS PAGE gel (should be very high molecular weight) or Western blot with UEA lectin.

      Thank you for your insightful suggestions. As mentioned in the public review, the mucus layer was isolated following the protocol described in the paper titled "The Antibacterial Lectin RegIIIγ Promotes the Spatial Segregation of Microbiota and Host in the Intestine" (PMID: 21998396). Briefly, after harvesting the middle colon from the mice, we cut open the colon longitudinally. After removing the gut contents, the lumen was vigorously rinsed in PBS while holding one end with forceps. The pellet obtained after centrifuging the rinsate was used as our mucus sample. Fresh feces were collected immediately after the mice defecated in a new, empty cage. We performed Western blot analysis to detect UEA lectin but were unsuccessful.

      However, as noted in the public review, we conducted protein mass spectrometry on the isolated mucus layers and analyzed the data by comparing it with established research ("Proteomic Analyses of the Two Mucus Layers of the Colon Barrier Reveal That Their Main Component, the Muc2 Mucin, Is Strongly Bound to the Fcgbp Protein," PMID: 19432394). Our data showed a high degree of overlap with the proteins identified in these established studies.

      (11) Figure 4E/F: The units of measurement are in pg/cm2, implying picogram per area. Can the authors please explain what this unit is referring to?

      We are grateful for your thorough review. The unit pg/cm ² represents picograms per square centimeter. Figures 4E and 4F present absolute quantification data based on the methodology described in the paper titled "The Antibacterial Lectin RegIIIγ Promotes the Spatial Segregation of Microbiota and Host in the Intestine" (PMID: 21998396). Briefly, we harvested a 3x0.5 cm section of colon and a 9x0.4 cm section of ileum. And then we collected the mucus layer as previously described (responses to question 10). We measured bacterial concentration as described in response to question 5 using the equation (y = -1.53ln(x) + 13.581), where x represents the bacterial concentration and y represents the Ct value. After obtaining the bacterial concentration, we multiplied it by the volume of the rinsate and divided it by the area to obtain the values for pg/cm² used in the figures.

      (12) Figure 5E. Normal tissues appear to be from different colon regions from colitis tissues: the "Normal" looks like the proximal colon, while "Colitis" looks like the Distal colon. They cannot be directly compared.

      Thank you for your insightful suggestion. We have now included the updated image in the revised manuscript as Figure 5E to compare the same region of the colons.

      (13) Similarly, in Figure 5I it appears different colon regions are being compared between groups: Proximal colon in the bottom panels, and distal in the top panels. Since the proximal colon is less damaged by DSS, this data could be misleading.

      Thank you for your insightful suggestion. We have now included the updated image in the revised manuscript as Figure 5I to compare the same region of the colons.

      (14) In the DSS studies, are the VillinCre and IEC Chit3l1 mice co-housed littermates?

      Thank you for your insightful suggestion. In the DSS studies, the Villin-Cre and IECΔChil1 mice are not co-housed littermates. However, they are derived from the same lineage and are housed in the same rack within the same room of the animal facility.

      (15) Supplementary Figure 3: Mucus thickness images; are they representative? Stats are needed on multiple mice to support the claim that the mucus is thinner.

      Thank you for your insightful suggestion. The images are representative of 4 mice each group. We have now included the statistical analysis in the revised manuscript Supplementary Figure 3C&D.

      Minor

      (1) Introduction: Reference to "mucosal layer": "Mucosal" and "Mucus" are different things. "Mucosal" refers to the epithelium, lamina propria, and muscularis mucosa. "Mucus" refers to the secreted mucus gel, the focus of the authors' study. Therefore, the statement "mucosal layer" is not proper. "Mucosal layer" should be changed to "mucus layer."

      Thank you for your constructive suggestions and we have learned a lot from it. We have made the replacement of “mucosal layer” to “mucus layer in the revised manuscript.

      (2) Line 366 and related lines: Feces cannot be "dissolved". "Resuspended" is a better term.

      Thank you for your constructive suggestion and we have made the changes of “dissolved” to “resuspended” in the revised manuscript.

      (3) Lines 36-37 and 43-44 are redundant to each other.

      Thank you for your constructive suggestion and we have removed the lines 36-37 in the revised manuscript.

    1. eLife assessment

      This study provides useful evidence substantiating a role for long noncoding RNAs in liver metabolism and organismal physiology. Using murine knockout and knock-in models, the authors invoke a previously unidentified role for the lncRNA Snhg3 in fatty liver. The revised manuscript has improved and most studies are backed by solid evidence but the study was found to be incomplete and will require future studies to substantiate some of the claims.

    2. Reviewer #1 (Public Review):

      Summary:

      In this manuscript the authors investigate the contributions of the long noncoding RNA snhg3 in liver metabolism and MAFLD. The authors conclude that liver-specific loss or overexpression of Snhg3 impacts hepatic lipid content and obesity through epigenetic mechanisms. More specifically, the authors invoke that nuclear activity of Snhg3 aggravates hepatic steatosis by altering the balance of activating and repressive chromatin marks at the Pparg gene locus. This regulatory circuit is dependent on a transcriptional regulator SNG1.

      Strengths:

      The authors developed a tissue specific lncRNA knockout and KI models. This effort is certainly appreciated as few lncRNA knockouts have been generated in the context of metabolism. Furthermore, lncRNA effects can be compensated in a whole organism or show subtle effects in acute versus chronic perturbation, rendering the focus on in vivo function important and highly relevant. In addition, Snhg3 was identified through a screening strategy and as a general rule the authors the authors attempt to follow unbiased approaches to decipher the mechanisms of Snhg3.

    3. Reviewer #2 (Public Review):

      Through RNA analysis, Xie et al found LncRNA Snhg3 was one of the most down-regulated Snhgs by high fat diet (HFD) in mouse liver. Consequently, the authors sought to examine the mechanism through which Snhg3 is involved in the progression of metabolic dysfunction-associated fatty liver diseases (MASLD) in HFD-induced obese (DIO) mice. Interestingly, liver-specific Sngh3 knockout reduced, while Sngh3 over-expression potentiated fatty liver in mice on a HFD. Using the RNA pull-down approach, the authors identified SND1 as a potential Sngh3 interacting protein. SND1 is a component of the RNA-induced silencing complex (RISC). The authors found that Sngh3 increased SND1 ubiquitination to enhance SND1 protein stability, which then reduced the level of repressive chromatin H3K27me3 on PPARg promoter. The upregulation of PPARg, a lipogenic transcription factor, thus contributed to hepatic fat accumulation.

      The authors propose a signaling cascade that explains how LncRNA sngh3 may promote hepatic steatosis. Multiple molecular approaches have been employed to identify molecular targets of the proposed mechanism, which is a strength of the study.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript the authors investigate the contributions of the long noncoding RNA snhg3 in liver metabolism and MAFLD. The authors conclude that liver-specific loss or overexpression of Snhg3 impacts hepatic lipid content and obesity through epigenetic mechanisms. More specifically, the authors invoke that nuclear activity of Snhg3 aggravates hepatic steatosis by altering the balance of activating and repressive chromatin marks at the Pparg gene locus. This regulatory circuit is dependent on a transcriptional regulator SNG1.

      Strengths:

      The authors developed a tissue specific lncRNA knockout and KI models. This effort is certainly appreciated as few lncRNA knockouts have been generated in the context of metabolism. Furthermore, lncRNA effects can be compensated in a whole organism or show subtle effects in acute versus chronic perturbation, rendering the focus on in vivo function important and highly relevant. In addition, Snhg3 was identified through a screening strategy and as a general rule the authors the authors attempt to follow unbiased approaches to decipher the mechanisms of Snhg3.

      Weaknesses:

      Despite efforts at generating a liver-specific knockout, the phenotypic characterization is not focused on the key readouts. Notably missing are rigorous lipid flux studies and targeted gene expression/protein measurement that would underpin why loss of Snhg3 protects from lipid accumulation. Along those lines, claims linking the Snhg3 to MAFLD would be better supported with careful interrogation of markers of fibrosis and advanced liver disease. In other areas, significance is limited since the presented data is either not clear or rigorous enough. Finally, there is an important conceptual limitation to the work since PPARG is not established to play a major role in the liver.

      We thank the reviewer for the nice comment. As the reviewer comment, the manuscript still exists some shortcomings, we added partial shortcomings in the section of Discussion, please check them in the third paragraph on p17 and the first paragraph on p18.

      We agree the reviewer comment, there are still conflicting conclusions about the role of PPARγ in MASLD. We had discussed it in the section of Discussion, please check them in the first paragraph on p13.

      Reviewer #2 (Public Review):

      Through RNA analysis, Xie et al found LncRNA Snhg3 was one of the most down-regulated Snhgs by high fat diet (HFD) in mouse liver. Consequently, the authors sought to examine the mechanism through which Snhg3 is involved in the progression of metabolic dysfunction-associated fatty liver diseases (MASLD) in HFD-induced obese (DIO) mice. Interestingly, liver-specific Sngh3 knockout reduced, while Sngh3 over-expression potentiated fatty liver in mice on a HFD. Using the RNA pull-down approach, the authors identified SND1 as a potential Sngh3 interacting protein. SND1 is a component of the RNA-induced silencing complex (RISC). The authors found that Sngh3 increased SND1 ubiquitination to enhance SND1 protein stability, which then reduced the level of repressive chromatin H3K27me3 on PPARg promoter. The upregulation of PPARg, a lipogenic transcription factor, thus contributed to hepatic fat accumulation.

      The authors propose a signaling cascade that explains how LncRNA sngh3 may promote hepatic steatosis. Multiple molecular approaches have been employed to identify molecular targets of the proposed mechanism, which is a strength of the study. There are, however, several potential issues to consider before jumping to the conclusion.

      (1) First of all, it's important to ensure the robustness and rigor of each study. The manuscript was not carefully put together. The image qualities for several figures were poor, making it difficult for the readers to evaluate the results with confidence. The biological replicates and numbers of experimental repeats for cell-based assays were not described. When possible, the entire immunoblot imaging used for quantification should be presented (rather than showing n=1 representative). There were multiple mis-labels in figure panels or figure legends (e.g., Fig. 2I, Fig. 2K and Fig. 3K). The b-actin immunoblot image was reused in Fig. 4J, Fig. 5G and Fig. 7B with different exposure times. These might be from the same cohort of mice. If the immunoblots were run at different times, the loading control should be included on the same blot as well.

      We thank the reviewer for the detailed comment. We have provided the clear figures in revised manuscript, please check them.

      The biological replicates and numbers of experimental repeats for cell-based assays had been updated and please check them in the manuscript.

      The entire immunoblot imaging used for quantification had been provided in the primary data. Please check them.

      The original Figure 2I, Figure 2K, Figure 3K have been revised and replaced with new Figure 2F, 2H, 3H, and their corresponding figure legends has also been corrected in revised manuscript.

      The protein levels of CD36, PPARγ and β-ACTIN were examined at the same time and we had revised the manuscript, please check them in revised Figure 7B and C.

      (2) The authors can do a better job in explaining the logic for how they came up with the potential function of each component of the signaling cascade. Sngh3 is down-regulated by HFD. However, the evidence presented indicates its involvement in promoting steatosis. In Fig. 1C, one would expect PPARg expression to be up-regulated (when Sngh3 was down-regulated). If so, the physiological observation conflicts with the proposed mechanism. In addition, SND1 is known to regulate RNA/miRNA processing. How do the authors rule out this potential mechanism? How about the hosting snoRNA, Snord17? Does it involve in the progression of NASLD?

      We thank the reviewer for the detailed comment. In this study, although the expression of Snhg3 was decreased in DIO mice, Snhg3 deficiency decreased the expression of hepatic PPARγ and alleviated hepatic steatosis in DIO mice, and Snhg3 overexpression induced the opposite effect, which led us to speculate that the downregulation of Snhg3 in DIO mice might be a stress protective reaction to high nutritional state, but the specific details need to be clarified. This is probably similar to FGF21 and GDF15, whose endogenous expression and circulating levels are elevated in obese humans and mice despite their beneficial effects on obesity and related metabolic complications (Keipert and Ost, 2021). We had added the content in the Discussion section, please check it in the second paragraph on p12.

      SND1 has multiple roles through associating with different types of RNA molecules, including mRNA, miRNA, circRNA, dsRNA and lncRNA. We agree with the reviewer good suggestion, the potential mechanism of SND1/lncRNA-Snhg3 involved in hepatic lipid metabolism needs to be further investigated. We also discussed the limitation in the manuscript and please refer the section of Discussion in the third paragraph on p17.

      Snhg3 serves as host gene for producing intronic U17 snoRNAs, the H/ACA snoRNA. A previous study found that cholesterol trafficking phenotype was not due to reduced Snhg3 expression, but rather to haploinsufficiency of U17 snoRNA (Jinn et al., 2015). Additionally, knockdown of U17 snoRNA in vivo protected against hepatic steatosis and lipid-induced oxidative stress and inflammation (Sletten et al., 2021). In this study, the expression of U17 snoRNA decreased in the liver of DIO Snhg3-HKO mice and remain unchanged in the liver of DIO Snhg3-HKI mice, but overexpression of U17 snoRNA had no effect on the expression of SND1 and PPARγ (figure supplement 5A-C), indicating that Sngh3 induced hepatic steatosis was independent on U17 snoRNA. We had discussed it in revised manuscript, please refer to p15 of the Discussion section.

      References

      JINN, S., BRANDIS, K. A., REN, A., CHACKO, A., DUDLEY-RUCKER, N., GALE, S. E., SIDHU, R., FUJIWARA, H., JIANG, H., OLSEN, B. N., SCHAFFER, J. E. & ORY, D. S. 2015. snoRNA U17 regulates cellular cholesterol trafficking. Cell Metab, 21, 855-67. DIO:10.1016/j.cmet.2015.04.010, PMID:25980348

      KEIPERT, S. & OST, M. 2021. Stress-induced FGF21 and GDF15 in obesity and obesity resistance. Trends Endocrinol Metab, 32, 904-915. DIO:10.1016/j.tem.2021.08.008, PMID:34526227

      SLETTEN, A. C., DAVIDSON, J. W., YAGABASAN, B., MOORES, S., SCHWAIGER-HABER, M., FUJIWARA, H., GALE, S., JIANG, X., SIDHU, R., GELMAN, S. J., ZHAO, S., PATTI, G. J., ORY, D. S. & SCHAFFER, J. E. 2021. Loss of SNORA73 reprograms cellular metabolism and protects against steatohepatitis. Nat Commun, 12, 5214. DIO:10.1038/s41467-021-25457-y, PMID:34471131

      (3) The role of PPARg in fatty liver diseases might be a rodent-specific phenomenon. PPARg agonist treatment in humans may actually reduce ectopic fat deposition by increasing fat storage in adipose tissues. The relevance of the finding to human diseases should be discussed.

      We thank the reviewer for the detailed comment. We agree the reviewer comment, there are still conflicting conclusions about the role of PPARγ in MASLD. We had discussed it in the section of Discussion, please check them in the first paragraph on p13.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I do not have further recommendations beyond what I mentioned in the original review. The authors have not adequately addressed all the issues but the manuscript has improved and the overall strength of evidence is now solid from incomplete.

      We appreciate positive feedback from the reviewer. While we acknowledge that the updated manuscript has significantly improved, we recognize that it remains incomplete and additional details regarding Snhg3 will be warranted in our future studies. Moreover, we have discussed those potential weakness in the section of Discussion (please refer in the third paragraph on p17 and the first paragraph on p18).

      Reviewer #2 (Recommendations For The Authors):

      The authors have provided explanations and some new data to clarify the comments from the first submission. They have also included the original immunoblots for all the experimental repeats. The CHX protein stability results shown in Fig. 5J were not consistent between experiments, perhaps because the difference was subtle. The results on PPARg protein expression were not clearcut. The inclusion of a PPARg knockdown control would be helpful to validate the specificity of the antibody. Of note, the immunoblots used for Fig. 5I (PA treated) repeats 2, 4 and 1 were identical to those of Fig. 7F repeats 3, 1 and 5. The authors should provide an explanation for the potential issue.

      We thank the further comments and suggestions from the reviewer. We agree with the reviewer comment about Snhg3-mediated SND1 protein stability. In this study, Snhg3 promoted the protein, not mRNA, level of SND1, but Snhg3 subtly increased the SND1 protein stability. We revised the description in the manuscript, “Meanwhile, Snhg3 regulated the protein, not mRNA, expression of SND1 in vivo and in vitro by mildly promoting the stability of SND1 protein (Figures 5G-I).” This revision can be found in the second paragraph on p9. While our findings indicated that Snhg3 can influence SND1 expression at the protein level, we acknowledge the possibility of additional mechanisms contributing to this complex regulatory network. Therefore, further investigation is necessary to clarify whether Snhg3 regulates SND1 protein expression through other potential mechanisms. In light of this, we have added it in the Discussion section. Please refer to the second paragraph on p16.

      In this study, the protein level of PPARγ (molecular weight ~57 kDa) was detected using anti-PPARγ antibody (Abclonal, Cat. A11183), which has been used to determine PPARγ protein expression in 13 published papers as showed in the ABclonal Technology Co., Ltd. (https://abclonal.com.cn/catalog/A11183). And the specificity of this antibody has been validated in Zhang’s study by PPARγ knockdown (Zhang et al., 2019). In our study, hepatic PPARγ protein sometimes showed two bands (~ 57kDa and > 75kDa) using this antibody. It is well established that the PPARγ gene encodes two protein isoforms (PPARγ1, a 477 amino acid protein, and PPARγ2, a 505 amino acid protein) via differential promoter usage and alternative splicing (Gene: Pparg (ENSMUSG00000000440) - Transcript comparison - Mus_musculus - Ensembl genome browser 112) (Hernandez-Quiles et al., 2021). The molecular weight difference between PPARγ1 and PPARγ2 is about 3kd. Therefore, we consider that the band shown larger than 75kd in our study is likely nonspecific. In line with the reviewer’s suggestion, the antibody’s specificity could be further validated by knockdown or knockout of PPARγ in the future.

      We thank the reviewer for the detailed comment. In this study, we tested the effect of Snhg3 overexpression on SND1 protein level and the effect of Snhg3 or Snd1 overexpression on PPARγ protein level in Hepa1-6 cells by transfecting with Snhg3, SND1 and the control, respectively. The results indicated that overexpression of Snhg3 promoted the protein levels of SND1 and PPARγ, and overexpression of SND1 also induced the protein level of PPARγ. Considering scholarly and professional thinking and writing, we firstly showed that overexpression of Snhg3 promoted the protein level of SND1 in Figure 5I, followed by demonstrating that the overexpression of Snhg3 or SND1 elicited PPARγ expression in Figures 7F. However, we acknowledge that this order of presentation may cause confusion. In fact, these experiments were repeatedly performed by multiple times, and we have provided the new original western blot data and analysis for Figure 5I (PA treatment) for further clarification. Please check them.

      References

      HERNANDEZ-QUILES, M., BROEKEMA, M. F. & KALKHOVEN, E. 2021. PPARgamma in Metabolism, Immunity, and Cancer: Unified and Diverse Mechanisms of Action. Front Endocrinol (Lausanne), 12, 624112. DIO:10.3389/fendo.2021.624112, PMID:33716977

      ZHANG, Z., ZHAO, G., LIU, L., HE, J., DARWAZEH, R., LIU, H., CHEN, H., ZHOU, C., GUO, Z. & SUN, X. 2019. Bexarotene Exerts Protective Effects Through Modulation of the Cerebral Vascular Smooth Muscle Cell Phenotypic Transformation by Regulating PPARgamma/FLAP/LTB(4) After Subarachnoid Hemorrhage in Rats. Cell Transplant, 28, 1161-1172. DIO:10.1177/0963689719842161, PMID:31010302

    1. eLife assessment

      The manuscript by Carbo et al. reports a novel role for the MltG homolog AgmT in gliding motility in M. xanthus. The authors provide convincing data to demonstrate that AgmT is a cell wall lytic enzyme (likely a lytic transglycosylase), its lytic activity is required for gliding motility, and that its activity is required for proper binding of a component of the motility apparatus to the cell wall. The findings are valuable as they contribute to our understanding of the molecular mechanisms underlying the interaction between gliding motility and the bacterial cell wall.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript nicely outlines a conceptual problem with the bFAC model in A-motility, namely, how the energy derived from the inner membrane AglRQS motor transduced through the cell wall into mechanical force on the cell surface to drive motility? To address this, the authors make a significant contribution by identifying and characterizing a lytic transglycosylase (LTG) called AgmT. This work thus provides clues and a future framework work to address mechanical force transmission from the cytoplasm through the cell envelope to the cell surface.

      Strengths:

      (i) Convincing evidence shows AgmT functions as a LTG and, surprisingly, that mltG from E. coli complements the swarming defect of an agmT mutant.

      (ii) Show 13 other LTGs found in M. xanthus are not required for A-motility.

      (iii) Authors show agmT mutants develop morphological changes in response to treatment with a beta-lactam antibiotic, mecillinam.

      (iv) The use of single molecule tracking to monitor the assembly and dynamics of bFACs in WT and mutant backgrounds.

      (v) The authors understand the limitations of their work and do not overinterpret their data.

      Weaknesses:

      The authors provided more experiments and clearly addressed my prior concerns in their revised manuscript.

    3. Reviewer #2 (Public review):

      The manuscript by Carbo et al. reports a novel role for the MltG homolog AgmT in gliding motility in M. xanthus. The authors conclusively show that AgmT is a cell wall lytic enzyme (likely a lytic transglycosylase), its lytic activity is required for gliding motility, and that its activity is required for proper binding of a component of the motility apparatus to the cell wall. The data are generally well-controlled. The marked strength of the manuscript includes the detailed characterization of AgmT as a cell wall lytic enzyme, and the careful dissection of its role in motility. Using multiple lines of evidence, the authors conclusively show that AgmT does not directly associate with the motility complexes, but that instead its absence (or the overexpression of its active site mutant) results in failure of focal adhesion complexes to properly interact with the cell wall.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript nicely outlines a conceptual problem with the bFAC model in A-motility, namely, how is the energy produced by the inner membrane AglRQS motor transduced through the cell wall into mechanical force on the cell surface to drive motility? To address this, the authors make a significant contribution by identifying and characterizing a lytic transglycosylase (LTG) called AgmT. This work thus provides clues and a future framework work for addressing mechanical force transmission between the cytoplasm and the cell surface. 

      Strengths: 

      (1) Convincing evidence shows AgmT functions as an LTG and, surprisingly, that mltG from E. coli complements the swarming defect of an agmT mutant. 

      (2) Authors show agmT mutants develop morphological changes in response to treatment with a b-lactam antibiotic, mecillinam. 

      (3) The use of single-molecule tracking to monitor the assembly and dynamics of bFACs in WT and mutant backgrounds. 

      (4) The authors understand the limitations of their work and do not overinterpret their data. 

      Weaknesses: 

      (1) A clear model of AgmT's role in gliding motility or interactions with other A-motility proteins is not provided. Instead, speculative roles for how AgmT enzymatic activity could facilitate bFAC function in A-motility are discussed. 

      We appreciate the reviewer for this comment. We have added a new figure, Fig. 6, and updated the Discussion to propose a mechanism, “rather than interacting with bFAC components directly and specifically, AgmT facilitates proper bFAC assembly indirectly through its LTG activity. LTGs usually break glycan strands and produce unique anhydro caps on their ends40-44. However, because AgmT is the only LTGs that is required for gliding, it is not likely to facilitate bFAC assembly by generating such modification on glycan strands. E. coli MltG is a glycan terminase that controls the length of newly synthesized PG glycans25. Likewise, AgmT could generate short glycan strands and thus uniquely modify the overall structure of M. xanthus PG, such as producing small pores that retard and retain the inner subcomplexes of bFACs (Fig. 6). On the contrary, the M. xanthus mutants that lack active AgmT could produce PG with increased strain length, which blocks bFACs from binding to the cell wall and precludes stable bFAC assembly. However, it would be very difficult to demonstrate how glycan length affects the connection between bFACs and PG”.

      (2) Although agmT mutants do not swarm, in-depth phenotypic analysis is lacking. In particular, do individual agmT mutant cells move, as found with other swarming defective mutants, or are agmT mutants completely nonmotile, as are motor mutants? 

      We appreciate the reviewer for bringing up an important question. Prompted by this question, we analyzed the gliding phenotype of the ΔagmT pilA mutant on the single cell level. We found that the ΔagmT pilA cells are not completely static. Instead, they move for less than half cell length before pauses and reversal. We moved on to quantify the velocity and gliding persistency and found that the gliding phenotype of the ΔagmT pilA cells matches the prediction on the bFACs that loses the connection between the inner subcomplexes and PG.  

      We then imaged individual ∆agmT pilA- cells on 1.5% agar surface at 10-s intervals using bright-field microscopy. To our surprise, instead of being static, individual ∆agmT pilA- cells displayed slow movements, with frequent pauses and reversals (Video 1). To quantify the effects of AgmT, we measured the velocity and gliding persistency (the distances cells traveled before pauses and reversals) of individual cells. Compared to the pilA- cells that moved at 2.30 ± 1.33 μm/min (n = 46) and high persistency (Video 2 and Fig. 2C, D), ∆agmT pilA- cells moved significantly slower (0.88 ± 0.62 μm/min, n = 59) and less persistent (Video 1 and Figure. 2C, D). Such aberrant gliding motility is distinct from the “hyper reversal” phenotype. Although the hyper reversing cells constitutively switching their moving directions, they usually maintain gliding velocity at the wild-type level27. due to the polarity regulators Instead, the slow and “slippery” gliding of the ∆agmT pilA- cells matches the prediction that when the inner complexes of bFACs lose connection with PG, bFACs can only generate short, and inefficient movements19. Our data indicate that AgmT is not essential component in the bFACs. Thus, AgmT is likely to regulate the assembly and stability of bFACs, especially their connection with PG.         

      (3) The bioinformatic and comparative genomics analysis of agmT is incomplete. For example, the sequence relationships between AgmT, MltG, and the 13 other LTG proteins in M. xanthus are not clear. Is E. coli MltG the closest homology to AgmT? Their relationships could be addressed with a phylogenetic tree and/or sequence alignments. Furthermore, are there other A-motility genes in proximity to agmT? Similarly, does agmT show specific co-occurrences with the other A-motility genes across genera/species?  

      We answered the first question in the Discussion (it was in the first Results section in the previous version), “Both M. xanthus AgmT and E. coli MltG belong to the YceG/MltG family, which is the first identified LTG family that is conserved in both Gram-negative and positive bacteria25,41. About 70% of bacterial genomes, including firmicutes, proteobacteria, and actinobacteria, encode YceG/MltG domains25. The unique inner membrane localization of this family and the fact that AgmT is the only M. xanthus LTG that belongs to this family (Table S2) could partially explain why it is the only LTG that contributes to gliding motility”.

      For the second, we added one sentence in the Results, “No other motility-related genes are found in the vicinity of agmT”.

      For the third question, we do not believe a co-occurrence analysis is necessary. Because M. xanthus gliding is very unique but “about 70% of bacterial genomes, including firmicutes, proteobacteria, and actinobacteria, encode YceG/MltG domains25”, gliding should show no co-occurrence with the YceG/MltG family LTGs.

      (4) Related to iii, what about the functional relationship of the endogenous 13 LTG genes? Although knockout mutants were shown to be motile, presumably because AgmT is present, can overexpression of them, similar to E. coli MltG, complement an agmT mutant? In other words, why does MltG complement and the endogenous LTG proteins appear not to be relevant? 

      We appreciate the reviewer for this question, which prompted us to think the uniqueness of AgmT more carefully. AgmT is unique for its inner-membrane localization, rather than activity. We answered this question in the discussion, “LTGs usually break glycan strands and produce unique anhydro caps on their ends40-44. However, because AgmT is the only LTGs that is required for gliding, it is not likely to facilitate bFAC assembly by generating such modification on glycan strands”. We then moved on to propose a possible mechanism, “E. coli MltG is a glycan terminase that controls the length of newly synthesized PG glycans25. Likewise, AgmT could generate short glycan strands and thus uniquely modify the overall structure of M. xanthus PG, such as producing small pores that retard and retain the inner subcomplexes of bFACs (Fig. 6). On the contrary, the M. xanthus mutants that lack active AgmT could produce PG with increased strain length, which blocks bFACs from binding to the cell wall and precludes stable bFAC assembly. However, it would be very difficult to demonstrate how glycan length affects the connection between bFACs and PG”. 

      (5) Based on Figure 2B, overexpression of MltG enhances A-motility compared to the parent strain and the agmT-PAmCh complemented strain, is this actually true? Showing expanded swarming colony phenotypes would help address this question. 

      We appreciate the reviewer for bringing up an important question. Prompted by this question, we analyzed the effects of MltG expression at the single-cell level. We found that “Consistent with its LTG activity, the expression of MltGEc restored gliding motility of the ΔagmT pilA- cells on both the colony (Fig. 2B) and single-cell (Fig. 2C, D) levels. Interestingly, in the absence of sodium vanillate, the leakage expression of MltGEc using the vanillate-inducible promoter was sufficient to compensate the loss of AgmT. A plausible explanation of this observation is that as E. coli grows much faster (generation time 20 - 30 min) than M. xanthus (generation time ~4 h), MltGEc could possess significantly higher LTG activity than AgmT. Induced by 200 μM sodium vanillate, the expression of MltGEc further but non significantly increased the velocity and gliding persistency (Fig. 2B-D). Importantly, the expression of MltGEc failed to restore gliding motility in the agmTEAEA pilA cells, even in the presence of 200 μM sodium vanillate (Fig. 2B). Consistent with the mecillinam resistance assay (Fig. 3C), this result suggests that AgmTEAEA still binds to PG and that in the absence of its LTG activity, AgmT does not anchor bFACs to PG”. These results are shown in the new panels C and D in Figure 2. 

      (6) Cell flexibility is correlated with gliding motility function in M. xanthus. Since AgmT has LTG activity, are agmT mutants less flexible than WT cells and is this the cause of their motility defect? 

      We appreciate the reviewer for bringing up an important question. We saw cells that lack AgmT making S-turns and U-turns frequently under microscope. We used a GRABS assay to quantify cell stiffness and found that neither the absence of AgmT nor the expression of MltGEc affect cell stiffness. We added this result in the manuscript, “The assembly of bFACs produces wave-like deformation on cell surface6,37, suggesting that their assembly may require a flexible PG layer2,6,11,12. As a major contributor to cell stiffness, PG flexibility affects the overall stiffness of cells38. To test the possibility that AgmT and MltGEc facilitate bFAC assembly by reducing PG stiffness, we adopted the GRABS assay38 to quantify if the lack of AgmT and the expression of MltGEc affects cell stiffness. To quantify changes in cell stiffness, we simultaneously measured the growth of the pilA-, ΔagmT pilA-, and ΔagmT Pvan-MltGEc pilA- (with 200 μM sodium vanillate) cells in a 1% agarose gel infused with CYE and liquid CYE and calculated the GRABS scores of the ΔagmT pilA-, and ΔagmT Pvan-MltGEc pilA- cells using the pilA- cells as the reference, where positive and negative GRABS scores indicate increased and decreased stiffness, respectively (see Materials and Methods and Ref38). The GRABS scores of the ΔagmT pilA-, and ΔagmT Pvan-MltGEc pilA- (with 200 μM sodium vanillate) cells were -0.06 ± 0.04 and -0.10 ± 0.07 (n = 4), respectively, indicating that neither AgmT nor MltGEc affects cell stiffness significantly. Whereas PG flexibility could still be essential for gliding, AgmT and MltGEc do not regulate bFAC assembly by modulating PG stiffness. Instead, these LTGs could connect bFACs to PG by generating structural features that are irrelevant to PG stiffness”.      

      Reviewer #2 (Public Review): 

      The manuscript by Carbo et al. reports a novel role for the MltG homolog AgmT in gliding motility in M. xanthus. The authors conclusively show that AgmT is a cell wall lytic enzyme (likely a lytic transglycosylase), its lytic activity is required for gliding motility, and that its activity is required for proper binding of a component of the motility apparatus to the cell wall. The data are generally well-controlled. The marked strength of the manuscript includes the detailed characterization of AgmT as a cell wall lytic enzyme, and the careful dissection of its role in motility. Using multiple lines of evidence, the authors conclusively show that AgmT does not directly associate with the motility complexes, but that instead its absence (or the overexpression of its active site mutant) results in the failure of focal adhesion complexes to properly interact with the cell wall. 

      An interpretive weakness is the rather direct role attributed to AgmT in focal adhesion assembly. While their data clearly show that AgmT is important, it is unclear whether this is the direct consequence of AgmT somehow promoting bFAC binding to PG or just an indirect consequence of changed cell wall architecture without AgmT. In E. coli, an MltG mutant has increased PG strain length, suggesting that M. xanthus's PG architecture may likewise be compromised in a way that precludes AglR binding to the cell wall. However, this distinction would be very difficult to establish experimentally. MltG has been shown to associate with active cell wall synthesis in E. coli in the absence of protein-protein interactions, and one could envision a similar model in M. xanthus, where active cell wall synthesis is required for focal adhesion assembly, and MltG makes an important contribution to this process. 

      Based on the data that AgmT does not assemble into bFACs and that heterologous MltGEc substitutes M. xanthus AgmT in gliding, we believe that AgmT facilitates the proper assembly of bFACs indirectly. At the end of Introduction, we pointed out, “Hence, the LTG activity of AgmT anchors bFAC to PG, potentially by modifying PG structure”. Following the reviewer’s recommendation, we revised the Discussion to emphasize that AgmT facilitates proper bFAC assembly indirectly through its LTG activity. For the reviewer’s convenience, the revised paragraph is pasted here, with the changes highlighted in blue:  

      “It is surprising that AgmT itself does not assemble into bFACs and that MltGEc substitutes AgmT in gliding. Thus, rather than interacting with bFAC components directly and specifically, AgmT facilitates proper bFAC assembly indirectly through its LTG activity. LTGs usually break glycan strands and produce unique anhydro caps on their ends40-44. However, because AgmT is the only LTGs that is required for gliding, it is not likely to facilitate bFAC assembly by generating such modification on glycan strands. E. coli MltG is a glycan terminase that controls the length of newly synthesized PG glycans25. Likewise, AgmT could generate short glycan strands and thus uniquely modify the overall structure of M. xanthus PG, such as producing small pores that retard and retain the inner subcomplexes of bFACs (Fig. 6). On the contrary, the M. xanthus mutants that lack active AgmT could produce PG with increased strain length, which blocks bFACs from binding to the cell wall and precludes stable bFAC assembly. However, it would be very difficult to demonstrate how glycan length affects the connection between bFACs and PG”.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The last sentence of the Discussion implies that anchoring LTG (AgmT) in the inner membrane is important. I did not see this mentioned about AgmT. Does it contain an inner membrane anchoring domain? Along these lines, the AgmT and MltG proteins appear to be of different sizes (Figure 1A). Please clarify, perhaps including full-length sequence alignment and/or domain architecture for these proteins. 

      We revised the first paragraph in the Results and clarified, “Among these genes, agmT (ORF K1515_0491023) was predicted to encode an inner membrane protein with a single N-terminal transmembrane helix (residues 4 – 25) and a large “periplasmic solute-binding” domain22.”

      We appreciate the reviewer for spotting the mistake in Fig. 2A. The E. coli MltG sequence shown in the alignment starts from residue 158, instead of 88. We have corrected this mistake in the figure. M. xanthus AgmT and E. coli MltG are of similar sizes, with 239 and 240 amino acids, respectively. 

      In Figure 3 legend, define D3. 

      The definition of D_3_ was added into the figure legend.

      Figure 4A shows 100-frame composite micrographs, but no time interval between frames is given. 

      The imaging frequency, 10 Hz, was stated in the text. We also added this information into the figure legend.

      Line 98, the term "Especially" does not flow well, change to "This includes the characteristic..." or similar. 

      We deleted “especially” from the sentence.

      Line 179, "not" is not accurate, replace with "rarely." 

      Changed.

      Line 188, add a qualifier, "proper" before "bFACs assembly." 

      Added.

      Lines 196 and 202, provide the sizes of each protein in these fusion constructs. 

      We added these numbers to the figure legend.

      In Figure 5A add arrows to identify each band. State in legend whether this is a denaturing gel, if so, why are AgmT-PAmCherry homodimers present?

      Protein electrophoresis was done using SDS-PAGE. It is not unusual that some proteins, especially membrane proteins, are resistant to dissociation by SDS and appear as multimers in SDS-PAGE. The authors have seen this phenomenon repeatedly in both our experiments and the literature. Nevertheless, we clarified our experimental condition in the text, “Similar to many membrane proteins that resistant to dissociation by SDS34, immunoblot using an anti-mCherry antibody showed that AgmTPAmCherry accumulated in two bands in SDS-PAGE that corresponded to monomers and dimers of the full-length fusion protein, respectively (Fig. 5A)”.

      A few examples for membrane proteins remaining as oligomers are listed in below:

      Rath et al., 2009, PNAS 106: 1760-1765

      Sulistijo et al., 2003, J Biol Chem 278: 51950-51956

      Sukharev 2002, Biophy J 83: 290-298

      Neumann et al., 1998, J Bacteriol 180: 3312-3316

      Blakey et al., 2002, Biochem J 364: 527-535

      Wegner and Jones, 1984, J Biol Chem 259: 1834-1841

      Jiang et al., 2002, Nature 417: 515-522

      Heginbotham and Miller, 1997, Biochem 36: 10335-10342

      Gentile et al., 2002, J Biol Chem 277: 44050-44060

      Line 207, "near evenly along cell bodies" does not seem consistent with Figure 5B as there looks to be an enrichment of AgmT at cell poles. 

      We have replaced panel 5B with more typical images. Due to the shape difference between cell poles and the cylindrical nonpolar regions, many surface-associated proteins could appear “enriched” at cell poles. This effect was very obvious in Fig. 5B, possibly due to the unevenness of the agar surface. We examined our data carefully and did not find significant polar enrichment. Compared to AglZ that significantly enriches at poles and forms evenly-spaced clusters along the cell body, the localization of AgmT is completely different.  

      Lines 252 and 260, change "Fig. 5B" to "Fig. 5C." 

      We apologize for these mistakes. They have been corrected.

      Line 266, insert "the" before "cell envelope." 

      Added.

      Line 278, insert "presumably" between "AgmT generates (small openings)" 

      Corrected.

      Reviewer #2 (Recommendations For The Authors): 

      - Major comment: I would rephrase conclusions regarding a direct role of AgmT in focal adhesion assembly since these data are indirect (AglR binding to the cell wall is reduced in the absence of AgmT - this could also be interpreted as the absence of AgmT causing altered cell wall architecture that precludes AglR binding). Example: I don't think the data support line 222 "AgmT connects bFACs to PG", perhaps rephrased to accommodate more agnostic explanations. Likewise, line 308 states that MltG has been "adopted" by the gliding motility machinery. This conclusion cannot be drawn from the data presented. 

      We agree with the reviewer that the conclusions should be stated precisely. At the end of Introduction, we pointed out, “Hence, the LTG activity of AgmT anchors bFAC to PG, potentially by modifying PG structure”. Following the reviewer’s recommendation, we revised the Discussion to emphasize that AgmT facilitates bFAC assembly indirectly through its LTG activity. For the reviewer’s convenience, the revised paragraph is pasted here, with the changes highlighted in blue: 

      “It is surprising that AgmT itself does not assemble into bFACs and that MltGEc substitutes AgmT in gliding. Thus, rather than interacting with bFAC components directly and specifically, AgmT facilitates proper bFAC assembly indirectly through its LTG activity. LTGs usually break glycan strands and produce unique anhydro caps on their ends40-44. However, because AgmT is the only LTGs that is required for gliding, it is not likely to facilitate bFAC assembly by generating such modification on glycan strands. E. coli MltG is a glycan terminase that controls the length of newly synthesized PG glycans25. Likewise, AgmT could generate short glycan strands and thus uniquely modify the overall structure of M. xanthus PG, such as producing small pores that retard and retain the inner subcomplexes of bFACs (Fig. 6). On the contrary, the M. xanthus mutants that lack active AgmT could produce PG with increased strain length, which blocks bFACs from binding to the cell wall and precludes stable bFAC assembly. However, it would be very difficult to demonstrate how glycan length affects the connection between bFACs and PG”.

      However, we believe that the conclusion that “AgmT connects bFACs to PG" still stands true. Although AgmT is not likely to interact with the gliding machinery directly, its activity does increase the binding between bFACs and PG. 

      We agree with the reviewer that “adopt” may not be the best word to describe AgmT’s function in gliding. In the revised manuscript, we changed the phrase to “contributes to gliding motility”. 

      - Line 35: define "bFAC" at first use. 

      Fixed.

      - Figure 2: Mention in the caption why the pilA mutation is significant. Also, make more clear what one is supposed to see. You could include an arrow showing motile cells extruding from the colony edge, and mark + label the edge of the colony. 

      Following the reviewer’s recommendations, we described the motility phenotypes in detail in the main text, “On a 1.5% agar surface, the pilA- cells moved away from colony edges both as individuals and in “flare-like” cell groups, indicating that they were still motile with gliding motility. In contrast, the ∆aglR pilA- cells that lack an essential component in the gliding motor, were unable to move outward from the colony edge and thus formed sharp colony edges. Similarly, the ∆agmT pilA- cells also formed sharp colony edges, indicating that they could not move efficiently with gliding (Fig. 2B)”. 

      We also added a schematic block into panel B and two sentences into the legend, “To eliminate S-motility, we further knocked out the pilA gene that encodes pilin for type IV pilus. Cells that move by gliding are able to move away from colony edges.” 

      - Figure 3 caption. Mecillinam concentration should presumably be µg/mL, not g/mL?

      Also, remove the ".van,." in the second to last line. 

      We apologize for these mistakes. We have corrected them in the figure legend. 

      - Line 212 - at this point in the manuscript, the fact that AgmT likely does not assemble into bFACs is quite well established, so I would start this paragraph with something like "As an additional test, we...". 

      Revised as the reviewer recommended.

      - Figure 5C - this assay needs a protein loading control. How about whole-cell AglR before pelleting PG? 

      We do have a whole-cell loading control, which we have added into the revised figure.

      - Figure 5A - how are the dimers visible? Is this a native gel? If so, please add to the Methods section (I would find information on Western Blot there, but not on gel electrophoresis). 

      Protein electrophoresis was done using SDS-PAGE. It is not unusual that some proteins, especially membrane proteins, are resistant to dissociation by SDS and appear as multimers in SDS-PAGE. The authors have seen this phenomenon repeatedly in both our experiments and the literature. Nevertheless, we clarified our experimental condition in the text, “Similar to many membrane proteins that resistant to dissociation by SDS34, immunoblot using an anti-mCherry antibody showed that AgmTPAmCherry accumulated in two bands in SDS-PAGE that corresponded to monomers and dimers of the full-length fusion protein, respectively (Fig. 5A)”.

      A few examples for membrane proteins remaining as oligomers are listed in below:

      Rath et al., 2009, PNAS 106: 1760-1765

      Sulistijo et al., 2003, J Biol Chem 278: 51950-51956

      Sukharev 2002, Biophy J 83: 290-298

      Neumann et al., 1998, J Bacteriol 180: 3312-3316

      Blakey et al., 2002, Biochem J 364: 527-535

      Wegner and Jones, 1984, J Biol Chem 259: 1834-1841

      Jiang et al., 2002, Nature 417: 515-522

      Heginbotham and Miller, 1997, Biochem 36: 10335-10342

      Gentile et al., 2002, J Biol Chem 277: 44050-44060

    1. eLife assessment

      This useful study describes a single set of label-chase mass spectrometry experiments to confirm the molecular function of YafK as a peptidoglycan hydrolase, and to describe the timing of its attachment to the peptidoglycan. Confirmation of the molecular function of YafK is helpful for further studies to examine the function and regulation of the outer membrane-peptidoglycan link in bacteria. The evidence supporting the molecular function of YafK and that lpp molecules are shuffled on and off the peptidoglycan is solid, however, some of the other data still remain incomplete in the revised version. The work will be of interest to researchers studying lipoproteins in gram negative bacteria.

    2. Reviewer #1 (Public review):

      The authors present data on outer membrane vesicle (OMV) production in different mutants, but they state that this is beyond the scope of the current manuscript, which I disagree with. This data could provide valuable physiological context that is otherwise lacking. The preliminary blots suggest that YafK does not alter OMV biogenesis. I recommend repeating these blots with appropriate controls, such as blotting for proteins in the culture media, an IM protein, periplasmic protein and an OM protein to strengthen the reliability of these findings. Including this data in the manuscript, even if it does not directly support the initial hypothesis, would enhance the physiological relevance of the study. Currently, the manuscript relies completely on the experimental setup (labeling-mass spec) previously developed by the authors, which limits the broader scope and interpretability of this study.

      Additionally susceptibility of strains to detergents like SDS can be tested to provide a much needed physisological context to the study.

      In summary, the authors should consider revising the manuscript to improve clarity, substantiate their claims with more detailed evidence, and include additional experimental results that provide necessary physiological context to their study.

      Comments on the revised version:

      Regarding my comments from last review on a new figure on OMV analysis, The authors have redirected me to their previous response and have not performed the suggested control blots. I do not get their argument that this is for specialized audience. I do not have any more comments.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The authors present data on outer membrane vesicle (OMV) production in different mutants, but they state that this is beyond the scope of the current manuscript, which I disagree with. This data could provide valuable physiological context that is otherwise lacking. The preliminary blots suggest that YafK does not alter OMV biogenesis. I recommend repeating these blots with appropriate controls, such as blotting for proteins in the culture media, an IM protein, periplasmic protein and an OM protein to strengthen the reliability of these findings. Including this data in the manuscript, even if it does not directly support the initial hypothesis, would enhance the physiological relevance of the study. Currently, the manuscript relies completely on the experimental setup (labeling-mass spec) previously developed by the authors, which limits the broader scope and interpretability of this study.

      As stated in the previous response to the reviewers,  MBP and  RpoA were indeed used in the western blot experiments as  appropriate controls for periplasmic and cytoplasmic proteins, respectively. The open review process of eLife has enabled us to include additional data from experiments suggested by the reviewers. We think that this mode of publication is appropriate in the present case for the reporting of the requested analysis of OMVs. Indeed, these data are of interest only to a rather specialized audience.

      Reviewer #2 (Public Review):  

      Weaknesses:

      Figure 3 and 4 - why are the data shown here only two biological replicates, when there are 3-5 replicates shown in table S1 and S2? This makes it seem like you are cherry picking your favorite replicates. Please present the data as the mean of all the replicates performed, with error shown on the graph.

      We apologize for forgetting to update the legend to Figures 3 and 4. In the modified version, we have indicated that the values used for the plots are the average of three to five replicates. The full set of data together with the means and standard deviations appear in Tables S1 and S2. We would like to keep the current presentation of the data because introducing standard deviations in these figures compromise the legibility of the data.

      This work will have a moderate impact on the field of research in which the connections between the OM and peptidoglycan are being studied in E. coli. Since lpp is not widely conserved in gram negatives, the impact across species is not clear. The authors do not discuss the impact of their work in depth.

      We have already answered this comment in the first response to the reviewers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, the authors investigated the dynamics of a neural network model characterized by sparsely connected clusters of neuronal ensembles. They found that such a network could intrinsically generate sequence preplay and place maps, with properties like those observed in the real-world data. Strengths of the study include the computational model and data analysis supporting the hippocampal network mechanisms underlying sequence preplay of future experiences and place maps.

      Previous models of replay or theta sequences focused on circuit plasticity and usually required a pre-existing place map input from the external environment via upstream structures. However, those models failed to explain how networks support rapid sequential coding of novel environments or simply transferred the question to the upstream structure. On the contrary, the current proposed model required minimal spatial inputs and was aimed at elucidating how a preconfigured structure gave rise to preplay, thereby facilitating the sequential encoding of future novel environments.

      In this model, the fundamental units for spatial representation were clusters within the network. Sequential representation was achieved through the balance of cluster isolation and their partial overlap. Isolation resulted in a self-reinforced assembly representation, ensuring stable spatial coding. On the other hand, overlap-induced activation transitions across clusters, enabling sequential coding.

      This study is important when considering that previous models mainly focused on plasticity and experience-related learning, while this model provided us with insights into how network architecture could support rapid sequential coding with large capacity, upon which learning could occur efficiently with modest modification via plasticity.

      I found this research very inspiring and, below, I provide some comments aimed at improving the manuscript. Some of these comments may extend beyond the scope of the current study, but I believe they raise important questions that should be addressed in this line of research.

      (1) The expression 'randomly clustered networks' needs to be explained in more detail given that in its current form risks to indicate that the network might be randomly organized (i.e., not organized). In particular, a clustered network with future functionality based on its current clustering is not random but rather pre-configured into those clusters. What the authors likely meant to say, while using the said expression in the title and text, is that clustering is not induced by an experience in the environment, which will only be later mapped using those clusters. While this organization might indeed appear as randomly clustered when referenced to a future novel experience, it might be non-random when referenced to the prior (unaccounted) activity of the network. Related to this, network organization based on similar yet distinct experiences (e.g., on parallel linear tracks as in Liu, Sibille, Dragoi, Neuron 2021) could explain/configure, in part, the hippocampal CA1 network organization that would appear otherwise 'randomly clustered' when referenced to a future novel experience.

      As suggested by the reviewer, we have revised the text to clarify that the random clustering is random with respect to any future, novel environment (lines 111-114 and 710-712).

      Lines 111-114: “To reconcile these experimental results, we propose a model of intrinsic sequence generation based on randomly clustered recurrent connectivity, wherein place cells are connected within multiple overlapping clusters that are random with respect to any future, novel environment.”

      Lines 710-712: “Our results suggest that the preexisting hippocampal dynamics supporting preplay may reflect general properties arising from randomly clustered connectivity, where the randomness is with respect to any future, novel experience.”

      The cause of clustering could be prior experiences (e.g. Bourjaily and Miller, 2011) or developmental programming (e.g. Perin et al., 2011; Druckmann et al., 2014; Huszar et al., 2022), and we have modified lines 116 and 714-718 to state this.

      Lines 116: Added citation of “Perin et al., 2011”

      Lines 714-718: “Synaptic plasticity in the recurrent connections of CA3 may primarily serve to reinforce and stabilize intrinsic dynamics, which could be established through a combination of developmental programming (Perin et al., 2011; Druckmann et al., 2014; Huszar et al., 2022) and past experiences (Bourjaily and Miller, 2011), rather than creating spatial maps de novo.”

      We thank the reviewer for suggesting that the results of Liu et al., 2021 strengthen the support for our modeling motivations. We agree, and we now cite their finding that the hippocampal representations of novel environments emerged rapidly but were initially generic and showed greater discriminability from other environments with repeated experience in the environment (lines 130-134).

      Lines 130-134: “Further, such preexisting clusters may help explain the correlations that have been found in otherwise seemingly random remapping (Kinsky et al., 2018; Whittington et al., 2020) and support the rapid hippocampal representations of novel environments that are initially generic and become refined with experience (Liu et al., 2021).”

      (2) The authors should elaborate more on how the said 'randomly clustered networks' generate beyond chance-level preplay. Specifically, why was there preplay stronger than the time-bin shuffle? There are at least two potential explanations:

      (1) When the activation of clusters lasts for several decoding time bins, temporal shuffle breaks the continuity of one cluster's activation, thus leading to less sequential decoding results. In that case, the preplay might mainly outperform the shuffle when there are fewer clusters activating in a PBE. For example, activation of two clusters must be sequential (either A to B or B to A), while time bin shuffle could lead to non-sequential activations such as a-b-a-b-a-b where a and b are components of A and B;

      (2) There is a preferred connection between clusters based on the size of overlap across clusters. For example, if pair A-B and B-C have stronger overlap than A-C, then cluster sequences A-B-C and C-B-A are more likely to occur than others (such as A-C-B) across brain states. In that case, authors should present the distribution of overlap across clusters, and whether the sequences during run and sleep match the magnitude of overlap. During run simulation in the model, as clusters randomly receive a weak location cue bias, the activation sequence might not exactly match the overlap of clusters due to the external drive. In that case, the strength of location cue bias (4% in the current setup) could change the balance between the internal drive and external drive of the representation. How does that parameter influence the preplay incidence or quality?

      Explanation 1 is correct: Our cluster-activation analyses (Figure 5) showed that the parameter values that generate preplay correspond to the parameter regions that support sustained cluster activity over multiple decoding time bins, which led us to the conclusion of the reviewer’s first proposed explanation.

      We have now added additional analyses supporting the conclusion that cluster-wise activity is the main driver of preplay rather than individual cell-identity (Figures 6 and 7). In Figure 6 we show that cluster-identity alone is sufficient to produce significant preplay by performing decoding after shuffling cell identity within clusters, and in Figure 7 we show that this result holds true when considering the sequence of spiking activity within population bursts rather than the spatial decoding.

      Lines 495-515: The pattern of preplay significance across the parameter grid in Figure 4f shows that preplay only occurs with modest cluster overlap, and the results of Figure 5 show that this corresponds to the parameter region that supports transient, isolated cluster-activation. This raises the question of whether cluster-identity is sufficient to explain preplay. To test this, we took the sleep simulation population burst events from the fiducial parameter set and performed decoding after shuffling cell identity in three different ways. We found that when the identity of all cells within a network are randomly permuted the resulting median preplay correlation shift is centered about zero (t-test 95% confidence interval, -0.2018 to 0.0012) and preplay is not significant (distribution of p-values is consistent with a uniform distribution over 0 to 1, chi-square goodness-of-fit test p=0.4436, chi-square statistic=2.68; Figure 6a). However, performing decoding after randomly shuffling cell identity between cells that share membership in a cluster does result in statistically significant preplay for all shuffle replicates, although the magnitude of the median correlation shift is reduced for all shuffle replicates (Figure 6b). The shuffle in Figure 6b does not fully preserve cell’s cluster identity because a cell that is in multiple clusters may be shuffled with a cell in either a single cluster or with a cell in multiple clusters that are not identical. Performing decoding after doing within-cluster shuffling of only cells that are in a single cluster results in preplay statistics that are not statistically different from the unshuffled statistics (t-test relative to median shift of un-shuffled decoding, p=0.1724, 95% confidence interval of -0.0028 to 0.0150 relative to the reference value; Figure 6c). Together these results demonstrate that cluster-identity is sufficient to produce preplay.

      Lines 531-551: While cluster-identity is sufficient to produce preplay (Figure 6b), the shuffle of Figure 6c is incomplete in that cells belonging to more than one cluster are not shuffled. Together, these two shuffles leave room for the possibility that individual cell-identity may contribute to the production of preplay. It might be the case that some cells fire earlier than others, both on the track and within events. To test the contribution of individual cells to preplay, we calculated for all cells in all networks of the fiducial parameter point their mean relative spike rank and tested if this is correlated with the location of their mean place field density on the track (Figure 7). We find that there is no relationship between a cell’s mean relative within-event spike rank and its mean place field density on the track (Figure 7a). This is the case when the relative rank is calculated over the entire network (Figure 7, “Within-network”) and when the relative rank is calculated only with respect to cells with the same cluster membership (Figure 7, “Within-cluster”). However, because preplay events can proceed in either track direction, averaging over all events would average out the sequence order of these two opposite directions. We performed the same correlation but after reversing the spike order for events with a negative slope in the decoded trajectory (Figure 7b). To test the significance of this correlation, we performed a bootstrap significance test by comparing the slope of the linear regression to the slope that results when performing the same analysis after shuffling cell identities in the same manner as in Figure 6. We found that the linear regression slope is greater than expected relative to all three shuffling methods for both the within-network mean relative rank correlation (Figure 6c) and the within-cluster mean relative rank correlation (Figure 6d).

      Lines 980-1000:

      “Cell identity shuffled decoding

      We performed Bayesian decoding on the fiducial parameter set after shuffling cell identities in three different manners (Figures 6 and 7). To shuffle cells in a cluster-independent manner (“Across-network shuffle”), we randomly shuffled the identity of cells during the sleep simulations. To shuffle cells within clusters (“Within-cluster shuffle”), we randomly shuffled cell identity only between cells that shared membership in at least one cluster. To shuffle cells within only single clusters (“Within-single-cluster shuffle”), we shuffled cells in the same manner as the within-cluster shuffle but excluded any cells from the shuffle that were in multiple clusters.

      To test for a correlation between spike rank during sleep PBEs and the order of place fields on the track (Figure 7), we calculated for each excitatory cell in each network of the fiducial parameter set its mean relative spike rank and correlated that with the location of its mean place field density on the track (Figure 7a). To account for event directionality, we calculated the mean relative rank after inverting the rank within events that had a negatively sloped decoded trajectory (Figure 7b). We calculated mean relative rank for each cell relative to all cells in the network (“Within-network mean relative rank”) and relative to only cells that shared cluster membership with the cell (“Within-cluster mean relative rank”). We then compared the slope of the linear regression between mean relative rank and place field location against the slope that results when applying the same analysis to each of the three methods of cell identify shuffles for both the within-network regression (Figure 7c) and the within-cluster regression (Figure 7d).”

      We also now show that the sequence of cluster-activation in events with 3 active clusters does not match the sequence of cluster biases on the track above chance levels and that events with fewer active clusters have the largest increase in median weighted decode correlation (Figure 5—figure supplement 1), showing that the reviewer’s second explanation is not the case.

      Lines 466-477: “The results of Figure 5 suggest that cluster-wise activation may be crucial to preplay. One possibility is that the random overlap of clusters in the network spontaneously produces biases in sequences of cluster activation which can be mapped onto any given environment. To test this, we looked at the pattern of cluster activations within events. We found that sequences of three active clusters were not more likely to match the track sequence than chance (Figure 5—figure supplement 1a). This suggests that preplay is not dependent on a particular biased pattern in the sequence of cluster activation. We then we asked if the number of clusters that were active influenced preplay quality. We split the preplay events by the number of clusters that were active during each event and found that the median preplay shift relative to shuffled events with the same number of active clusters decreased with the number of active clusters (Spearman’s rank correlation, p=0.0019, =-0.13; Figure 5—figure supplement 1b).”

      Lines 1025-1044:

      “Active cluster analysis

      To quantify cluster activation (figure 5), we calculated the population rate for each cluster individually as the mean firing rate of all excitatory cells belonging to the cluster smoothed with a Gaussian kernel (15 ms standard deviation). A cluster was defined as ‘active’ if at any point its population rate exceeded twice that of any other cluster during a PBE. The active clusters’ duration of activation was defined as the duration for which it was the most active cluster.

      To test whether the sequence of activation in events with three active clusters matched the sequence of place fields on the track, we performed a bootstrap significance test (Figure 5—figure supplement 1). For all events from the fiducial parameter set that had three active clusters, we calculated the fraction in which the sequence of the active clusters matched the sequence of the clusters’ left vs right bias on the track in either direction. We then compared this fraction to the distribution expected from randomly sampling sequences of three clusters without replacement.

      To determine if there was a relationship between the number of active clusters within an event and it’s preplay quality we performed a Spearman’s rank correlation between the number of active clusters and the normalized absolute weighted correlation across all events at the fiducial parameter set. The absolute weighted correlations were z-scored based on the absolute weighted correlations of the time-bin shuffled events that had the same number of active clusters.”

      We also now add control simulations showing that without the cluster-dependent bias the population burst events no longer significantly decode as preplay (Figure 4—figure supplement 4e).

      (3) The manuscript is focused on presenting that a randomly clustered network can generate preplay and place maps with properties similar to experimental observations. An equally interesting question is how preplay supports spatial coding. If preplay is an intrinsic dynamic feature of this network, then it would be good to study whether this network outperforms other networks (randomly connected or ring lattice) in terms of spatial coding (encoding speed, encoding capacity, tuning stability, tuning quality, etc.)

      We agree that this is an interesting future direction, but we see it as outside the scope of the current work. There are two interesting avenues of future work: 1) Our current model does not include any plasticity mechanisms, but a future model could study the effects of synaptic plasticity during preplay on long-term network dynamics, and 2) Our current model does not include alternative approaches to constructing the recurrent network, but future studies could systematically compare the spatial coding properties of alternative types of recurrent networks.

      (4) The manuscript mentions the small-world connectivity several times, but the concept still appears too abstract and how the small-world index (SWI) contributes to place fields or preplay is not sufficiently discussed.

      For a more general audience in the field of neuroscience, it would be helpful to include example graphs with high and low SWI. For example, you can show a ring lattice graph and indicate that there are long paths between points at opposite sides of the ring; show randomly connected graphs indicating there are no local clustered structures, and show clustered graphs with several hubs establishing long-range connections to reduce pair-wise distance.

      How this SWI contributes to preplay is also not clear. Figure 6 showed preplay is correlated with SWI, but maybe the correlation is caused by both of them being correlated with cluster participation. The balance between cluster overlap and cluster isolation is well discussed. In the Discussion, the authors mention "...Such a balance in cluster overlap produces networks with small-world characteristics (Watts and Strogatz, 1998) as quantified by a small-world index..." (Lines 560-561). I believe the statement is not entirely appropriate, a network similar to ring lattice can still have the balance of cluster isolation and cluster overlap, while it will have small SWI due to a long path across some node pairs. Both cluster structure and long-range connection could contribute to SWI. The authors only discuss the necessity of cluster structure, but why is the long-range connection important should also be discussed. I guess long-range connection could make the network more flexible (clusters are closer to each other) and thus increase the potential repertoire.

      We agree that the manuscript would benefit from a more concrete explanation of the small-world index. We have added a figure illustrating different types of networks and their corresponding SWI (Figure 1—figure supplement 1) and a corresponding description in the main text (lines 228-234).

      Lines 228-234: “A ring lattice network (Figure 1—figure supplement 1a) exhibits high clustering but long path lengths between nodes on opposite sides of the ring. In contrast, a randomly connected network (Figure 1—figure supplement 1c) has short path lengths but lacks local clustered structure. A network with small world structure, such as a Watts-Strogatz network (Watts and Strogatz, 1998) or our randomly clustered model (Figure 1—figure supplement 1b), combines both clustered connectivity and short path lengths. In our clustered networks, for a fixed connection probability the SWI increases with more clusters and lower cluster participation…”

      We note that while our most successful clustered networks are indeed those with small-world characteristics, there are other ways of producing small-world networks which may not show good place fields or preplay. We have modified lines 690-692 to clarify that that statement is specific to our model.

      Lines 690-692: “In our clustered network structure, such a balance in cluster overlap produces networks with small-world characteristics (Watts and Strogatz, 1998) as quantified by a small-world index (SWI, Figure 1g; Neal, 2015; Neal, 2017).”

      (5) What drives PBE during sleep? Seems like the main difference between sleep and run states is the magnitude of excitatory and inhibitory inputs controlled by scaling factors. If there are bursts (PBE) in sleep, do you also observe those during run? Does the network automatically generate PBE in a regime of strong excitation and weak inhibition (neural bifurcation)?

      During sleep simulations, the PBEs are spontaneously generated by the recurrent connections in the network. The constant-rate Poisson inputs drive low-rate stochastic spiking in the recurrent network, which then randomly generates population events when there is sufficient internal activity to transiently drive additional spiking within the network.

      During run simulations, the spatially-tuned inputs drive greater activity in a subset of the cells at a given point on the track, which in turn suppress the other excitatory cells through the feedback inhibition.

      We have added a brief explanation of this in the text in lines 281-284.

      Lines 281-284: “During simulated sleep, sparse, stochastic spiking spontaneously generates sufficient excitement within the recurrent network to produce population burst events resembling preplay (Figure 2d-f)”

      (6) Is the concept of 'cluster' similar to 'assemblies', as in Peyrache et al, 2010; Farooq et al, 2019? Does a classic assembly analysis during run reveal cluster structures?

      Our clusters correspond to functional assemblies in that cells that share a cluster membership have more-similar place fields and are more likely to reactivate together during population burst events. In the figure to the right, we show for an example network at the fiducial parameter set the Pearson correlation between all pairs of place fields split by whether the cells share membership in a cluster (blue) or do not (red).

      Author response image 1.

      We expect an assembly analysis would identify assemblies similarly to the experimental data, but we see this additional analysis as a future direction. We have added a description of this correspondence in the text at lines 134-137.

      Lines 134-137: “Such clustered connectivity likely underlies the functional assemblies that have been observed in hippocampus, wherein groups of recorded cells have correlated activity that can be identified through independent component analysis (Peyrache et al., 2010; Farooq et al., 2019).”

      (7) Can the capacity of the clustered network to express preplay for multiple distinct future experiences be estimated in relation to current network activity, as in Dragoi and Tonegawa, PNAS 2013?

      We agree this is an interesting opportunity to compare the results of our model to what has been previously found experimentally. We report here preliminary results supporting this as an interesting future direction.

      Author response image 2.

      We performed a similar analysis to that reported in Figure 3C of Dragoi and Tonegawa, 2013. We determined the statistical significance of each event individually for each of the two environments by testing whether the decoded event’s absolute weighted correlation exceeded that 99th percentile of the corresponding shuffle events. We then fit a linear regression to the fraction of events that were significant for each of the two tracks and that were significant to either of the two tracks (left panel of above figure). We then estimated the track capacity as the number of tracks at the point where the linear regression reached 100% of the network capacity. We find that applying this analysis to our fiducial parameter set returns an estimate of ~8.6 tracks (Dragoi and Tonegawa, 2013, found ~15 tracks).

      We performed this same analysis for each parameter point in our main parameter grid (right panel of above figure). The parameter region that produces significant preplay (Figure 4f) corresponds to the region that has a track capacity of approximately 8-25 tracks. In the parameter grid region that does not produce preplay, the estimated track capacity approaches the high values that this analysis would produce when applied to events that are significant only at the false-positive rate. This analysis is based on the assumption that each preplay event would significantly correspond to at least one future event. Interesting interpretation issues arise when applying this analysis to parameter regions that do not produce statistically significant preplay, which we leave to future directions to address.

      We note two differences between our analysis here and that in Dragoi and Tonegawa, 2013. First, their track capacity analysis was performed on spike sequences rather than decoded spatial sequences, which is the focus of our manuscript. Second, they recorded rats exploring three novel tracks, while in our manuscript we only simulated two novel tracks, which reduces the accuracy of our linear extrapolation of track capacity.

      Reviewer #2 (Public Review):

      Summary:

      The authors show that a spiking network model with clustered neurons produces intrinsic spike sequences when driven with a ramping input, which are recapitulated in the absence of input. This behavior is only seen for some network parameters (neuron cluster participation and number of clusters in the network), which correspond to those that produce a small world network. By changing the strength of ramping input to each network cluster, the network can show different sequences.

      Strengths:

      A strength of the paper is the direct comparison between the properties of the model and neural data.

      Weaknesses:

      My main critiques of the paper relate to the form of the input to the network.

      First, because the input is the same across trials (i.e. all traversals are the same duration/velocity), there is no ability to distinguish a representation of space from a representation of time elapsed since the beginning of the trial. The authors should test what happens e.g. with traversals in which the animal travels at different speeds, and in which the animal's speed is not constant across the entire track, and then confirm that the resulting tuning curves are a better representation of position or duration.

      We thank the reviewer for pointing out this important limitation. We see extensive testing of the time vs space coding properties of this network as a future direction, but we have performed simulations that demonstrate the robustness of place field coding to variations in traversal speeds and added the results as a supplemental figure (Figure 3—figure supplement 1).

      Lines 332-336: “To verify that our simulated place cells were more strongly coding for spatial location than for elapsed time, we performed simulations with additional track traversals at different speeds and compared the resulting place fields and time fields in the same cells. We find that there is significantly greater place information than time information (Figure 3—figure supplement 1).

      Lines 835-841: “To compare coding for place vs time, we performed repeated simulations for the same networks at the fiducial parameter point with 1.0x and 2.0x of the original track traversal speed. We then combined all trials for both speed conditions to calculate both place fields and time fields for each cell from the same linear track traversal simulations. The place fields were calculated as described below (average firing rate within each of the fifty 2-cm long spatial bins across the track) and the time fields were similarly calculated but for fifty 40-ms time bins across the initial two seconds of all track traversals.”

      Second, it's unclear how much the results depend on the choice of a one-dimensional environment with ramping input. While this is an elegant idealization that allows the authors to explore the representation and replay properties of their model, it is a strong and highly non-physiological constraint. The authors should verify that their results do not depend on this idealization. Specifically, I would suggest the authors also test the spatial coding properties of their network in 2-dimensional environments, and with different kinds of input that have a range of degrees of spatial tuning and physiological plausibility. A method for systematically producing input with varying degrees of spatial tuning in both 1D and 2D environments has been previously used in (Fang et al 2023, eLife, see Figures 4 and 5), which could be readily adapted for the current study; and behaviorally plausible trajectories in 2D can be produced using the RatInABox package (George et al 2022, bioRxiv), which can also generate e.g. grid cell-like activity that could be used as physiologically plausible input to the network.

      We agree that testing the robustness of our results to variations in feedforward input is important. We have added new simulation results (Figure 4—figure supplement 4) showing that the existence of preplay in our model is robust to variations in the form of input.

      Testing the model in a 2D environment is an interesting future direction, but we see it as outside the scope of the current work. To our knowledge there are no experimental findings of preplay in 2D environments, but this presents an interesting opportunity for future modeling studies.

      Lines 413-420: To test the robustness of our results to variations in input types, we simulated alternative forms of spatially modulated feedforward inputs. We found that with no parameter tuning or further modifications to the network, the model generates robust preplay with variations on the spatial inputs, including inputs of three linearly varying cues (Figure 4—figure supplement 4a) and two stepped cues (Figure 4—figure supplement 4b-c). The network is impaired in its ability to produce preplay with binary step location cues (Figure 4—figure supplement 4d), when there is no cluster bias (Figure 4—figure supplement 4e), and at greater values of cluster participation (Figure 4—figure supplement 4f).

      Finally, I was left wondering how the cells' spatial tuning relates to their cluster membership, and how the capacity of the network (number of different environments/locations that can be represented) relates to the number of clusters. It seems that if clusters of cells tend to code for nearby locations in the environment (as predicted by the results of Figure 5), then the number of encodable locations would be limited (by the number of clusters). Further, there should be a strong tendency for cells in the same cluster to encode overlapping locations in different environments, which is not seen in experimental data.

      Thank you for making this important point and giving us the opportunity to clarify. We do find that subsets of cells with identical cluster membership have correlated place fields, but as we show in Figure 9b (original Figure 7b) the network place map as a whole shows low remapping correlations across environments, which is consistent with experimental data (Hampson et al., 1996; Pavlides, et al., 2019).

      Our model includes a relatively small number of cells and clusters compared to CA3, and with a more realistic number of clusters, the level of correlation across network place maps should reduce even further in our model network. The reason for a low level of correlation in the model is because cluster membership is combinatorial, whereby cells that share membership in one cluster can also belong to separate/distinct other clusters, rendering their activity less correlated than might be anticipated.

      We have added text at lines 627-630 clarifying these points.

      Lines 628-631: “Cells that share membership in a cluster will have some amount of correlation in their remapping due to the cluster-dependent cue bias, which is consistent with experimental results (Hampson et al., 1996; Pavlides et al., 2019), but the combinatorial nature of cluster membership renders the overall place field map correlations low (Figure 9b).”

      Reviewer #3 (Public Review):

      Summary:

      This work offers a novel perspective on the question of how hippocampal networks can adaptively generate different spatial maps and replays/preplays of the corresponding place cells, without any such maps pre-existing in the network architecture or its inputs. Unlike previous modeling attempts, the authors do not pre-tune their model neurons to any particular place fields. Instead, they build a random, moderately-clustered network of excitatory (and some inhibitory) cells, similar to CA3 architecture. By simulating spatial exploration through border-cell-like synaptic inputs, the model generates place cells for different "environments" without the need to reconfigure its synaptic connectivity or introduce plasticity. By simulating sleep-like random synaptic inputs, the model generates sequential activations of cells, mimicking preplays. These "preplays" require small-world connectivity, so that weakly connected cell clusters are activated in sequence. Using a set of electrophysiological recordings from CA1, the authors confirm that the modeled place cells and replays share many features with real ones. In summary, the model demonstrates that spontaneous activity within a small-world structured network can generate place cells and replays without the need for pre-configured maps.

      Strengths:

      This work addresses an important question in hippocampal dynamics. Namely, how can hippocampal networks quickly generate new place cells when a novel environment is introduced? And how can these place cells preplay their sequences even before the environment is experienced? Previous models required pre-existing spatial representations to be artificially introduced, limiting their adaptability to new environments. Other models depended on synaptic plasticity rules which made remapping slower than what is seen in recordings. This modeling work proposes that quickly-adaptive intrinsic spiking sequences (preplays) and spatially tuned spiking (place cells) can be generated in a network through randomly clustered recurrent connectivity and border-cell inputs, avoiding the need for pre-set spatial maps or plasticity rules. The proposal that small-world architecture is key for place cells and preplays to adapt to new spatial environments is novel and of potential interest to the computational and experimental community.

      The authors do a good job of thoroughly examining some of the features of their model, with a strong focus on excitatory cell connectivity. Perhaps the most valuable conclusion is that replays require the successive activation of different cell clusters. Small-world architecture is the optimal regime for such a controlled succession of activated clusters.

      The use of pre-existing electrophysiological data adds particular value to the model. The authors convincingly show that the simulated place cells and preplay events share many important features with those recorded in CA1 (though CA3 ones are similar).

      Weaknesses:

      To generate place cell-like activity during a simulated traversal of a linear environment, the authors drive the network with a combination of linearly increasing/decreasing synaptic inputs, mimicking border cell-like inputs. These inputs presumably stem from the entorhinal cortex (though this is not discussed). The authors do not explore how the model would behave when these inputs are replaced by or combined with grid cell inputs which would be more physiologically realistic.

      We chose the linearly varying spatial inputs as the minimal model of providing spatial input to the network so that we could focus on the dynamics of the recurrent connections. We agree our results will be strengthened by testing alternative types of border-like input. We show in Figure 4—figure supplement 4that our preplay results are robust to several variations in the location-cue inputs. However, given that a sub-goal of our model was to show that place fields could arise in locations at which no neurons receive a peak in external input, whereas combining input from multiple grid cells produces peaked place-field like input, adding grid cell input (and the many other types of potential hippocampal input) is beyond the scope of the paper.

      Even though the authors claim that no spatially-tuned information is needed for the model to generate place cells, there is a small location-cue bias added to the cells, depending on the cluster(s) they belong to. Even though this input is relatively weak, it could potentially be driving the sequential activation of clusters and therefore the preplays and place cells. In that case, the claim for non-spatially tuned inputs seems weak. This detail is hidden in the Methods section and not discussed further. How does the model behave without this added bias input?

      We apologize for a lack of clarity if we have caused confusion about the type of inputs and if we implied an absence of spatially-tuned information in the network. In order for place fields to appear the network must receive spatial information, which we model as linearly-varying cues and illustrate in Figure 1b and describe in the caption (original lines 156-157), Results (original lines 189-190 & 497-499), and Methods (original lines 671-683). Such input is not place-field like, as the small bias to any cell linearly decreases from one boundary of the track or the other.

      The cluster-dependent bias, which is also described in the same lines (Figure 1 caption (original lines 156-157), Results (original lines 189-190 & 497-499), and Methods (original lines 671-683)), only affects the strength of the spatial cues that are present during simulated run periods. Crucially, this cluster-dependent bias is absent during sleep simulations when preplay occurs, which is why preplay can equally correlate with place field sequences in any context.

      We have modified the text (lines 207-210, 218, and 824-827) to clarify these points. We have also added results from a control simulation (Figure 4—figure supplement 4e) showing that preplay is not generated in the absence of the cluster-dependent bias.

      Lines 207-210: “This bias causes cells that share cluster memberships to have more similar place fields during the simulated run period, but, crucially, this bias is not present during sleep simulations so that there is no environment-specific information present when the network generates preplay.”

      Lines 218: “Second, to incorporate cluster-dependent correlations in place fields, a small…”

      Lines 824-827: “The addition of this bias produced correlations in cells’ spatial tunings based on cluster membership, but, importantly, this bias was not present during the sleep simulations, and it did not lead to high correlations of place-field maps between environments (Figure 9b).”

      Unlike excitation, inhibition is modeled in a very uniform way (uniform connection probability with all E cells, no I-I connections, no border-cell inputs). This goes against a long literature on the precise coordination of multiple inhibitory subnetworks, with different interneuron subtypes playing different roles (e.g. output-suppressing perisomatic inhibition vs input-gating dendritic inhibition). Even though no model is meant to capture every detail of a real neuronal circuit, expanding on the role of inhibition in this clustered architecture would greatly strengthen this work.

      This is an interesting future direction, but we see it as outside the scope of our current work. While inhibitory microcircuits are certainly important physiologically, we focus here on a minimal model that produces the desired place cell activity and preplay, as measured in excitatory cells. We have added a brief discussion of this to the manuscript.

      Lines 733-739: “Additionally, the in vivo microcircuitry of CA3 is complex and includes aspects such as nonlinear dendritic computations and a variety of inhibitory cell types (Rebola et al., 2017). This microcircuitry is crucial for explaining certain aspects of hippocampal function, such as ripple and gamma oscillogenesis (Ramirez-Villegas et al., 2017), but here we have focused on a minimal model that is sufficient to produce place cell spiking activity that is consistent with experimentally measured place field and preplay statistics.”

      For the modeling insights to be physiologically plausible, it is important to show that CA3 connectivity (which the model mimics) shares the proposed small-world architecture. The authors discuss the existence of this architecture in various brain regions but not in CA3, which is traditionally thought of and modeled as a random or fully connected recurrent excitatory network. A thorough discussion of CA3 connectivity would strengthen this work.

      We agree this is an important point that is missing, and we have modified lines 114-116 to address the clustered connectivity reported in CA3.

      Lines 114-116: “Such clustering is a common motif across the brain, including the CA3 region of the hippocampus (Guzman et al., 2016) as well as cortex (Song et al., 2005), …”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Based on Figure 3, the place fields are not uniformly distributed in the maze. Meanwhile, based on Figure 1b and Methods, the total input seems to be uniform across the maze. Why does the uniform total external input lead to nonuniform network activities?

      While the total input to the network is constant across the maze, the input to any individual cell can peak only at either end of the track. All excitatory cells receive input from both the left-cue and the right-cue with different input strengths. By chance and due to the cluster-dependent bias some cells will have stronger input from one cue than the other and will therefore be more likely to have a place field toward that side of the track. However, no cell receives a peak of input in the center of the track. We have modified lines 141-143 to clarify this.

      Lines 141-143: “While the total input to the network is constant as a function of position, each cell only receives a peak in its spatially linearly varying feedforward input at one end of the track.”

      (2) I find these sentences confusing: "...we expected that the set of spiking events that significantly decode to linear trajectories in one environment (Figure 4) should decode with a similar fidelity in another environment..." (Lines 513-515) and "As expected... but not with the place fields of trajectories from different environments (Figure 7c)" (Line 517-520). What is the expectation for cross-environment decoding? Should they be similar or different? Also, in Figure 7c, the example is not fully convincing. In the figure caption, it states that decoding is significant in the top row but not in the bottom row, but they look similar across rows.

      Original lines 513-515 refer to the entire set of events, while original lines 517-520 refer to one example event. The sleep events are simulated without any track-specific information present, so the degree to which preplay occurs when decoding based on the place fields of a specific future track should be independent of any particular track when considering the entire set of decoded PBEs, as shown in Figure 9d (original Figure 7). However, because there is strong remapping across tracks (Figure 9b), an individual event that shows a strong decoded trajectory based on the place fields of one track (Figure 9c, top row) should show chance levels of a decoded trajectory when decoded with the place fields of an alternative track (Figure 9c, bottom row).

      We have revised lines 643-650 for clarity, and we have added statistics for the events shown in Figure 9c.

      Lines 644-651: “Since the place field map correlations are high for trajectories on the same track and near zero for trajectories on different tracks, any individual event would be expected to have similar decoded trajectories when decoding based on the place fields from different trajectories in the same environment and dissimilar decoded trajectories when decoding based on place fields from different environments. A given event with a strong decoded trajectory based on the place fields of one environment would then be expected to have a weaker decoded trajectory when decoded with place fields from an alternative environment (Figure 9c).

      Lines 604-608: “(c) An example event with a statistically significant trajectory when decoded with place fields from Env. 1 left (absolute correlation at the 99th percentile of time-bin shuffles) but not when decoded with place fields of the other trajectories (78th, 45th, and 63rd percentiles, for Env. 1 right, Env. 2 left, and Env. 2 right, respectively). shows a significant trajectory when it is decoded with place fields from one environment (top row), but not when it is decoded with place fields from another environment (bottom row). “

      (3) In Methods, the equation at line 610, E in the last term should be E_ext.

      We modeled the feedforward inputs as excitatory connections with the same reversal potential as the recurrent excitatory connections, so  is the proper value.

      (4) Equation line 617 states that conductances follow exponential decay, but the initial conductances of g_I.g_E and g_SRA are not specified.

      We have added a description of the initial values in lines 760-764.

      Lines 760-764: “Initial feed-forward input conductances were set to values approximating their steady-state values by randomly selecting values from a Gaussian with a mean of   and a standard deviation of . Initial values of the recurrent conductances and the SRA conductance were set to zero.”

      (5) In the parameter table below line 647, W_E-E, W_E-I, and W_I-E are not described in the text.

      We have clarified in lines 757-760 that the step increase in conductance corresponds to these parameter values.

      Lines 757-760: “A step increase in conductance occurs at the time of each spike by an amount corresponding to the connection strength for each synapse ( for E-to-E connections, for E-to-I connections, and  for I-to-E connections), or by  for .”

      (6) On line 660, "...Each environment and the sleep session had unique context cue input weights...". Does that mean that within a sleep session, the network received the same context input? How strongly are the sleep dynamics driven by that context input rather than by intrinsic dynamics? Usually, sleep activity is high dimensional, what would happen if the input during sleep is more stochastic?

      Yes, within a sleep session each network receives a single set of context inputs, which are implemented as independent Poisson spike trains (so being independent, in small time-windows the dimensionality is equal to the number of neurons). The effects of any particular set of sleep context cue inputs should be minor, since the standard deviation of the input weights, , is small. Further, because the preplay analysis is performed across many networks at each parameter point, the observation of preplay is independent of any particular realization of either the recurrent network or the sleep context inputs.

      Further exploring the effects of more biophysically realistic neural dynamics during simulated sleep is an interesting future direction.

      (7) One bracket is missing in the denominator in line 831.

      We have fixed this error.

      Line 1005: “)” -> “()”

      Reviewer #2 (Recommendations For The Authors):

      - I would suggest the authors cite Chenkov et al 2017, PLOS Comp Bio, in which "replay" sequences were produced in clustered networks, and discuss how their work differs.

      We have included a contrast of our model to that of Chenkov et al., 2017 in lines 73-78.

      Lines 73-78: “Related to replay models based on place-field distance-dependent connectivity is the broader class of synfire-chain-like models. In these models, neurons (or clusters of neurons) are connected in a 1-dimensional feed-forward manner (Diesmann et al., 1999; Chenkov et al., 2017). The classic idea of a synfire-chain has been extended to included recurrent connections, such as by Chenkov et al., 2017, however such models still rely on an underlying 1-dimensional sequence of activity propagation.”

      - Figure legend 2e says "replay", should be "preplay".

      We have fixed this error.

      Line 255: “(e) Example preplay event…”

      - How much does the context cue affect the result? e.g. Is sleep notably different with different sleep context cues?

      As discussed above in our response to Reviewer 1, the context cue weights have a small standard deviation, , which means that differences in the effects of different realizations of the context inputs are small. Different sets of context cues will cause cells to have slightly higher or lower spiking rates during sleep simulations, but because there is no correlation between the sleep context cue and the place field simulations there should be no effect on preplay quality.

      - Figure 4 should include a control with a single cluster.

      We thank the reviewer for this suggestion and have added additional control simulations.

      In our model, the recurrent structure of a network with a single cluster is equivalent to a cluster-less random network. Additionally, any network where cluster participation equals the number of clusters is equivalent to a cluster-less random network, since all neurons belong to all clusters and can therefore potentially connect to any other neuron. Such a condition corresponds to a diagonal boundary where the number of clusters equals the cluster participation, which occurs at higher values of cluster participation than we had shown in our primary parameter grid.

      We now include simulation results that extend to this boundary, corresponding to cluster-less networks (Figure 4—figure supplement 4f). Networks at these parameter points do not show preplay. See our earlier response for the new text associated with Figure 4—figure supplement 4.

      - The results of Figure 4 are very noisy. I would recommend increasing the sampling, both in terms of the number of population events in each condition and the number of conditions.

      We have run simulations for longer durations (300 seconds) and with more networks (20) to produce more accurate empirical values for the statistics calculated across the parameter grids in Figures 3 and 4. Our additional simulations (Figure 4—figure supplement 4) provide support that the parameter region of preplay significance is reliable.

      Lines 831-833: “For the parameter grids in Figures 3 and 4 we simulated 20 networks with 300 s long sleep sessions in order to get more precise empirical estimates of the simulation statistics.”

      - It's not entirely clear what's different between the analysis described in lines 334-353, and the preplay analysis in Figure 2. In general, the description of this result was difficult to follow, as it included a lot of text that would be better served in the methods.

      In Figure 2 we first introduce the Bayesian decoding method, but it is not until Figure 4 that the shuffle-based significance testing is first introduced. We have simplified the description of the shuffle comparison in lines 371-375 and now refer the reader to the methods for details.

      Lines 371-375: “We find significant preplay in both our reference experimental data set (Shin et al., 2019; Figure 4a, b; see Figure 4—figure supplement 1 for example events) and our model (Figure 4c, d) when analyzed by the same methods as Farooq et al., 2019, wherein the significance of preplay is determined relative to time-bin shuffled events (see Methods). For each detected event we calculated its absolute weighted correlation. We then generated 100 time-bin shuffles of each event, and for each shuffle recalculated the absolute weighted correlation to generate a null distribution of absolute weighted correlations.”

      - Many of the figures have low text resolution (e.g. Figure 6).

      We have now fixed this.

      - How does the clustered small world network compare to e.g. a small world ring network as used in Watts and Strogatz 1998?

      As described in our above response to Reviewer 1's fourth point, we have added a supplementary figure (Figure 1—figure supplement 1, with corresponding text) comparing our model with the Watts-Strogatz model.

      Reviewer #3 (Recommendations For The Authors):

      Figure 5 would benefit from a plot of the overlap of activated clusters per event.

      In our cluster activation analysis in Figure 5, we defined a cluster as “active” if at any point in the event its population rate was twice that of any other clusters’. We used this definition—which permits no overlap of activated clusters—rather than a definition based on a z-scoring of the rate, because we determined that preplay required periods of spiking dominated by individual clusters.

      Author response image 3.

      The choice of such a definition is supported by our observation that most spiking activity within an event is dominated by whichever cluster is most active at each point in time. In the left panel of the above figure we show the distribution of the average fraction of spikes within each event that came from the most active cluster at each point in time. The right panel shows the distribution of the average across time within each event of the ratio of the population activity rate of the most active cluster to the second most active cluster. The data for both panels comes from all events at the fiducial parameter set.

      Author response image 4.

      Rather than overlapping at a given moment in time, clusters might have overlap in their probability of being active at some point within an event. We do find that there is a small but significant correlation in cluster co-activation. For each network we calculated the activation correlation across events for each pair of clusters (example network show in the left panel). We compared the distribution of resulting absolute correlations against the values that results after shuffling the correlations between cluster activations (right panel, all correlations for all networks from the fiducial parameter point).

      Figures 4e/f are referred to as 4c/d in the text (pg 14).

      We have fixed this error.

      Lines 400-412: “4c” -> “4e” and “4d” -> “4f”

    2. eLife assessment

      This study presents an important finding on the spontaneous emergence of structured activity in artificial neural networks endowed with specific connectivity profiles. The evidence supporting the claims of the authors is convincing, providing direct comparison between the properties of the model and neural data although investigating more naturalistic inputs to the network would have strengthened the main claims. The work will be of interest to systems and computational neuroscientists studying the hippocampus and memory processes.

    3. Reviewer #1 (Public review):

      Summary:

      An investigation of the dynamics of a neural network model characterized by sparsely connected clusters of neuronal ensembles. The authors found that such a network could intrinsically generate sequence preplay and place maps, with properties like those observed in the real-world data.

      Strengths:

      Computational model and data analysis supporting the hippocampal network mechanisms underlying sequence preplay of future experiences and place maps.<br /> The revised version of the manuscript addressed all my comments and as a result is significantly improved.

      Weaknesses:

      None noted

    4. Reviewer #2 (Public review):

      Summary:

      The authors show that a spiking network model with clustered connectivity produces intrinsic spike sequences when driven with an ramping input, which are recapitulated in the absence of input. This behavior is only seen for some network parameters (neuron cluster participation and number of clusters in the network), which correspond to those that produce a small world network. By changing the strength of ramping input to each network cluster, the network can show different sequences.

      Strengths:

      A strength of the paper is the direct comparison between the properties of the model and neural data.

      Weaknesses:

      My main critique of the paper relates to the form of the input to the network. Specifically, it's unclear how much the results depend on the choice of a one-dimensional environment with ramping input. While this is an elegant idealization that allows the authors to explore the representation and replay properties of their model, it is a strong and highly non-physiological constraint. In order to address this concern, the authors would need to test the spatial tuning of their network in 2-dimensional environments, and with different kinds of input from a population of neurons that have a range of degree of spatial tuning and physiological plausibility. A method for systematically producing input with varying degrees of spatial tuning in both 1D and 2D environments has been previously used in (Fang et al 2023, eLife, see Figures 4 and 5), which could be readily adapted for the current study; and behaviorally plausible trajectories in 2D can be produced using the RatInABox package (George et al 2022, bioRxiv), which can also generate e.g. grid cell-like activity that could be used as physiologically plausible input to the network.

    5. Reviewer #3 (Public review):

      This work offers a novel perspective to the question of how hippocampal networks can adaptively generate different spatial maps and replays of the corresponding place cells, without any such maps pre-existing in the network architecture or its inputs. And how can these place cells preplay their sequences even before the environment is experienced? Previous models required pre-existing spatial representations to be artificially introduced, limiting their adaptability to new environments. Others depended on synaptic plasticity rules which made remapping slower that what is seen in recordings. In contrast, this modeling study proposes that quickly-adaptive intrinsic spiking sequences (preplays) and spatially tuned spiking (place cells) can be generated in a network through randomly clustered recurrent connectivity. By simulating spatial exploration through border-cell-like synaptic inputs, the model generates place cells for different "environments" without the need to reconfigure its synaptic connectivity or introduce plasticity. By simulating sleep-like random synaptic inputs, the model generates sequential activations of cells, mimicking preplays. These "preplays" require small-world connectivity, so that cell clusters are activated in sequence. Using a set of electrophysiological recordings from CA1, the authors confirm that the modeled place cells and replays share many features with recorded ones.

      Many features of the model are thoroughly examined, and conclusions are overall convincing (within the simple architecture of the model). Even though the modeled connectivity applies more closely to CA3, it remains unclear whether CA3 recapitulates the proposed small world architecture.

      In any case, the proposal that a small-world-structured, clustered network can generate flexible place cells and replays without the need for pre-configured maps is novel and of potential interest to a wide computational and experimental community.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife assessment: I find that the eLife assessment mentions “statistical analyses are yet to be carried out to support statements of statistical significance” while the reviewers mention that the data are compelling and results are technically solid. Besides all observations in the manuscript are presented with robust and established norms of statistical analysis.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths:

      The use of data from before COVID-19 is both a strength and a weakness. Because COVID had effects on vascular health and had higher death rates for groups with the comorbidities of interest here, it has likely shifted the demographics in ways that would shift the results in unpredictable ways if the analysis were repeated with current data. This can be a strength in providing a reference point for studying those changes as well as allowing researchers to study differences between regions without the complication of different public health responses adding extra variation to the data. On the other hand, it limits the usefulness of the data in research concerned with the current status of the various populations.

      We completely agree with the observation, but were restricted as the purpose was to use the most robust and technically qualified data from GBD. The post COVID19 GBD data has not yet been released, but I am sure the observations made in the study can help in guiding the issues in the post COVID era too, because genetics is not going to change in these population groups.

      However, we did highlight this aspect of COVID19 even in our original version and also in the revised version.

      Reviewer #2 (Public Review):

      Weaknesses:

      The presentation is not focused. It is important to include p-values for all comparisons and focus the presentation on the main effects from the dataset analysis.

      The significant p-values were restricted to public health data only to identify and distinguish differences in incidence, prevalence and mortality and how they differ across world populations. These differences have often been interpreted from socio-economic point of view, while our manuscript presents the reasons for differences for main condition (Stroke) and its comorbid condition among different ethnicities from a genetic perspective. This genetic perspective was further explored to identify unique ethnic specific variants and their patterns of linkage disequilibrium in distinguishing the phenotypic variations. Considering the quantum and diversity of data, both for public health and GWAS data, there can be several directions but for presentation we focused only on the most distinguishing and established phenotypic differences. I am sure this will open up avenues for several future investigations including COVID, as has been highlighted by the reviewers too. All observations were presented with robust and established norms of statistical analysis.


      The following is the authors’ response to the original reviews.

      Thanks for the constructive observations on strengths and weaknesses of our manuscript. Interestingly, some of the weaknesses mentioned here also turns out to be the strength of the article. For example COVID19 has been mentioned by the reviewer as a driver to increase the mortality in some comorbid conditions and stroke. Firstly, I must clarify that, our data is from PreCOVID era and we indeed mention that in COVID era, COVID-19 might differentially impact the risk of stroke. Possibly this differential influence on the comorbidities of stroke, is likely to be influenced by its underlying genetics of stroke and its comorbidities.

      I have tried to address the concerns raised by the reviewers, which ideally doesn’t impact the original manuscript. Statistical limitation has been commented pertaining to P-values, which has been clarified here. However, certain minor concerns such as abbreviations have been resolved in the revised manuscript. My response to weakness and reviewer’s comments are mentioned below.

      Reviewer #1 (Public Review):

      Strengths:

      The data provided here will provide a foundation for a lot of future research into the causes of the observed correlations as well as whether the observed differences in comorbidities across regions have clinically relevant effects on risk management.

      Weaknesses:

      • As with any cross-national analysis of rates, the data is vulnerable to differences in classification and reporting across jurisdictions.

      GBD data is the most robust and most comprehensive data resource which has been used and accepted globally in predicting the health metrics statistics.

      GBD data indeed considers normalisations, regarding classification and reporting.

      To the best of our knowledge this is the best available resource to consider all health metrics analysis.

      • Furthermore, given the increased death rate from COVID-19 associated with many of these comorbid conditions and the long-term effects of COVID-19 infection on vascular health, it is expected that many of the correlations observed in this dataset will shift along with the shifting health of the underlying populations.

      I must clarify that we have used data prior to COVID-19.

      But yes the patterns after COVID19 will shift due to the impact of covid. This makes the study even more relevant as the comorbid conditions of stroke are also the risk drivers for COVID19 and mortality. This shift has been reported by some authors, which has been discussed in the discussion.

      Therefore, understanding the genetic factors underlying stroke and its comorbid conditions might help in resolving how COVID19 might differentially impact on health outcome.

      We did highlight this aspect of COVID19 even in our original version.

      Introduction 1st para:

      “It is the accumulated risk of comorbid conditions that enhances the risk of stroke further. Are these comorbid conditions differentially impacted by socio-economic factors and ethnogeographic factors. This was clearly evident in COVID era, when COVID-19 differentially impacted the risk of stroke, possibly due to its differential influence on the comorbidities of stroke.”

      Discussion 3rd para:

      “Studies reported reduction in life expectancy in 31 of 37 high-income countries, deduced to be due to COVID-191 . However, it would be unfair to ignore the comorbid conditions which could also be the critical determinants for reduced life expectancy in these countries.”

      Recommendations For The Authors:

      On page 5, the authors make a note about Africa and the Middle East having the highest ASMR for high SBP and comment about the relative populations of these regions. The populations of the regions are irrelevant to the rate.

      Since the study is on comorbid factors of stroke and its impact on mortality therefore, relative burden seems critical. This has been further elaborated here to justify the comment, which indeed is an integral part of the original manuscript.

      Paragraph referred – Results section 2:

      “Ethno-regional differences in mortality and prevalence of stroke and its major comorbid conditions

      We observed interesting patterns of ASMRs of stroke, its subtypes and its major comorbidities across different regions over the years as shown in figure 1a, table 1 and supplementary files S2 & S3. When assessed in terms of ranks, high SBP is the most fatal condition followed by IHD in all regions, except Oceania (OCE) where IHD and high SBP swap ranks. Africa (AFR; 206.2/100000, 95%UI 177.4-234.2) and Middle East (MDE; 198.6/100000, 95%UI 162.8-234.4) have the highest ASMR for high SBP, even though they rank as only the third and sixth most populous continents (fig. S2), respectively.”

      On page 17, the authors are alarmed by a large ratio between prevalence rates and mortality rates for certain conditions. This is confusing since this indicates that these conditions are not as dangerous as the other conditions.

      This has been further elaborated here to justify the comment, which indeed is an integral part of the original manuscript.

      Paragraph referred – Discussion para 1:

      “While the global stroke prevalence is nearly 15 times its mortality rate, prevalence of comorbid conditions such as high SBP, high BMI, CKD, T2D are alarmingly 150- to 500-fold higher than their mortality rates. These comorbid conditions can drastically affect the outcome of stroke.”

      In Figure 4, the colors are not defined.

      In Structure plot colours are assigned as per each K, it doesn’t directly refer to any population. But the plot distinguishes the stratification of populations as per K. Ramasamy, R.K., Ramasamy, S., Bindroo, B.B. et al. STRUCTURE PLOT: a program for drawing elegant STRUCTURE bar plots in user friendly interface. SpringerPlus 3, 431 (2014). https://doi.org/10.1186/2193-1801-3-431

      Reviewer #2 (Public Review):

      Strengths:

      The idea is interesting and the data are compelling. The results are technically solid.

      The authors identify specific genetic loci that increase the risk of a stroke and how they differ by region.

      Weaknesses:

      The presentation is not focused. It would be better to include p-values and focus presentation on the main effects of the dataset analysis.

      I presume the comment is made with reference to results with significant p-values.

      P-values are mentioned in the main text when referring to significant decrease or increase with respect to global rates and time e.g. P-values for comparison of a year 2019, are based on regional rates to global rates of 2019. Supplementary table S2a (mortality) and S3a (prevalence) e.g. P-values for comparison between year is based on 2019 rates to 2009 rates in Supplementary table S2b (mortality) and S3b (prevalence) e.g. P-values for proportional mortality and proportional prevalence in Supplementary table S4 and S5 is also based on global rates.

      Recommendations For The Authors:

      It would be better to minimize the use of acronyms. Often one has to go back to decipher what the acronym stands for. It is fine to have acronyms in figure legends, if necessary. However, at least for regions, please do not use acronyms.

      In the revised version we have tried to minimise the Acronyms.

      Removed the acronyms for regions and other places wherever possible however, the diseases acronyms have been maintained as per the GBD terms.

      Please focus the presentation on the main results. Currently, the presentation wanders and repeats itself a lot.

      Since the manuscript tries to address the global and regional rates of prevalence, mortality and its relationship to genetic correlates, it is difficult not to repeat the same to stress the significant observations coming out of different analysis methods. This might reflect on some amount of repetitiveness but the intention was to stress the significant observations.

      I would also recommend acknowledging and discussing socioeconomic factors earlier in the manuscript.

      Current mention happens in 3rd para of Discussion

      “The changing dynamics of stroke or its comorbid conditions can be attributed to multitude of factors. Often global burden of stroke has been discussed from the point of view of socio-economic parameters. Studies indicate that half of the stroke-related deaths are attributable to poor management of modifiable risk factors 8,9. However, we observe that different socio-economic regions are driven by different risk factors.”

    1. eLife assessment

      This study presents a fundamental finding on how levels of m6A levels are controlled, invoking a consolidated model where degradation of modified RNAs in the cytoplasm plays a primary role in shaping m6A patterns and dynamics, rather than needing active regulation by m6A erasers and other related processes. The evidence is compelling and uses transcriptome-wide data and mechanistic modeling. However, it is possible that m6A-erasers will have roles in specific developmental contexts or conditions, so this model may not apply universally.

    2. Reviewer #1 (Public review):

      Summary:

      Here, the authors propose that changes in m6A levels may be predictable via a simple model that is based exclusively on mRNA metabolic events. Under this model, m6A mRNAs are "passive" victims of RNA metabolic events with no "active" regulatory events needed to modulate their levels by m6A writers, readers, or erasers; looking at changes in RNA transcription, RNA export, and RNA degradation dynamics is enough to explain how m6A levels change over time.

      The relevance of this study is extremely high at this stage of the epi transcriptome field. This compelling paper is in line with more and more recent studies showing how m6A is a constitutive mark reflecting overall RNA redistribution events. At the same time, it reminds every reader to carefully evaluate changes in m6A levels if observed in their experimental setup. It highlights the importance of performing extensive evaluations on how much RNA metabolic events could explain an observed m6A change.

      Weaknesses:

      It is essential to notice that m6ADyn does not exactly recapitulate the observed m6A changes. First, this can be due to m6ADyn's limitations. The authors do a great job in the Discussion highlighting these limitations. Indeed, they mention how m6ADyn cannot interpret m6A's implications on nuclear degradation or splicing and cannot model more complex scenario predictions (i.e., a scenario in which m6A both impacts export and degradation) or the contribution of single sites within a gene.

      Secondly, since predictions do not exactly recapitulate the observed m6A changes, "active" regulatory events may still play a partial role in regulating m6A changes. The authors themselves highlight situations in which data do not support m6ADyn predictions. Active mechanisms to control m6A degradation levels or mRNA export levels could exist and may still play an essential role.

      (1) "We next sought to assess whether alternative models could readily predict the positive correlation between m6A and nuclear localization and the negative correlations between<br /> m6A and mRNA stability. We assessed how nuclear decay might impact these associations by introducing nuclear decay as an additional rate, δ. We found that both associations were robust to this additional rate (Supplementary Figure 2a-c)."<br /> Based on the data, I would say that model 2 (m6A-dep + nuclear degradation) is better than model 1. The discussion of these findings in the Discussion could help clarify how to interpret this prediction. Is nuclear degradation playing a significant role, more than expected by previous studies?

      (2) The authors classify m6A levels as "low" or "high," and it is unclear how "low" differs from unmethylated mRNAs.

      (3) The authors explore whether m6A changes could be linked with differences in mRNA subcellular localization. They tested this hypothesis by looking at mRNA changes during heat stress, a complex scenario to predict with m6ADyn. According to the collected data, heat shock is not associated with dramatic changes in m6A levels. However, the authors observe a redistribution of m6A mRNAs during the treatment and recovery time, with highly methylated mRNAs getting retained in the nucleus being associated with a shorter half-life, and being transcriptional induced by HSF1. Based on this observation, the authors use m6Adyn to predict the contribution of RNA export, RNA degradation, and RNA transcription to the observed m6A changes. However:

      (a) Do the authors have a comparison of m6ADyn predictions based on the assumption that RNA export and RNA transcription may change at the same time?

      (b) They arbitrarily set the global reduction of export to 10%, but I'm not sure we can completely rule out whether m6A mRNAs have an export rate during heat shock similar to the non-methylated mRNAs. What happens if the authors simulate that the block in export could be preferential for m6A mRNAs only?

      (c) The dramatic increase in the nucleus: cytoplasmic ratio of mRNA upon heat stress may not reflect the overall m6A mRNA distribution upon heat stress. It would be interesting to repeat the same experiment in METTL3 KO cells. Of note, m6A mRNA granules have been observed within 30 minutes of heat shock. Thus, some m6A mRNAs may still be preferentially enriched in these granules for storage rather than being directly degraded. Overall, it would be interesting to understand the authors' position relative to previous studies of m6A during heat stress.

      (d) Gene Ontology analysis based on the top 1000 PC1 genes shows an enrichment of GOs involved in post-translational protein modification more than GOs involved in cellular response to stress, which is highlighted by the authors and used as justification to study RNA transcriptional events upon heat shock. How do the authors think that GOs involved in post-translational protein modification may contribute to the observed data?

      (e) Additionally, the authors first mention that there is no dramatic change in m6A levels upon heat shock, "subtle quantitative differences were apparent," but then mention a "systematic increase in m6A levels observed in heat stress". It is unclear to which systematic increase they are referring to. Are the authors referring to previous studies? It is confusing in the field what exactly is going on after heat stress. For instance, in some papers, a preferential increase of 5'UTR m6A has been proposed rather than a systematic and general increase.

    3. Reviewer #2 (Public review):

      Dierks et al. investigate the impact of m6A RNA modifications on the mRNA life cycle, exploring the links between transcription, cytoplasmic RNA degradation, and subcellular RNA localization. Using transcriptome-wide data and mechanistic modelling of RNA metabolism, the authors demonstrate that a simplified model of m6A primarily affecting cytoplasmic RNA stability is sufficient to explain the nuclear-cytoplasmic distribution of methylated RNAs and the dynamic changes in m6A levels upon perturbation. Based on multiple lines of evidence, they propose that passive mechanisms based on the restricted decay of methylated transcripts in the cytoplasm play a primary role in shaping condition-specific m6A patterns and m6A dynamics. The authors support their hypothesis with multiple large-scale datasets and targeted perturbation experiments. Overall, the authors present compelling evidence for their model which has the potential to explain and consolidate previous observations on different m6A functions, including m6A-mediated RNA export.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript works with a hypothesis where the overall m6A methylation levels in cells are influenced by mRNA metabolism (sub-cellular localization and decay). The basic assumption is that m6A causes mRNA decay and this happens in the cytoplasm. They go on to experimentally test their model to confirm its predictions. This is confirmed by sub-cellular fractionation experiments which show high m6A levels in the nuclear RNA. Nuclear localized RNAs have higher methylation. Using a heat shock model, they demonstrate that RNAs with increased nuclear localization or transcription, are methylated at higher levels. Their overall argument is that changes in m6A levels are rather determined by passive processes that are influenced by RNA processing/metabolism. However, it should be considered that erasers have their roles under specific environments (early embryos or germline) and are not modelled by the cell culture systems used here.

      Strengths:

      This is a thought-provoking series of experiments that challenge the idea that active mechanisms of recruitment or erasure are major determinants for m6A distribution and levels.

    1. eLife assessment

      The authors made a useful finding that Zizyphi spinosi semen, a traditional Chinese medicine, has demonstrated excellent biological activity and potential therapeutic effects against Alzheimer's disease (AD). The researchers presented the effects, but the research evidence for the mechanism was incomplete. The main claims were only partially supported.

    2. Reviewer #1 (Public review):

      Summary:

      The study shows that Zizyphi spinosi semen (ZSS), particularly its non-extracted simple crush powder, has significant therapeutic effects on neurodegenerative diseases. It removes Aβ, tau, and α-synuclein oligomers, restores synaptophysin levels, enhances BDNF expression and neurogenesis, and improves cognitive and motor functions in mouse AD, FTD, DLB, and PD models. Additionally, ZSS powder reduces DNA oxidation and cellular senescence in normal-aged mice, increases synaptophysin, BDNF, and neurogenesis, and enhances cognition to levels comparable to young mice.

      Weaknesses:

      (1) While the study demonstrates that ZSS has protective effects across a wide range of animal models, including AD, FTD, DLB, PD, and both young and aged mice, it is broad and lacks a detailed investigation into the underlying mechanisms. This is the most significant concern.

      (2) The authors highlight that the non-extracted simple crush powder of ZSS shows more substantial effects than its hot water extract and extraction residue. However, the manuscript provides very limited data comparing the effects of these three extracts.

      (3) The authors have not provided a rationale for the dosing concentrations used, nor have they tested the effects of the treatment in normal mice to verify its impact under physiological conditions.

      (4) Regarding the assessment of cognitive function in mice, the authors only utilized the Morris Water Maze (MWM) test, which includes a five-day spatial learning training phase followed by a probe trial. The authors focused solely on the learning phase. However, it is relevant to note that data from the learning phase primarily reflects the learning ability of the mice, while the probe trial is more indicative of memory. Therefore, it is essential that probe trial data be included for a more comprehensive analysis. A justification should be included to explain why the latency of 1st is about 50s not 60s.

      (5) The BDNF immunohistochemical staining in the manuscript appears to be non-specific.

      (6) The central pathological regions in PD are the substantia nigra and striatum. Please replace the staining results from the cortex and hippocampus with those from these regions in the PD model.

    3. Reviewer #2 (Public review):

      Summary:

      The authors studied the effects of hot water extract, extraction residue, and non-extracted simple crush powder of ZSS in diseased or aged mice. It was found that ZSS played an anti-neurodegenerative role by removing toxic proteins, repairing damaged neurons, and inhibiting cell senescence.

      Strengths:

      The authors studied the effects of ZSS in different transgenic mice and analyzed the different states of ZSS and the effects of different components.

      Weaknesses:

      The authors' study lacked an in-depth exploration of mechanisms, including changes in intracellular signal transduction, drug targets, and drug toxicity detection.

    4. Reviewer #3 (Public review):

      ZSS has been widely used in Traditional Chinese Medicine as a sleep-promoting herb. This study tests the effects of ZSS powder and extracts on AD, PD, and aging, and broad protective effects were revealed in mice.

      However, this work did not include a mechanistic study or target data on ZSS were included, and PK data were also not involved. Mechanisms or targets and PK study are suggested. A human PK study is preferred over mice or rats. E.g. which main active ingredients and the concentration in plasma, in this context, to study the pharmacological mechanisms of ZSS.

    1. eLife assessment

      Utilizing transgenic lineage tracing techniques and tissue clearing-based advanced imaging and three-dimensional slices reconstruction, the authors comprehensively mapped the distribution atlas of NFATc1+ and PDGFR-α+ cells in dental and periodontal mesenchyme and tracked their in vivo fate trajectories. This important work extends our understanding of NFATc1+ and PDGFR-α+ cells in dental and periodontal mesenchyme homeostasis, and should provide impact on clinical application and investigation. The strength of this work is compelling in employing CRISPR/Cas9-mediated gene editing to generate two dual recombination systems, and mapped gNFATc1+ and PDGFR-α+cells residing in dental and periodontal mesenchyme, their capacity for progeny cell generation, and their inclusive, exclusive and hierarchical relations in homeostasis, generating a spatiotemporal atlas of these skeletal stem cell population.

    2. Reviewer #1 (Public Review):

      In this study, Yang et al. investigated the locations and hierarchies of NFATc1+ and PDGFRα+ cells in dental and periodontal mesenchyme. By combining intersectional and exclusive reporters, they attempted to distinguish among NFATc1+PDGFRα+, NFATc1+PDGFRα-, and NFATc1- PDGFRα+ cells. Using tissue clearing and serial section-based 3D reconstruction, they mapped the distribution atlas of these cell populations. Through DTA-induced ablation of PDGFRα+ cells, they demonstrated the crucial role of PDGFRα+ cells in the formation of the odontoblast cell layer and periodontal components.

      Main issues:

      (1) The authors did not quantify the contribution of PDGFRα+ cells or NFATc1+ cells to dental and periodontal lineages in PDGFRαCreER; Nfatc1DreER;LGRT mice. Zsgreen+ cells represented PDGFRα+ cells and their lineages. Tomato+ cells represented NFATc1+ cells and their lineages. Tomato+Zsgreen+ cells represented NFATc1+PDGFRα+ cells and their lineages. Conducting immunostaining experiments with lineage markers is essential to determine the physiological contributions of these cells to dental and periodontal homeostasis.

      (2) The authors attempted to use PDGFRαCreER; Nfatc1DreER;IR1 mice to illustrate the hierarchies of NFATc1+ and PDGFRα+ cells. According to the principle of the IR1 reporter, it requires sequential induction of PDGFRα-CreER and Nfatc1-DreER to investigate their genetic relationship. Upon induction by tamoxifen, NFATc1+PDGFRα- cells and NFATc1-PDGFRα+ cells were labeled by Tomato and Zsgreen, respectively. However, the reporter expression of NFATc1+PDGFRα+ cells was uncertain, most likely random. Therefore, the hierarchical relationship of NFATc1+ and PDGFRα+ cells cannot be reliably determined from PDGFRαCreER; Nfatc1DreER; IR1 mice.

    3. Reviewer #2 (Public Review):

      Summary:

      Yang et al. present an article investigating the spatiotemporal atlas of NFATc1+ and PDGFR-α+ cells within the dental and periodontal mesenchyme. The study explores their capacity for progeny cell generation and their relationships - both inclusive and hierarchical - under homeostatic conditions. Utilizing the Cre/loxP-Dre/Rox system to construct tool mice, combined with tissue transparency and continuous tissue slicing for 3D reconstruction, the researchers effectively mapped the distribution of NFATc1+ and PDGFR-α+ cells. Additionally, in conjunction with DTA mice, the study provides preliminary validation of the impact of PDGFR-α+ cells on dental pulp and periodontal tissues. Primarily, this study offers an in-situ distribution atlas for NFATc1+ and PDGFR-α+ cells but provides limited information regarding their origin, fate differentiation, and functionality.

      Strengths:

      (1) Tissue transparency techniques and continuous tissue slicing for 3D reconstruction, combined with transgenic mice, provide high-quality images and rich, reliable data.<br /> (2) The Cre/loxP and Dre/Rox systems used by the researchers are powerful and innovative.<br /> (3) The IR1 lineage tracing model is significantly important for investigating cellular differentiation pathways.<br /> (4) This study provides effective spatial distribution information of NFATc1+/PDGFR-α+ cell populations in the dental and periodontal tissues of adult mice.

      Weaknesses:

      (1) In the functional experiment section, the investigation into the role of NFATc1+/PDGFR-α+ cell populations is somewhat lacking.

      (2) The author mentions that 3D reconstruction of consecutive tissue slices can provide more detailed information on cell distribution, so what is the significance of using tissue-clearing techniques in this article?

      (3) After reading the entire article, it is confusing whether the purpose of the article is to explore the distribution and function of NFATc1+/PDGFR-α+ cells in teeth and periodontal tissues, or to compare the differences between tissue clearing techniques and 3D reconstruction of continuous histological slices using NFATc1+/PDGFR-α+ cells?

      (4) The researchers did not provide a clear definition of the cell types of NFATc1+/PDGFR-α+ cells in teeth and periodontal tissues.

      (5) In studies related to long bones, the author defines the NFATc1+/PDGFR-α+ cell population as SSCs, which as a stem cell group should play an important role in tooth development or injury repair. However, the distribution patterns and functions of the NFATc1+/PDGFR-α+ cell population in these two conditions have not been discussed in this study.

    4. Reviewer #3 (Public Review):

      Summary:

      This groundbreaking study provided the most advanced transgenic lineage tracing and advanced imaging techniques in deciphering dental/periodontal mesenchyme cells. In this study, authors utilized CRISPR/Cas9-mediated transgenic lineage tracing techniques to concurrently demonstrate the inclusive, exclusive, and hierarchical distributions of NFATc1+ and PDGFR-α+ cells and their lineage commitment in dental and periodontal mesenchyme.

      Strengths:

      In cooperating with tissue clearing-based advanced imaging and three-dimensional slices reconstruction, the distribution and hierarchical relationship of NFATc1+ and PDGFR-α+ cells and progeny cells plainly emerged, which undoubtedly broadens our understanding of their in vivo fate trajectories in craniomaxillofacial tissue. Also, the experiment design is comprehensive and well-executed, and the results are convincing and compelling.

      Weaknesses:

      Minor modifications could be made to the paper, including more details on the advantages of the methodology used by the authors in this study, compared to other studies.

    1. eLife assessment

      In this fundamental study, the authors describe a new data processing pipeline that can be used to discover causal interactions from time-lapse imaging data. The utility of this pipeline was convincingly illustrated using tumor-on-chip ecosystem data. The newly developed pipeline could be used to better understand cell-cell interactions and could also be applied to perform temporal causal discovery in other areas of science, meaning this work could potentially have a wide range of applications.

    2. Reviewer #1 (Public review):

      Summary:

      This paper presents a data processing pipeline to discover causal interactions from time-lapse imaging data, and convicingly illustrates it on a challenging application for the analysis of tumor-on-chip ecosystem data.

      The core of the discovery module is the original tMIIC method of the authors, which is shown in supplementary material to compare favourably to two state-of-the-art methods on synthetic temporal data on a 15 nodes network.

      Strengths:

      This paper tackles the problem of learning causal interactions from temporal data which is an open problem in presence of latent variables.

      The core of the method tMIIC of the authors is nicely presented in connection to Granger-Schreiber causality and to the novel graphical conditions used to infer latent variables and based on a theorem about transfer entropy.

      tMIIC compares favourably to PC and PCMCI+ methods using different kernels on synthetic datasets generated from a network of 15 nodes.

      A full application to tumor-on-chip cellular ecosystems data including cancer cells, immune cells, cancer-associated fibroblasts, endothelial cells and anti cancer drugs, with convincing inference results with respect to both known and novel effects between those components and their contact.

      The code and dataset are available online for the reproducibility of the results.

      Weaknesses:

      The references to "state-of-the-art methods" concerning the inference of causal networks should be more precise by giving citations in the main text, and better discussed in general terms, both in the first section and in the section of presentation of CausalXtract. It is only in the legend of the figures of the supplementary material that we get information.

      Of course, comparison on our own synthetic datasets can always be criticized but this is rather due to the absence of common benchmark and I would recommend the authors to explicitly propose their datasets as benchmark to the community.

    3. Reviewer #2 (Public review):

      Summary:

      The authors propose a methodology to perform causal (temporal) discovery. The approach appears to be robust and is tested in the different scenarios: one related with live-cell imaging data, and another one using synthetic (mathematically defined) time series data. They compare the performance of their findings against another well-know method by using metrics like F-score, precision and recall,

      Strengths:

      Performance, robustness, the text is clear and concise, The authors provide the code to review.

      Weaknesses:

      One concern could be the applicability of the method in other areas like climate, economy. For those areas, public data are available and might be interesting to test how the method performs with this kind of data.

    1. Reviewer #1 (Public Review):

      Summary:<br /> Both flies and mammals have D1-like and D2-like dopamine receptors, yet the role of D2-like receptors in Drosophila learning and memory remains underexplored. The paper by Qi et al. investigates the role of the D2-like dopamine receptor D2R in single pairs of dopaminergic neurons (DANs) during single-odor aversive learning in the Drosophila larva. First, they use confocal imaging to screen driver strains with expression in only single pairs of dopaminergic neurons. Next, they use thermogenetic manipulations of one pair of DANs (DAN-c1) to implicate DAN-c1 activity during larval aversive learning. They then use confocal imaging to demonstrate expression of D2R in the DANs and mushroom body of the larval brain. Finally, they show that optogenetic activation during training phenocopies D2R knockdown in these neurons: aversive learning is impaired when DAN-c1 is targeted, while appetitive and aversive learning are impaired when the mushroom body is manipulated. Qi et al. thus propose a model in which D2R limits excessive dopamine release to facilitate successful olfactory learning.

      Strengths:<br /> The paper reproduces prior findings by Qi and Lee (2014), which demonstrated that D2R knockdown in DL1 DANs or the mushroom body impairs aversive olfactory learning in Drosophila larvae. The authors extended this previous work by screening 57 GAL4 drivers to identify tools that drive expression in individual DANs and used one of the tools, the R76F02-AD; R55C10-DBD driver, to manipulate DAN-c1 neurons with greater specificity. They also show that GFP-tagged D2R is expressed in most DANs and the mushroom body. Although the authors only train larvae with a single odor, they demonstrate that driving D2R knockdown in DAN-c1 neurons impairs aversive learning, as do other loss-of-function manipulations of DAN-c1 neurons.

      Weaknesses:<br /> The authors claim to have identified drivers that label single DANs in Figure 1, but their confocal images in Figure S1 suggest that many of those drivers label additional neurons in the larval brain. It is also not clear why only some of the 57 drivers are displayed in Figure S1.<br /> Critically, R76F02-AD; R55C10-DBD labels more than one neuron per hemisphere in Figure S1c, and the authors cite Xie et al. (2018) to note that this driver labels two DANs in adult brains. Therefore, the authors cannot argue that the experiments throughout their paper using this driver exclusively target DAN-c1.<br /> Missing from the screen of 57 drivers is the driver MB320C, which typically labels only PPL1-γ1pedc in the adult and should label DAN-c1 in the larva. If MB320C labels DAN-c1 exclusively in the larva, then the authors should repeat their key experiments with MB320C to provide more evidence for DAN-c1 involvement specifically.<br /> The authors claim that the SS02160 driver used by Eschbach et al. (2020) labels other neurons in addition to DAN-c1. Could the authors use confocal imaging to show how many other neurons SS02160 labels? Given that both Eschbach et al. and Weber et al. (2023) found no evidence that DAN-c1 plays a role in larval aversive learning, it would be informative to see how SS02160 expression compares with the driver the authors use to label DAN-c1.<br /> The claim that DAN-c1 is both necessary and sufficient in larval aversive learning should be reworded. Such a claim would logically exclude any other neuron or even the training stimuli from being involved in aversive learning (see Yoshihara and Yoshihara (2018) for a detailed discussion of the logic), which is presumably not what the authors intended because they describe the possible roles of other DANs during aversive learning in the discussion.<br /> Moreover, if DAN-c1 artificial activation conveyed an aversive teaching signal irrespective of the gustatory stimulus, then it should not impair aversive learning after quinine training (Figure 2k). While the authors interpret Figure 2k (and Figure 5) to indicate that artificial activation causes excessive DAN-c1 dopamine release, an alternative explanation is that artificial activation compromises aversive learning by overriding DAN-c1 activity that could be evoked by quinine.<br /> The authors should not necessarily expect that D2R enhancer driver strains would reflect D2R endogenous expression, since it is known that TH-GAL4 does not label p(PAM) dopaminergic neurons. Their observations of GFP-tagged D2R expression could be strengthened with an anti-D2R antibody such as that used by Lam et al., (1999) or Love et al., (2023).<br /> Finally, the authors could consider the possibility other DANs may also mediate aversive learning via D2R. Knockdown of D2R in DAN-g1 appears to cause a defect in aversive quinine learning compared with its genetic control (Figure S4e). It is unclear why the same genetic control has unexpectedly poor aversive quinine learning after training with propionic acid (Figure S5a). The authors could comment on why RNAi knockdown of D2R in DAN-g1 does not similarly impair aversive quinine learning (Figure S5b).

    2. eLife assessment

      This study presents a valuable finding on the role of dopamine receptor D2R in dopaminergic neurons DAN-c1 and mushroom body neurons (Y201-GAL4 pattern) on aversive and appetitive conditioning. The evidence supporting the claims of the authors is solid and promotes the investigation using fly larvae, which have interesting advantages in the time required for obtaining experimental animals and the use of optogenetics. The work will be of interest to researchers studying neuronal control of behaviour and learning and memory in general.

    3. Reviewer #2 (Public Review):

      Summary:<br /> The study wanted to functionally identify individual DANs that mediate larval olfactory<br /> learning. Then search for DAN-specific driver strains that mark single dopaminergic neurons, which subsequently can be used to target genetic manipulations of those neurons. 56 GAL4 drivers identifying dopaminergic neurons were found (Table 1) and three of them drive the expression of GFP to a single dopaminergic neuron in the third-instar larval brain hemisphere. The DAN driver R76F02-AD;R55C10-DBD appears to drive the expression to a dopaminergic neuron innervating the lower peduncle (LP), which would be DAN-c1.<br /> Split-GFP reconstitution across synaptic partners (GRASP) technique was used to investigate the "direct" synaptic connections from DANs to the mushroom body. Potential synaptic contact between DAN-c1 and MB neurons (at the lower peduncle) were detected.<br /> Then single odor associative learning was performed and thermogenetic tools were used (Shi-ts1 and TrpA1). When trained at 34{degree sign}C, the complete inactivation of dopamine release from DAN-c1 with Shibirets1 impaired aversive learning (Figure 2h), while Shibirets1 did not affect learning when trained at room temperature (22{degree sign}C). When paired with a gustatory stimulus (QUI or SUC), activation of DAN-c1 during training impairs both aversive and appetitive learning (Figure 2k).<br /> They examined the expression pattern of D2R in fly brains and were found in dopaminergic neurons and the mushroom body (Figure 3). To inspect whether the pattern of GFP signals indeed reflected the expression of D2R, three D2R enhancer driver strains (R72C04, R72C08, and R72D03-GAL4) were crossed with the GFP-tagged D2R strain.<br /> D2R knockdown (UAS-RNAi) in dopaminergic neurons driven by TH-GAL4 impaired larval aversive learning. Using a microRNA strain (UAS-D2R-miR), a similar deficit was observed. Crossing the GFP-tagged D2R strain with a DAN-c1-mCherry strain demonstrated the expression of D2R in DAN-c1 (Figure 4a). Knockdown of D2R in DAN-c1 impaired aversive learning with the odorant pentyl acetate, while appetitive learning was unaffected (Figure 4e). Sensory and motor functions appear not affected by D2R suppression.<br /> To exclude possible chronic effects of D2R knockdown during development, optogenetics was applied at distinct stages of the learning protocol. ChR2 was expressed in DAN-c1, and blue light was applied at distinct stages of the learning protocol. Optogenetic activation of DAN-c1 during training impaired aversive learning, not appetitive learning (Figure 5b-d).<br /> Knockdown of D2Rs in MB neurons by D2R-miR impaired both appetitive and aversive learning (Figure 6a). Activation of MBNs during training impairs both larval aversive and appetitive learning.<br /> Finally, based on the data the authors propose a model where the effective learning requires a balanced level of activity between D1R and D2R (Figure 7).

      Strengths:<br /> The work is well written, clear, and concise. They use well documented strategies to examine GAL4 drivers with expression in a single DAN, behavioral performance in larvae with distinct genetic tools including those to do thermo and optogenetics in behaving flies. Altogether, the study was able to expand our understanding of the role of D2R in DAN-c1 and MB neurons in the larva brain.

      Weaknesses:<br /> Is not completely clear how the system DAN-c1, MB neurons and Behavioral performance work. We can be quite sure that DAN-c1;Shits1 were reducing dopamine release and impairing aversive memory (Figure 2h). Similarly, DAN-c1;ChR2 were increasing dopamine release and also impaired aversive memory (Figure 5b). However, is not clear what is happening with DAN-c1;TrpA1 (Figure 2K). In this case the thermos-induction appears to impair the behavioral performance of all three conditions (QUI, DW and SUC) and the behavior is quite distinct from the increase and decrease of dopamine tone (Figure 2h and 5b).

      The study successfully examined the role of D2R in DAN-c1 and MB neurons in olfactory conditioning. The conclusions are well supported by the data, with the exception of the claim that dopamine release from DAN-c1 is sufficient for aversive learning in the absence of unconditional stimulus (Figure 2K). Alternatively, the authors need to provide a better explanation of this point.<br /> The study provides insight into the role of D2R in associative learning expanding our understanding and might be a reference similar to previous key findings (Qi and Lee, 2014, https://doi.org/10.3390/biology3040831).

    4. Reviewer #3 (Public Review):

      It is a strength of the paper that it analyses the function of dopamine neurons (DANs) at the level of single, identified neurons, and uses tools to address specific dopamine receptors (DopRs), exploiting the unique experimental possibilities available in larval Drosophila as a model system. Indeed, the result of their screening for transgenic drivers covering single or small groups of DANs and their histological characterization provides the community with a very valuable resource. In particular the transgenic driver to cover the DANc1 neuron might turn out useful. However, I wonder in which fraction of the preparations an expression pattern as in Figure 1f/ S1c is observed, and how many preparations the authors have analyzed. Also, given the function of DANs throughout the body, in addition to the expression pattern in the mushroom body region (Figure 1f) and in the central nervous system (Figure S1c) maybe attempts can be made to assess expression from this driver throughout the larval body (same for Dop2R distribution).

      A first major weakness is that the main conclusion of the paper, which pertains to associative memory (last sentence of the abstract, and throughout the manuscript), is not justified by their evidence. Why so? Consider the paradigm in Figure 2g, and the data in Figure 2h (22 degrees, the control condition), where the assay and the experimental rationale used throughout the manuscript are introduced. Different groups of larvae are exposed, for 30min, to an odour paired with either i) quinine solution (red bar), ii) distilled water (yellow bar), or iii) sucrose solution (blue bar); in all cases this is followed by a choice test for the odour on one side and a distilled-water blank on the other side of a testing Petri dish. The authors observe that odour preference is low after odour-quinine pairing, intermediate after odour-water pairing and high after odour-sucrose pairing. The differences in odour preference relative to the odour-water case are interpreted as reflecting odour-quinine aversive associations and odour-sucrose appetitive associations, respectively. However, these differences could just as well reflect non-associative effects of the 30-min quinine or sucrose exposure per se (for a classical discussion of such types of issues see Rescorla 1988, Annu Rev Neurosci, or regarding Drosophila Tully 1988, Behav Genetics, or with some reference to the original paper by Honjo & Furukubo-Tokunaga 2005, J Neurosci that the authors reference, also Gerber & Stocker 2007, Chem Sens).<br /> As it stands, therefore, the current 3-group type of comparison does not allow conclusions about associative learning.

      A second major weakness is apparent when considering the sketch in Figure 2g and the equation defining the response index (R.I.) (line 480). The point is that the larvae that are located in the middle zone are not included in the denominator. This can inflate scores and is not appropriate. That is, suppose from a group of 30 animals (line 471) only 1 chooses the odour side and 29, bedazzled after 30-min quinine or sucrose exposure or otherwise confused by a given opto- or thermogenetic treatment, stay in the middle zone... a P.I. of 1.0 would result.

      Unless experimentally demonstrated, claims that the thermogenetic effector shibire/ts reduces dopamine release from DANs are questionable. This is because firstly, there might be shibire/ts-insensitive ways of dopamine release, and secondly because shibire/ts may affect co-transmitter release from DANs.<br /> To implicate a role of dopamine in DANs, previous work used e.g. RNAi against the dopamine-synthesizing TH enzyme (Rohwedder et al, cited).

      It is not clear whether the genetic controls when using the Gal4/ UAS system are the homozygous, parental strains (XY-Gal4/ XY-Gal4 and UAS-effector/ UAS-effector), or as is standard in the field the heterozygous driver (XY-Gal4/ wildtype) and effector controls (UAS-effector/ wildtype) (in some cases effector controls appear to be missing, e.g. Figure 4d, Figure S4e, Figure S5c).

      As recently suggested by Yamada et al 2024, bioRxiv, high cAMP can lead to synaptic depression (sic). That would call into question the interpretation of low-Dop2R leading to high-cAMP, leading to high-dopamine release, and thus the authors interpretation of the matching effects of low-Dop2R and driving DANs.

    5. Author response:

      Reviewer #1 (Public Review):

      Weakness #1: The authors claim to have identified drivers that label single DANs in Figure 1, but their confocal images in Figure S1 suggest that many of those drivers label additional neurons in the larval brain. It is also not clear why only some of the 57 drivers are displayed in Figure S1.

      As introduced in the results section, we screened 57 driver strains based on previous studies, either they were reported identifying a single (a pair of) dopaminergic neuron (DAN) in larvae or identifying only several DANs in the adult brain indicating the potential of identifying single dopaminergic neuron in larvae. In Figure 1, TH-GAL4 was used to cover all neurons in the DL1 cluster, while R58E02 and R30G08 were well known drivers for pPAM. Fly strains in Figure 1h, k, l, and m were reported as single DAN strains in larvae4, while strains in Figure 1e, f, g were reported identifying only several DANs in adult brains5,6. We examined these strains and only some of them labeled single DANs in 3rd instar larval brains (Figure 1f, g, h, l and m). Among them, only strains in Figure 1f and h labeled single DAN in the brain hemisphere, without labeling other non-DANs. Other strains labeled non-DANs in addition to single DANs (Figure 1g, l and m). Taking ventral nerve cord (VNC) into consideration, strain in Figure 1h also labeled neurons in VNC (Figure S1e), while strain in Figure 1f did not (Figure S1c).

      In summary, the strain in Figure 1f (R76F02AD;R55C10DBD, labeling DAN-c1) is a strain we screened labeling only a single DAN in the 3rd instar larval brains. Others (Figure 1g, h, l, and m) we still describe them as strains labeling single DANs, but they also label one to several non-DANs. In Figure 1, we mainly showed the strains labeling single DANs. The labeling patterns of other screened driver strains were summarized in Table1. Since all brain images of the rest 47 strains are available, we will state in Fig S1 that additional brain images can be provided upon request.

      Weakness #2: Critically, R76F02-AD; R55C10-DBD labels more than one neuron per hemisphere in Figure S1c, and the authors cite Xie et al. (2018) to note that this driver labels two DANs in adult brains. Therefore, the authors cannot argue that the experiments throughout their paper using this driver exclusively target DAN-c1.

      Figure S1c shows single DA neuron in each brain hemisphere. Additional GFP (+) signals were often observed, but not from cell bodies of DANs because they were not stained by a TH antibody. These additional GFP (+) signals were mainly neurites, including axonal terminals, but could be false positive signals or weakly stained non-neuronal cell bodies. This conclusion was based on analysis of a total of 22 larval brains. We will add this in the text or Fig S1 caption. Enlarged insert of GFP (+) signals will be added also to Figure S1c.  

      Weakness #3: Missing from the screen of 57 drivers is the driver MB320C, which typically labels only PPL1-γ1pedc in the adult and should label DAN-c1 in the larva. If MB320C labels DAN-c1 exclusively in the larva, then the authors should repeat their key experiments with MB320C to provide more evidence for DAN-c1 involvement specifically.

      We thank the reviewer for the suggestion. MB320C mainly labels PPL1-y1pedc in the adult brain, with one or two other weakly labeled cells. It will be interesting to investigate the pattern of this driver in 3rd instar larval brains. If it only covers DAN-c1, we can try to knock-down D2R in this strain to check whether it can repeat our results. This will be an interesting fly strain to test, but we believe that it will not be necessary for our current manuscript as DAN-c1 driver is very specific (for details, refer to our response to Reviewer#3). However, this line will be very useful for future experiments.

      Weakness #4: The authors claim that the SS02160 driver used by Eschbach et al. (2020) labels other neurons in addition to DAN-c1. Could the authors use confocal imaging to show how many other neurons SS02160 labels? Given that both Eschbach et al. and Weber et al. (2023) found no evidence that DAN-c1 plays a role in larval aversive learning, it would be informative to see how SS02160 expression compares with the driver the authors use to label DAN-c1.

      We did not have our own images showing DANs in brains of SS02160 driver cross line. However, Extended Data Figure 1 in the paper of Eschbach et al. (2020) shows strongly labeled four neurons on each brain hemisphere9, indicating that this driver is not a strain only labeling one neuron, DAN-c1.

      Weakness #5: The claim that DAN-c1 is both necessary and sufficient in larval aversive learning should be reworded. Such a claim would logically exclude any other neuron or even the training stimuli from being involved in aversive learning (see Yoshihara and Yoshihara (2018) for a detailed discussion of the logic), which is presumably not what the authors intended because they describe the possible roles of other DANs during aversive learning in the discussion.

      We agree that the words ‘necessary’ and ‘sufficient’ are too exclusive for other neurons. As mentioned in the Discussion part, we do think other dopaminergic neurons may also be involved in larval aversive learning. We are going to re-phrase these words by replacing them with more logically appropriate words, such as ‘important’, ‘essential’, or ‘mediating’.

      Weakness #6: Moreover, if DAN-c1 artificial activation conveyed an aversive teaching signal irrespective of the gustatory stimulus, then it should not impair aversive learning after quinine training (Figure 2k). While the authors interpret Figure 2k (and Figure 5) to indicate that artificial activation causes excessive DAN-c1 dopamine release, an alternative explanation is that artificial activation compromises aversive learning by overriding DAN-c1 activity that could be evoked by quinine.

      This is a great point! Yes, we cannot rule out the possibility that artificial activation compromises aversive learning by overriding DAN-c1 activity that could be evoked by quinine. The experimental results with TRPA1 could be caused by depletion of dopamine, or DA inactivation due to prolonged depolarization or adaptation. However, we still think that our hypothesis on the over-excitation of DAN-c1 is more consistent with our experimental results and other published data. Our justification is as follows:

      (1) Associative learning occurs only when the CS and US are paired. In wild type larvae, a specific odor (conditioned stimulus, CS, such as pentyl acetate) depolarizes a subset of Kenyon cells in the mushroom body, while gustatory unconditioned stimulus (US, quinine) induces dopamine release from DAN-c1 to the lower peduncle (LP) compartment in the mushroom body (Figure 7a). Only when the CS and US are paired, calcium influx caused by CS and Gas activated by D1R binding to dopamine will turn on a mushroom body specific version of adenylyl cyclase, rutabaga, which is the co-incidence detector in associative learning (Figure 7d).

      (2) Rutabaga transforms ATP into cAMP, activating PKA signaling pathway and modifying the synaptic strength from mushroom body neurons (MBN, also called Kenyan cells) to the mushroom body output neurons (MBON, Figure 7d). This change in synaptic strength will lead to learned responses when the same odor appears again.

      (3) In our work, we found D2R is expressed in DAN-c1, and knockdown D2R in DAN-c1 impairs larval aversive learning. As D2R reduces cAMP level and neuronal excitability3, we hypothesized that knockdown of D2R in DAN-c1 would remove the inhibition of D2R auto-receptor, and lead to more dopamine (DA) release when US (quinine) was delivered compared to the wild type larvae. The elevated DA release along with calcium influx caused by CS increases the cAMP level in MBN, which leads to the learning deficit (over-excitation, Figure 7b). Mutant larvae with excessive cAMP, dunce, showed aversive learning deficiency, supporting our hypothesis2.

      (4) Our results of TRPA1 can be explained by this over-excitation hypothesis. When DAN-c1 is activated (34C) in distilled water group, the artificial activation mimicked the gustatory activation of quinine. The larvae showed the aversive learning responses towards the odor (Figure 2k DW group). When DAN-c1 is activated (34C) in sucrose group, the artificial activation mimicked the gustatory activation of quinine, so the larvae showed a learning response combining both appetitive and aversive learning (Figure 2k SUC group).

      (5) When DAN-c1 is activated (34C) in quinine group, the artificial activation and the gustatory activation of quinine lead to elevated DA release from DAN-c1. During training, this elevated DA caused over-excitation of MBN, leading to failure of aversive learning (Figure 2k QUI group), which had a similar phenotype compared to larvae with D2R knockdown in DAN-c1.

      (6) Similarly, optogenetic activation of DAN-c1 during aversive training, leads to elevated DA release from DAN-c1 (both gustatory activation of quinine and artificial activation). This would also cause over-excitation of MBN, and lead to failure of aversive learning. Artificial activation in other stages (resting or testing) won’t cause elevated DA release during training, so the aversive learning was not affected (Figure 5b).

      (7) However, when optogenetic activation was applied during training, we did not observe aversive learning responses in the distilled water group, or a reduction in the sucrose group (Figure 5c, Figure 5d). Our explanation is that the optogenetic stimulus we applied is too strong, DAN-c1 has already released elevated DA in both groups. So, the aversive learning in these groups has already been impaired, they just showed the corresponding learning responses to distilled water or sucrose.

      (8) We also applied this over-excitation to activate MBNs. As MBN takes over both appetitive and aversive learnings, over-excitation of MBNs led to deficit in both types of learning, which follows our hypothesis (Figure 6).

      In summary, we hypothesized that DAN-c1 restricts DA release via activation of D2R, which is important for larval aversive learning. D2R knockdown or artificial activation of DAN-c1 during training would induce elevated DA release, leading to over-excitation of MBNs and failure of aversive learning.

      Weakness #7: The authors should not necessarily expect that D2R enhancer driver strains would reflect D2R endogenous expression, since it is known that TH-GAL4 does not label p(PAM) dopaminergic neurons.

      Just like the example of TH-GAL4, it is possible that the D2R driver strains may partially reflect the expression pattern of endogenous D2R in larval brains. When we crossed the D2R driver strains with the GFP-tagged D2R strain, however, we observed co-localization in DM1 and DL2b dopaminergic neurons, as well as in mushroom body neurons (Figure S3 c to h). In addition, D2R knockdown with D2R-miR directly supported that the GFP-tagged D2R strain reflected the expression pattern of endogenous D2R (Figure 4b to d, signals were reduced in DM1). In summary, we think the D2R driver strains supported the expression pattern we observed from the GFP-tagged D2R strain, especially in DM1 DANs.

      Weakness #8: Their observations of GFP-tagged D2R expression could be strengthened with an anti-D2R antibody such as that used by Lam et al., (1999) or Love et al., (2023).

      Love et al., (2023) used the antibody from Draper et al.10. We have tried the same antibody, but we were not able to observe clear signals after staining. Maybe it is not specific for the neurons in the fly larval brain, or our staining protocol did not fit with this antibody.

      Unfortunately, we were not able to find Lam (1999) paper.

      Weakness #9: Finally, the authors could consider the possibility other DANs may also mediate aversive learning via D2R. Knockdown of D2R in DAN-g1 appears to cause a defect in aversive quinine learning compared with its genetic control (Figure S4e). It is unclear why the same genetic control has unexpectedly poor aversive quinine learning after training with propionic acid (Figure S5a). The authors could comment on why RNAi knockdown of D2R in DAN-g1 does not similarly impair aversive quinine learning (Figure S5b).

      We also think that other DANs may be involved in aversive learning. We re-analyzed the learning assay data, seemingly D2R knockdown in DAN-g1 with miR partially affected aversive learning when trained with pentyl acetate (Figure S4e). We are going to build single statistic panels for DAN-g1 and DAN-d1. However, neither larvae with D2R knockdown in DAN-g1 using miR trained with propionic acid (Figure S5a), nor larvae with D2R knockdown in DAN-g1 using RNAi trained with pentyl acetate (Figure S5b) showing aversive learning deficit. We will add paragraphs about this in both Results and Discussion sections.

      Reviewer #2 (Public Review):

      Weakness#1: Is not completely clear how the system DAN-c1, MB neurons and Behavioral performance work. We can be quite sure that DAN-c1;Shits1 were reducing dopamine release and impairing aversive memory (Figure 2h). Similarly, DAN-c1;ChR2 were increasing dopamine release and also impaired aversive memory (Figure 5b). However, is not clear what is happening with DAN-c1;TrpA1 (Figure 2K). In this case the thermos-induction appears to impair the behavioral performance of all three conditions (QUI, DW and SUC) and the behavior is quite distinct from the increase and decrease of dopamine tone (Figure 2h and 5b).

      The study successfully examined the role of D2R in DAN-c1 and MB neurons in olfactory conditioning. The conclusions are well supported by the data, with the exception of the claim that dopamine release from DAN-c1 is sufficient for aversive learning in the absence of unconditional stimulus (Figure 2K). Alternatively, the authors need to provide a better explanation of this point.

      Please refer to our response to Weakness #6 of Public Reviewer #1.

      Reviewer #3 (Public Review):

      Weakness #1: It is a strength of the paper that it analyses the function of dopamine neurons (DANs) at the level of single, identified neurons, and uses tools to address specific dopamine receptors (DopRs), exploiting the unique experimental possibilities available in larval Drosophila as a model system. Indeed, the result of their screening for transgenic drivers covering single or small groups of DANs and their histological characterization provides the community with a very valuable resource. In particular the transgenic driver to cover the DANc1 neuron might turn out useful. However, I wonder in which fraction of the preparations an expression pattern as in Figure 1f/ S1c is observed, and how many preparations the authors have analyzed. Also, given the function of DANs throughout the body, in addition to the expression pattern in the mushroom body region (Figure 1f) and in the central nervous system (Figure S1c) maybe attempts can be made to assess expression from this driver throughout the larval body (same for Dop2R distribution).

      We thank the reviewer for the positive comments and the suggestions. For the strain R76F02AD; R55C10DBD, we examined 22 third instar larval brains expressing GFP or Syt-GFP and Den-mCherry, all of them clearly labeled DAN-c1. Half of them only labeled DAN-c1, the rest have 1 to 5 weak labeled soma without neurites. Barely 1 or 2 strong labeled cells appear. These non-DAN-c1 neurons are seldom dopaminergic neurons. In VNC, 8 out of 12 do not label cells, 3 have 2-4 strong labeled cells. These data supported that R76F02AD;R55C10DBD exclusively labeled DAN-c1 in 3rd instar larval brains.

      For the question about the pattern of R76F02AD; R55C10DBD and the expression pattern of D2R in larval body, it is an interesting question. However, our main focus was on the central nervous system and the learning behaviors in fruit fly larvae, we may investigate this question in the future.

      Weakness #2: A first major weakness is that the main conclusion of the paper, which pertains to associative memory (last sentence of the abstract, and throughout the manuscript), is not justified by their evidence. Why so? Consider the paradigm in Figure 2g, and the data in Figure 2h (22 degrees, the control condition), where the assay and the experimental rationale used throughout the manuscript are introduced. Different groups of larvae are exposed, for 30min, to an odour paired with either i) quinine solution (red bar), ii) distilled water (yellow bar), or iii) sucrose solution (blue bar); in all cases this is followed by a choice test for the odour on one side and a distilled-water blank on the other side of a testing Petri dish. The authors observe that odour preference is low after odour-quinine pairing, intermediate after odour-water pairing and high after odour-sucrose pairing. The differences in odour preference relative to the odour-water case are interpreted as reflecting odour-quinine aversive associations and odour-sucrose appetitive associations, respectively. However, these differences could just as well reflect non-associative effects of the 30-min quinine or sucrose exposure per se (for a classical discussion of such types of issues see Rescorla 1988, Annu Rev Neurosci, or regarding Drosophila Tully 1988, Behav Genetics, or with some reference to the original paper by Honjo & Furukubo-Tokunaga 2005, J Neurosci that the authors reference, also Gerber & Stocker 2007, Chem Sens).<br /> As it stands, therefore, the current 3-group type of comparison does not allow conclusions about associative learning.

      We adopted this single odor larval learning paradigm from Honjo’s papers1,2. In these works, Honjo et al. first designed and performed this single odor paradigm for larval olfactory associative learning. To address the reviewer’s question about the potential non-associative effects of the 30-min quinine or sucrose exposure, we would like to defend it primarily based on results from Honjo et al. (2005 and 2009). They applied the odorant to the larvae after training, only the ones had paired training with both odor and unconditioned stimulus (quinine or sucrose) showed learning responses. Larvae exposed 30 min in only odorant or unconditioned stimulus did not show different response to the odor compared to the naïve group1,2. To validate this paradigm induces associative learning responses, they also tested the paradigm from three aspects:

      (1) The odor responses are associative. Honjo et al. showed only when the odorant paired with unconditioned stimulus would induce corresponding attraction or repulsion of larvae to the odor. Neither odorant alone, unconditioned stimulus alone, nor temporal dissociation of odorant and unconditioned stimulus would induce learning responses.

      (2) The odor responses are odor specific. When applied a second odorant that was not used for training, larvae only showed learning responses to the unconditioned stimulus paired odor. This result ruled out the explanation of a general olfactory suppression and indicates larvae can discriminate and specifically alter the responses to the odor paired with unconditioned stimulus. Although the two-odor reciprocal training is not used, these results can show the association of unconditioned stimulus and the corresponding paired odor.

      (3) Well known learning deficit mutants did not show learned responses in this learning paradigm. Honjo et al. tested mutants (e.g., rut and dnc) showing learning deficits in the adult stage with two odor reciprocal learning paradigm. These mutant larvae also failed to show learning responses tested with the single odor larval learning paradigm.

      (4) In our study, we used two distinct odorants (pentyl acetate and propionic acid), as well as two D2R knockdown strains (UAS-miR and UAS-RNAi for D2R). We obtained similar results for larvae with D2R knockdown in DAN-c1. In addition, our naïve olfactory, naïve gustatory, and locomotion data ruled out the possibilities that the responses were caused by impaired sensory or motor functions. Comparison with the control group (odor paired with distilled water) ruled out the potential effects if habituation existed. All these results supported this single odor learning paradigm is reliable to assess the learning abilities of Drosophila larvae. And the failure of reduction in R.I when larvae with D2R knockdown in DAN-c1 were trained in quinine paired with the odorant is caused by deficit in aversive learning ability. We will add a paragraph to address this in the Discussion part.

      Weakness #3: A second major weakness is apparent when considering the sketch in Figure 2g and the equation defining the response index (R.I.) (line 480). The point is that the larvae that are located in the middle zone are not included in the denominator. This can inflate scores and is not appropriate. That is, suppose from a group of 30 animals (line 471) only 1 chooses the odor side and 29, bedazzled after 30-min quinine or sucrose exposure or otherwise confused by a given opto- or thermogenetic treatment, stay in the middle zone... a P.I. of 1.0 would result.

      It is a good question. We gave 5 min during the testing stage to allow the larvae to wander in the testing plate. Under most conditions, more than half of larvae (>50%) will explore around, and the rest may stay in the middle zone (will not be calculated). We used 25-50 larvae in each learning assay, so finally around 10-30 larvae will locate in two semicircular areas. Indeed, based on our raw data, a R.I. of 1 seldom appears. Most of the R.I.s fall into a region from -0.2 to 0.8. We should admit that the calculation equation of R. I. is not linear, so it would be sharper (change steeply) when it approaching to -1 and 1. However, as most of the values fall into the region from -0.2 to 0.8, we think ‘border effects’ can be neglected if we have enough numbers of larvae in the calculation (10-30).

      Weakness #4: Unless experimentally demonstrated, claims that the thermogenetic effector shibire/ts reduces dopamine release from DANs are questionable. This is because firstly, there might be shibire/ts-insensitive ways of dopamine release, and secondly because shibire/ts may affect co-transmitter release from DANs.

      Shibirets1 gene encodes a thermosensitive mutant of dynamin, expressing this mutant version in target neurons will block neurotransmitter release at the ambient temperature higher than 30C, as it represses vesicle recycling1. It is a widely used tool to examine whether the target neuron is involved in a specific physiological function. We cannot rule out that there might be Shibirets1 insensitive ways of dopamine release exist. However, blocking dopamine release from DAN-c1 with Shibirets1 has already led to learning responses changing (Figure 2h). This result indicated that the dopamine release from DAN-c1 during training is important for larval aversive learning, which has already supported our hypothesis.

      For the second question about the potential co-transmitter release, we think it is a great question. Recently Yamazaki et al. reported co-neurotransmitters in dopaminergic system modulate adult olfactory memories in Drosophila_11, and we cannot rule out the roles of co-released neurotransmitters/neuropeptides in larval learning. Ideally, if we could observe the real time changes of dopamine release from DAN-c1 in wild type and TH knockdown larvae would answer this question. However, live imaging of dopamine release from one dopaminergic neuron is not practical for us at this time. On the other hand, the roles of dopamine receptors in olfactory associative learning support that dopamine is important for _Drosophila learning. D1 receptor, dDA1, has been proven to be involved in both adult and larval appetitive and aversive learning12,13. In our work, D2R in the mushroom body showed important roles in both larval appetitive and aversive learning (Figure 6a). All this evidence reveals the importance of dopamine in Drosophila olfactory associative learning. In addition, there is too much unknow information about the co-release neurotransmitter/neuropeptides, as well as their potential complex ‘interaction/crosstalk’ relations. We believe that investigation of co-released neurotransmitter/neuropeptides is beyond the scope of this study at this time.

      Weakness #5: It is not clear whether the genetic controls when using the Gal4/ UAS system are the homozygous, parental strains (XY-Gal4/ XY-Gal4 and UAS-effector/ UAS-effector), or as is standard in the field the heterozygous driver (XY-Gal4/ wildtype) and effector controls (UAS-effector/ wildtype) (in some cases effector controls appear to be missing, e.g. Figure 4d, Figure S4e, Figure S5c).

      Almost all controls we used were homozygous parental strains. They did not show abnormal behaviors in either learnings or naïve sensory or locomotion assays. The only exception is the control for DAN-c1, the larvae from homozygous R76F02AD; R55C10DBD strain showed much reduced locomotion speed (Figure S6). To prevent this reduced locomotion speed affecting the learning ability, we used heterozygous R76F02AD; R55C10DBD/wildtype as control, which showed normal learning, naïve sensory and locomotion abilities (Figure 4e to i).

      For Figure 4d, it is a column graph to quantify the efficiency of D2R knockdown with miR. Because we need to induce and quantify the knockdown effect in specific DANs (DM1), only TH-GAL4 can be used as the control group, rather than UAS-D2R-miR.

      For the missing control groups in Figure S4e and S5c, we have shown them in other Figures (Figure 4e). We will re-organize the figures to make them easier to understand.

      Weakness #6: As recently suggested by Yamada et al 2024, bioRxiv, high cAMP can lead to synaptic depression (sic). That would call into question the interpretation of low-Dop2R leading to high-cAMP, leading to high-dopamine release, and thus the authors interpretation of the matching effects of low-Dop2R and driving DANs.

      We will read through this paper and try to add it as possible explanations for the learning mechanisms. As we introduced in the Discussion section, the learning mechanism is quite complex, mixing both non-linear neuronal circuits and multiple signaling pathways, in responding to complex environmental learning contexts. We will try to develop a better hypothesis with the best compatibility to accommodate our results with published data.

      Reference

      (1) Honjo, K. & Furukubo-Tokunaga, K. Induction of cAMP response element-binding protein-dependent medium-term memory by appetitive gustatory reinforcement in Drosophila larvae. J Neurosci 25, 7905-7913 (2005). https://doi.org/10.1523/JNEUROSCI.2135-05.2005

      (2) Honjo, K. & Furukubo-Tokunaga, K. Distinctive neuronal networks and biochemical pathways for appetitive and aversive memory in Drosophila larvae. J Neurosci 29, 852-862 (2009). https://doi.org/10.1523/JNEUROSCI.1315-08.2009

      (3) Neve, K. A., Seamans, J. K. & Trantham-Davidson, H. Dopamine receptor signaling. J Recept Signal Transduct Res 24, 165-205 (2004). https://doi.org/10.1081/rrs-200029981

      (4) Saumweber, T. et al. Functional architecture of reward learning in mushroom body extrinsic neurons of larval Drosophila. Nat Commun 9, 1104 (2018). https://doi.org/10.1038/s41467-018-03130-1

      (5) Aso, Y. & Rubin, G. M. Dopaminergic neurons write and update memories with cell-type-specific rules. Elife 5 (2016). https://doi.org/10.7554/eLife.16135

      (6) Xie, T. et al. A Genetic Toolkit for Dissecting Dopamine Circuit Function in Drosophila. Cell Rep 23, 652-665 (2018). https://doi.org/10.1016/j.celrep.2018.03.068

      (7) Hartenstein, V., Cruz, L., Lovick, J. K. & Guo, M. Developmental analysis of the dopamine-containing neurons of the Drosophila brain. J Comp Neurol 525, 363-379 (2017). https://doi.org/10.1002/cne.24069

      (8) Aso, Y. et al. The neuronal architecture of the mushroom body provides a logic for associative learning. Elife 3, e04577 (2014). https://doi.org/10.7554/eLife.04577

      (9) Eschbach, C. et al. Recurrent architecture for adaptive regulation of learning in the insect brain. Nat Neurosci 23, 544-555 (2020). https://doi.org/10.1038/s41593-020-0607-9

      (10) Draper, I., Kurshan, P. T., McBride, E., Jackson, F. R. & Kopin, A. S. Locomotor activity is regulated by D2-like receptors in Drosophila: an anatomic and functional analysis. Dev Neurobiol 67, 378-393 (2007). https://doi.org/10.1002/dneu.20355

      (11) Yamazaki, D., Maeyama, Y. & Tabata, T. Combinatory Actions of Co-transmitters in Dopaminergic Systems Modulate Drosophila Olfactory Memories. J Neurosci 43, 8294-8305 (2023). https://doi.org/10.1523/jneurosci.2152-22.2023

      (12) Selcho, M., Pauls, D., Han, K. A., Stocker, R. F. & Thum, A. S. The role of dopamine in Drosophila larval classical olfactory conditioning. PLoS One 4, e5897 (2009). https://doi.org/10.1371/journal.pone.0005897

      (13) Kim, Y. C., Lee, H. G. & Han, K. A. D1 dopamine receptor dDA1 is required in the mushroom body neurons for aversive and appetitive learning in Drosophila. J Neurosci 27, 7640-7647 (2007). https://doi.org/10.1523/JNEUROSCI.1167-07.2007

    1. eLife assessment

      This valuable study characterizes the variability in spacing and direction of entorhinal grid cells and shows how this variability can be used to disambiguate locations within an environment. These claims are supported by solid evidence, yet some aspects of the methodology should be clarified. This study will be of interest to neuroscientists working on spatial navigation and, more generally, on neural coding.

    2. Reviewer #1 (Public review):

      Summary:

      The present paper by Redman et al. investigated the variability of grid cell properties in the MEC by analyzing publicly available large-scale neural recording data. Although previous studies have proposed that grid spacing and orientation are homogeneous within the same grid module, the authors found a small but robust variability in grid spacing and orientation across grid cells in the same module. The authors also showed, through model simulations, that such variability is useful for decoding spatial position.

      Strengths:

      The results of this study provide novel and intriguing insights into how grid cells compose the cognitive map in the axis of the entorhinal cortex and hippocampus. This study analyzes large data sets in an appropriate manner and the results are solid.

      Weaknesses:

      A weakness of this paper is that the scope of the study may be somewhat narrow, as this study focused only on the variability of spacing and orientation across grid cells. I would suggest some additional analysis or discussion that might increase the value of the paper.

      (1) Is the variability in grid spacing and orientation that the authors found intrinsically organized or is it shaped by experience? Previous research has shown that grid representations can be modified through experience (e.g., Boccara et al., Science 2019). To understand the dynamics of the network, it would be important to investigate whether robust variability exists from the beginning of the task period (recording period) or whether variability emerges in an experience-dependent manner within a session.

      (2) It is important to consider the optimal variability size. The larger the variability, the better it is for decoding. On the other hand, as the authors state in the Discussion, it is assumed that variability does not exist in the continuous attractor model. Although this study describes that it does not address how such variability fits the attractor theory, it would be better if more detailed ideas and suggestions were provided as to what direction the study could take to clarify the optimal size of variability.

    3. Reviewer #2 (Public review):

      Summary:

      This paper presents an interesting and useful analysis of grid cell heterogeneity, showing that the experimentally observed heterogeneity of spacing and orientation within a grid cell module can allow more accurate decoding of location from a single module.

      Strengths:

      I found the statistical analysis of the grid cell variability to be very systematic and convincing. I also found the evidence for enhanced decoding of location based on between-cell variability within a module to be convincing and important, supporting their conclusions.

      Weaknesses:

      (1) Even though theoreticians might have gotten the mistaken impression that grid cells are highly regular, this might be due to an overemphasis on regularity in a subset of papers. Most experimentalists working with grid cells know that many if not most grid cells show high variability of firing fields within a single neuron, though this analysis focuses on between neurons. In response to this comment, the reviewers should tone down and modify their statements about what are the current assumptions of the field (and if possible provide a short supplemental section with direct quotes from various papers that have made these assumptions).

      (2) The authors state that "no characterization of the degree and robustness of variability in grid properties within individual modules has been performed." It is always dangerous to speak in absolute terms about what has been done in scientific studies. It is true that few studies have had the number of grid cells necessary to make comparisons within and between modules, but many studies have clearly shown the distribution of spacing in neuronal data (e.g. Hafting et al., 2005; Barry et al., 2007; Stensola et al., 2012; Hardcastle et al., 2015) so the variability has been visible in the data presentations. Also, most researchers in the field are well aware that highly consistent grid cells are much rarer than messy grid cells that have unevenly spaced firing fields. This doesn't hurt the importance of the paper, but they need to tone down their statements about the lack of previous awareness of variability (specific locations are noted in the specific comments).

      (3) The methods section needs to have a separate subheading entitled: How grid cells were assigned to modules" that clearly describes how the grid cells were assigned to a module (i.e. was this done by Gardner et al., or done as part of this paper's post-processing?

    4. Reviewer #3 (Public review):

      Summary:

      Redman and colleagues analyze grid cell data obtained from public databases. They show that there is significant variability in spacing and orientation within a module. They show that the difference in spacing and orientation for a pair of cells is larger than the one obtained for two independent maps of the same cell. They speculate that this variability could be useful to disambiguate the rat position if only information from a single module is used by a decoder.

      Strengths:

      The strengths of this work lie in its conciseness, clarity, and the potential significance of its findings for the grid cell community, which has largely overlooked this issue for the past two decades. Their hypothesis is well stated and the analyses are solid.

      Weaknesses:

      On the side of weaknesses, we identified two aspects of concern. First, alternative explanations for the main result exist that should be explored and ruled out. Second, the authors' speculation about the benefits of variability in angle and spacing for spatial coding is not particularly convincing, although this issue does not diminish the importance or impact of the results.

      Major comments:

      (1) One possible explanation of the dispersion in lambda (not in theta) could be variability in the typical width of the field. For a fixed spacing, wider fields might push the six fields around the center of the autocorrelogram toward the outside, depending on the details of how exactly the position of these fields is calculated. We recommend authors show that lambda does not correlate with field width, or at least that the variability explained by field width is smaller than the overall lambda variability.

      (2) An alternative explanation could be related to what happens at the borders. The authors tackle this issue in Figure S2 but introduce a different way of measuring lambda based on three fields, which in our view is not optimal. We recommend showing that the dispersions in lambda and theta remain invariant as one removes the border-most part of the maps but estimating lambda through the autocorrelogram of the remaining part of the map. Of course, there is a limit to how much can be removed before measures of lambda and theta become very noisy.

      (3) A third possibility is slightly more tricky. Some works (for example Kropff et al, 2015) have shown that fields anticipate the rat position, so every time the rat traverses them they appear slightly displaced opposite to the direction of movement. The amount of displacement depends on the velocity. Maps that we construct out of a whole session should be deformed in a perfectly symmetric way if rats traverse fields in all directions and speeds. However, if the cell is conjunctive, we would expect a deformation mainly along the cell's preferred head direction. Since conjunctive cells have all possible preferred directions, and many grid cells are not conjunctive at all, this phenomenon could create variability in theta and lambda that is not a legitimate one but rather associated with the way we pool data to construct maps. To rule away this possibility, we recommend the authors study the variability in theta and lambda of conjunctive vs non-conjunctive grid cells. If the authors suspect that this phenomenon could explain part of their results, they should also take into account the findings of Gerlei and colleagues (2020) from the Nolan lab, that add complexity to this issue.

      (4) The results in Figure 6 are correct, but we are not convinced by the argument. The fact that grid cells fire in the same way in different parts of the environment and in different environments is what gives them their appeal as a platform for path integration since displacement can be calculated independently of the location of the animal. Losing this universal platform is, in our view, too much of a price to pay when the only gain is the possibility of decoding position from a single module (or non-adjacent modules) which, as the authors discuss, is probably never the case. Besides, similar disambiguation of positions within the environment would come for free by adding to the decoding algorithm spatial cells (non-hexagonal but spatially stable), which are ubiquitous across the entorhinal cortex. Thus, it seems to us that - at least along this line of argumentation - with variability the network is losing a lot but not gaining much.

      (5) In Figure 4 one axis has markedly lower variability. Is this always the same axis? Can the authors comment more on this finding?

      (6) The paper would gain in depth if maps coming out of different computational models could be analyzed in the same way.

      (7) Similarly, it would be very interesting to expand the study with some other data to understand if between-cell delta_theta and delta_lambda are invariant across environments. In a related matter, is there a correlation between delta_theta (delta_lambda) for the first vs for the second half of the session? We expect there should be a significant correlation, it would be nice to show it.

    5. Author response:

      We thank the reviewers for their time and thoughtful comments. We are encouraged that all reviewers found our work novel and clear. We will submit a full revision to address all the points the reviewers made. Below, we briefly highlight a few clarifications and planned analyses to address major concerns; all other concerns raised by the reviewers will also be addressed in the revision.

      Reviewers #1 and #3 asked whether the variability in grid properties emerged with experience/time in the environment. We agree that this is an interesting question, and we will re-analyze the data to explore whether between-cell variability increases with time within a session. However, we note that since the rats were already familiarized to the environment for 10-20 sessions prior to the recordings, there may be limited additional changes in between-cell variability between recording sessions. In one case, two sessions from the same rat were recorded on consecutive days (R11/R12 and R21/R22) - these sessions did not show any difference in variability. 

      Reviewer #2 noted that the variability in grid properties is known to experimentalists. We will tone down our discussion on the current assumptions in the field to accurately reflect this awareness in the community. However, we would like to emphasize that the lack of work carefully examining the robustness of this variability has prevented a firm understanding of whether this is an inherent property of grid cells or due to noise. The impact of this can be seen in theoretical neuroscience work where a considerable number of articles (including recent publications) start with the assumption that all grid cells within a module have identical properties, with the exception of phase shift and noise. In addition, since grid cells are assumed to be identical in the computational neuroscience community, there has been little work on quantifying how much variability a given model produces. This makes it challenging to understand how consistent different models are with our observations. We believe that making these limitations of previous work clear is important to properly conveying our work’s contribution. 

      Reviewer #3 asked whether the variability in grid properties could be driven by cells that were conjunctively tuned with head direction. We agree that this is an interesting hypothesis and will explore this by performing new analysis. We note that, as reported by Gardner et al. (2022), only 19 of the 168 cells in recording session R12 are conjunctive. Even if these cells are included in the same proportion as pure grid cells by our inclusion criteria (which appears unlikely, given that conjunctive cells may be less reliable across splits of the data), then approximately 9 out of the 82 cells we analyzed would be conjunctive. Therefore, we expect it to be unlikely that they are the main source of the variability we find. However, we will test this in our revised manuscript.

      Reviewer #3 asked whether the “price” paid in having grid property variability was too high for the modest gain in ability to encode local space. We agree that losing the continuous attractor network (CAN) structure, and the ability to path integrate, would be a very large loss. However, we do not believe that the variability we observe necessarily destroys either CAN or path integration. We argue this for two reasons. First, the data we analyzed [from Gardner et al. (2022)] is exactly the data set that was found to have toroidal topology and therefore viewed to be in line with a major prediction of CANs. Thus, the amount of variability in grid properties does not rule out the underlying presence of a continuous attractor. Second, path integration may still be possible with grid cells that have variable properties. To illustrate this, and to address another comment from Reviewer #3, we have begun to analyze the distribution of grid properties in a recurrent neural network (RNN) model trained to perform path integration (Sorscher et al., 2019). This RNN model, in addition to others (Banino et al., 2018; Cueva and Wei, 2018), has been found to develop grid cells and there is evidence that it develops CANs as the underlying circuit mechanism (Sorscher et al., 2023). We find that the grid cells that emerge from this model exhibit variability in their grid spacings and orientations. This illustrates that path integration (the very task the RNN was trained to perform) is possible using grid cells with variable properties.

    1. eLife assessment

      In this useful study, the authors show that N-acetylation of synuclein increases clustering of synaptic vesicles in vitro and that this effect is mediated by enhanced interaction with lysophosphatidylcholine. While the evidence for enhanced clustering is largely solid, the biological significance remains unclear.

    2. Reviewer #1 (Public review):

      ⍺-synuclein (syn) is a critical protein involved in many aspects of human health and disease. Previous studies have demonstrated that post-translational modifications (PTMs) play an important role in regulating the structural dynamics of syn. However, how post-translational modifications regulate syn function remains unclear. In this manuscript, Wang et al. reported an exciting discovery that N-acetylation of syn enhances the clustering of synaptic vesicles (SVs) through its interaction with lysophosphatidylcholine (LPC). Using an array of biochemical reconstitution, single vesicle imaging, and structural approaches, the authors uncovered that N-acetylation caused distinct oligomerization of syn in the presence of LPC, which is directly related to the level of SV clustering. This work provides novel insights into the regulation of synaptic transmission by syn and might also shed light on new ways to control neurological disorders caused by syn mutations.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors provide evidence that posttranslational modification of synuclein by N-acetylation increases clustering of synaptic vesicles in vitro. When using liposomes the authors found that while clustering is enhanced by the presence of either lysophosphatidylcholine (LPC) or phosphatidylcholine in the membrane, N-acetylation enhanced clustering only in the presence of LPC. Enhancement of binding was also observed when LPC micelles were used, which was corroborated by increased intra/intermolecular cross-linking of N-acetylated synuclein in the presence of LPC.

      Strengths:

      It is known for many years that synuclein binds to synaptic vesicles but the physiological role of this interaction is still debated. The strength of this manuscript is clearly in the structural characterization of the interaction of synuclein and lipids (involving NMR-spectroscopy) showing that the N-terminal 100 residues of synuclein are involved in LPC-interaction, and the demonstration that N-acetylation enhances the interaction between synuclein and LPC.

      Weaknesses:

      Lysophosphatides form detergent-like micelles that destabilize membranes, with their steady-state concentrations in native membranes generally being a lot lower than in the experiments reported here. Since no difference in binding between the N-acetylated and unmodified form was observed when the acidic phospholipid phosphatidylserine was included. It remains unclear to which extent binding to LPC is physiologically relevant, particularly in the light of recent reports from other laboratories showing that synuclein may interact with liquid-liquid phases of synapsin I, or associate with the unfolded regions of VAMP that both were reported to cause vesicle clustering.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      ⍺-synuclein (syn) is a critical protein involved in many aspects of human health and disease. Previous studies have demonstrated that post-translational modifications (PTMs) play an important role in regulating the structural dynamics of syn. However, how post-translational modifications regulate syn function remains unclear. In this manuscript, Wang et al. reported an exciting discovery that N-acetylation of syn enhances the clustering of synaptic vesicles (SVs) through its interaction with lysophosphatidylcholine (LPC). Using an array of biochemical reconstitution, single vesicle imaging, and structural approaches, the authors uncovered that N-acetylation caused distinct oligomerization of syn in the presence of LPC, which is directly related to the level of SV clustering. This work provides novel insights into the regulation of synaptic transmission by syn and might also shed light on new ways to control neurological disorders caused by syn mutations.

      We thank the reviewer for appreciating the importance of our work and his/her positive comments.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors employed DLS to quantify the percentage of SV clustering in Fig. 1c and d. As DLS usually measures particle size distribution, I am not sure how the data was plotted in Fig. 1c and d. It would be great to show a representative raw dataset here.

      We thank the reviewer for the comment. To address this, we have put four representative DLS datasets of different α-Syn variants mediating SV clustering for clarification (Author response image 1). Rather than presenting the particle distribution based on the light scattering intensity, DLS can also convert the intensity to present the data as particle size distribution based on the particle number counts. In our analysis, particle diameters around 50 nm are considered to represent single SV species, whereas diameters larger than 120 nm indicate SV clusters. Specifically, as shown in Author response image 1, adding Ac-α-syn to a homogeneous SV sample altered the distribution from one single SV particle species (Author response image 1d) to three distinct species (Author response image 1a); this resulted in 68.5% of the particles being single SVs and 31.5% being SV clusters.

      Author response image 1.

      Representative raw dataset of α-Syn-mediated synaptic vesicle (SV) clustering monitored by dynamic light scattering (DLS). The grey-colored rows represent small particles (< 5 nm) that contributed zero to the particle number count.

      (2) Syn-lipid interactions are known to be altered by mutations involved in neurodegenerative diseases. I am wondering how those mutations will affect SV clustering mediated by the interaction of LPC with N-acetylated syn.

      We thank the reviewer for the insightful comment. Our data indicate that N-acetylation enhances the binding of the N-terminal region of α-syn to LPC, thereby facilitating SV clustering. This enhancement benefits from the fact that N-acetylation effectively neutralizes the positive charge of α-syn’s N-terminal region, promoting its insertion into LPC-rich membranes through hydrophobic interactions. Therefore, we envision that any mutation that weakens membrane binding capability of the N-terminal unmodified α-Syn may decrease SV clustering mediated by the interaction between the Ac-α-syn and LPC.

      In a separated work (doi: 10.1093/nsr/nwae182, Fig. S8), we compared the binding affinity of LPC with wild-type N-terminal un-modified α-syn and six Parkinson’s disease (PD) familial mutants (A30P, E46K, H50Q, G51D, A53E, and A53T). Among these, only the A30P mutation showed a significant decrease in binding with LPC. Furthermore, using the same single vesicle assay setup, in another paper (doi: 10.1073/pnas.2310174120, Fig. 4C), we demonstrated that the A30P-mutated α-Syn lost its ability to facilitate SV clusters. Therefore, among the six PD mutations, the A30P mutation may significantly impact the SV clustering mediated by Ac-α-syn LPC interaction.

      (3) The crosslinking data in Fig. 4 was obtained using LPC or PS liposomes. I am wondering if these results truly mimic physiological conditions. Could the authors use SVs for these experiments?

      We thank the reviewer for the suggestion. To elucidate the mechanistic differences between N-terminal unmodified α-syn and N-acetylated α-syn, we utilized pure LPC and PS liposomes for clarity. If using natural source SVs, which contain many synaptic proteins, could complicate or obscure the interaction patterns of Ac-α-syn due to potential crosstalk with other SV proteins. Additionally, the complex lipid environment of SV membranes would not help us decipher the specific molecular mechanism by which Ac-α-Syn facilitates SV clustering through LPC.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors provide evidence that posttranslational modification of synuclein by N-acetylation increases clustering of synaptic vesicles in vitro. When using liposomes the authors found that while clustering is enhanced by the presence of either lysophosphatidylcholine (LPC) or phosphatidylcholine in the membrane, N-acetylation enhanced clustering only in the presence of LPC. Enhancement of binding was also observed when LPC micelles were used, which was corroborated by increased intra/intermolecular cross-linking of N-acetylated synuclein in the presence of LPC.

      Strengths:

      It is known for many years that synuclein binds to synaptic vesicles but the physiological role of this interaction is still debated. The strength of this manuscript is clearly in the structural characterization of the interaction of synuclein and lipids (involving NMR-spectroscopy) showing that the N-terminal 100 residues of synuclein are involved in LPC-interaction, and the demonstration that N-acetylation enhances the interaction between synuclein and LPC.

      We thank the reviewer for their positive assessment of our work.

      Weaknesses:

      Lysophosphatides form detergent-like micelles that destabilize membranes, with their steady-state concentrations in native membranes being low, questioning the significance of the findings. Oddly, no difference in binding between the N-acetylated and unmodified form was observed when the acidic phospholipid phosphatidylserine was included. It remains unclear to which extent binding to LPC is physiologically relevant, particularly in the light of recent reports from other laboratories showing that synuclein may interact with liquid-liquid phases of synapsin I that were reported to cause vesicle clustering.

      We appreciate the reviewers’ insightful comments. Indeed, in another paper (doi: 10.1093/nr/nwae182), employing conventional α-Syn pull-down assay and LC-MS lipidomics method, we found that α-Syn has a preference for binding to lysophospholipids across in vivo and in vitro systems. Additionally, by comparing the lipid compositions of mouse brains, SVs and SV lipid-raft membranes, we found LPC levels to be twice as high in SVs compared to brain homogenates, and twice as high in lipid-raft membranes compared to non-lipid-raft membranes. Altogether, these findings emphasize the physiological relevance of understanding the mechanism by which Ac-α-syn mediated SV clustering through LPC.

      Liquid-liquid phase separation has been implicated in the assembly and maintenance of SV clusters, and we believe that the SV cluster liquid phase is interconnected by highly abundant proteins with multivalent low-affinity interactions. Besides the previously discovered protein-protein interactions between α-Syn and synapsin (doi: 10.1016/j.jmb.2021.166961) or VAMP2 (doi: 10.1038/s41556-024-01456-1) that contribute to SV condensates, protein-lipid interactions between α-Syn and acidic phospholipids or LPC may also play a role. Furthermore, post-translational modifications, such as N-acetylation of α-Syn, may also contribute to SV condensates.

      Reviewer #2 (Recommendations For The Authors):

      In Fig. 2, the authors indicate that for the binding assay both vesicle populations, the immobilized "acceptor" and the superfused "donor" population were labeled with different fluorescent dyes whereas in the text it is stated that the immobilized acceptor liposomes were unlabeled. Please clarify. Moreover, a control is missing showing that binding indeed depends on the immobilised liposome fraction and does not occur in their absence. This control is important because due to the long incubation times non-specific adsorption may occur which may be enhanced by adding destabilizing LPC or charged PS to the membrane.

      We thank the reviewer for pointing out this inconsistency. To avoid signal leakage from a high concentration of DiD vesicles upon green laser irradiation, we immobilized unlabeled vesicles. We have revised the Figure 2a as well as the figure caption.

      Regarding the control mentioned by the reviewer, we agree with the reviewer that non-specific binding could occur with the long incubation. In fact, a layer of highly dense liposomes (100 μM) immobilized on the imaging surface is also for reducing non-specific interactions. In the absence of this layer of immobilized liposomes, we did see a high level of non-specific binding that significantly impacted our experiments. Therefore, we need to perform clustering experiments in the presence of immobilized liposomes.

    1. eLife assessment

      The manuscript introduces an important and innovative non-AI computational method for segmenting noisy grayscale images, with a particular focus on identifying immunostained potassium ion channel clusters. This method significantly enhances accuracy over basic threshold-based techniques while remaining user-friendly and accessible, even for researchers with limited computational backgrounds. The evidence supporting the method's efficacy is convincing. Its practical application and ease of use make it a tool that will benefit a wide range of laboratories.

    2. Reviewer #1 (Public review):

      The manuscript introduces a valuable and innovative non-AI computational method for segmenting noisy grayscale images, with a particular focus on identifying immunostained potassium ion channel clusters.

      Strengths:

      (1) Applicability and Usability: The method is exceptionally accessible to biologists and researchers without advanced computational expertise. It offers a highly practical alternative to AI-based methods, which often require significant training data and computational resources, making it an excellent choice for a broader range of laboratories.

      (2) Proof-of-Concept: The manuscript provides compelling evidence through multiple experiments, showcasing the method's superior performance over traditional threshold-based techniques, particularly in noisy environments. The dual immuno-electron microscopy experiments further reinforce the robustness and effectiveness of this approach.

      (3) Clarity and Methodology: The manuscript is exceptionally well-written, with clear and concise descriptions that effectively highlight the method's advantages. The detailed figures and comprehensive references greatly enhance the manuscript's credibility and strongly support the claims made.

      Weaknesses:

      The manuscript does not include comparisons with more advanced segmentation techniques, particularly those based on artificial intelligence. While the authors have provided a rationale for this decision, including such comparisons could have enriched the discussion and offered additional insights. Additionally, there are some concerns about the computational demands of the method, especially when applied to large-scale or 3D image analysis. Although the authors have shared some computational data, further optimization or practical recommendations would enhance the method's utility. Initially, the manuscript lacked a data and code availability statement, which could have limited the method's accessibility. However, this issue has since been resolved, with the code now being made available to the community. Lastly, while the findings related to Kv4.2 in the thalamus are noteworthy, they might achieve even greater impact if presented in a separate paper. Nevertheless, the authors have chosen to retain these results within the current manuscript to strengthen the overall narrative and relevance.

      We appreciate that the authors have provided thorough explanations for their original choices. These justifications offer a clearer understanding of their approach and the reasons behind the presentation of the data.

      Conclusion:

      The revised manuscript successfully addresses the majority of the reviewers' concerns, presenting a strong case for the proposed segmentation method. The method's ease of use for non-experts in AI, combined with its proven effectiveness in proof-of-concept experiments, positions it as a valuable addition to the field. While the manuscript could benefit from incorporating comparisons with more advanced segmentation methods and offering a more detailed discussion of computational requirements, it remains a robust contribution. The decision to include the Kv4.2 findings within the paper is well-justified by the authors, though these results could potentially have an even greater impact if published separately.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by David et al. describes a novel image segmentation method, implementing Local Moran's method, which determines whether the value of a datapoint or a pixel is randomly distributed among all values, in differentiating pixel clusters from the background noise. The study includes several proof-of-concept analyses to validate the power of the new approach, revealing that implementation of Local Moran's method in image segmentation is superior to threshold-based segmentation methods commonly used in analyzing confocal images in neuroanatomical studies.

      Strengths:

      Several proof-of-concept experiments are performed to confirm the sensitivity and validity of the proposed method. Using composed images with varying levels of background noise and analyzing them in parallel with the Local Moran's or a Threshold-Based Method (TBM), the study is able to compare these approaches directly and reveal their relative power in isolating clustered pixels.

      Similarly, dual immuno-electron microscopy was used to test the biological relevance of a colocalization that was revealed by Local Moran's segmentation approach on dual-fluorescent labeled tissue using immuno-markers of the axon terminal and a membrane-protein (Figure 5). The EM revealed that the two markers were present in terminals and their post-synaptic partners, respectively. This is a strong approach to verify the validity of the new approach for determining object-based colocalization in fluorescent microscopy.

      The methods section is clear in explaining the rationale and the steps of the new method (however, see the weaknesses section). Figures are appropriate and effective in illustrating the methods and the results of the study. The writing is clear; the references are appropriate and useful.

      Weaknesses:

      While the steps of the mathematical calculations to implement Local Moran's principles for analyzing high-resolution images are clearly written, the manuscript currently does not provide a computation tool that could facilitate easy implementation of the method by other researchers. Without a user-friendly tool, such as an ImageJ plugin or a code, the use of the method developed by David et al by other investigators may remain limited.

      This weakness is eliminated in the revision, which now provides the approach as a Matlab tool.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study describes a new computational method for unsupervised (i.e., non-artificial intelligence) segmentation of objects in grayscale images that contain substantial noise, to differentiate object, no object, and noise. Such a problem is essential in biology because they are commonly confronted in the analysis of microscope images of biological samples and recently have been resolved by artificial intelligence, especially by deep neural networks. However, training artificial intelligence for specific sample images is a difficult task and not every biological laboratory can handle it. Therefore, the proposed method is particularly appealing to laboratories with little computational background. The method was shown to achieve better performance than a threshold-based method for artificial and natural test images. To demonstrate the usability, the authors applied the method to high-power confocal images of the thalamus for the identification and quantification of immunostained potassium ion channel clusters formed in the proximity of large axons in the thalamic neuropil and verified the results in comparison to electron micrographs.

      Strengths:

      The authors claim that the proposed method has higher pixel-wise accuracy than the threshold-based method when applied to gray-scale images with substantial noises.

      Since the method does not use artificial intelligence, training and testing are not necessary, which would be appealing to biologists who are not familiar with machine learning technology.

      The method does not require extensive tuning of adjustable parameters (trying different values of "Moran's order") given that the size of the object in question can be estimated in advance.

      We appreciate the positive assessment of our approach.

      Weaknesses:

      It is understood that the strength of the method is that it does not depend on artificial intelligence and therefore the authors wanted to compare the performance with another non-AI method (i.e. the threshold-based method; TBM). However, the TBM used in this work seems too naive to be fairly compared to the expensive computation of "Moran's I" used for the proposed method. To provide convincing evidence that the proposed method advances object segmentation technology and can be used practically in various fields, it should be compared to other advanced methods, including AI-based ones, as well.

      Protein localization studies revealed that protein distributions are frequently inhomogeneous in a cell. This is very common in neurons which are highly polarized cell types with distinct axo-somato-dendritic functions. Moreover, due to the nature of the cell-to-cell interactions among neurons (e.g. electrical and chemical synapses) the cell membrane includes highly variable microdomains with unique protein assemblies (i.e. clusters). Protein clusters are defined as membrane segments with higher protein densities compared to neighboring membrane regions. However, protein density can continuously change between “clusters” and “non-clusters”. As a consequence, differentiating proteins involved vs not involved in clusters is a challenging task.  Indeed, our analysis showed that the boundaries of protein clusters varied remarkably when 23 human experts delineated them.

      Despite the fact the protein clusters can only be vaguely defined numerous studies have demonstrated the functional relevance of inhomogeneous protein distribution. Thus, there is a high relevance and need for an observer independent, “operative” segmentation method that can be accomplished and compared among different conditions and specimens. The strength of the Moran’s I analysis we propose here, as pointed out by our reviewers and editors, is that it can extract the relevant signals from an image generated in different, often noisy condition using a simple algorithm that allows quantitative characterization and identification of changes in many biological and non-biological samples.

      In AI based analysis the ground truth is known by an observer and using a large training set AI learns to extract the relevant information for image segmentation. As outlined above the “ground truth”, however, cannot be unequivocally defined for protein clusters. There is no doubt, that with sufficient resource investment there would be an AI based analysis of the same problem. In our view, however, in an average laboratory setting generating a training set using hundreds of images examined by many experts may not be plausible. Moreover, generalization of one training set to another set of cluster, resistance to noise or different levels of background could also not be guaranteed.

      This method was claimed to be better than the TBM when the noise level was high. Related to the above, TBMs can be used in association with various denoising methods as a preprocess. It is questionable whether the claim is still valid when compared to the methods with adequate complexity used together with denoising. Consider for example, Weigert et al. (2018) https://doi.org/10.1038/s41592-018-0216-7; or Lehtinen et al (2018) https://doi.org/10.48550/arXiv.1803.04189.

      In Weigert et al. AI was trained with high-quality images of the same object obtained with extreme photon exposure in confocal microscope. As delineated above without training AI systems cannot be used for such purposes. The Lehtinen paper is unfortunately no longer available at this doi.

      We must emphasize that in our work we did not intend to compare the image segmentation method based on local Moran’s I with all other available segmentation techniques. Rather we wanted to demonstrate a straightforward method of grouping pixels with similar intensities and in spatial proximity which does not require a priori knowledge of the objects. We used TBM to benchmark the method. We agree that with more advanced TBM methods the difference between Moran’s and TBM might have been smaller. The critical component here is, however, that even with most advanced TBM an artificial threshold is needed to be defined. The optimal threshold may change from sample to sample depending on the experimental conditions which makes quantification questionable. Moran’s method overcomes this problem and allows more objective segmentation of images even if the exact conditions (background labeling, noise, intensity etc) are not identical among the samples.

      The computational complexity of the method, determined by the convolution matrix size (Moran's order), linearly increases as the object size increases (Fig. S2b). Given that the convolution must be run separately for each pixel, the computation seems quite demanding for scale-up, e.g. when the method is applied for 3D image volumes. It will be helpful if the requirement for computer resources and time is provided.

      Here we provide the required data concerning the hardware and the computational time:

      Hardware used for performing the analysis:

      Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz, 2594 Mhz, 4 kernel CPU, 64GB RAM, NVIDIA GeForce GTX 1080 graphic card.

      MATLAB R2021b software was used for implementation.

      Author response table.

      Computation times:

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by David et al. describes a novel image segmentation method, implementing Local Moran's method, which determines whether the value of a datapoint or a pixel is randomly distributed among all values, in differentiating pixel clusters from the background noise. The study includes several proof-of-concept analyses to validate the power of the new approach, revealing that implementation of Local Moran's method in image segmentation is superior to threshold-based segmentation methods commonly used in analyzing confocal images in neuroanatomical studies.

      Strengths:

      Several proof-of-concept experiments are performed to confirm the sensitivity and validity of the proposed method. Using composed images with varying levels of background noise and analyzing them in parallel with the Local Moran's or a Threshold-Based Method (TBM), the study is able to compare these approaches directly and reveal their relative power in isolating clustered pixels.     

      Similarly, dual immuno-electron microscopy was used to test the biological relevance of a colocalization that was revealed by Local Moran's segmentation approach on dual-fluorescent labeled tissue using immuno-markers of the axon terminal and a membrane-protein (Figure 5). The EM revealed that the two markers were present in terminals and their post-synaptic partners, respectively. This is a strong approach to verify the validity of the new approach for determining object-based colocalization in fluorescent microscopy. 

      The methods section is clear in explaining the rationale and the steps of the new method (however, see the weaknesses section). Figures are appropriate and effective in illustrating the methods and the results of the study. The writing is clear; the references are appropriate and useful.

      We are grateful for the constructive assessment of our results.

      Weaknesses:

      While the steps of the mathematical calculations to implement Local Moran's principles for analyzing high-resolution images are clearly written, the manuscript currently does not provide a computation tool that could facilitate easy implementation of the method by other researchers. Without a user-friendly tool, such as an ImageJ plugin or a code, the use of the method developed by David et al by other investigators may remain limited.

      The code for the analysis is now available online as a user-friendly MATLAB script at: https://github.com/dcsabaCD225/Moran_Matlab/blob/main/moran_local.m

      Recommendations for the authors:

      Summary of reviews:

      Both reviewers acknowledge the potential significance and practicality of the newly proposed image segmentation method. This method uses Local Moran's principles, offering an advantage over traditional intensity thresholding approaches by providing more sensitivity, particularly in reducing background noise and preserving biologically relevant pixels.

      Strengths Highlighted:

      • The proposed method can provide more accurate results, especially for grayscale images with significant noise.

      • The method is not dependent on artificial intelligence, making it appealing for researchers with minimal computational background.    

      • The approach can operate without the need for extensive tuning, given that the size of the object is known.

      • Several proof-of-concept experiments were carried out, revealing the effectiveness of the method in comparison with the threshold-based segmentation methods.

      • The manuscript is clear in terms of methodology, and the results are supported by effective illustrations and references.

      Weaknesses Noted:

      • The study lacked a comparative analysis with advanced segmentation methods, especially those that employ artificial intelligence.

      See our response above to the same question of Reviewer 1.

      • There are concerns about computational complexity, especially when dealing with larger data sets or 3D image volumes.

      See our response about the calculations of computation times above to the similar question of Reviewer 1.

      • Both reviewers noted the absence of a data/code availability statement in the manuscript, which might restrict the method's adoption by other researchers.

      The code availability is provided now.

      • Reviewer 2 suggested that some results, particularly related to Kv4.2 in the thalamus, might be better presented in a separate study due to their significance.

      We thank our reviewers for this suggestion. We carefully evaluated the pros and cons of publishing the Kv4.2 data separately. We finally decided to keep the segmentation and experimental data together due to the following reason. We believe that the ultrastructural localization provides strong experimental proof for the relevance of our novel segmentation method. In order to make the potassium channel data more visible we added a subsentence to the title. In this manner we think scientist interested in the imaging method as well as the neurobiology will be both find and cite the paper. The novel title reads now:

      “An image segmentation method based on the spatial correlation coefficient of Local Moran’s I - identification of A-type potassium channel clusters in the thalamus.”

      Reviewer Recommendations:

      (1) Provide details about the data and program code availability.

      See our response above

      (2) Offer practical recommendations and provide clarity on software packages and coding for the proposed method to enhance its adoption.

      Done.

      (3) Consider presenting the findings about Kv4.2 in the thalamus separately as they hold significant importance on their own.

      See our response above

      Given the reviews, the proposed image segmentation method presents a promising advancement in the domain of image analysis. The technique offers tangible benefits, especially for researchers dealing with biological microscopy data. However, for this method to see a broader application, it's imperative to provide clearer practical guidance and make data or code easily accessible. Additionally, while the findings regarding Kv4.2 in the thalamus are intriguing, they might achieve more impact if detailed in a dedicated paper.

      Reviewer #1 (Recommendations For The Authors):

      The availability of data or program code was not stated in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) While the principles of the method are explained clearly in a step-by-step fashion in the Methods section, the practical aspects of running sequential computations over a large matrix of pixel values are not well described. It would be very useful if the authors could provide recommendations on how to set the data structure and clarify which software and programming package for Local Moran's analysis they used. In addition, providing the code for the sequential implementation described in the Methods section would facilitate the adoption of the method by other researchers, and thus, the impact of the study. Currently, there is no data or code availability statement included in the manuscript.

      See our response above.

      (2) Figure 4 illustrates an experiment in which transmission electron microscopy and freeze-fracture replica labeling approaches were used to demonstrate that a potassium channel marker, Kv4.2 was selective to synapses forming on larger caliber dendrites in the thalamus. As impressive as the EM approaches utilized in this figure are, the results of this experiment have a somewhat tangential bearing on the segmentation method that is the focus of this study. In fact, the experiments illustrated in Figure 5, dual immuno-EM, are more than sufficient to confirm what the dual-confocal imaging coupled with Local Moran's segmentation analysis reveals. Furthermore, the author's findings about the localization and selectivity of Kv4.2 in the thalamus are too important and exciting to bury in a paper focusing on the methodology. Those results may have a wider impact if they are presented and discussed in a separate experimental paper.

      See our response above

    1. eLife assessment

      This useful experiment seeks to better understand how memory interacts with incoming visual information to effectively guide human behavior. Using several methods, the authors identify two distinct pathways relating visual processing to the default mode network: one that emphasizes semantic cognition, and the other, spatial cognition. The evidence presented is solid and will be of interest to cognitive and systems neuroscientists.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gonzalez Alam et al. sought to understand how memory interacts with incoming visual information to effectively guide human behavior by using a task that combines spatial contexts (houses) with objects of one or more other semantic categories. Three additional datasets (all from separate participants) were also employed: one that functionally localized regions of interest (ROIs) based on subtractions of different visually presented category types (in this case, scenes, objects, and scrambled objects); another consisting of resting-state functional connectivity scans, and a section of the Human Connectome Project that employed DTI data for structural connectivity analysis. Across multiple analyses, the authors identify dissociations between regions preferentially activated during scene or other object judgments, between the functional connectivity of regions demonstrating such preferences, and in the anatomical connectivity of these same regions. The authors conclude that the processing streams that take in visual information and support semantic or spatial processing are largely parallel and distinct.

      Strengths:

      (1) Recent work has reconceptualized the classic default mode network as parallel and interdigitated systems (e.g., Braga & Buckner, 2017; DiNicola et al., 2021). The current manuscript is timely in that it attempts to describe how information is differentially processed by two streams that appear to begin in visual cortex and connect to different default subnetworks. Even at a group level where neuroanatomy is necessarily blurred across individuals, these results provide clear evidence of stimulus-based processing dissociation.

      (2) The manuscript analyzes data from multiple independent datasets. It is therefore unlikely that a single experimenter choice in any given analysis would spuriously produce the general convergence of the results reported in this manuscript.

      Weaknesses:

      (1) The manuscript makes strong distinctions between spatial processing and other forms of semantic processing. However, it is not clear if scenes are uniquely different from other stimulus categories, such as faces or tools. As is noted by the authors in their revised discussion section, the design of the experiment does not allow for a category-level generalization beyond scenes. The dichotomization of semantic and spatial information invoked throughout the manuscript should be read with this limitation in mind.

      (2) Although the term "objects" is used by the authors to refer to the stimuli placed in scenes, it is a mixture of other stimulus categories, including various types of animals, tools, and other manmade objects. Different regions along the ventral stream are thought to process these different types of stimuli (e.g., Martin, 2007, Ann Rev Psychol), but as they are not being modeled separately, the responses associated with "object" processing in this manuscript are necessarily blurring across known distinctions in functional neuroanatomy.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Gonzalez Alam et al. report a series of functional MRI results about the neural processing from the visual cortex to high-order regions in the default-mode network (DMN), compiling evidence from task-based functional MRI, resting-state connectivity, and diffusionweighted imaging. Their participants were first trained to learn the association between objects and rooms/buildings in a virtual reality experiment; after the training was completed, in the task-based MRI experiment, participants viewed the objects from the earlier training session and judged if the objects were in the semantic category (semantic task) or if they were previously shown in the same spatial context (spatial context task). Based on the task data, the authors utilised resting-state data from their previous studies, visual localiser data also from previous studies, as well as structural connectivity data from the Human Connectome Project, to perform various seed-based connectivity analysis. They found that the semantic task causes more activation of various regions involved in object perception while the spatial context task causes more activation in various regions for place perception, respectively. They further showed that those object perception regions are more connected with the frontotemporal subnetwork of the DMN while those place perception regions are more connected with the medial-temporal subnetwork of the DMN. Based on these results, the authors argue that there are two main pathways connecting the visual system to highlevel regions in the DMN, one linking object perception regions (e.g., LOC) leading to semantic regions (e.g., IFG, pMTG), the other linking place perception regions (e.g., parahippocampal gyri) to the entorhinal cortex and hippocampus.

      Below I provide my takes on (1) the significance of the findings and the strength of evidence, (2) my guidance for readers regarding how to interpret the data, as well as several caveats that apply to their results, and finally (3) my suggestions for the authors.

      (1) Significance of the results and strength of the evidence

      I would like to praise the authors for, first of all, trying to associate visual processing with high-order regions in the DMN. While many vision scientists focus specifically on the macroscale organisation of the visual cortex, relatively few efforts are made to unravel how neural processing in the visual system goes on to engage representations in regions higher up in the hierarchy (a nice precedent study that looks at this issue is by Konkle and Caramazza, 2017). We all know that visual processing goes beyond the visual cortex, potentially further into the DMN, but there's no direct evidence. So, in this regard, the authors made a nice try to look at this issue.

      We thank the reviewer for their positive feedback and for their very thoughtful and thorough comments, which have helped us to improve the quality of the paper.

      Having said this, the authors' characterisation of the organisation of the visual cortex (object perception/semantics vs. place perception/spatial contexts) does not go beyond what has been known for many decades by vision neuroscience. Specifically, over the past two decades, numerous proposals have been put forward to explain the macroscale organisation of the visual system, particularly the ventrolateral occipitotemporal cortex. A lateral-medial division has been reliably found in numerous studies. For example, some researchers found that the visual cortex is organised along the separation of foveal vision (lateral) vs. peripheral vision (medial), while others found that it is structured according to faces (lateral) vs. places (medial). Such a bipartite division is also found in animate (lateral) vs. inanimate (medial), small objects (lateral) vs. big objects (medial), as well as various cytoarchitectonic and connectomic differences between the medial side and the lateral side of the visual cortex. Some more recent studies even demonstrate a tripartite division (small objects, animals, big objects; see Konkle and Caramazza, 2013). So, in terms of their characterisation of the visual cortex, I think Gonzalez Alam et al. do not add any novel evidence to what the community of neuroscience has already known.

      The aim of our study was not to provide novel evidence about visual organisation, but rather to understand how these well-known visual subdivisions are related to functional divisions in memory-related systems, like the DMN. We agree that our study confirms the pattern observed by numerous other studies in visual neuroscience.  

      However, the authors' effort to link visual processing with various regions of the DMN is certainly novel, and their attempt to gather converging evidence with different methodologies is commendable. The authors are able to show that, in an independent sample of restingstate data, object-related regions are more connected with semantic regions in the DMN while place-related regions are more connected with navigation-related regions in the DMN, respectively. Such patterns reveal a consistent spatial overlap with their Kanwisher-type face/house localiser data and also concur with the HCP white-matter tractography data. Overall, I think the two pathways explanation that the authors seek to argue is backed by converging evidence. The lack of travelling wave type of analysis to show the spatiotemporal dynamics across the cortex from the visual cortex to high-level regions is disappointing though because I was expecting this type of analysis would provide the most convincing evidence of a 'pathway' going from one point to another. Dynamic caudal modelling or Granger causality may also buttress the authors' claim of pathway because many readers, like me, would feel that there is not enough evidence to convincingly prove the existence of a 'pathway'.

      By ‘pathway’ we are referring to a pattern of differential connectivity between subregions of visual cortex and subregions of DMN, suggesting there are at least two distinct routes between visual and heteromodal regions. However, these routes don’t have to reflect a continuous sequence of cortical areas that extend from visual cortex to DMN – and given our findings of structural connectivity differences that relate to the functional subdivisions we observe, this is unlikely to be the sole mechanism underpinning our findings. We have now clarified this in the discussion section of the manuscript. We agree it would be interesting to characterise the spatiotemporal dynamics of neural propagation along our pathways, and we have incorporated this proposal into the future directions section.

      “One important caveat is that we have not investigated the spatiotemporal dynamics of neural propagation along the pathways we identified between visual cortex and DMN. The dissociations we found in task responses, intrinsic functional connectivity and white matter connections all support the view that there are at least two distinct routes between visual and heteromodal DMN regions, yet this does not necessarily imply that there is a continuous sequence of cortical areas that extend from visual cortex to DMN – and given our findings of structural connectivity differences that relate to the functional subdivisions we observe, this is unlikely to be the sole mechanism underpinning our findings. It would be interesting in future work to characterise the spatiotemporal dynamics of neural propagation along visualDMN pathways using methods optimised for studying the dynamics of information transmission, like Granger causality or travelling wave analysis.”

      We have also edited the wording of sentences in the introduction and discussion that we thought might imply directionality or transmission of information along these pathways, or to clarify the nature of the pathways (please see a couple of examples below):

      In the Introduction:

      “We identified dissociable pathways of connectivity between from different parts of visual cortex to and DMN subsystems “

      In the Discussion:

      “…pathways from visual cortex to DMN -> …pathways between visual cortex and DMN“.

      (2) Guidance to the readers about interpretation of the data

      The organisation of the visual cortex and the organisation of the DMN historically have been studied in parallel with little crosstalk between different communities of researchers. Thus, the work by Gonzalez Alam et al. has made a nice attempt to look at how visual processing goes beyond the realm of the visual cortex and continues into different subregions of the DMN.

      While the authors of this study have utilised multiple methods to obtain converging evidence, there are several important caveats in the interpretation of their results:

      (1) While the authors choose to use the term 'pathway' to call the inter-dependence between a set of visual regions and default-mode regions, their results have not convincingly demonstrated a definitive route of neural processing or travelling. Instead, the findings reveal a set of DMN regions are functionally more connected with object-related regions compared to place-related regions. The results are very much dependent on masking and thresholding, and the patterns can change drastically if different masks or thresholds are used.

      We would like to qualify that our findings do not only reveal a set of any “DMN regions that are functionally more connected with object-related regions compared to place-related regions”. Instead, we show a double dissociation based on our functional task responses: DMN regions that were more responsive to semantic decisions about objects are more functionally and structurally connected to visual regions more activated by perceiving objects, while DMN regions that were more responsive to spatial decisions are more connected to visual regions activated by the contrast of scene over object perception.

      We do not believe that the thresholding or masking involved in generating seeds strongly affected our results. We are reassured of this by two facts:

      (1) We re-analysed the resting-state data using a stricter clustering threshold and this did not change the pattern of results (see response below).

      (2) In response to a point by reviewer #2, we re-analysed the data eroding the masks of the MT-DMN, and this also didn’t change the pattern of results (please see response to reviewer 2).

      In this way, our results are robust to variations in mask shape/size and thresholding.

      (2) Ideally, if the authors could demonstrate the dynamics between the visual cortex and DMN in the primary task data, it would be very convincing evidence for characterising the journey from the visual cortex to DMN. Instead, the current connectivity results are derived from a separate set of resting state data. While the advantage of the authors' approach is that they are able to verify certain visual regions are more connected with certain DMN regions even under a task-free situation, it falls short of explaining how these regions dynamically interact to convert vision into semantic/spatial decision.

      We agree that a valuable future direction would be to collect evidence of spatiotemporal dynamics of propagation of information along these pathways. This could be the focus of future studies designed to this aim, and we have suggested this in the manuscript based on the reviewer’s suggestion. Furthermore, as stated above, we have now qualified our use of the term ‘pathway’ in the manuscript to avoid confusion.

      “These pathways refer to regions that are coupled, functionally or structurally, together, providing the potential for communication between them.”

      (3) There are several results that are difficult to interpret, such as their psychophysiological interactions (PPI), representational similarity analysis, and gradient analysis. For example, typically for PPI analysis, researchers interrogate the whole brain to look for PPI connectivity. Their use of targeted ROI is unusual, and their use of spatially extensive clusters that encompass fairly large cortical zones in both occipital and temporal lobes as the PPI seeds is also an unusual approach. As for the gradient analysis, the argument that the semantic task is higher on Gradient 1 than the spatial task based on the statistics of p-value = 0.027 is not a very convincing claim (unhelpfully, the figure on the top just shows quite a few blue 'spatial dots' on the hetero-modal end which can make readers wonder if the spatial context task is really closer to the unimodal end or it is simply the authors' statistical luck that they get a p-value under 0.05). While it is statistically significant, it is weak evidence (and it is not pertinent to the main points the authors try to make).

      To streamline the manuscript, we have moved the PPI and RSA results to the

      Supplementary Materials. However, we believe the gradient analysis is highly pertinent to understanding the functional separation of these pathways. In the revision, we show that not only was the contrast between the Semantic and Spatial tasks significant, in addition, the majority of participants exhibited a pattern consistent with the result we report. To show the results more clearly, we have added a supplementary figure (Figure S8) focussed on comparisons at the participant level.

      This figure shows the position in the gradient for each peak per participant per task. The peaks for each participant across tasks are linked with a line. Cases where the pattern was reversed are highlighted with dashed lines (7/27 participants in each gradient). This allows the reader and reviewer to verify in how many cases, at the individual level, the pattern of results reported in the text held (see “Supplementary Analysis: Individual Location of pathways in whole-brain gradients”).  

      (3) My suggestion for the authors

      There are several conceptual-level suggestions that I would like to offer to the authors:

      (1)  If the pathway explanation is the key argument that you wish to convey to the readers, an effective connectivity type of analysis, such as Granger causality or dynamic caudal modelling, would be helpful in revealing there is a starting point and end point in the pathway as well as revealing the directionality of neural processing. While both of these methods have their issues (e.g., Granger causality is not suitable for haemodynamic data, DCM's selection of seeds is susceptible to bias, etc), they can help you get started to test if the path during task performance does exist. Alternatively, travelling wave type of analysis (such as the results by Raut et al. 2021 published in Science Advances) can also be useful to support your claims of the pathway.

      As we have stated above, we agree with the reviewer that, given the pattern of results obtained in our work, analyses that characterise the spatiotemporal dynamics of transmission of information along the pathways would be of interest. However, we are concerned that our data is not well-optimised for these analyses.

      (2)  I think the thresholding for resting state data needs to be explained - by the look of Figure 2E and 3E, it looks like whole-brain un-thresholded results, and then you went on to compute the conjunction between these un-thresholded maps with network templates of the visual system and DMN. This does not seem statistically acceptable, and I wonder if the conjunction that you found would disappear and reappear if you used different thresholds. Thus, for example, if the left IFG cluster (which you have shown to be connected with the visual object regions) would disappear when you apply a conventional threshold, this means that you need to seriously consider the robustness of the pathway that you seek to claim... it may be just a wild goose that you are chasing.

      We believe the reviewer might be confused regarding the procedure we followed to generate the ROIs used in the pathways connectivity analysis. As stated in the last paragraph of the “Probe phase” and “Decision phase” results subsections, the maps the reviewer is referring to (Fig. 3E, for example) were generated by seeding the intersection of our thresholded univariate analysis (Fig. 3A) with network templates. In the case of Fig 3E, these are the Semantic>Spatial decision results after thresholding, intersected with Yeo DMN (MT, FT and Core, combined). These seeds were then entered into a whole-brain seed-based spatial correlation analysis, which was thresholded and cluster-corrected using the defaults of CONN. The same is true for Fig. 2E, but using the thresholded Probe phase

      Semantic>Context regions. Thus, we do not believe the objections to statistical rigour the reviewer is raising apply to our results.

      The thresholding of the resting-state data itself was explained in the Methods (Spatial Maps and Seed-to-ROI Analysis). As stated above, we thresholded using the default of the CONN software package we used (cluster-forming threshold of p=.05, equivalent to T=1.65). For increased rigour, we reproduced the thresholded maps from Figs 2E and 3E further increasing the threshold from p=.05, equivalent to T=1.65, to p=.001, equivalent to T=3.1. The resulting maps were very similar, showing minimal change with a spatial correlation of r > .99 between the strict and lax threshold versions of the maps for both the probe and decision seeds. This can be seen in Figure 2E and Figure 33E, which depict the maps produced with stricter thresholding. These maps can also be downloaded from the Neurovault collection, and the re-analysis is now reported in the Supplementary Materials (see section “Supplementary Analysis: Resting-state maps with stricter thresholding”) Probe phase (compare with Fig. 2E):

      (3) There are several analyses that are hard to interpret and you can consider only reporting them in the supplementary materials, such as the PPI results and representational similarity analysis, as none of these are convincing. These analyses do not seem to add much value to make your argument more convincing and may elicit more methodological critiques, such as statistical issues, the set-up of your representational theory matrix, and so on.

      We have moved the PPI and RSA results to the supplementary materials. We agree this will help us streamline the manuscript.  

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Alam et al. sought to understand how memory interacts with incoming visual information to effectively guide human behavior by using a task that combines spatial contexts (houses) with objects of one or multiple semantic categories. Three additional datasets (all from separate participants) were also employed: one that functionally localized regions of interest (ROIs) based on subtractions of different visually presented category types (in this case, scenes, objects, and scrambled objects); another consisting of restingstate functional connectivity scans, and a section of the Human Connectome Project that employed DTI data for structural connectivity analysis. Across multiple analyses, the authors identify dissociations between regions preferentially activated during scene or object judgments, between the functional connectivity of regions demonstrating such preferences, and in the anatomical connectivity of these same regions. The authors conclude that the processing streams that take in visual information and support semantic or spatial processing are largely parallel and distinct.

      Strengths:

      (1) Recent work has reconceptualized the classic default mode network as two parallel and interdigitated systems (e.g., Braga & Buckner, 2017; DiNicola et al., 2021). The current manuscript is timely in that it attempts to describe how information is differentially processed by two streams that appear to begin in visual cortex and connect to different default subnetworks. Even at a group level where neuroanatomy is necessarily blurred across individuals, these results provide clear evidence of stimulus-based dissociation.

      (2) The manuscript contains a large number of analyses across multiple independent datasets. It is therefore unlikely that a single experimenter choice in any given analysis would spuriously produce the overall pattern of results reported in this work.

      We thank the reviewer for their remarks on the strengths of our manuscript.

      Weaknesses:

      (1) Throughout the manuscript, a strong distinction is drawn between semantic and spatial processing. However, given that only objects and spatial contexts were employed in the primary experiment, it is not clear that a broader conceptual distinction is warranted between "semantic" and "spatial" cognition. There are multiple grounds for concern regarding this basic premise of the manuscript.

      a. One can have conceptual knowledge of different types of scenes or spatial contexts. A city street will consistently differ from a beach in predictable ways, and a kitchen context provides different expectations than a living room. Such distinctions reflect semantic knowledge of scene-related concepts, but in the present work spatial and "all other" semantic information are considered and discussed as distinct and separate.

      The “building” contexts we created were arbitrary, containing beds, desks and an assortment of furniture that did not reflect usual room distributions, i.e., a kitchen next to a dining room. We have made this aspect of our stimuli clearer in the Materials section of the task. 

      “The learning phase employed videos showing a walk-through for twelve different buildings (one per video), shot from a first-person perspective. The videos and buildings were created using an interior design program (Sweet Home 3D). Each building consisted of two rooms: a bedroom and a living room/office, with an ajar door connecting the two rooms. The order of the rooms (1st and 2nd) was counterbalanced across participants. Each room was distinctive, with different wallpaper/wall colour and furniture arrangements. The building contexts created by these rooms were arbitrary, containing furniture that did not reflect usual room distributions (i.e., a kitchen next to a dining room), to avoid engaging further conceptual knowledge about frequently-encountered spatial contexts in the real world.”

      To help the reviewer and readers to verify this and come to their own conclusions, we have also added the videos watched by the participants to the OSF collection.

      “A full list of pictures of the object and location stimuli employed in this task, as well as the videos watched by the participants can be consulted in the OSF collection associated with this project under the components OSF>Tasks>Training. “

      We agree that scenes or spatial contexts have conceptual characteristics, and we actually manipulated conceptual information about the buildings in our task, in order to assess the neural underpinnings of this effect. In half of the buildings, the rooms/contexts were linked through the presence of items that shared a common semantic category (our “same category building” condition): this presented some conceptual scaffolding that enabled participants to link two rooms together. These buildings could then be contrasted with “mixed category buildings” where this conceptual link between rooms was not available. We found that right angular gyrus was important in the linking together of conceptual and spatial information, in the contrast of same versus mixed category buildings.

      b. As a related question, are scenes uniquely different from all other types of semantic/category information? If faces were used instead of scenes, could one expect to see different regions of the visual cortex coupling with task-defined face > object ROIs? The current data do not speak to this possibility, but as written the manuscript suggests that all (non-spatial) semantic knowledge should be processed by the FT-DMN.

      Thanks for raising this important point. Previous work suggests that the human visual system (and possibly the memory system, as suggested by Deen and Freiwald, 2021) is sensitive to perceptual categories important to human behaviour, including spatial, object and social information. Previous work (Silson et al., 2019; Steel et al., 2021) has shown domain-specific regions in visual regions (ventral temporal cortex; VTC) whose topological organisation is replicated in memory regions in medial parietal cortex (MPC) for faces and places. In these studies, adding objects to the analyses revealed regions sensitive to this category sandwiched between those responsive to people and places in VTC, but not in MPC. However, consistent with our work, the authors find regions sensitive to memory tasks for places and objects (as well as people) in the lateral surface of the brain. 

      Our study was not designed to probe every category in the human visual system, and therefore we cannot say what would happen if we contrasted social judgments about faces with semantic judgments about objects. We have added this point as a limitation and future direction for research:

      “Likewise, further research should be carried out on memory-visual interactions for alternative domains. Our study focused on spatial location and semantic object processing and therefore cannot address how other categories of stimuli, such as faces, are processed by the visual-tomemory pathways that we have identified. Previous work has suggested some overlap in the neurobiological mechanisms for semantic and social processing (Andrews-Hanna et al., 2014; Andrews-Hanna & Grilli, 2021; Chiou et al., 2020), suggesting that the FT-DMN pathway may be highlighted when contrasting both social faces and semantic objects with spatial scenes. On the other hand, some researchers have argued for a ‘third pathway’ for aspects of social visual cognition (Pitcher & Ungerleider, 2021; Pitcher, 2023). Future studies that probe other categories will be able to confirm the generality (or specificity) of the pathways we described.”

      c. Recent precision fMRI studies characterizing networks corresponding to the FT-DMN and MTL-DMN have associated the former with social cognition and the latter with scene construction/spatial processing (DiNicola et al., 2020; 2021; 2023). This is only briefly mentioned by the authors in the current manuscript (p. 28), and when discussed, the authors draw a distinction between semantic and social or emotional "codes" when noting that future work is necessary to support the generality of the current claims. However, if generality is a concern, then emphasizing the distinction between object-centric and spatial cognition, rather than semantic and spatial cognition, would represent a more conservative and bettersupported theoretical point in the current manuscript.

      We appreciate this comment and we have spent quite a bit of time considering what the most appropriate terminology would be. The distinction between object and spatial cognition is largely appropriate to our probe phase, although we feel this label is still misleading for two reasons:

      First, we used a range of items from different semantic categories, not just “objects”, although we have used that term as a shorthand to refer to the picture stimuli we presented. The stimuli include both animals (land animals, marine animals and birds) and man-made objects (tools, musical instruments and sports equipment). This category information is now more prominent in the rationale (Introduction) and the Methods to avoid confusion.

      Interested readers can also review our “object” stimuli in the OSF collection associated with this manuscript:

      Introduction: “…participants learned about virtual environments (buildings) populated with objects belonging to different, heterogeneous, semantic categories, both man-made (tools, musical instruments, sports equipment) and natural (land animals, marine animals, birds).”

      Methods:

      “A full list of pictures of the object and location stimuli employed in this task can be consulted in the OSF collection associated with this project under the components OSF>Tasks>Training.”

      Secondly, we manipulated the task demands so that participants were making semantic judgments about whether two items were in the same category, or spatial judgments about whether two rooms had been presented in the same building. Our use of the terms “semantic” and “spatial” was largely guided by the tasks that participants were asked to perform.

      We have revised the terminology used in the discussion to reflect this more conservative term. However, since the task performed was semantic in nature (participants had to judge whether items belonged to semantic categories), we have modified the term proposed by the reviewer to “object-centric semantics”, which we hope will avoid confusion.  

      (2) Both the retrosplenial/parieto-occipital sulcus and parahippocampal regions are adjacent to the visual network as defined using the Yeo et al. atlas, and spatial smoothness of the data could be impacting connectivity metrics here in a way that qualitatively differs from the (non-adjacent) FT-DMN ROIs. Although this proximity is a basic property of network locations on the cortical surface, the authors have several tools at their disposal that could be employed to help rule out this possibility. They might, for instance, reduce the smoothing in their multi-echo data, as the current 5 mm kernel is larger than the kernel used in Experiment 2's single-echo resting-state data. Spatial smoothing is less necessary in multiecho data, as thermal noise can be attenuated by averaging over time (echoes) instead of space (see Gonzalez-Castillo et al., 2016 for discussion). Some multi-echo users have eschewed explicit spatial smoothing entirely (e.g., Ramot et al., 2021), just as the authors of the current paper did for their RSA analysis. Less smoothing of E1 data, combined with a local erosion of either the MTL-DMN and VIS masks (or both) near their points of overlap in the RSFC data, would improve confidence that the current results are not driven, at least in part, by spatial mixing of otherwise distinct network signals.

      A: The proximity of visual peripheral and DMN-C networks is a property of these networks’ organisation (Silson et al., 2019; Steel et al., 2021), and we agree the potential for spatial mixing of the signal due to this adjacency is a valid concern. Altering the smoothing kernel of the multi-echo data would not address this issue though, since no connectivity analyses were performed in task data. The reviewer is right about the kernel size for task data (5mm), but not about the single echo RS data, which actually has lower spatial resolution (6mm). 

      Since this objection is largely about the connectivity analysis, we re-analysed the RS data by shrinking the size of the visual probe and DMN decision ROIs for the context task using fslmaths. We eroded the masks until the smallest gap between them exceeded the size of our 6mm FWHM smoothing kernel, which eliminates the potential for spatial mixing of signals due to ROI adjacency. The eroded ROIs can be consulted in the OSF collection associated with this project (see component “ROI Analysis/Revision_ErodedMasks”. The results, presented in the supplementary materials as “Eroded masks replication analysis”, confirmed the pattern of findings reported in the manuscript (see SM analysis below). We did not erode the respective ROIs for the semantic task, given that adjacency is not an issue there. 

      “Eroded masks replication analysis:

      The Visual-to-DMN ANOVA showed main effects of seed (F(1,190)=22.82, p<.001), ROI (F(1,190)=9.48, p=.002) and a seed by ROI interaction (F(1,190)=67.02, p<.001). Post-hoc contrasts confirmed there was stronger connectivity between object probe regions and semantic versus spatial context decision regions (t(190)=3.38, p<.001), and between scene probe regions and spatial context versus semantic decision regions (t(190)=-7.66, p<.001).

      The DMN-to-Visual ANOVA confirmed this pattern: again, there was a main effect of ROI (F(1,190)=4.3, p=.039) and a seed by ROI interaction (F(1,190)=57.59, p<.001), with posthoc contrasts confirming stronger intrinsic connectivity between DMN regions implicated in semantic decisions and object probe regions (t(190)=5.06, p<.001), and between DMN regions engaged by spatial context decisions and scene probe regions (t(190)=3.25, p=.001).”

      (3) The authors identify a region of the right angular gyrus as demonstrating a "potential role in integrating the visual-to-DMN pathways." This would seem to imply that lesion damage to right AG should produce difficulties in integrating "semantic" and "spatial" knowledge. Are the authors aware of such a literature? If so, this would be an important point to make in the manuscript as it would tie in yet another independent source of information relevant to the framework being presented. The closest of which I am aware involves deficits in cued recall performance when associates consisted of auditory-visual pairings (Ben-Zvi et al., 2015), but that form of multi-modal pairing is distinct from the "spatial-semantic" integration forwarded in the current manuscript.

      This is a very interesting observation. There is a body of literature pointing to AG (more often left than right) as an integrator of multimodal information: It has been shown to integrate semantic and episodic memory, contextual information and cross-modality content.

      The Contextual Integration Model (Ramanan et al., 2017) proposes that AG plays a crucial role in multimodal integration to build context. Within this model, information that is essential for the representation of rich, detailed recollection and construction (like who, when, and, crucially for our findings, what and where) is processed elsewhere, but integrated and represented in the AG. In line with this view, Bonnici et al (2016) found AG engagement during retrieval of multimodal episodic memories, and that multivariate classifiers could differentiate multimodal memories in AG, while unimodal memories were represented in their respective sensory areas only. Recent work examining semantic processing in temporallyextended narratives using multivariate approaches concurs with a key role of left AG in context integration (Branzi et al., 2020).

      In addition to context integration, other lines of work suggest a role of AG as an integrator across modalities, more specifically. Recent perspectives suggest a role of AG as a dynamic buffer that allows combining distinct forms of information into multimodal representations (Humphreys et al., 2021), which is consistent with the result in our study of a region that brings together semantic and spatial representations in line with task demands. Others have proposed a role of the AG as a central connector hub that links three semantic subsystems, including multimodal experiential representation (Xu et al., 2017). Causal evidence of the role of AG in integrating multimodal features has been provided by Yazar et al (2017), who studied participants performing memory judgements of visual objects embedded in scenes, where the name of the object was presented auditorily. TMS to AG impaired participants’ ability to retrieve context features across multiple modalities. However, these studies do not single out specifically right AG.

      Some recent proposals suggest a causal role of right AG as a key region in the early definition of a context for the purpose of sensemaking, for which integrating semantic information with many other modalities, including vision, may be a crucial part (Seghier, 2023). TMS studies suggest a causal role for the right AG in visual attention across space

      (Olk et al. 2015, Petitet et al. 2015), including visual search and the binding of stimulus- and response-characteristics that can optimise it (Bocca et al. 2015). TMS over the right AG disrupts the ability to search for a target defined by a conjunction of features (Muggleton et al. 2008) and affects decision-making when visuospatial attention is required (Studer et al. 2014). This suggests that the AG might contribute to perceptual decision-making by guiding attention to relevant information in the visual environment (Studer et al. 2014). These, taken together, suggest a causal role of right AG in controlling attention across space and integrating content across modalities in order to search for relevant information. 

      Most of this body of research points to left, rather than right, AG as a key region for integration, but we found regions of right AG to be important when semantic and spatial information could be integrated. We might have observed involvement of the right AG in our study, as opposed to the more-often reported left, given that people have to integrate semantic information with spatial context, which relies heavily on visuospatial processes predominantly located in right hemisphere regions (cf. Sormaz et al., 2017), which might be more strongly connected to right than left AG. 

      Lastly, we are not aware of a literature on right AG lesions impairing the integration of semantic and spatial information but, in the face of our findings, this might be a promising new direction. We have added as a recommendation that patients with damage to right AG should be examined with specific tasks aimed at probing this type of integration. We have added the following to the discussion:

      “We found a region of the right AG that was potentially important for integrating semantic and spatial context information. Previous research has established a key role of the AG in context integration (Ramanan et al., 2017; Bonnici et al., 2016; Branzi et al., 2020) and specifically, in guiding multimodal decisions and behaviour (Humphreys et al., 2021; Xu et al., 2017; Yazar et al., 2017). Although some recent proposals suggest a causal role of right AG in the early establishment of meaningful contexts, allowing semantic integration across modalities (Seghier, 2023; Olk et al., 2015, Petitet et al., 2015; Bocca et al., 2015; Muggleton et al. 2008), the majority of this research points to left, rather than right, AG as a key region for integration. However, we might have observed involvement of the right AG in our study given that people were integrating semantic information with spatial context, and right-lateralised visuospatial processes (cf. Sormaz et al., 2017) might be more strongly connected to right than left AG. We are not aware of a literature on right AG lesions impairing the integration of semantic and spatial information but, in the face of our findings, this might be a promising new direction. Patients with damage to right AG should be examined with specific tasks aimed at probing this type of integration.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) I mentioned the numerous converging analyses reported in this manuscript as a strength. However, in practice, it also makes results in numerous dense figures (routinely hitting 7-8 sub-panels) and results paragraphs which, as currently presented, are internally coherent but are not assembled into a "bigger picture" until the discussion. Readers may have an easier time with the paper if introductions to the different analyses ("probe phase", "decision phase", etc.) also include a bigger-picture summary of how the specific analysis is contributing to the larger argument that is being constructed throughout the manuscript. This may also help readers to understand why so many different analysis approaches and decisions were employed throughout the manuscript, why so many different masks were used, etc.

      Thank you for this suggestion. We agree that the range of analyses and their presentation can make digesting them difficult. To address this, we have outlined our analyses rationale at the beginning of the results as a sort of “big picture” summary that links all analyses together, and added introductory paragraphs to each analysis that needed them (namely, the probe, decision, and pathway connectivity analyses, as the gradient and integration analyses already had introductory paragraphs describing their rationale, and the PPI/RSA analyses were moved to supplementary materials), linking them to the summary, which we reproduce below:

      “To probe the organisation of streams of information between visual cortex and DMN, our neuroimaging analysis strategy consisted of a combination of task-based and connectivity approaches. We first delineated the regions in visual cortex that are engaged by the viewing of probes during our task (Figure 2), as well as the DMN regions that respond when making decisions about those probes (Figure 3): we characterised both by comparing the activation maps with well-established DMN and object/scene perception regions, analysed the pattern of activation within them, their functional connectivity and task associations. Having characterised the two ends of the stream, we proceeded to ask whether they are differentially linked: are the regions activated by object probe perception more strongly linked to DMN regions that are activated when making semantic decisions about object probes, relative to other DMN regions? Is the same true for the spatial context probe and decision regions? We answered this question through a series of connectivity analyses (Figure 4) that examined: 1) if the functional connectivity of visual to DMN regions (and DMN to visual regions) showed a dissociation, suggesting there are object semantic and spatial cognition processing ‘pathways’; 2) if this pattern was replicated in structural connectivity; 3) if it was present at the level of individual participants, and, 4) we characterised the spatial layout, network composition (using influential RS networks) and cognitive decoding of these pathways. Having found dissociable pathways for semantic (object) and spatial context (scene) processing, we then examined their position in a high-dimensional connectivity space (Figure 5) that allowed us to document that the semantic pathway is less reliant on unimodal regions (i.e., more abstract) while the spatial context pathway is more allied to the visual system. Finally, we used uni- and multivariate approaches to examine how integration between these pathways takes place when semantic and spatial information is aligned (Figure 6).”

      (2) At various points, figures are arranged out of sequence (e.g., panel d is referenced after panel g in Figure 2) or are missing descriptions of what certain colors mean (e.g., what yellow represents in Figure 6d). This is a minor issue, but one that's important and easy to address in future revisions.

      We thank the reviewer for bringing this issue to our attention. We have added descriptions for the yellow colour to the figure legends of Figures 6 and 7 (now in supplementary materials, Figure S9).

      We have also edited the text to follow a logical sequence with respect to referencing the panels in Figures 2 and 3, where panel d is now referenced after panel c. Lastly, we reorganised the layout of Figure 4 to follow the description of the results in the text.

    1. eLife assessment

      This important study shows a significant role for Mushashi-2 (Msi2) in lung adenocarcinoma. The authors provided solid data that support the requirement for Msi2 in tumor growth and progression, although the study would have been strengthened by including more patient samples and additional evidence regarding Msi2+ cells being more responsive to transformation. These findings are of interest to both the lung cancer and the RNA binding protein fields.

    2. Reviewer #1 (Public Review):

      Summary:

      Here, the authors, Barber AG et al, developed a new mouse model and investigated an importance of Musashi-2 in lung cancer. Specifically, they found that Musashi-2 is important for lung cancer cells as it controls cancer cell growth, and also regulates several genes that also control cancer cell growth. Development of a new Musashi-2 mouse model is a plus, which confirmed Musashi-2 importance for lung cancer survival, and finding several genes that Musashi controls that are important for lung cancer growth. Additionally, they demonstrated that Musashi-2 overexpression which is tracked by GFP is preferred in lung adenocarcinoma cells. The data is rigorous and only minor revisions are requested.

      Strengths:

      Authors achieved their goals, by developing new Musashi-2 mouse model, confirming Musashi-2 importance for lung cancer survival, and finding several genes that Musashi controls that are important for lung cancer growth.

      Weaknesses:

      The findings of Musashi-2 mouse and human lung cancer growth control are not that novel as prior publication in 2016 showed that already, again, in both human and mouse models (Kudinov et al PNAS, PMID: 27274057), and also the authors missed the point of that paper which did use both miuse and human models to show impact on inbvasion and metastasis- both in vitro and in vivo. Additionally, another publication is currently under revisions recently also generated new Musashi-2 transgenic mouse model which confirmed Musashi-2 support of lung cancer growth (Bychkov I et al, PMID: 37398283; https://www.biorxiv.org/content/10.1101/2023.06.13.544756v1). Another weakness is that Musashi-2 cannot be effectively targeted and the new genes the authors found that Musashi-2 regulates are likely to be also difficult therapeutic targets. Therefore, impact of this new investigation is relatively modest in the field.

      Major suggestions:

      (1) Figure 3: it is unclear what is the efficiency of Msi2 deletion shRNA - could you demonstrate it by at least two independent methods? (QPCR, Western, or IHC?) please quantitate the data.

      (2) In Figure 4, similarly, it is unclear if Msi2 depletion was effective- and what is shRNA efficiency. Please test this by at least two independent methods (QPCR, Western, or IHC) and also please quantitate the data

      (3) the reason for impairment of cell growth demonstrated in Figs 3 and 4 is not clear: is it apoptosis? Necrosis? Cell cycle defects? Autophagy? Senescence? Please probe 2-3 possibilities and provide the data.

      (4) Since Musashi-1 is a Musashi-2 paralogue that could compensate for Musashi-2 loss, please test Msi1 expression levels in matching Fig 3 and Fig 4 sections (in cells/ tumors with Msi2 deletion and in KP cells with Msi2 shRNA). One method could suffice here.

      (5) It is not exactly clear why RNA-seq (as opposed to proteomics) was done to investigate downstream Msi2 targets (since Msi2 is in first place, translational and not transcriptional regulator)- . RNA effects in Fig 5J are quite modest, 2-fold or so. It would be useful (if antibodies available) to test four targets in Fig 5J by Western blot, to see any impact of musashi-2 depletion on those target protein levels. Indeed, several papers - including Kudinov et al PNAS, PMID: 27274057, Makhov P et al PMID: 33723247 and PMID: 37173995 - used proteomics/ RIP approaches and found direct Musashi-2 targets in lung cancer, including EGFR, and others.

    3. Reviewer #2 (Public Review):

      Summary:

      Alison G. Barber et al. reports the function of Msi2 in mouse models of non-small cell lung cancer. The expression of Msi2 in normal lung was evaluated using a knockin reporter allele. Msi2 expressing cells were found to be around 30-40% in normal lung epithelium without a strong bias in subsets of lung cells. Knocking out Msi2 in a KrasG12D and P53 KO model reduced lung cancer initiation. Knocking down Msi2 in established lung cancer cells reduced in vitro sphere formation and in vivo xenograft. Finally, the authors identified several genes whose expression was downregulated by Msi2 knockdown. Knocking down four of these genes, including Ptgds, Arl2bp, hRnf157, and Syt11, each with a single shRNA, reduced lung sphere formation in vitro, suggesting their involvement in lung cancer.

      Strengths:

      This manuscript represents an interesting advance on the role of Msi2 in lung cancer. While some of the data (for example the knockdown effect of Msi2 in established lung cancer cells) corroborated previous findings, the study of Msi2 expression in normal lung and the characterization of the KO phenotype in lung cancer initiation are new and interesting.

      Weaknesses:

      Two areas can be further strengthened. Several conclusions are not fully supported by the existing data. The stable/dynamic nature of Msi2 expressing cells in lung would benefit from more detailed investigations for proper data interpretation.

      (1) It will be interesting to determine whether Msi2+ cells are a relatively stable subset or rather the Msi2+ cells in lung is a dynamic concept that is transient or interconvertible. This is relevant to the interpretation of what Msi2 positivity really means.

      (2) Does Kras mutation and/or p53 loss upregulate Msi2? This point and the point above are related to whether Msi2+ cells are truly more susceptible to tumorigenesis, as the authors suggested.

      (3) The KO of Msi2 reducing tumor number and burden in the lung cancer initiation model is interesting. However, there are two alternative interpretations. First, it is possible that the Msi2 KO mice (without Kras activation and p53 loss) has reduced total lung cell numbers or altered percentage of stem cells. There is currently only one sentence citing data not shown on line 125, commenting that there is no difference in BASC and AT2 cell populations. It will be helpful that such data are shown and the effect of KO on overall lung mass or cellularity is clarified. Second, the phenotype may also be due to a difference in the efficiencies of cre on Kras and p53 in the Msi2 WT and KO mice.

      (4) All shRNA experiments (for both Msi2 KD and the KD of candidate genes) utilized a single shRNA. This approach cannot exclude off-target effects of the shRNA.

      (5) The technical details of the PDX experiment (Figure 4F) are not fully explained.

    4. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Barber and colleagues propose a dual role for the RNA-binding protein Mushashi-2 (Msi2) in lung adenocarcinoma initial transformation and subsequent tumor propagation. First, authors show that Msi2 is expressed in a subset of Club/BASC (37%) and AT2 (26%) cells in the normal lung and displayed a distinct transcriptional profile than non-expressing Msi2 cells. Furthermore, Msi2 is broadly expressed/activated in vivo in genetically induced lung adenocarcinoma tumors (Kras/p53 mouse model) and Msi2+ cells displayed a significantly higher ability to form tumor spheres in vitro. Authors demonstrated by in vivo and in vitro assays that Msi2 loss of function significantly impair tumor growth and progression in lung adenocarcinoma. Data showed that Msi2 function is conserved in human adenocarcinoma tumor growth in patient-derived xenograft. Lastly, novel genes regulated by Msi2 and involved in lung adenocarcinoma tumor growth were identified.

      Strengths:

      The authors provided convincing data for a key role of Msi2 in lung adenocarcinoma tumor progression and growth. Multiple evidences using Msi2 knock-out genetic mouse model and shRNA knock-down in tumor sphere formation assay are clearly demonstrated. The conservation and importance of Msi2 was further shown in human patient-derived xenograft. Although specific cell types (Club/BASC, AT2) were not isolated, authors further delved in the transcriptional difference between Msi2+ and Msi2- cells in the normal lung. Furthermore, novel genes and pathways regulated by Msi2 in lung adenocarcinoma were identified and tested for their ability to inhibit tumor growth in vitro. These 2 RNA-Seq datasets will be useful in the future and provide a basis to explore 1) potential propensity of a given cell to initiate oncogenic transformation, and 2) potential novel regulators of lung adenocarcinoma.

      Weaknesses:

      Although this work strongly demonstrated the importance of Msi2 in lung adenocarcinoma tumor progression and growth, the following points remain to be clarified or addressed.

      - In Figure 1, characterization of Msi2 expression in the normal mouse lung was carried out by using a Msi2-GFP Knock-in reporter and analyzed by flow cytometry followed by cytospins and immunostaining. Additional characterization of Msi2 expression by co-immunostaining with well-known markers of airway and alveolar cell types in intact lung tissue will strengthen the existing data and provide more specific information about Msi2 expression and abundancy in relevant cell types. It will be also interesting to know whether Msi2 is expressed or not in other abundant lung cell types such as ciliated and AT1 cells.

      - While this set of experiments provide strong evidence that Msi2 is required for tumor progression and growth in lung adenocarcinoma, it is unclear whether normal Msi2+ lung cells are more responsive to transformation or whether Msi2 is upregulated early during the process of tumorigenesis. Future lineage tracing experiments using Msi2-CreER and mouse models of chemically-induced lung carcinogenesis will provide additional data that will fully support this claim.

      - In Figure 4F, Patient-derived xenograft (PDX) assays were conducted in 2 patients only and the percentage of cells infected by shRNA-Msi2 is low in both PDX (30% and 10% for patient 1 and 2 respectively). It is surprising that Msi2 downregulation in a small percentage of tumor cells has such a dramatic effect on tumor growth and expansion. Confirmation of this finding with additional patient samples would suggest an important non-cell autonomous role for Msi2 in lung adenocarcinoma.

    5. Author response:

      Reviewer #1 (Public Review):

      (1) Figure 3: it is unclear what is the efficiency of Msi2 deletion shRNA - could you demonstrate it by at least two independent methods? (QPCR, Western, or IHC?) please quantitate the data.

      In Figure 3, we did not delete Msi2 via shRNA. Instead, we utilized a genetic model in which the Msi2 gene was disrupted via gene trap mutagenesis. We have also used this model in previous publications to define the impact of Msi2 loss in other systems1.

      (2) In Figure 4, similarly, it is unclear if Msi2 depletion was effective- and what is shRNA efficiency. Please test this by at least two independent methods (QPCR, Western, or IHC) and also please quantitate the data

      We demonstrated that the efficiency of Msi2 depletion was ~83% (Figures 4A and 4C) via qPCR analysis for our in vitro and in vivo experiments, respectively, and verified the knockdown via bulk RNA-seq analysis. The shRNA hairpin used was previously validated and published by our lab2.

      (3) the reason for impairment of cell growth demonstrated in Figs 3 and 4 is not clear: is it apoptosis? Necrosis? Cell cycle defects? Autophagy? Senescence? Please probe 2-3 possibilities and provide the data.

      The basis of the cell growth impairment after Msi2 deletion/knockdown in this paper is certainly an important question, and future experiments will be performed to better delineate this. In previous publications loss of Msi2 in leukemia cells has been shown to inhibit growth via arrested cell cycle progression by increasing the expression of p213. Further, loss of Msi2 was also shown to promote apoptosis in part by upregulating Bax3. These data suggest that Msi2 can have an impact via multiple distinct mechanisms including by mediating cell cycle arrest and blocking apoptosis. While these specific genes were not detectably changed after loss of Msi2 in lung cancer cells, other genes in these and other pathways will be important to study in the future.

      (4) Since Musashi-1 is a Musashi-2 paralogue that could compensate for Musashi-2 loss, please test Msi1 expression levels in matching Fig 3 and Fig 4 sections (in cells/ tumors with Msi2 deletion and in KP cells with Msi2 shRNA). One method could suffice here.

      In our RNA-seq of cells following Msi2 knockdown, Msi1 expression was undetectable. The TPM values for Msi1 in control and knockdown cells were less than 0.01, suggesting that it did not compensate for the loss of Msi2.

      (5) It is not exactly clear why RNA-seq (as opposed to proteomics) was done to investigate downstream Msi2 targets (since Msi2 is in first place, translational and not transcriptional regulator)- . RNA effects in Fig 5J are quite modest, 2-fold or so. It would be useful (if antibodies available) to test four targets in Fig 5J by Western blot, to see any impact of musashi-2 depletion on those target protein levels. Indeed, several papers - including Kudinov et al PNAS, PMID: 27274057, Makhov P et al PMID: 33723247 and PMID: 37173995 - used proteomics/ RIP approaches and found direct Musashi-2 targets in lung cancer, including EGFR, and others.

      Previous published work from the lab showed that expression of Msi2 in the context of myeloid leukemia1can not only repress NUMB protein (I believe protein should be all caps?) (as has been previously demonstrated in the nervous system) but also Numb RNA. This indicated that as an RNA binding protein, Msi2 also can bind and destabilize direct binding targets such as Numb; this was the reason for pursuing transcriptomic analysis.  However as the reviewer suggests, proteomic studies are certainly very important to develop a complete picture of the impact of Musashi to determine which targets are controlled by Msi2 at the protein level.

      Reviewer #2 (Public Review):

      (1) It will be interesting to determine whether Msi2+ cells are a relatively stable subset or rather the Msi2+ cells in lung is a dynamic concept that is transient or interconvertible. This is relevant to the interpretation of what Msi2 positivity really means.

      In previous unpublished work from our lab, we have found that Msi2+ cells from a GFP reporter KPf/fC mouse are readily able to become GFP negative (Msi2-), but the inverse is not true. Specifically, when Msi2+ KPf/fC pancreatic cells were transplanted into the flanks of NSG mice, Msi2+ cells formed tumors in all recipients; these tumors contained both GFP+ and GFP- cells (over 80%)  recapitulating the original heterogeneity and suggesting GFP+ cells can give rise to both GFP+ and GFP- cells (Lytle and Reya, unpublished observations). In contrast only a small subset of GFP- transplanted mice formed tumors. One of the rare GFP- derived tumors was isolated and found to contain largely GFP- cells, with ~0.1% GFP+ cells. The small frequency of GFP expression could be from contaminating cells or may suggest that GFP- cells retain some ability to switch on Msi under selective pressure, and that although they pose a lower risk of driving tumorigenesis than Msi+ cells, they may nonetheless bear latent potential to become higher risk. These data may offer a possible model for projecting the potential of Msi2+ cells in the lung, but is something that needs to be further studied in this tissue.

      (2) Does Kras mutation and/or p53 loss upregulate Msi2? This point and the point above are related to whether Msi2+ cells are truly more susceptible to tumorigenesis, as the authors suggested.

      In unpublished work from our lab, we have found that Kras mutation upregulates Msi2 over baseline and subsequent p53 loss upregulates Msi2 further in the context of pancreatic cells (Lytle and Reya unpublished results), therefore it is possible that the same is true for the lung. Specifically, we have observed that Msi2 increased from normal acinar cells to Kras-mutated acinar (e.g. pancreatic intraepithelial neoplasia (PanIN)).

      To address whether Msi2+ cells are more susceptible to tumorigenesis, we have recently published data showing that the stabilization of the oncogenic MYC protein in lung Msi2+ cells drive the formation of small-cell lung cancer in a new inducible Msi2-CreERT2; CAG-LSL-MycT58A mice (Msi2-Myc)4 model. More importantly, this data provides the first evidence that normal Msi2+ cells are primed and highly sensitive to MYC-driven transformation across many organs and not just the lung4.

      (3) The KO of Msi2 reducing tumor number and burden in the lung cancer initiation model is interesting. However, there are two alternative interpretations. First, it is possible that the Msi2 KO mice (without Kras activation and p53 loss) has reduced total lung cell numbers or altered percentage of stem cells. There is currently only one sentence citing data not shown on line 125, commenting that there is no difference in BASC and AT2 cell populations. It will be helpful that such data are shown and the effect of KO on overall lung mass or cellularity is clarified. Second, the phenotype may also be due to a difference in the efficiencies of cre on Kras and p53 in the Msi2 WT and KO mice.

      We isolated the lungs of three Msi2 WT and three Msi2 KO mice and used immunofluorescence staining to stain for CC10 (BASC) and SPC (AT2) to determine if these cell populations were reduced after Msi2 loss alone. Below are representative images showing that the Msi2 KO mice did not have lower numbers of both BASC and AT2 cell populations. 

      Author response image 1.

      (4) All shRNA experiments (for both Msi2 KD and the KD of candidate genes) utilized a single shRNA. This approach cannot exclude off-target effects of the shRNA.

      The shRNA hairpin used for Msi2 was previously validated and published by our lab2. Additionally, in this work we did develop and use a Msi2 genetic knockout mouse model that validates our shRNA knockdown data showing the specific impact of Msi2 on lung tumor growth.

      (5) The technical details of the PDX experiment (Figure 4F) are not fully explained.

      Due to space considerations, we were unable not put the specifics in the legend, but the details are in the methods section (Flank Transplant Assays). In brief, 500,000 cells/well were plated in a 6-well plate coated with Matrigel and 83,000 cells/well were plated in a 24-well plate coated with Matrigel for subsequent determination of transduction efficiency via FACS. 24 hours after transduction, media from the cells was collected and placed on ice. 1mL of 2mg/mL collagenase/dispase was then added to the well and incubated for 45 minutes at 37ºC to dissociate the remaining cells from Matrigel followed by subsequent washes. Cells were pelleted by centrifugation and an equivalent number of shControl and shMsi2 transduced cells were resuspended in full media, mixed at a 1:1 ratio with growth factor reduced Matrigel at a final volume of 100 μL, and transplanted subcutaneously into the flanks of NSG recipient mice.

      Reviewer #3 (Public Review):

      - In Figure 1, characterization of Msi2 expression in the normal mouse lung was carried out by using a Msi2-GFP Knock-in reporter and analyzed by flow cytometry followed by cytospins and immunostaining. Additional characterization of Msi2 expression by co-immunostaining with well-known markers of airway and alveolar cell types in intact lung tissue will strengthen the existing data and provide more specific information about Msi2 expression and abundancy in relevant cell types. It will be also interesting to know whether Msi2 is expressed or not in other abundant lung cell types such as ciliated and AT1 cells.

      We performed co-staining of Msi2 and CC10 as well as Msi2 and SPC in Figure 1C. In the future we can include additional markers as well as markers for airway and other alveolar cell types.

      - While this set of experiments provide strong evidence that Msi2 is required for tumor progression and growth in lung adenocarcinoma, it is unclear whether normal Msi2+ lung cells are more responsive to transformation or whether Msi2 is upregulated early during the process of tumorigenesis. Future lineage tracing experiments using Msi2-CreER and mouse models of chemically-induced lung carcinogenesis will provide additional data that will fully support this claim.

      Recently, we published data showing that Msi2 is expressed in Clara cells at the bronchoalveolar junction in the lung of our new Msi2-CreERT2 knock-in mouse model4. Furthermore, stabilization of the oncogenic MYC protein in these specific cells to model Myc amplification was sufficient to drive the formation of small-cell lung cancer4. These data excitingly demonstrate that Msi2+ cells are more responsive to transformation after Myc stabilization.

      - In Figure 4F, Patient-derived xenograft (PDX) assays were conducted in 2 patients only and the percentage of cells infected by shRNA-Msi2 is low in both PDX (30% and 10% for patient 1 and 2 respectively). It is surprising that Msi2 downregulation in a small percentage of tumor cells has such a dramatic effect on tumor growth and expansion. Confirmation of this finding with additional patient samples would suggest an important non-cell autonomous role for Msi2 in lung adenocarcinoma.

      In the future we hope to collect more patient samples to further validate the data presented with the first 2 patients shown here. We are not certain about the reason behind the large impact of Msi2 inhibition, but as cancer stem cells drive the formation of the rest of the tumor and also drive the stromal microenvironment, it is possible that when Msi2 is deleted, Msi2- cells no longer form tumors? and also the ability to build the stromal microenvironment is impacted. This possibility needs to be further tested in future experiments.

      References

      (1) Ito, T. Kwon, H. Y., Zimdahl, B., Congdon, K. L., Blum, J., Lento, W. E., Zhao, C., Lagoo, A., Gerrard, G., Foroni, L., Goldman, J., Goh, H., Kim, S. H., Kim, D. W., Chuah, C., Oehler, V. G., Radich, J. P., Jordan, C. T., & Reya, T. Regulation of myeloid leukaemia by the cell-fate determinant Musashi. Nature 466, 765–768 (2010).

      (2) Fox, R. G. Lytle, N. K., Jaquish, D. V., Park, F. D., Ito, T., Bajaj, J., Koechlein, C. S., Zimdahl, B., Yano, M., Kopp, J. L., Kritzik, M., Sicklick, J. K., Sander, M., Grandgenett, P. M., Hollingsworth, M. A., Shibata, S., Pizzo, D., Valasek, M. A., Sasik, R., Scadeng, M., Okano, H., Kim, Y., MacLeod, A. R., Lowy, A. M., & Reya, T. Image-based detection and targeting of therapy resistance in pancreatic adenocarcinoma. Nature 534, 407–411 (2016).

      (3) Zhang, H. Tan, S., Wang, J., Chen, S., Quan, J., Xian, J., Zhang, Ss., He, J., & Zhang, L. Musashi2 modulates K562 leukemic cell proliferation and apoptosis involving the MAPK pathway. Exp Cell Res 320, 119-27 (2014).

      (4) Rajbhandari, N., Hamilton, M., Quintero, C.M., Ferguson, L.P., Fox, R., Schürch, C.M., Wang, J., Nakamura, M., Lytle, N.K., McDermott, M., Diaz, E., Pettit, H., Kritzik, M., Han, H., Cridebring, D., Wen, K.W., Tsai, S., Goggins, M.G., Lowy, A.M., Wechsler-Reya, R.J., Von Hoff, D.D., Newman, A.M., & Reya, T. Single-cell mapping identifies MSI+ cells as a common origin for diverse subtypes of pancreatic cancer. Cancer Cell 41(11):1989-2005.e9 (2023).

    1. eLife assessment

      This fundamental study provides compelling evidence for dysgranular insular involvement in top-down and bottom-up interoceptive processing by building on previous evidence using state-of-the-art methods. Its translational application in ADE patients corroborates the assumption that the mid-insula may indeed be a locus of 'interoceptive disruption' in psychiatric disorders, which underscores the study's high relevance for both body-brain as well as clinical research.

    2. Reviewer #2 (Public Review):

      Summary:

      The authors have conducted an exceptionally informative series of studies investigating the neural basis of interoception in transdiagnostic psychiatric symptoms. By comparing differential and overlapping neural activation during 'top-down' and 'bottom-up' interoceptive tasks, they reveal convergent activation largely localised to the ventral dysgranular subregion ('mid-insula'), which differs in extent between patients and controls, replicating and extending previous suggestions of this region as a central locus of disruption in psychiatric disorders. Their work also reveals different extents of divergent activation in the anterior insula during anticipation of interoceptive disruption. This substantially advances our previous knowledge of the anatomy of interoception, and confirms theoretical predictions of the roles of different cytoarchitectural subregions of the insula in interoceptive dysfunction in mental health conditions.

      Strengths:

      The work is exceptional in terms of breadth and depth, making use of multiple imaging and analysis techniques which are non-standard and go well beyond what is known today. The study is statistically well-powered and the tasks are well-validated in the literature. To my knowledge, these functions of the insula in interoception and mental health have never been compared directly before, so the results are novel and informative for both basic science and psychiatry. The work is strongly theory-driven, building on and directly testing results from influential theories and previous studies. It is likely that the results will strengthen our theoretical models of interoception and advance psychiatric studies of the insula.

      Weaknesses:

      The study has three limitations. (1) The interpretation of the resting-state isoproterenol data could potentially represent fluctuations over time rather than following interoception specifically; future studies should investigate test-retest reliability of this measure. Note this does not preclude the strong conclusions which can be drawn from the authors' task-based data. (2) The transdiagnostic patient sample was almost entirely female, and many were currently taking psychotropic medications; future studies should replicate these effects in unmedicated, sex-balanced samples (3) As the authors point out, there may have been task-specific preprocessing/analysis differences that influenced results, for example due to physiological correction in one but not both tasks; however, there are also merits to this analysis approach, such as comparability with previous studies.

    3. Reviewer #3 (Public Review):

      Summary:<br /> Adamic and colleagues present fMRI data from ADE patients and a healthy control group acquired during two interoceptive tasks (attention and perturbation) from the same session. They report convergent activity within the granular and dysgranular insular cortex during both tasks, with a patient group-specific lateralisation effect. Furthermore, insular functional connectivity was found to be linked to disease severity.

      Strengths:<br /> The study is well-designed and - despite some limitations noted by the authors - provides much-needed insight into the functional pathways of interoceptive processing in health and disease. The manuscript is clear, concise, and well-written.

      Weaknesses:<br /> None remain after the authors' revision.

    4. Reviewer #4 (Public Review):

      Summary:<br /> In the manuscript titled "Hemispheric Divergence of Interoceptive Processing Across Psychiatric Disorders", the authors analyzed a subset of data collected for a larger project investigating interoception in anorexia nervosa and generalized anxiety disorder (ClinicalTrials.gov Identifier: NCT02615119). This study utilized fMRI and various analyses with a special focus on the insula and its connectivity to map the neural commonalities and differences in both top-down and bottom-up interoceptive processing.

      The primary aim was to compare whether these neural activations were quantitatively and qualitatively different in a sample of healthy controls (HC) versus patients diagnosed with anxiety, depression, and/or eating disorders (ADE).

      The study initially recruited 70 patients with primary diagnoses of ADE and 57 HC. After applying exclusion criteria, the final sample consisted of 46 ADE patients and 46 matched HC. Participants underwent task-related and resting-state fMRI scan sessions.

      Specifically, participants performed 2 tasks in fMRI: i) a bottom-up interoceptive (ISO) task involving intravenous infusions of isoproterenol (a peripherally-acting beta-adrenergic receptor agonist) administered in a double-blind, placebo-controlled fashion to alter cardiovascular activity where participants were asked about their visceral awareness; and ii) a top-down interoceptive attention (VIA) task where participants were asked to focus on their visceral sensations triggered by words indicating specific body parts (e.g., STOMACH, HEART, LUNGS) or to pay attention to color changes of the word TARGET during an exteroceptive control task.<br /> Main results show overlapping patterns of neural activation within the dysgranular mid-insula during top-down and bottom-up interoceptive processing with hemispheric differences. The patterns of dysgranular activation distinguished individuals with ADE compared to HC. Also differences in the activation of the anterior agranular insula during periods of interoceptive uncertainty differentiate ADE patients from HC.

      Strengths:<br /> - This is a very nice study that aligns with modern Clinical Neuroscience approaches, as recommended by NIH policy (i.e. RDoC initiative), which puts emphasis describing clinical conditions via transdiagnostic dimensions measured on psychological processes, behaviors, and neural processes rather than merely identifying a series of symptoms.

      I appreciated very much the different analyses that authors performed to characterize differences at the qualitative and quantitative regarding the insular activity and its connectivity during bottom-up and top-down interoceptive processes.

      These findings may open avenues for new studies that will explain the mechanisms underlying these phenomena and provide useful insights for developing novel interventions.

      Weaknesses:<br /> Weakness/Requests of additional clarifications<br /> (1) The sample<br /> (1.1) The authors describe the patient's group as having a primary diagnosis of anxiety, depression, and/or eating disorders. However, Table 1 shows that the majority had Anxiety disorders, some Major Depression (it is not clear which are the percentages of patients that at the time of the study had a concurred problem of major depression, please clarify), and very few had a diagnosis of Anorexia Nervosa. The leftward activation asymmetry and distinct activation patterns in the left dysgranular mid-insula across both the ISO and VIA tasks found on ADE did not correlate with symptoms measured by the SCOFF questionnaire, but correlated with anxiety and depressive symptoms. It would be nice if the authors can comment on these results in relation to eating disorders.

      (1.2) Furthermore, the sample consisted of 5 males and 41 females in the HC group and 1 male and 45 females in the ADE group. In order to generalize these findings, the authors should acknowledge this gender imbalance and discuss whether they expect similar results in a predominantly male sample.

      (2) The procedure<br /> While the fixed order of tasks reflects the primary emphasis on acquiring data from the infusion (ISO) task, this could introduce confounding order effects. The authors should acknowledge this as a limitation of this study.

      (3) The rationale behind the study<br /> - The authors recognized that there was a broader aim behind this data collection. It would be important to clarify a little bit more how the differences in insular areas mapping both (or specifically) bottom-up and top-down interoceptive processes and insular connectivity, recorded in ADE patients compared to healthy controls (HC), contribute to psychiatric diagnoses (hypothesis 3).<br /> For example, they should explain the psychopathological dimensions common to the three patient groups. Are disturbances in bottom-up and top-down interoceptive processing common traits in these patients, reflected in the asymmetric interhemispheric dysgranular mid-insular activation? The link between these disturbances and anatomical evidence of convergence/divergence of top-down vs. bottom-up interoceptive processes should be clearly stated.

      (4) Operationalization of Convergence / Divergence maps underlying top-down and bottom-up interoceptive processes in HC vs ADE patients<br /> It is not clear to me the concept of Convergence / Divergence maps underlying top-down and bottom-up interoceptive processes. The authors want to compare, in HCs and ADE patients, the neural structures that are co-activated (convergence maps) vs those that are uniquely involved (divergence maps) in top-down and bottom-up interoceptive processes in the two groups. Thus, I would expect that these two different analyses would have been performed on similar portions of data, instead different moments of the tasks (= different bottom-up / top-down interoceptive processes) have been analyzed.<br /> Specifically, the convergence maps have been identified by comparing active voxels recorded when participants were focusing on the heart and the lungs (compared to when they were focused on the exteroceptive features of the target) in the VIA task, and during infusions (Peak period) of 2mcg isoproterenol (compared to baseline) in the ISO task. The divergence maps have been identified by comparing voxels uniquely active during the anticipatory phases of both isoproterenol and saline infusions (compared to baseline) and during the peak period of saline dose of the ISO task with respect to when participants focused their attention on the heart and the lungs (compared to when they were focuses on the exteroceptive features) in the VIA task.<br /> I understand the idea of mapping interoceptive uncertainty, however I think that these two analyses do not show commonalities and differences in the neural structures involved in bottom up vs top down processes (in ADE vs HC), but also neural correlates underlying different types of interoceptive processes involving or nor top-down expectations.<br /> According to the authors, which is the most important neural marker that differentiates the ADE group: the difference in hemispheric activations within the left and right dysgranular insula or the less granular anterior insular activation during periods of interoceptive uncertainty? Also, do they reflect different transdiagnostic dimensions?

      (5) Collected physiological measures<br /> The authors speak about cardiorespiratory interoceptive processes, but they only included cardiac measures. Including respiratory changes could provide a more comprehensive comparison between bottom-up signals and top-down attentional processes. Also, I guess that the "STOMACH" trials of the VIA task were not analyzed in this study since those are used in the bigger study and since no gastric measures were collected? Please clarify this point.

      (6) ISO task instructions<br /> To better understand the task and participants' expectations, could the authors clarify the instructions given to participants regarding the isoproterenol and saline infusions. Did the participants have two types of expectations?

      (7) Title of the study<br /> I understand that the term "divergence" in the title refers to the different hemispheric activations characterizing ADE patients compared to HC. However, it also suggests an analysis based on convergence/divergence maps, which might be ambiguous. Could the authors make some small modifications to the title to make it clearer?

      (8) Caption of Figure 7<br /> The caption of Fig.7 notes that no difference in HR was found during the Saline infusion between the HC and ADE groups. However, it would be fair to mention the significant difference in dial ratings observed during the Saline infusion. How do the authors explain this difference?

      Typos<br /> Figure 3 In Figure 3, "Hemispheric divergence", I think, should be corrected to "Hemispheric convergence."

      I believe that by addressing these points, the manuscript will provide a clearer and more comprehensive understanding of the rationale, methods, and findings underlying this study.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      One concern is regarding the experimental task design. Currently, only subjective reports of interoceptive intensity are taken into account, the addition of objective behavioural measures would have given additional value to the study and its impact. 

      To address this comment, we calculated interoceptive accuracy during the cardiorespiratory perturbation (isoproterenol) task according to our previous methods (e.g., Khalsa et al 2009 Int J Psychophys, Khalsa et al, 2015 IJED, Khalsa et al 2020 Psychophys, Hassanpour et al, 2018 NPP, Teed et al 2022 JAMA Psych). Thus, we quantified interoceptive accuracy as the cross-correlation between heart rate and real-time cardiorespiratory perception; specifically, the zero-lag cross-correlation between the heart rate and dial rating time series, and the maximum cross-correlation between these time series while allowing for different temporal delays (or lags). As expected, we found a dose-related increase in interoceptive accuracy from the 0.5mcg moderate perturbation dose (for which neuroimaging maps were not included in the current study) to the 2.0mcg high perturbation dose: zero-lag cross-correlations of 0.25 and 0.61, maximum cross-correlations of 0.41 and 0.73, for 0.5mcg and 2.0mcg doses, respectively, when averaged across all participants in the current study. Taking a closer examination at just the 2.0mcg dose, there were no group differences in zero-lag cross-correlation (t89\=-0.68, p=0.50) or maximum cross-correlation (t87\=-1.0, p=0.32) (depicted below, panel A). Furthermore, there were no associations between either of these interoceptive accuracy measures and the magnitude of activation within bilateral dysgranular convergent regions (F1\= 0.27 and 0.01, p=0.61 and 0.91, for the main effect of percent signal change on max and zero-lag cross-correlations, respectively; depicted below, panel B). When considering the significant correlation between the right insula signal intensity and subjective dial ratings, this lack of association with interoceptive accuracy suggests that the right dysgranular convergent insula was preferentially tracking the magnitude estimation rather than accuracy facet of interoceptive awareness during cardiorespiratory perturbation. Notably, during the saline placebo infusion, there were no systematic changes in heart rate and thus no systematic change in dial rating, precluding the calculation of the cross-correlation as a measure of interoceptive accuracy.

      In reviewing these findings, we did not feel that the results add meaningful information to our interpretation of convergence, and accordingly we have chosen not to include it in the manuscript.

      Author response image 1.

      (A) Interoceptive accuracy during 2.0mcg isoproterenol perturbation, as measured by the maximum (left panel) and zero-lag (right panel) cross-correlation between the time series of heart rate and perceptual dial rating. There were no differences between groups. (B) There were no associations between interoceptive accuracy ratings and signal intensity within the convergence dysgranular insula during the Peak period of 2.0mcg perturbation. 

      This brings me to my second concern. The authors mostly refer to their own previous work, without highlighting other methods used in the field. Some tasks measure interoceptive accuracy or other behavioural outcomes, instead of merely subjective intensity. Expanding the scientific context would aid the understanding and integration of this study with the rest of the field. 

      Given our focus on the neural basis of bottom-up perturbations of interoception, we found it relevant to reference previous studies from our lab, as we built directly upon these previous findings to inform the hypotheses and design of the current experiment, but we can appreciate to provide a broader view of the literature. To expand the contextual frame, we have cited two fMRI meta-analyses of cardiac and gastrointestinal interoception (line 101). There are few studies that have used comparable perturbation approaches during neuroimaging in clinical populations, although we have referenced an exemplar study from the respiratory domain by Harrison et al (2021) in the discussion (line 612). In considering this comment more carefully, we felt that expanding the context further to other task-based methods or behavioral outcomes would shift the focus beyond our emphasis on the insular cortex and top-down/bottom-up convergence, though we have previously discussed and integrated such approaches (e.g., Khalsa & Lapidus, 2016 Front Psych, Khalsa et al, 2018 Biol Psychiatry CNNI, Khalsa et al 2022, Curr Psych Rep).

      Lastly, the suggestions for future research lack substance compared to the richness of the discussion. I recommend a slight revision of the introduction/discussion. There is text in the discussion (explanatory or illuminating) which is better suited to the introduction. 

      When discussing our study limitations (beginning line 732), we offer numerous areas for future research including different preprocessing pipelines, more sophisticated analysis techniques (such as multivariate pattern analysis) that would allow for individual-level inferences regarding convergent patterns of activation within the insula. However, we have revised the last sentence of our limitations paragraph (line 757), and have added more specificity regarding future approaches examining insular and whole-brain interoceptive signal flow.

      Reviewer 2:

      (1) The interpretation of the resting-state data is not quite as clear-cut as the task-based data - as presented currently, changes could potentially represent fluctuations over time rather than following interoception specifically. In contrast, much stronger conclusions can be drawn from the authors' task-based data. …I was also unsure about the interpretation of the resting state analysis (Figure 5), as there was no control condition without interoceptive tasks, meaning any change could represent a change over time that differed between groups and not necessarily a change from pre- to post-interoception. Relatedly I wondered if the authors had calculated the test-retest reliability of the resting state data (e.g. intraclass correlation coefficients for the whole-brain functional connective of convergent dysgranular insula subregions and left middle frontal gyrus before vs. after the tasks), as it would be generally useful for the field to know its stability. 

      We have acknowledged the lack of a control condition in the isoproterenol task (note that the VIA task contained an exteroceptive trial that was included in the brain image contrast analysis). We have also provided further justification for our approach in both the Methods (see the first paragraph “fMRI resting state analysis” subsection) and Results (see the last paragraph of the “Convergence analysis” subsection). We cannot estimate test-retest reliability from the current dataset, given that we do not have resting state scans separated by a similar time frame without the performance of the interoceptive tasks in between (this is now clarified in line 346).

      (2) The transdiagnostic sample could be better characterised in terms of diagnostic information, and was almost entirely female; it is also unclear what the effect of psychotropic medications may have been on the results given the effects of (e.g.) serotonergic medication on the BOLD signal. …Table 1 would be substantially improved by a fuller clinical characterisation of the specific sample included in the analysis - the diagnostic acronyms included in the table caption are not used in the table itself at present and would be an excellent addition, describing, for example, the demographics and symptom scores of patients meeting criteria for MDD, GAD, and AN (and perhaps those meeting criteria for more than 1). Similarly, additional information about the specific medications patients (or controls?) were taking in this study would be welcome (given the potential influences of common medications (e.g. antidepressants) on neurovascular coupling). 

      We have expanded Table 1 to include more specific diagnostic information for the transdiagnostic ADE group (GAD, MDD, and/or AN, as well as other psychiatric diagnoses). We have also included medication use.  

      Finally, Figures 7c and 7d would be greatly improved by showing individual data points if possible, and there may be a typo in the caption 'The cardiac group reported higher cardiac intensity ratings in the ADE group'.

      We have adjusted Figure 7c and 7d to include individual data points, as we agree that this provides greater transparency to the data itself. We have also fixed the typo in the figure caption.

      (3) As the authors point out, there may have been task-specific preprocessing/analysis differences that influenced results, for example, due to physiological correction in one but not both tasks. Although I note this is mentioned in the limitations, it was not clear to me why physiological noise was removed from the ISO task and whether it would be possible to do the same in the VIA task, which could be important for the most robust comparison of the two. 

      In this study, we intentionally chose different task-specific preprocessing pipelines so we could ensure that our results were not simply due to new ways of handling the data. This would allow us to evaluate evidence of replicating the previous group-level findings of insular activation that informed the current approach and hypotheses. We agree that a harmonized approach is also merited, and in a subsequent project using this dataset, we have matched preprocessing pipelines for a connectivity-based analysis, to best facilitate comparison across tasks. We look forward to sharing those results with the scientific community in due time.

      Reviewer 3:

      Maybe I missed it (and my apologies in case I did), but there were a few instances where it was not entirely clear whether differential effects (say between groups or conditions) were compared directly, as would be required. One example is l. 459 ff: The authors report the interesting lateralisation effect for the two interception tasks and say it was absent in the exteroceptive VIA task. As a reader, it would be great to know whether that finding (effect in one condition but not in the other) is meaningful, i.e. whether the direct comparison becomes statistically significant. … The same applies to later comparisons, for example, the correlations reported in l. 465 ff (do these differ from one another?) as well as the FC patterns reported in l. 476 ff - again, there is a specific increase in the ADE group (but not in the HC), but is this between-group difference statistically meaningful? 

      Thank you for these questions. We have added greater detail in the Results section in order to increase clarity regarding which statistical comparisons support which conclusions. Generally, we limited our comparisons to the effect of group, as comparing ADE vs. HC individuals was of primary interest, and in some cases also the effect of hemisphere and epoch. However, we did not perform exhaustive comparisons for all measures, in the interest of keeping the focus of our multi-level multi-task analysis on the hypothesis-driven questions specifically related to convergence of top-down and bottom-up processing.

      Regarding the comment asking if we could compare the lateralization effect directly across task conditions (i.e., is there a greater difference between hemispheres in the ISO task compared to VIA?): unfortunately, directly comparing signal intensity across tasks is not possible because the isoproterenol infusion induces physiological changes that can cause some dose-related signal reduction (we have attempted to address this in the past, e.g., Hassanpour et al, 2018 HumBrMapp). Consequently, our conclusions about spatial localization of top-down and bottom-up convergence are limited to group-level comparisons based on binary activation.

      (2) A second 'major' relates to the intensity ratings (l. 530 ff). I found it very interesting that the ADE group reported higher cardiac, but lower exteroceptive intensity ratings during the VIA task. I understand the authors' approach to collapse within the ADE group, but it would be great to know which subgroup of patients drives this differential effect. It could be the case that the cardiac effect is predominantly present in the anxiety group, while the lower exteroceptive ratings are driven by the depression patients. Even if that were not the case, it would be highly instructive to understand the rating pattern within the anxiety group in greater detail. Do these patients 'just' selectively upregulate interoception, or is there even a perceived downregulation of exteroceptive signalling? 

      We have depicted these data below for reviewers’ reference, showing individual responses for each group (HC and ADE; panel A), as well as the ADE individuals separated by primary diagnosis (GAD = generalized anxiety disorder, n=24; AN = anorexia nervosa, n=16; MDD = major depressive disorder, n=6; panel B). When tested via linear regression, we found no differences in ratings across ADE subgroups (rating ~ subgroup * condition, F3\=1.71, p=0.16 for main effect of subgroup). However, several factors should be considered in interpreting this result: first, all subgroups are small, particularly the MDD sample. Second, while these diagnostic labels refer to the most prominent symptom expression of each patient, every clinical participant in the study had a co-morbid disorder. Therefore, it is not possible to isolate disorder-specific pathology from our multi-diagnostic sample, and for this reason we refrained from including the subgroup-specific data in the manuscript.

      Author response image 2.

      (A) Post-trial ratings during the Visceral Interoceptive attention task, for reference. This is also shown in Figure 7D. (B) The same post-trial ratings in (A), but with the ADE group separated by primary diagnoses. Importantly, although assigned to one diagnostic category on the basis of most prominent symptom expression, most patients had one or more comorbidities across disorders. GAD = Generalized Anxiety Disorder. MDD = major depressive disorder. AN = anorexia nervosa. HC = healthy comparison.

      l. 86: 'Conscious experience' of what, precisely? During the first round of reading, I was wondering about the extent to which consciousness as a general concept will play a role, which could be misleading. 

      We have changed it to “conscious experience of the inner body” in the text. The current study is limited in scope to the neurobiology of conscious perceptions of the inner body, not consciousness as a general phenomenon. We hope this distinction is now clear.

      l.115: Particularly given the focus on predictive processing, I was wondering whether the (slightly outdated) spotlight metaphor is really needed here. 

      While not perfect, we believe it is still valid to metaphorically reference goal-directed attention towards the body as an “attentional spotlight”. Given the concern, we have minimized the focus on this metaphor, and the sentence now reads as follows:

      “Extending beyond these model-based influences are goal-directed activities (also described previously as the ‘attentional spotlight’ effect ((Brefczynski and DeYoe 1999)), whereby focusing voluntary attention towards certain environmental signals not only alters their conscious experience but selectively enhances neural activity in the responsive area of cortex.”

      l. 129 ff: The sentence has three instances of 'and' in it, most likely a typo. 

      We have fixed this in the text.

      l. 245: What do these ratings correspond to, i.e. what was the precise question/instruction? 

      The instructions for subjective ratings in each task are mentioned in the Methods (line 223 for ISO task, line 249 for the VIA task), and we have added more detail regarding the scale used to collect subjective intensity ratings.

      l. 322: Could you provide the equation of the LMEM in the main text? It would be interesting to know e.g. whether participants/patients were included as a random effect. 

      We have provided this equation in the Methods (line 326).

      l. 418 ff: I was confused about the statistical approach here. Why use separate t-tests instead of e.g. another LMEM which would adequately model task and condition factors? 

      We did not use t-tests, but instead used linear regression to look at differences in agranular PSC across groups, hemispheres, and epochs, as well as potential associations between PSC and trait measures. We have adjusted the wording in this Methods paragraph (line 418) to help clarity.

      l. 425: As a general comment, it would be great to provide the underlying scripts openly through GitHub, OSF, ... 

      We agree with this comment, and our main analysis scripts have been posted on our OSF as an addition to the original preregistration of this work (https://osf.io/6nxa3/).

      l. 443: For consistency, please report the degrees of freedom for the X² test.

      l. 454: ... and the F statistic would require two degrees of freedom (only the second is reported).

      l. 523: The t value is reported without degrees of freedom here (but has them in other instances).

      l. 540: Typo ('were showed').

      We have reported degrees of freedom for all statistics.

    1. eLife assessment

      This is a potentially valuable contribution, reporting a deletion analysis of the MSL1 gene to assess how different parts of the protein product interact with the MSL2 protein and roX RNA to affect the association of the MSL complex with the male X chromosome of Drosophila. However, the framework that the MSL complex mediates dosage compensation is outdated and has flaws, and the evidence is currently considered inadequate to support the claims. Because there are many ways to alter viability, sex-specific viability is insufficient to make claims regarding dosage compensation.

    2. Reviewer #2 (Public Review):

      Summary:

      A deletion analysis of the MSL1 gene to assess how different parts of the protein product interact with the MSL2 protein and roX RNA to affect the association of the MSL complex with the male X chromosome of Drosophila was performed.

      Strengths:

      The deletion analysis of the MSL1 protein and the tests of interaction with MSL2 are adequate.

      Weaknesses:

      This reviewer does not adhere to the basic premise of the authors that the MSL complex is the primary mediator of dosage compensation of the X chromosome of Drosophila. Several lines of evidence from various laboratories indicate that it is involved in sequestering the MOF histone acetyltransferase to the X chromosome but there is a constraint on its action there. When the MSL complex is disrupted, there is no overall loss of compensation but there is an increase in autosomal expression. Sun et al (2013, PNAS 110: E808-817) showed that ectopic expression of MSL2 does not increase expression of the X and indeed inhibits the effect of acetylation of H4Lys16 on gene expression. Aleman et al (2021, Cell Reports 35: 109236) showed that dosage compensation of the X chromosome can be robust in the absence of the MSL complex. Together, these results indicate that the MSL complex is not the primary mediator of X chromosome dosage compensation. The authors state that an inverse dosage effect results from a titration of the histone acetylase MOF between the NSL and MSL complexes. This is a misunderstanding of the inverse effect, which is an imbalance of regulatory molecules as described in the citation below. The inverse effect operates in triple X metafemales to produce dosage compensation of the three X chromosomes and a reduced expression of the autosomes (Sun et al 2913 PNAS 110: 7383-7388). There is no MSL complex in metafemales.

      A detailed explanation was provided by Birchler and Veitia (2021, One Hundred Years of Gene Balance: How stoichiometric issues affect gene expression, genome evolution, and quantitative traits. Cytogenetics and Genome Research 161: 529-550). The relevant portions of that article that pertain to Drosophila are quoted below. The cited references can be found in that publication.

      "In Drosophila, the sex chromosomes consist of an X and a Y. The Y in this species contains only a few genes required for male fertility (Zhang et al., 2020). The X consists of approximately 20% of the genome. Thus, females have two X chromosomes and males have one. Muller (1932) found that the expression of genes between the two sexes was similar but when individual genes on the X were varied in dosage they exhibited a proportional dosage effect. Each copy in a male was expressed at about twice the level as each copy in a female. Females with three X chromosomes are highly inviable but when they do survive to the adult stage, Stern (1960) found that they too exhibited dosage compensation in that the expression in the triple X genotype was similar to normal females and males. Studies in triploid flies found that dosage compensation also occurred among X; AAA, XX;AAA, and XXX; AAA genotypes via upregulation of the Xs, where X indicates the dosage of the X and A indicates the triploid nature of the autosomes (see Birchler, 2016 for further discussion). Diploid and triploid females have a similar per gene expression but the other five genotypes each must modulate gene expression by different amounts equivalent to an inverse relationship between the X versus autosomal dosage to achieve a balanced expression between the X and the A (Birchler, 1996).

      Some years ago, mutations were sought in Drosophila that were lethal to males but viable in females. A number of such mutations were found and termed Male Specific Lethal (MSL) loci (Belote and Lucchesi, 1980). Once the products of these genes were identified, they were found to be at high concentrations on the male X chromosome (Kuroda et al., 1991). One of these genes encodes a histone acetyl transferase that acetylates Lysine16 of Histone H4 (Bone et al., 1994; Hilfiker et al., 1997). The recognition of the MSL complex and its association with the male X was an important set of contributions to an understanding of sex chromosome evolution in Drosophila (Kuroda et al., 2016). Thus, the hypothesis arose that the MSL complex accumulated this chromatin modifier on the male X to activate the expression about two-fold to bring about dosage compensation. Other data that contributed to this hypothesis were that when autoradiography of nascent transcription on salivary gland polytene chromosomes was examined in the MSL maleless mutation, the ratio of the number of grains over the X versus an autosomal region was reduced compared to the normal ratio (Belote and Lucchesi, 1980).

      It has been pointed out (Hiebert and Birchler, 1994; Bhadra et al., 1999; Pal Bhadra et al., 2005; Sun et al., 2013a; Birchler, 2016), however, that the grain counts over the X and the autosomes when considered in absolute terms rather than as a ratio show that the X more or less retained dosage compensation and the autosomal numbers are about doubled, i.e. exhibit an inverse dosage effect. The same situation occurs with the msl3 mutation (Okuno et al., 1984), another MSL gene, in that the autoradiographic grain numbers as an absolute measure show retention of X dosage compensation and an autosomal increase. The data treatment to produce an X to A ratio seemed reasonable in the context of the time when all regulation in eukaryotes was considered positive. However, when studies were conducted in such a manner as to assay the absolute effect on gene expression in the maleless mutation, in adults (Hiebert and Birchler, 1994), larvae (Hiebert and Birchler, 1994; Bhadra et al., 1999; 2000; Pal Bhadra et al., 2005), and embryos (Pal Bhadra et al., 2005), the trend was for retention of dosage compensation of X linked genes and an increase in expression of autosomal genes.

      In global studies, if the X to autosomal expression does not change between mutant and normal, one can conclude that dosage compensation is operating. However, a lower X to A ratio could be a loss of compensation or an increased transcriptome size from the increase of the autosomes, as suggested by the absolute data of Belote and Lucchesi (1980) and Okuno et al (1984) and that was visualized directly in embryos (Pal Bhadra et al., 2005). The transcriptome size in aneuploids can change, which cannot be detected in RNA-seq analyses alone (Yang et al., 2021), so it is an important consideration for studies of dosage compensation. It was recently acknowledged that in MSL2 knockdowns the relative X expression is decreased and a moderate autosomal increase is found (Valsecchi et al., 2021b). A similar trend is evident in the microarray data on MSL2 knockdown in SL2 tissue culture cells (Hamada et al., 2005) and in the roX RNA (noncoding RNAs essential for MSL localization on the male X) mutants (Deng and Meller, 2006). This trend is in fact consistent with the absolute data that suggest an increase in the transcriptome size (Figure 7). A global change in transcriptome size can cause a generalized dosage compensation of a single chromosome to appear as a proportional dosage effect (loss of compensation) to some degree (Figure 7).<br /> Examination of expression in triple X metafemales, where there is no MSL complex, found that X-linked genes generally show dosage compensation but there is a generalized inverse effect on the autosomes, which could account for the detrimental effects of metafemales (Birchler et al., 1989; Sun et al., 2013b). An examination in metafemales of alleles of the white eye color gene that do or do not exhibit dosage compensation in males, showed the same response, namely, increased expression if there was no dosage compensation in males and no difference from normal females for the male dosage-compensated alleles (Birchler, 1992). This experiment demonstrated a relationship between the mechanism of dosage compensation in males and metafemales and implicated the inverse dosage effect in both. An involvement of the inverse effect in Drosophila dosage compensation provides an explanation for how the five levels of gene expression can be explained (Birchler, 1996), whereas an all-or-none presence of a complex on the X does not. The stoichiometric relationship of regulatory gene products provides a means to read the relative dosage at multiple doses to produce the appropriate inverse level.

      What then is the function of the MSL complex? It was discovered that the MSL complex will actually constrain the effect of H4 lysine16 acetylation to prevent it from causing an overexpression of genes (Bhadra et al., 1999; 2000; Pal Bhadra et al., 2005; Sun and Birchler 2009; Sun et al., 2013a). Indeed, in the chromatin remodeling Imitation Switch (ISWI) mutants, the male X chromosome was specifically overexpressed suggesting that its normal function is needed for the constraint to occur (Pal Bhadra et al., 2005). Independently, the Mtor nuclear pore component shows a similar specific male X upregulation when Mtor is knocked down and this effect was shown to operate on the transcriptional level (Aleman et al., 2021). Interestingly, the increased expression of the X in the Mtor knockdown is accompanied by an inverse modulation of a substantial subset of autosomal genes, illustrating why the constraining process evolved to counteract male X overexpression. The constraining effect might involve a number of gene products (Birchler, 2016) and is an interesting direction for further study.

      Furthermore, when the H4Lys16 acetylase was individually targeted to reporter genes, there was an increase in expression (Sun et al., 2013a). However, when other members of the MSL complex were present in normal males or ectopically expressed, this increase did not occur (Sun et al., 2013a). It thus appears that the function of the MSL complex is to sequester the acetylase from the autosomes and constrain it on the X (Bhadra et al., 1999; 2000; Pal Bhadra et al., 2005; Sun and Birchler, 2009; Sun et al., 2013a). Indeed, in the Mtor knockdowns, the X linked genes with the greatest upregulation were those with the greatest association with the acetylase and the H4K16ac histone mark (Aleman et al 2021), supporting the idea of a constraining activity that becomes released in the Mtor knockdown. When the MSL complex is disrupted, there is an inverse effect on the autosomes that occurs but in normal circumstances the sequestration mutes this effect. The MSL complex disruption releases the acetylase to be uniformly distributed across all chromosomes as determined cytologically (Bhadra et al., 1999) or via ChIPseq for H4Lys16ac (Valsecchi et al., 2021a). Indeed, the quantity of the H4Lys16ac mark only has a proportional effect on gene expression when the constraining activity is disrupted (Aleman et al., 2021) or when the MSL complex is not present (Sun et al., 2013a). Thus, in normal flies there is a more or less equalized expression of the X and autosomes despite the monosomy for 20% of the genome.

      The component of the complex that is expressed in males and thought to organize the complex to the male X, MSL2, was recently found to also be associated with autosomal dosage sensitive regulatory genes (Valsecchi et al., 2018). MSL2 was found to modulate these autosomal dosage sensitive genes in various directions, which illustrates that MSL2 has a role in dosage balance that goes beyond the X chromosome. This finding is consistent with the evolutionary scenario that the initial attraction of the complex to the X chromosome was to upregulate dosage sensitive genes in hemizygous regions as the progenitor Y became deleted for them, with the constraining activity evolving to prevent an overexpression as the amount of acetylase on the male X increased with time (Birchler, 2016).

      The MSL hypothesis takes an X-centric view that does not accommodate what is now known about dosage effects across the whole genome. The idea that dissolution of the MSL complex would cause reduction in expression of the male X linked genes without any consequences for the autosomes is not consistent with current knowledge of gene regulatory networks and their dosage sensitivity. Indeed, the finding of dosage compensation in large autosomal aneuploids that operates on the transcriptional level (Devlin et al., 1982; 1984; Birchler et al., 1990; Sun et al., 2013c) as well as a predominant inverse effect by the same (Devlin, et al., 1988; Birchler et al., 1990) argues that one must consider the inverse effect for an understanding of the evolution of dosage compensation in Drosophila (and other species). Further discussion of models of Drosophila compensation has been published (Birchler, 2016).

      What is likely to be the most critical issue with sex chromosome evolution is the consequences for dosage sensitive regulatory genes. This fact is nicely illustrated by the retention of these types of genes in different independent vertebrate sex chromosome evolutions (Bellott and Page, 2021). In Drosophila, by contrast, dosage compensation is more of a blanket effect on most but not all X linked genes despite the fact that many genes on the X are unlikely to have dosage detrimental effects, although dosage sensitive genes might have played a role as noted above. The particularly large size of the X in Drosophila compared to the whole genome is potentially a contributing factor because such large genomic imbalance is likely to modulate most genes across the genome. Also, there is no evidence of a WGD in Drosophila as there is in other species for which the inverse effect has been documented (maize, Arabidopsis, yeast, mice, human). These other species have various numbers of retained duplicate dosage sensitive regulatory genes from WGDs. Thus, the relative change of regulatory genes in aneuploids in these species will not be as great compared to some of their interactors in the remainder of the genome, which could result in lesser magnitudes of some trans-acting effects, similarly to how aneuploids in ascending ploidies have fewer effects as described above. The absence of duplicate regulatory genes in Drosophila would predict a stronger inverse effect in general and that could have been capitalized upon to produce dosage compensation of most genes on the X chromosome despite many of them not being dosage critical. While sex chromosome evolution must accommodate dosage sensitive genes for proper development and viability, it could also be capitalized upon to evolve sexual dimorphisms in expression (Sun et al., 2013c)."

      Comments on revised submission:

      The authors did make an effort to address the issue previously raised.

      The authors state that an inverse dosage effect results from a titration of the histone acetylase MOF between the NSL and MSL complexes (lines 87-89). This is a misunderstanding of the inverse effect, which is an imbalance of regulatory molecules. Single regulatory gene dosage series can produce this effect. The inverse effect operates in triple X metafemales to produce dosage compensation of the three X chromosomes and a reduced expression of the autosomes (Sun et al 2913 PNAS 110: 7383-7388). There is no MSL complex in metafemales.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Thank you for taking the time to review our manuscript. We are grateful to reviewer #1 for positive evaluation of our work and for providing valuable comments that will significantly enhance the presentation of our results. We understand reviewer #2's negative assessment because we did not discuss an alternative model of dosage compensation in Drosophila. We will address this omission in the Introduction section of the revised manuscript and remove any controversial statements from other parts of the text. However, it is important to clarify that our study does not focus on the mechanisms of dosage compensation. The main goal of the manuscript was to investigate the assembly of the MSL complex and its specific binding to the Drosophila X chromosome. We utilized male survival data to demonstrate the efficacy of MSL complex binding to the X chromosome, a relationship that has been supported by numerous independent studies. We understand that Reviewer #2 agrees that disruption of the MSL complex binding results in male lethality. As far as we understand, Reviewer #2 suggests that the MSL complex does not activate transcription of X chromosome genes, but instead facilitate the recruitment of MOF protein and potentially other general transcription factors to the X chromosome. This could explain the decrease in autosomal gene expression due to a reduction in activating factors like MOF at autosomal promoters. In the upcoming revision, we aim to strike a balance between the two models that elucidate dosage compensation in Drosophila. We appreciate your feedback and look forward to enhancing the clarity and coherence of our manuscript based on your insightful comments.

      Reviewer #2 (Public Review):

      Summary:

      A deletion analysis of the MSL1 gene to assess how different parts of the protein product interact with the MSL2 protein and roX RNA to affect the association of the MSL complex with the male X chromosome of Drosophila was performed.

      Strengths:

      The deletion analysis of the MSL1 protein and the tests of interaction with MSL2 are adequate.

      We thank the reviewer for the positive assessment of the experimental work done.

      This reviewer does not adhere to the basic premise of the authors that the MSL complex is the primary mediator of dosage compensation of the X chromosome of Drosophila.

      We completely agree with this reviewer's claim. In the Introduction section we attempted to make clear that there are two models for the functional role of specific recruitment of the MSL complex to the X chromosome in males.

      Several lines of evidence from various laboratories indicate that it is involved in sequestering the MOF histone acetyltransferase to the X chromosome but there is a constraint on its action there. When the MSL complex is disrupted, there is no overall loss of compensation but there is an increase in autosomal expression. Sun et al (2013, PNAS 110: E808-817) showed that ectopic expression of MSL2 does not increase expression of the X and indeed inhibits the effect of acetylation of H4Lys16 on gene expression. Aleman et al (2021, Cell Reports 35: 109236) showed that dosage compensation of the X chromosome can be robust in the absence of the MSL complex. Together, these results indicate that the MSL complex is not the primary mediator of X chromosome dosage compensation. The authors use sex-specific lethality as a measure of disruption of dosage compensation, but other modulations of gene expression are the likely cause of these viability effects.

      Sun et al (2013, PNAS 110: E808-817) showed that recruitment of the MSL complex-specific subunit MSL2 or the MOF protein to the UAS promoter resulted in recruitment of the entire MSL complex in males but not transcriptional activation. This important result argues that the MSL complex does not activate transcription. However, it must be taken into account that the GAL4 DNA binding region used to recruit the chimeric MSL2 protein to the UAS promoter was directly fused to the MSL2 RING domain, which is critical for interaction of MSL2 with MSL1 and its ubiquitination activity (this activity could potentially be involved in transcription activation). It also remains poorly understood what happens to the MSL complex after recruitment to the promoters or HAS on the X chromosome. Subcomplex MSL1/MSL3/MOF can acetylate TF and H4K16 during RNA polymerase II elongation, resulting in increasing of transcription. The separate role of MSL2 and MSL1 in the activation of transcription of gene promoters is also shown. Sun et al. showed that in females, recruitment of MOF to the UAS promoter leads to a strong increase in transcription, which is associated with the inclusion of MOF in the non-specific lethal (NSL) complex, which is bound to promoters and is required for strong transcription activation. In males, MOF is preferentially recruited to the UAS promoter in the full MSL complex or perhaps in the MSL1/MSL3/MOF subcomplex, which stimulates transcription during RNA polymerase II elongation much less strongly than NSL complex. The same result was obtained in the Prestel et al. 2010 (Mol Cell 38:815-26). In this study the GAL4 binding sites were inserted upstream of the lacZ and mini-white genes. Activation of transcription after recruitment of GAL4-MOF to the GAL4 sites was studied in males and females. As in Sun et al. 2013, strong activation of the reporter was observed in females. A weak transcriptional activation of the reporter gene in males was shown, and the MOF protein was detected not only on the promoter, but also in the coding and 3’ regions of the reporter.

      We do not understand how the paper by Aleman et al (Cell Reports 35: 109236, 2021) is consistent with the hypothesis that the MSL complex is not involved in the transcriptional activation of X chromosomal genes. The main conclusions of this paper: 1) Inactivation of Mtor leads to selective activation of the male X chromosome. 2) Mtor-driven attenuation of male X occurs in broad domains linked by the MSL complex. 3) Mtor genetically interacts with MSL components and reduces male mortality; 4) Mtor restrains dose-compensated expression at the level of nascent transcription. Thus, the paper shows that the MSL complex has an activator activity that is partially inhibited by Mtor. Accordingly, inactivation of Mtor only partially restored the survival of males in which dosage compensation was not completely inactivated.

      A detailed explanation was provided by Birchler and Veitia (2021, One Hundred Years of Gene Balance: How stoichiometric issues affect gene expression, genome evolution, and quantitative traits. Cytogenetics and Genome Research 161: 529-550).

      We agree that an alternative model of the dosage compensation mechanism is reasonable. We can assume that both mechanisms can function jointly provide effective dosage compensation in Drosophila males. At the suggestion of the reviewer to reconsider the entire context of the article, we will make many small changes throughout the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Overall, I found the text well written and the figures logically organized (especially Figure 5, which had the potential to confuse). The authors especially excelled in bringing together the decades of literature in the Discussion.

      I offer several suggestions to improve the readability:

      Consider presenting the coiled-coil domain homology in Figure 1A as a contrast for the N-terminal region, which the authors claim is poorly conserved.

      We added the coiled-coil domain homology in Figure 1A in new version of the manuscript.

      It is difficult to visualize the red MSL2 in Figure 2; the green and red panels should be presented separately in the main text, as they are in the Supplemental Figure 2.

      We prepared Figure 2 with separate green and red panels.

      The ChIP-seq experiments for MSL proteins are well presented, but in my opinion, add little to the overall conclusions:

      Figure 6 mostly recapitulates what has already been published and utilized by several groups, most recently the authors themselves (Tikhonova 2019): that MSL expressed in females targets the X/HAS, similar to in males. While these are nice supporting data for the female transgenic system, I do not believe this figure should be prominently featured as if this is a novelty of the current study.

      We fully agree with the reviewer's comment about the limitation of scientific novelty in Figure 6. It has an auxiliary meaning. Therefore, we transferred this figure to Supplementary material (as supplement for Figure 5).

      The ChIP experiments in Figure 7 agree with the conclusions in Figures 2 and 3 (polytene chromosome immunostaining) when it comes to X/autosome localization. I believe it would help with the flow of the paper if these experiments were combined or at least placed closer together in the narrative, rather than falling at the end.

      We moved Figure 7 (in new version – Figure 5) closer to polytene chromosome immunostaining. We agree with reviewer that this placement of the figure will make it easier to perceive the meaning of the article as a whole.

      I find Figure 8 difficult to understand, especially since the "clusters" are not annotated in the figure, but are described in the text. I struggled to follow the authors' conclusions based on these data. The authors could clarify the figure with annotations, although to be honest I do not currently see the value of this analysis/figure.

      In the new version of the article, we changed this part: we removed clusters for autosomes as difficult for understanding and non-important for this manuscript. Also we tried to place emphasis more clearly in the text of the article for clusters 1 and 2 that characterize HAS.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This very interesting manuscript proposes a general mechanism for how activating signaling proteins respond to species-specific signals arising from a variety of stresses. In brief, the authors propose that the activating signal alters the structure by a universal allosteric mechanism.

      Strengths:

      The unitary mechanism proposed is appealing and testable. They propose that the allosteric module consists of crossed alpha-helical linkers with similar architecture and that their attached regulatory domains connect to phosphatases or other molecules through coiled-coli domains, such that the signal is transduced via rigidifying the alpha helices, permitting downstream enzymatic activity. The authors present genetic and structural prediction data in favor of the model for the system they are studying, and stronger structural data in other systems.

      Weaknesses:

      The evidence is indirect - targeted mutations, structural predictions, and biochemical data. Therefore, these important generalizable conclusions are not buttressed by impeccable data, which would require doing actual structures in B. subtilis, confirming experiments in other organisms, and possibly co-evolutionary coupling. In the absence of such data, it is not possible to rule out variant models.

      We thank the reviewer for their feedback. A challenge of studying flexible proteins is that it is often not possible to directly obtain high resolution structural data. For the case of B. subtilis RsbU, the independent experimental approaches we applied (including two unbiased genetic screens, targeted mutagenesis, SAXS, enzymology, and structure prediction, which includes evolutionary coupling) converged upon a model for activation, which we feel is well supported. Frustratingly, our attempts at determining high resolution experimental structures have been unsuccessful, which we think is due to the flexibility of the proteins revealed by our SAXS experiments. For example, we collected X-ray diffraction data from crystals of a fragment of B. subtilis RsbU containing the N-terminal domain and linker in which the linker was almost entirely disordered in the maps. We agree that doing experiments in other organisms would be valuable next steps to test the hypothesis that this coiled-coil based transduction mechanism is conserved across species, and will modify the text to differentiate this more speculative section of the manuscript. Based on this critique (and the critiques of the other reviewers), we plan to do energetic analysis of the predicted coiled coils from the enzymes we analyzed from other species and to incorporate this into the manuscript. Finally, in the manuscript, we have highlighted that this mechanism is not the only mechanism for activation of other proteins with effector domains connected to linkers, but rather one of many mechanisms (Fig 5G). The reviewer additionally made helpful suggestions about the text in detailed comments that we will incorporate as appropriate.

      Reviewer #2 (Public review):

      Summary:<br /> While bacteria have the ability to induce genes in response to specific stresses, they also use the General Stress Response (GSR) to deal with growth conditions that presumably include a larger range of stresses (for instance, stationary phase growth). The activation of GSR-specific sigma factors is frequently at the heart of the induction of a GSR. Given the range of stresses that can lead to GSR induction, the regulatory inputs are frequently complex. In B. subtilis, the stressosome, a multi-protein complex, contains a set of proteins that, upon appropriate stresses, initiate partner switching cascades that free the sigma B sigma factor from an anti-sigma. The focus here is on the mode of activation of RsbU, a serine/threonine phosphatase of the PPM family, leading to sigB activation. RbsT, a component of the degradosome interacts with RsbU upon stress, activating the phosphatase activity. Once active, RsbU dephosphorylates its target (RsbV, an anti-antisigma), which in turn binds the anti-sigma. The conclusion is that flexible linker domains upstream of the phosphatase domain are the target for activation, via binding of proteins to the N-terminal domain, resulting in a crossed-linker dimeric structure. The authors then use the information on RsbU to suggest that parallel approaches are used to activate PPM phosphatases for the GSR response in other bacteria. (Biology vs. Mechanism, evolution?)

      Strengths and Weaknesses:<br /> Many of these have to do with clarifying what was done and why. This includes the presentation and content of the figures.<br /> One issue relates to the background and context. A bit more information on the stresses that release RsbT would be useful here. The authors might also consider a figure showing the major conclusions and parallels for SpoIIE activation and possibly other partner switches that are discussed, introducing the switch change more clearly to set the stage for the work here (and the generalization). There are a lot of players to keep track of.

      We plan to carefully review the manuscript to improve the clarity of presentation and background. In particular, we thank the reviewer for pointing out the missing information about the release of RsbT from the stressosome. We will incorporate this information into the introduction and provide an additional figure. The reviewer additionally provided detailed helpful comments that we will incorporate in the text and figures.

      Reviewer #3 (Public review):

      Summary:<br /> The authors present a study building on their previous work on activation of the general stress response phosphatase, RsbU, from Bacillus subtilis. Using computed structural models of the RsbU dimer the authors map previously identified activating mutations onto the structure and suggest further protein variants to test the role of the predicted linker helix and the interaction with RsbT on the activation of the phosphatase activity.<br /> Using in vivo and in vitro activity assays, the authors demonstrate that linker variants can constitutively activate RsbU and increase the affinity of the protein for RsbT, thus showing a link between the structure of the linker region and RsbT binding.<br /> Small angle X-ray scattering experiments on RsbU variants alone, and in complex with RsbT show structural changes consistent with a decreased flexibility of the RsbU protein, which is hypothesised to indicate a disorder-order transition in the linker when RsbT binds. This interpretation of the data is consistent with the biochemical data presented by the authors.<br /> Further computed structure models are presented for other protein phosphates from different bacterial species and the authors propose a model for phosphatase activation by partner binding. They compare this to the activation mechanisms proposed for histidine kinase two-component systems and GGDEF proteins and suggest the individual domains could be swapped to give a toolkit of modular parts for bacterial signalling.

      Strengths:<br /> The key mutagenesis data is presented with two lines of evidence to demonstrate RsbU activation - in vivo sigma-b activation assays utilising a beta-galactosidase reporter and in vitro activity assays against the RsbV protein, which is the downstream target of RsbU. These data support the hypothesis for RsbT binding to the RsbU linker region as well as the dimerisation domain to activate the RsbU activity.

      Weaknesses:<br /> Small angle scattering curves are difficult to unambiguously interpret, but the authors present reasonable interpretations that fit with the biochemical data presented. These interpretations should be considered as good models for future testing with other methods - hydrogen/deuterium exchange mass spectrometry, would be a good additional method to use, as exchange rates in the linker region would be affected significantly by the disorder/order transition on RsbT binding.

      We agree with the reviewer that the SAXS data has inherent ambiguity due to the nature of the measurement. However, SAXS is one of the best techniques to directly assess conformational flexibility. Our scattering data for RsbU have multiple signatures of flexibility supporting a high confidence conclusion. While the scattering data support a reduction in flexibility for the RsbT/RsbU complex, we agree that a high resolution structure would be valuable. However the combination of the scattering data with our biochemical and genetic data supports the validity of the AlphaFold predicted model. We thank the reviewer for the suggestion of future hydrogen/deuterium exchange experiments that would be complementary, but which we feel are beyond the scope of this work.

      The interpretation of the computed structure models should be toned down with the addition of a few caveats related to the bias in the models returned by AlphaFold2. For the full-length models of RsbU and other phosphatase proteins, the relationship of the domains to each other is likely to be the least reliable part of the models - this is apparent from the PAE plots shown in Supplementary Figure 8. Furthermore, the authors should show models coloured by pLDDT scores in an additional supplementary figure to help the reader interpret the confidence level of the predicted structures.

      We thank the reviewer for suggestions on how to clarify the discussion of AlphaFold models. We will decrease the emphasis on the computed models in the text and will add figures with the models colored by the pLDDT scores to aid in the interpretation.

    2. eLife assessment

      This important study combines genetic analysis, biochemistry, and structural modeling to reveal new insights into how changes in protein-protein structure activate signal transduction as part of the bacterial general stress response. The data, collected using validated and standard methods, and the interpretations are solid, although additional experimental structural evidence would strengthen the proposed model and its potential application to other systems. This manuscript, which provides multiple avenues for follow-up studies, will be of broad interest to microbiologists, structural biologists, and cell biologists.

    3. Reviewer #1 (Public review):

      Summary:

      This very interesting manuscript proposes a general mechanism for how activating signaling proteins respond to species-specific signals arising from a variety of stresses. In brief, the authors propose that the activating signal alters the structure by a universal allosteric mechanism.

      Strengths:

      The unitary mechanism proposed is appealing and testable. They propose that the allosteric module consists of crossed alpha-helical linkers with similar architecture and that their attached regulatory domains connect to phosphatases or other molecules through coiled-coli domains, such that the signal is transduced via rigidifying the alpha helices, permitting downstream enzymatic activity. The authors present genetic and structural prediction data in favor of the model for the system they are studying, and stronger structural data in other systems.

      Weaknesses:

      The evidence is indirect - targeted mutations, structural predictions, and biochemical data. Therefore, these important generalizable conclusions are not buttressed by impeccable data, which would require doing actual structures in B. subtilis, confirming experiments in other organisms, and possibly co-evolutionary coupling. In the absence of such data, it is not possible to rule out variant models.

    4. Reviewer #2 (Public review):

      Summary:

      While bacteria have the ability to induce genes in response to specific stresses, they also use the General Stress Response (GSR) to deal with growth conditions that presumably include a larger range of stresses (for instance, stationary phase growth). The activation of GSR-specific sigma factors is frequently at the heart of the induction of a GSR. Given the range of stresses that can lead to GSR induction, the regulatory inputs are frequently complex. In B. subtilis, the stressosome, a multi-protein complex, contains a set of proteins that, upon appropriate stresses, initiate partner switching cascades that free the sigma B sigma factor from an anti-sigma. The focus here is on the mode of activation of RsbU, a serine/threonine phosphatase of the PPM family, leading to sigB activation. RbsT, a component of the degradosome interacts with RsbU upon stress, activating the phosphatase activity. Once active, RsbU dephosphorylates its target (RsbV, an anti-antisigma), which in turn binds the anti-sigma. The conclusion is that flexible linker domains upstream of the phosphatase domain are the target for activation, via binding of proteins to the N-terminal domain, resulting in a crossed-linker dimeric structure. The authors then use the information on RsbU to suggest that parallel approaches are used to activate PPM phosphatases for the GSR response in other bacteria. (Biology vs. Mechanism, evolution?)

      Strengths and Weaknesses:

      Many of these have to do with clarifying what was done and why. This includes the presentation and content of the figures.

      One issue relates to the background and context. A bit more information on the stresses that release RsbT would be useful here. The authors might also consider a figure showing the major conclusions and parallels for SpoIIE activation and possibly other partner switches that are discussed, introducing the switch change more clearly to set the stage for the work here (and the generalization). There are a lot of players to keep track of.

    5. Reviewer #3 (Public review):

      Summary:

      The authors present a study building on their previous work on activation of the general stress response phosphatase, RsbU, from Bacillus subtilis. Using computed structural models of the RsbU dimer the authors map previously identified activating mutations onto the structure and suggest further protein variants to test the role of the predicted linker helix and the interaction with RsbT on the activation of the phosphatase activity.

      Using in vivo and in vitro activity assays, the authors demonstrate that linker variants can constitutively activate RsbU and increase the affinity of the protein for RsbT, thus showing a link between the structure of the linker region and RsbT binding.

      Small angle X-ray scattering experiments on RsbU variants alone, and in complex with RsbT show structural changes consistent with a decreased flexibility of the RsbU protein, which is hypothesised to indicate a disorder-order transition in the linker when RsbT binds. This interpretation of the data is consistent with the biochemical data presented by the authors.

      Further computed structure models are presented for other protein phosphates from different bacterial species and the authors propose a model for phosphatase activation by partner binding. They compare this to the activation mechanisms proposed for histidine kinase two-component systems and GGDEF proteins and suggest the individual domains could be swapped to give a toolkit of modular parts for bacterial signalling.

      Strengths:

      The key mutagenesis data is presented with two lines of evidence to demonstrate RsbU activation - in vivo sigma-b activation assays utilising a beta-galactosidase reporter and in vitro activity assays against the RsbV protein, which is the downstream target of RsbU. These data support the hypothesis for RsbT binding to the RsbU linker region as well as the dimerisation domain to activate the RsbU activity.

      Weaknesses:

      Small angle scattering curves are difficult to unambiguously interpret, but the authors present reasonable interpretations that fit with the biochemical data presented. These interpretations should be considered as good models for future testing with other methods - hydrogen/deuterium exchange mass spectrometry, would be a good additional method to use, as exchange rates in the linker region would be affected significantly by the disorder/order transition on RsbT binding.

      The interpretation of the computed structure models should be toned down with the addition of a few caveats related to the bias in the models returned by AlphaFold2. For the full-length models of RsbU and other phosphatase proteins, the relationship of the domains to each other is likely to be the least reliable part of the models - this is apparent from the PAE plots shown in Supplementary Figure 8. Furthermore, the authors should show models coloured by pLDDT scores in an additional supplementary figure to help the reader interpret the confidence level of the predicted structures.

    1. eLife assessment

      This study provides potentially highly valuable new insight into the role of Fgf signalling in SUFU mutation-linked cerebellar tumors and indicates novel therapeutic interventions via inhibition of Fgf signalling. The evidence supporting the major claims, however, is at this point currently incomplete. A more robust analysis of gene expression patterns and deeper mechanistic insight would significantly enhance this study, which could have wide-ranging implications for the treatment of specific cerebellar tumors.

    2. Reviewer #1 (Public Review):

      Summary:

      SUFU modulates Sonic hedgehog (SHH) signaling and is frequently mutated in the B-subtype of SHH-driven medulloblastoma. The B-subtype occurs mostly in infants, is often metastatic, and lacks specific treatment. Yabut et al. found that Fgf5 was highly expressed in the B-subtype of SHH-driven medulloblastoma by examining a published microarray expression dataset. They then investigated how Fgf5 functions in the cerebellum of mice that have embryonic Sufu loss of function. This loss was induced using the hGFAP-cre transgene, which is expressed in multiple cell types in the developing cerebellum, including granule neuron precursors (GNPs) derived from the rhombic lip. By measuring the area of Pax6+ cells in the external granule cell layer (EGL) of Sufu-cKO mice at postnatal day 0, they find Pax6+ cells occupy a larger area in the posterior lobe adjacent to the secondary fissure, which is poorly defined. They show that Fgf5 RNA and phosphoErk1/2 immunostaining are also higher in the same disrupted region. Some of the phosphoErk1/2+ cells are proliferative in the Sufu-cKO. Western blot analysis of Gli proteins that modulate SHH signaling found reduced expression and absence of Gli1 activity in the region of cerebellar dysgenesis in Sufu-cKO mice. This suggests the GNP expansion in this region is independent of SHH signaling. Amazingly, intraventricular injection of the FGFR1-2 antagonist AZD4547 from P0-4 and examined histologically at P7 found the treatment restored cytoarchitecture in the cerebella of Sufu-cKO mice. This is further supported by NeuN immunostaining in the internal granule cell layer, which labels mature, non-diving neurons, and KI67 immunostaining, indicating dividing cells, and primarily found in the EGL. The mice were treated beginning at a timepoint when cerebellar cytoarchitecture was shown to be disrupted and it is indistinguishable from control following treatment. Figure 3 presents the most convincing and exciting data in this manuscript.

      Sufu-cKO do not readily develop cerebellar tumors. The authors detected phosphorylated H2AX immunostaining, which labels double-strand breaks, in some cells in the EGL in regions of cerebellar dysgenesis in the Sufu-cKO, as was cleaved Caspase 3, a marker of apoptosis. P53, downstream of the double-strand break pathway, the protein was reduced in Sufu-cKO cerebellum. Genetically removing p53 from the Sufu-cKO cerebellum resulted in cerebellar tumors in 2-month old mice. The Sufu;p53-dKO cerebella at P0 lacked clear foliation, and the secondary fissure, even more so than the Sufu-cKO. Fgf5 RNA and signaling (pERK1/2) were also expressed ectopically.

      The conclusions of the paper are largely supported by the data, but some data analysis need to be clarified and extended.

      (1) The rationale for examining Fgf5 in medulloblastoma is not sufficiently convincing. The authors previously reported that Fgf15 was upregulated in neocortical progenitors of mice with conditional loss of Sufu (PMID: 32737167). In Figure 1, the authors report FGF5 expression is higher in SHH-type medulloblastoma, especially the beta and gamma subtypes mostly found in infants. These data were derived from a genome-wide dataset and are shown without correction for multiple testing, including other Fgfs. Showing the expression of other Fgfs with FDR correction would better substantiate their choice or moving this figure to later in the manuscript as support for their mouse investigations would be more convincing.

      (2) The Sufu-cKO cerebellum lacks a clear anchor point at the secondary fissure and foliation is disrupted in the central and posterior lobes. It would be helpful for the authors to review Sudarov & Joyner (PMID: 18053187) for nomenclature specific to the developing cerebellum.

      (3) The metrics used to quantify cerebellar perimeter and immunostaining are not sufficiently described. It is unclear whether the individual points in the bar graph represent a single section from independent mice, or multiple sections from the same mice. For example, in Figures 2B-D. This also applies to Figure 3C-D.

      (4) The data on Fgf5 RNA expression presented in Figure 2E are not sufficiently convincing. The perimeter and cytoarchitecture of the cerebellum are difficult to see and the higher magnification shown in 2F should be indicated in 2E.

      (5) The data presented in Figure 3 are not sufficiently convincing. The number of cells double positive for pErk and KI67 (Figure 3B) are difficult to see and appear to be few, suggesting the quantification may be unreliable.

      (6) The data presented in Figure 4F-J would be more convincing with quantification. The Sufu;p53-dKO appears to have a thickened EGL across the entire vermis perimeter, and very little foliation, relative to control and single cKO cerebella. This is a more widespread effect than the more localized foliation disruption in the Sufu-cKO.

      (7) Figure 5 does not convincingly summarize the results. Blue and purple cells in sagittal cartoon are not defined. Which cells express Fgf5 (or other Fgfs) has not been determined. The yellow cells are not defined in relation to the initial cartoon on the left.

    3. Reviewer #2 (Public Review):

      Summary:

      Mutations in SUFU are implicated in SHH medulloblastoma (MB). SUFU modulates Shh signaling in a context-dependent manner, making its role in MB pathology complex and not fully understood. This study reports that elevated FGF5 levels are associated with a specific subtype of SHH MB, particularly in pediatric cases. The authors demonstrate that Sufu deletion in a mouse model leads to abnormal proliferation of granule cell precursors (GCPs) at the secondary fissure (region B), correlating with increased Fgf5 expression. Notably, pharmacological inhibition of FGFR restores normal cerebellar development in Sufu mutant mice.

      Strengths:

      The identification of increased FGF5 in subsets of MB is novel and a key strength of the paper.

      Weaknesses:

      The study appears incomplete despite the potential significance of these findings. The current paper does not fully establish the causal relationship between Fgf5 and abnormal cerebellar development, nor does it clarify its connection to SUFU-related MB. Some conclusions seem overstated, and the central question of whether FGFR inhibition can prevent tumor formation remains untested.

    4. Reviewer #3 (Public Review):

      Summary:

      The interaction between FGF signaling and SHH-mediated GNP expansion in MB, particularly in the context of Sufu LoF, has just begun to be understood. The manuscript by Yabut et al. establishes a connection between ectopic FGF5 expression and GNP over-expansion in a late-stage embryonic Sufu LoF model. The data provided links region-specific interaction between aberrant FGF5 signaling with the SHH subtype of medulloblastoma. New data from Yabut et al. suggest that ectopic FGF5 expression correlates with GNP expansion near the secondary fissure in Sufu LoF cerebella. Furthermore, pharmacological blockade of FGF signaling inhibits GNP proliferation. Interestingly, the data indicate that the timing of conditional Sufu deletion (E13.5 using the hGFAP-Cre line) results in different outcomes compared to later deletion (using Math1-cre line, Jiwani et al., 2020). This study provides significant insights into the molecular mechanisms driving GNP expansion in SHH subgroup MB, particularly in the context of Sufu LoF. It highlights the potential of targeting FGF5 signaling as a therapeutic strategy. Additionally, the research offers a model for better understanding MB subtypes and developing targeted treatments.

      Strengths:

      One notable strength of this study is the extraction and analysis of ectopic FGF5 expression from a subset of MB patient tumor samples. This translational aspect of the study enhances its relevance to human disease. By correlating findings from mouse models with patient data, the authors strengthen the validity of their conclusions and highlight the potential clinical implications of targeting FGF5 in MB therapy.

      The data convincingly show that FGFR signaling activation drives GNP proliferation in Sufu, conditional knockout models. This finding is supported by robust experimental evidence, including pharmacological blockade of FGF signaling, which effectively inhibits GNP proliferation. The clear demonstration of a functional link between FGFR signaling and GNP expansion underscores the potential of FGFR as a therapeutic target in SHH subgroup medulloblastoma.

      Previous studies have demonstrated the inhibitory effect of FGF2 on tumor cell proliferation in certain MB types, such as the ptc mutant (Fogarty et al., 2006)(Yaguchi et al., 2009). Findings in this manuscript provide additional support suggesting multiple roles for FGF signaling in cerebellar patterning and development.

      Weaknesses:

      In the GEO dataset analysis, where FGF5 expression is extracted, the reporting of the P-value lacks detail on the statistical methods used, such as whether an ANOVA or t-test was employed. Providing comprehensive statistical methodologies is crucial for assessing the rigor and reproducibility of the results. The absence of this information raises concerns about the robustness of the statistical analysis.

      Another concern is related to the controls used in the study. Cre recombinase induces double-strand DNA breaks within the loxP sites, and the control mice did not carry the Cre transgene (as stated in the Method section), while Sufu-cKO mice did. This discrepancy necessitates an additional control group to evaluate the effects of Cre-induced double-strand breaks on phosphorylated H2AX-DSB signaling. Including this control would strengthen the validity of the findings by ensuring that observed effects are not artifacts of Cre recombinase activity.

      Although the use of the hGFAP-Cre line allows genetic access to the late embryonic stage, this also targets multiple celltypes, including both GNPs and cerebellar glial cells. However, the authors focus primarily on GNPs without fully addressing the potential contributions of neuron-glial interaction. This oversight could limit the understanding of the broader cellular context in which FGF signaling influences tumor development.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      SUFU modulates Sonic hedgehog (SHH) signaling and is frequently mutated in the B-subtype of SHH-driven medulloblastoma. The B-subtype occurs mostly in infants, is often metastatic, and lacks specific treatment. Yabut et al. found that Fgf5 was highly expressed in the B-subtype of SHH-driven medulloblastoma by examining a published microarray expression dataset. They then investigated how Fgf5 functions in the cerebellum of mice that have embryonic Sufu loss of function. This loss was induced using the hGFAP-cre transgene, which is expressed in multiple cell types in the developing cerebellum, including granule neuron precursors (GNPs) derived from the rhombic lip. By measuring the area of Pax6+ cells in the external granule cell layer (EGL) of Sufu-cKO mice at postnatal day 0, they find Pax6+ cells occupy a larger area in the posterior lobe adjacent to the secondary fissure, which is poorly defined. They show that Fgf5 RNA and phosphoErk1/2 immunostaining are also higher in the same disrupted region. Some of the phosphoErk1/2+ cells are proliferative in the Sufu-cKO. Western blot analysis of Gli proteins that modulate SHH signaling found reduced expression and absence of Gli1 activity in the region of cerebellar dysgenesis in Sufu-cKO mice. This suggests the GNP expansion in this region is independent of SHH signaling. Amazingly, intraventricular injection of the FGFR1-2 antagonist AZD4547 from P0-4 and examined histologically at P7 found the treatment restored cytoarchitecture in the cerebella of Sufu-cKO mice. This is further supported by NeuN immunostaining in the internal granule cell layer, which labels mature, non-diving neurons, and KI67 immunostaining, indicating dividing cells, and primarily found in the EGL. The mice were treated beginning at a timepoint when cerebellar cytoarchitecture was shown to be disrupted and it is indistinguishable from control following treatment. Figure 3 presents the most convincing and exciting data in this manuscript.

      Sufu-cKO do not readily develop cerebellar tumors. The authors detected phosphorylated H2AX immunostaining, which labels double-strand breaks, in some cells in the EGL in regions of cerebellar dysgenesis in the Sufu-cKO, as was cleaved Caspase 3, a marker of apoptosis. P53, downstream of the double-strand break pathway, the protein was reduced in Sufu-cKO cerebellum. Genetically removing p53 from the Sufu-cKO cerebellum resulted in cerebellar tumors in 2-month old mice. The Sufu;p53-dKO cerebella at P0 lacked clear foliation, and the secondary fissure, even more so than the Sufu-cKO. Fgf5 RNA and signaling (pERK1/2) were also expressed ectopically.

      The conclusions of the paper are largely supported by the data, but some data analysis need to be clarified and extended.

      (1) The rationale for examining Fgf5 in medulloblastoma is not sufficiently convincing. The authors previously reported that Fgf15 was upregulated in neocortical progenitors of mice with conditional loss of Sufu (PMID: 32737167). In Figure 1, the authors report FGF5 expression is higher in SHH-type medulloblastoma, especially the beta and gamma subtypes mostly found in infants. These data were derived from a genome-wide dataset and are shown without correction for multiple testing, including other Fgfs. Showing the expression of other Fgfs with FDR correction would better substantiate their choice or moving this figure to later in the manuscript as support for their mouse investigations would be more convincing.

      To assess FGF5 (ENSG00000138675) expression in MB tissues, we used Geo2R (Barrett et al., 2013) to analyze published human MB subtype expression arrays from accession no. GSE85217 (Cavalli et al., 2017). GEO2R is an interactive web tool that compares expression levels of genes of interest (GOI) between sample groups in the GEO series using original submitter-supplied processed data tables. We entered the GOI Ensembl ID and organized data sets according to age and MB subgroup or MBSHH subtype classifications. GEO2R results presented gene expression levels as a table ordered by FDR-adjusted (Benjamini & Hochberg) p-values, with significance level cut-off at 0.05, processed by GEO2R’s built-in limma statistical test. Resulting data were subsequently exported into Prism (GraphPad). We generated scatter plots presenting FGF5 expression levels across all MB subgroups (Figure 1A) and MBSHH subtypes (Figure 1D). We performed additional statistical analyses to compare FGF5 expression levels between MB subgroups and MBSHH subtypes and graphed these data as violin plots (Figure 1B, 1C, and 1E). For these analyses, we used one-way ANOVA with Holm-Sidak’s multiple comparisons test, single pooled variance. P value ≤0.05 was considered statistically significant. Graphs display the mean ± standard error of the mean (SEM).

      Author response image 1.

      Comparative expression of FGF ligands, FGF5, FGF10, FGF12, and FGF19, across all MB subgroups. FGF12 expression is not significantly different, while FGF5, FGF10, and FGF19, show distinct upregulation in MBSHH subgroup (MBWNT n=70, MBSHH n=224, MBGR3 n=143, MBGR4 n=326).

      Expression of the 21 known FGF ligands were also analyzed. Many FGFs did not exhibit differential expression levels in MBSHH compared to other MB subgroups, such as with FGF12 in Figure 1. FGF5, FGF10, and FGF19 (the human orthologue of mouse FGF15) all showed specific upregulation in MBSHH compared to other MB subgroups (Author response image 1), supporting our previous observations that FGF15 is a downstream target of SHH signaling (Yabut et al., 2020), as the reviewer pointed out. However, further stratification of MBSHH patient data revealed that only FGF5 specifically showed upregulation in infants with MBSHH (MBSHHb and MBSHHg Author response image 2) indicating a more prominent role for FGF5 in the developing cerebellum and driver of MBSHH tumorigenesis in this dynamic environment.

      Author response image 2.

      Comparative expression of FGF5, FGF10, and FGF19 in different MBSHH subtypes. FGF5 specifically show mRNA relative levels above 6 in 81% of MBSHH infant patient tumors (n=80 MBSHHb and MBSHHg tumors) unlike 35% of MBSHHa  (n=65) or 0% of MBSHHd  (n=75) tumors.

      (2) The Sufu-cKO cerebellum lacks a clear anchor point at the secondary fissure and foliation is disrupted in the central and posterior lobes. It would be helpful for the authors to review Sudarov & Joyner (PMID: 18053187) for nomenclature specific to the developing cerebellum.

      The reviewers are correct that the cerebellar foliation is severely disrupted in central and posterior lobes, as per Sudarov and Joyner (Neural Development 2007). This nomenclature may be referred to describe the regions referred in this manuscript.

      (3) The metrics used to quantify cerebellar perimeter and immunostaining are not sufficiently described. It is unclear whether the individual points in the bar graph represent a single section from independent mice, or multiple sections from the same mice. For example, in Figures 2B-D. This also applies to Figure 3C-D.

      All quantification were performed from 2-3 20 um cerebellar sections of 3-6 independent mice per genotype analyzed. Individual points in the bar graphs represent the average cell number (quantified from 2-3 sections) from each mice. Figure 2B show data points from n=4 mice per genotype. Figure 2C show data from n=3 mice per genotype. Figure 2D show data from n=6 mice per genotype.  Figure 3C-D show data from n=3 mice per genotype.

      (4) The data on Fgf5 RNA expression presented in Figure 2E are not sufficiently convincing. The perimeter and cytoarchitecture of the cerebellum are difficult to see and the higher magnification shown in 2F should be indicated in 2E.

      The lack of foliation in Sufu-cKO cerebellum is clear particularly when visualizing the perimeter via DAPI labeling (Figure 2E). The expression area of FGF5 is also visibly larger, given that all images in Figure 2E are presented in the same scale (scale bars = 500 um). 

      (5) The data presented in Figure 3 are not sufficiently convincing. The number of cells double positive for pErk and KI67 (Figure 3B) are difficult to see and appear to be few, suggesting the quantification may be unreliable.

      We used KI67+ expression to provide a molecular marker of regions to be quantified in both WT and Sufu-cKO sections. Quantification of labeled cells were performed in images obtained by confocal microscopy, enabling imaging of 1-2 um optical slices since Ki67 or pERK expression might not localize within the same cellular compartments. We relied on continuous DAPI nuclear staining to distinguish individual cells in each optical slice and the colocalization of of Ki67 and pERK. All quantification were performed from 2-3 20 um cerebellar sections of 3-6 independent mice per genotype analyzed. Individual points in the bar graphs represent the average cell number (quantified from 2-3 sections) from each mice.

      (6) The data presented in Figure 4F-J would be more convincing with quantification. The Sufu;p53-dKO appears to have a thickened EGL across the entire vermis perimeter, and very little foliation, relative to control and single cKO cerebella. This is a more widespread effect than the more localized foliation disruption in the Sufu-cKO. 

      We agree with the reviewers that quantification of these phenotypes provide a solid measure of the defects. The phenotypes of Sufu:p53-dKO cerebellum are so profound requiring  in-depth characterization that will be the focus of future studies.

      (7) Figure 5 does not convincingly summarize the results. Blue and purple cells in sagittal cartoon are not defined. Which cells express Fgf5 (or other Fgfs) has not been determined. The yellow cells are not defined in relation to the initial cartoon on the left.

      The revised manuscript will address this confusion by clearly labeling the cells and their roles in the schematic diagram.

      Reviewer #2 (Public Review):

      Summary:

      Mutations in SUFU are implicated in SHH medulloblastoma (MB). SUFU modulates Shh signaling in a context-dependent manner, making its role in MB pathology complex and not fully understood. This study reports that elevated FGF5 levels are associated with a specific subtype of SHH MB, particularly in pediatric cases. The authors demonstrate that Sufu deletion in a mouse model leads to abnormal proliferation of granule cell precursors (GCPs) at the secondary fissure (region B), correlating with increased Fgf5 expression. Notably, pharmacological inhibition of FGFR restores normal cerebellar development in Sufu mutant mice.

      Strengths:

      The identification of increased FGF5 in subsets of MB is novel and a key strength of the paper.

      Weaknesses:

      The study appears incomplete despite the potential significance of these findings. The current paper does not fully establish the causal relationship between Fgf5 and abnormal cerebellar development, nor does it clarify its connection to SUFU-related MB. Some conclusions seem overstated, and the central question of whether FGFR inhibition can prevent tumor formation remains untested.

      Reviewer #3 (Public Review):

      Summary:

      The interaction between FGF signaling and SHH-mediated GNP expansion in MB, particularly in the context of Sufu LoF, has just begun to be understood. The manuscript by Yabut et al. establishes a connection between ectopic FGF5 expression and GNP over-expansion in a late-stage embryonic Sufu LoF model. The data provided links region-specific interaction between aberrant FGF5 signaling with the SHH subtype of medulloblastoma. New data from Yabut et al. suggest that ectopic FGF5 expression correlates with GNP expansion near the secondary fissure in Sufu LoF cerebella. Furthermore, pharmacological blockade of FGF signaling inhibits GNP proliferation. Interestingly, the data indicate that the timing of conditional Sufu deletion (E13.5 using the hGFAP-Cre line) results in different outcomes compared to later deletion (using Math1-cre line, Jiwani et al., 2020). This study provides significant insights into the molecular mechanisms driving GNP expansion in SHH subgroup MB, particularly in the context of Sufu LoF. It highlights the potential of targeting FGF5 signaling as a therapeutic strategy. Additionally, the research offers a model for better understanding MB subtypes and developing targeted treatments.

      Strengths:

      One notable strength of this study is the extraction and analysis of ectopic FGF5 expression from a subset of MB patient tumor samples. This translational aspect of the study enhances its relevance to human disease. By correlating findings from mouse models with patient data, the authors strengthen the validity of their conclusions and highlight the potential clinical implications of targeting FGF5 in MB therapy.

      The data convincingly show that FGFR signaling activation drives GNP proliferation in Sufu, conditional knockout models. This finding is supported by robust experimental evidence, including pharmacological blockade of FGF signaling, which effectively inhibits GNP proliferation. The clear demonstration of a functional link between FGFR signaling and GNP expansion underscores the potential of FGFR as a therapeutic target in SHH subgroup medulloblastoma.

      Previous studies have demonstrated the inhibitory effect of FGF2 on tumor cell proliferation in certain MB types, such as the ptc mutant (Fogarty et al., 2006)(Yaguchi et al., 2009). Findings in this manuscript provide additional support suggesting multiple roles for FGF signaling in cerebellar patterning and development.

      Weaknesses:

      In the GEO dataset analysis, where FGF5 expression is extracted, the reporting of the P-value lacks detail on the statistical methods used, such as whether an ANOVA or t-test was employed. Providing comprehensive statistical methodologies is crucial for assessing the rigor and reproducibility of the results. The absence of this information raises concerns about the robustness of the statistical analysis.

      The revised manuscript will include the following detailed explanation of the statistical analyses of the GEO dataset:

      For the analysis of expression values of FGF5 (ENSG00000138675), we obtained these values using Geo2R (Barrett et al., 2013), which directly analyze published human MB subtype expression arrays from accession no. GSE85217 (Cavalli et al., 2017). GEO2R is an interactive web tool that compares expression levels of genes of interest (GOI) between sample groups in the GEO series using original submitter-supplied processed data tables. We simply entered the GOI Ensembl ID and organized data sets according to age and MB subgroup or MBSHH subtype classifications. GEO2R results presented gene expression levels as a table ordered by FDR-adjusted (Benjamini & Hochberg) p-values, with significance level cut-off at 0.05, processed by GEO2R’s built-in limma statistical test. Resulting data were subsequently exported into Prism (GraphPad). We generated scatter plots presenting FGF5 expression levels across all MB subgroups (Figure 1A) and MBSHH subtypes (Figure 1D). We performed additional statistical analyses to compare FGF5 expression levels between MB subgroups and MBSHH subtypes and graphed these data as violin plots (Figure 1B, 1C, and 1E). For these analyses, we used one-way ANOVA with Holm-Sidak’s multiple comparisons test, single pooled variance. P value ≤0.05 was considered statistically significant. Graphs display the mean ± standard error of the mean (SEM). Sample sizes were:

      Author response table 1.

      Another concern is related to the controls used in the study. Cre recombinase induces double-strand DNA breaks within the loxP sites, and the control mice did not carry the Cre transgene (as stated in the Method section), while Sufu-cKO mice did. This discrepancy necessitates an additional control group to evaluate the effects of Cre-induced double-strand breaks on phosphorylated H2AX-DSB signaling. Including this control would strengthen the validity of the findings by ensuring that observed effects are not artifacts of Cre recombinase activity.

      The breeding scheme we used to generate homozygous SUFU conditional mutants will not generate pups carrying only hGFAP-Cre. Thus, we are unable to compare expression of gH2AX expression in littermates that do not carry loxP sites. The reviewer is correct in pointing out the possibility of Cre recombinase activity inducing double-strand breaks on its own. However, it is likely that any hGFAP-Cre induced double-strand breaks does not sufficiently cause the phenotypes we observed in homozygous mutants (Sufu-cKO) mice because the cerebellum of mice carry heterozygous SUFU mutations (hGFAP-Cre;Sufu-fl/+) do not display the profound cerebellar phenotypes observed in Sufu-cKO mice. We cannot rule out, however, any undetectable abnormalities that could be present which may require further analyses.

      Although the use of the hGFAP-Cre line allows genetic access to the late embryonic stage, this also targets multiple celltypes, including both GNPs and cerebellar glial cells. However, the authors focus primarily on GNPs without fully addressing the potential contributions of neuron-glial interaction. This oversight could limit the understanding of the broader cellular context in which FGF signaling influences tumor development. 

      The reviewer is correct in that hGFAP-Cre also targets other cell types, such as cerebellar glial cells, which are generated when Cre-expression has begun. It is possible that cerebellar glial cell development is also compromised in Sufu-cKO mice and may disrupt neuron-glial interaction, due to or independently of FGF signaling. In-depth studies are required to interrogate how loss of SUFU specifically affect development of cerebellar glial cells and influence their cellular interactions in the developing cerebellum.

    1. eLife assessment

      This manuscript presents a valuable new quantitative crosslinking mass spectrometry approach using novel isobaric crosslinkers. The data are solid and the method has potential for a broad application in structural biology if more isobaric crosslinking channels are available and the quantitative information of the approach is exploited in more depth.

    2. Reviewer #1 (Public review):

      Summary:

      Crosslinking mass spectrometry has become an important tool in structural biology, providing information about protein complex architecture, binding sites and interfaces, and conformational changes. One key challenge of this approach represents the quantitation of crosslinking data to interrogate differential binding states and distributions of conformational states.

      Here, Luo and Ranish present a novel class of isobaric crosslinkers ("Qlinkers"), conduct proof-of-concept benchmarking experiments on known protein complexes, and show example applications on selected target proteins. The data are solid and this could well be an exciting, convincing new approach in the field if the quantitation strategy is made more comprehensive and the quantitative power of isobaric labeling is fully leveraged as outlined below. It's a promising proof-of-concept, and potentially of broad interest for structural biologists.

      Strengths:

      The authors demonstrate the synthesis, application, and quantitation of their "Q2linkers", enabling relative quantitation of two conditions against each other. In benchmarking experiments, the Q2linkers provide accurate quantitation in mixing experiments. Then the authors show applications of Q2linkers on MBP, Calmodulin, selected transcription factors, and polymerase II, investigating protein binding, complex assembly, and conformational dynamics of the respective target proteins. For known interactions, their findings are in line with previous studies, and they show some interesting data for TFIIA/TBP/TFIIB complex formation and conformational changes in pol II upon Rbp4/7 binding.

      Weaknesses:

      This is an elegant approach but the power of isobaric mass tags is not fully leveraged in the current manuscript.

      First, "only" Q2linkers are used. This means only two conditions can be compared. Theoretically, higher-plexed Qlinkers should be accessible and would also be needed to make this a competitive method against other crosslinking quantitation strategies. As it is, two conditions can still be compared relatively easily using LFQ - or stable-isotope-labeling based approaches. A "Q5linker" would be a really useful crosslinker, which would open up comprehensive quantitative XLMS studies.

      Second, the true power of isobaric labeling, accurate quantitation across multiple samples in a single run, is not fully exploited here. The authors only show differential trends for their interaction partners or different conformational states and do not make full quantitative use of their data or conduct statistical analyses. This should be investigated in more detail, e.g. examine Qlinker quantitation of MBP incubated with different concentrations of maltose or Calmodulin incubated with different concentrations of CBPs. Does Qlinker quantitation match ratios predicted using known binding constants or conformational state populations? Is it possible to extract ratios of protein populations in different conformations, assembly, or ligand-bound states?

      With these two points addressed this approach could be an important and convincing tool for structural biologists.

    3. Reviewer #2 (Public review):

      The regulation of protein function heavily relies on the dynamic changes in the shape and structure of proteins and their complexes. These changes are widespread and crucial. However, examining such alterations presents significant challenges, particularly when dealing with large protein complexes in conditions that mimic the natural cellular environment. Therefore, much emphasis has been put on developing novel methods to study protein structure, interactions, and dynamics. Crosslinking mass spectrometry (CSMS) has established itself as such a prominent tool in recent years. However, doing this in a quantitative manner to compare structural changes between conditions has proven to be challenging due to several technical difficulties during sample preparation. Luo and Ranish introduce a novel set of isobaric labeling reagents, called Qlinkers, to allow for a more straightforward and reliable way to detect structural changes between conditions by quantitative CSMS (qCSMS).

      The authors do an excellent job describing the design choices of the isobaric crosslinkers and how they have been optimized to allow for efficient intra- and inter-protein crosslinking to provide relevant structural information. Next, they do a series of experiments to provide compelling evidence that the Qlinker strategy is well suited to detect structural changes between conditions by qCSMS. First, they confirm the quantitative power of the novel-developed isobaric crosslinkers by a controlled mixing experiment. Then they show that they can indeed recover known structural changes in a set of purified proteins (complexes) - starting with single subunit proteins up to a very large 0.5 MDa multi-subunit protein complex - the polII complex.

      The authors give a very measured and fair assessment of this novel isobaric crosslinker and its potential power to contribute to the study of protein structure changes. They show that indeed their novel strategy picks up expected structural changes, changes in surface exposure of certain protein domains, changes within a single protein subunit but also changes in protein-protein interactions. However, they also point out that not all expected dynamic changes are captured and that there is still considerable room for improvement (many not limited to this crosslinker specifically but many crosslinkers used for CSMS).

      Taken together the study presents a novel set of isobaric crosslinkers that indeed open up the opportunity to provide better qCSMS data, which will enable researchers to study dynamic changes in the shape and structure of proteins and their complexes. However, in its current form, the study some aspects of the study should be expanded upon in order for the research community to assess the true power of these isobaric crosslinkers. Specifically:

      Although the authors do mention some of the current weaknesses of their isobaric crosslinkers and qCSMS in general, more detail would be extremely helpful. Throughout the article a few key numbers (or even discussions) that would allow one to better evaluate the sensitivity (and the applicability) of the method are missing. This includes:

      (1) Throughout all the performed experiments it would be helpful to provide information on how many peptides are identified per experiment and how many have actually a crosslinker attached to it.

      (2) Of all the potential lysines that can be modified - how many are actually modified? Do the authors have an estimate for that? It would be interesting to evaluate in a denatured sample the modification efficiency of the isobaric crosslinker (as an upper limit as here all lysines should be accessible) and then also in a native sample. For example, in the MBP experiment, the authors report the change of one mono-linked peptide in samples containing maltose relative to the one not containing maltose. The authors then give a great description of why this fits to known structural changes. What is missing here is a bit of what changes were expected overall and which ones the authors would have expected to pick up with their method and why have they not been picked up. For example, were they picked up as modified by the crosslinker but not differential? I think this is important to discuss appropriately throughout the manuscript to help the reader evaluate/estimate the potential sensitivity of the method. There are passages where the authors do an excellent job doing that - for example when they mention the missed site that they expected to see in the initial the polII experiments (lines 191 to 207). This kind of "power analysis" should be heavily discussed throughout the manuscript so that the reader is better informed of what sensitivity can be expected from applying this method.

      (3) It would be very helpful to provide information on how much better (or not) the Qlinker approach works relative to label-free qCLMS. One is missing the reference to a potential qCLMS gold standard (data set) or if such a dataset is not readily available, maybe one of the experiments could be performed by label-free qCLMS. For example, one of the differential biosensor experiments would have been well suited.

    1. eLife assessment

      The authors present a valuable study exploring the interaction between JNK signaling and high sucrose feeding. The strength of evidence supporting these observations is solid, including multi-tissue transcriptomic and metabolic analyses, followed by network modeling approaches to define the organs and pathways involved. Reviewers provided several suggestions to improve the manuscript including clarifications of model and analyses, as well as explanations for within-group variations and confirming RNA-seq results at the level of metabolite processes highlighted.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, authors have investigated the effects of JNK inhibition on sucrose-induced metabolic dysfunction in rats. They used multi-tissue network analysis to study the effects of the JNK inhibitor JNK-IN-5A on metabolic dysfunction associated with excessive sucrose consumption. Their results show that JNK inhibition reduces triglyceride accumulation and inflammation in the liver and adipose tissues while promoting metabolic adaptations in skeletal muscle. The study provides new insights into how JNK inhibition can potentially treat metabolic dysfunction-associated fatty liver disease (MAFLD) by modulating inter-tissue communication and metabolic processes.

      Strengths:

      The study has several notable strengths:

      Comprehensive Multi-Tissue Analysis: The research provides a thorough multi-tissue evaluation, examining the effects of JNK inhibition across key metabolically active tissues, including the liver, visceral white adipose tissue, skeletal muscle, and brain. This comprehensive approach offers valuable insights into the systemic effects of JNK inhibition and its potential in treating MAFLD.

      Robust Use of Systems Biology: The study employs advanced systems biology techniques, including transcriptomic analysis and genome-scale metabolic modeling, to uncover the molecular mechanisms underlying JNK inhibition. This integrative approach strengthens the evidence supporting the role of JNK inhibitors in modulating metabolic pathways linked to MAFLD.

      Potential Therapeutic Insights: By demonstrating the effects of JNK inhibition on both hepatic and extrahepatic tissues, the study offers promising therapeutic insights into how JNK inhibitors could be used to mitigate metabolic dysfunction associated with excessive sucrose consumption, a key contributor to MAFLD.

      Behavioral and Metabolic Correlation: The inclusion of behavioral tests alongside metabolic assessments provides a more holistic view of the treatment's effects, allowing for a better understanding of the broader physiological implications of JNK inhibition.

      Weaknesses:

      While the study provides a comprehensive evaluation of JNK inhibitors in mitigating MAFLD conditions, addressing the following points will enhance the manuscript's quality:

      The authors should explicitly mention and provide a detailed list of metabolites affected by sucrose and JNK inhibition treatment that have been previously associated with MAFLD conditions. This will better contextualize the findings within the broader field of metabolic disease research.

      The limitations of the study should be clearly stated, particularly the lack of evidence on the effects of chronic JNK inhibitor treatment and potential off-target effects. Addressing these concerns will offer a more balanced perspective on the therapeutic potential of JNK inhibition.

      The potential risks of using JNK inhibitors in non-MAFLD conditions should be highlighted, with a clear distinction made between the preventive and curative effects of these therapies in mitigating MAFLD conditions. This will ensure the therapeutic implications are properly framed.

      The statistical analysis section could be strengthened by providing a justification for the chosen statistical tests and discussing the study's power. Additionally, a more detailed breakdown of the behavioral test results and their implications would be beneficial for the overall conclusions of the study.

    3. Reviewer #2 (Public review):

      Summary:

      Excessive sucrose is a possible initial factor for the development of metabolic dysfunction-associated fatty liver disease (MAFLD). To investigate the possibility that intervention with JNK inhibitor could lead to the treatment of metabolic dysfunction caused by excessive sucrose intake, the authors performed multi-organ transcriptomics analysis (liver, visceral fat (vWAT), skeletal muscle, and brain) in a rat model of MAFLD induced by sucrose overtake (+ a selective JNK2 and JNK3 inhibitor (JNK-IN-5A) treatment). Their data suggested that changes in gene expression in the vWAT as well as in the liver contribute to the pathogenesis of their MAFLD model and revealed that the JNK inhibitor has a cross-organ therapeutic effect on it.

      Strengths:

      (1) It has been previously reported that inhibition of JNK signalling can contribute to the prevention of hepatic steatosis (HS) and related metabolic syndrome in other models, but the role of JNK signalling in the metabolic disruption caused by excessive intake of sucrose, a possible initial factor for the development of MAFLD, has not been well understood, and the authors have addressed this point.

      (2) This study is also important because pharmacological therapy for MAFLD has not yet been established.

      (3) By obtaining transcriptomic data in multiple organs and comprehensively analyzing the data using gene co-expression network (GCN) analysis and genome-scale metabolic models (GEM), the authors showed the multi-organ interaction in not only in the pathology of MAFLD caused by excessive sucrose intake but also in the treatment effects by JNK-IN-5A.

      (4) Since JNK signalling has diverse physiological functions in many organs, the authors effectively assessed possible side effects with a view to the clinical application of JNK-IN-5A.

      Weaknesses:

      (1) The metabolic process activities were evaluated using RNA-seq results in Figure 7, but direct data such as metabolite measurements are lacking.

      (2) There is a lack of consistency in the data between JNK-IN-5A_D1 and _D2, and there is no sufficient data-based explanation for why the effects observed in D1 were inconsistent in the D2 samples.

      (3) Although it is valuable that the authors were able to suggest the possibility of JNK inhibitor as a therapeutic strategy for MAFLD, the evaluation of the therapeutic effect was limited to the evaluation of plasma TG, LDH, and gene expression changes. As there was no evaluation of liver tissue images, it is unclear what changes were brought about in the liver by the excessive sucrose intake and the treatment with JNK-IN-5A.

    1. eLife assessment

      Recent studies have demonstrated that depletion of nuclear TDP-43 leads to loss of its nuclear function resulting in changes in gene expression and splicing of target mRNAs. This study developed a sensitive and robust sensor for TDP-43 activity that should impact the field's ability to monitor whether TDP-43 is functional or not. Though limited to cell culture, the evidence presented is convincing and is the first demonstration that a GFP on/off system can be used to assess TDP-43 mutants as well as loss of soluble TDP-43. The findings are valuable and may represent a novel tool to investigate TDP-43-associated disease mechanisms.

    2. Reviewer #1 (Public review):

      Summary:<br /> The authors create an elegant sensor for TDP -43 loss of function based on cryptic splicing of CFTR and UNC13A. The usefulness of this sensor primarily lies in its use in eventual high throughput screening and eventual in vivo models. The TDP-43 loss of function sensor was also used to express TDP-43 upon reduction of its levels.

      Strengths:<br /> The validation is convincing, the sensor was tested in models of TDP-43 loss of function, knockdown and models of TDP-43 mislocalization and aggregation. The sensor is susceptible to a minimal decrease of TDP-43 and can be used at the protein level unlike most of the tests currently employed,

      Weaknesses:<br /> Although the LOF sensor described in this study may be a primary readout for high-throughput screens, ALS/TDP-43 models typically employ primary readouts such as protein aggregation or mislocalization. The information in the two following points would assist users in making informed choices. 1. Testing the sensor in other cell lines 2. Establishing a correlation between the sensor's readout and the loss of function (LOF) in the physiological genes would be useful given that the LOF sensor is a hybrid structure and doesn't represent any physiological gene. It would be beneficial to determine if a minor decrease (e.g., 2%) in TDP-43 levels is physiologically significant for a subset of exons whose splicing is controlled by TDP-43.

      Considering that most TDP-LOF pathologically occurs due to aggregation and or mislocalization, and in most cases the endogenous TDP-43 gene is functional but the protein becomes non-functional, the use of the loss of function sensor as a switch to produce TDP-43 and its eventual use as gene therapy would have to contend with the fact that the protein produced may also become nonfunctional. This would eventually be easy to test in one of the aggregation modes that were used to test the sensor.. However, as the authors suggest, this is a very interesting system to deliver other genetic modifiers of TDP-43 proteinopathy in a regulated fashion and timely fashion.

    3. Reviewer #2 (Public review):

      Summary:<br /> The authors goal is to develop a more accurate system that reports TDP-43 activity as a splicing regulator. Prior to this, most methods employed western blotting or QPCR-based assays to determine whether targets of TDP-43 were up or down-regulated. The problem with that is the sensitivity. This approach uses an ectopic delivered construct containing splicing elements from CFTR and UNC13A (two known splicing targets) fused to a GFP reporter. Not only does it report TDP-43 function well, but it operates at extremely sensitive TDP-43 levels, requiring only picomolar TDP-43 knockdown for detection. This reporter should supersede the use of current TDP-43 activity assays, it's cost-effective, rapid and reliable.

      Strengths:<br /> In general, the experiments are convincing and well designed. The rigor, number of samples and statistics, and gradient of TDP-43 knockdown were all viewed as strengths. In addition, the use of multiple assays to confirm the splicing changes were viewed as complimentary (ie PCR and GFP-fluorescence) adding additional rigor. The final major strength I'll add is the very clever approach to tether TDP-43 to the loss of function cassette such that when TDP-43 is inactive it would autoregulate and induce wild-type TDP-43. This has many implications for the use of other genes, not just TDP-43, but also other protective factors that may need to be re-established upon TDP-43 loss of function.

      Weaknesses:<br /> Admittedly, one needs to initially characterize the sensor and the use of cell lines is an obvious advantage, but it begs the question of whether this will work in neurons. Additional future experiments in primary neurons will be needed. The bulk analysis of GFP-positive cells is a bit crude. As mentioned in the manuscript, flow sorting would be an easy and obvious approach to get more accurate homogenous data. This is especially relevant since the GFP signal is quite heterogeneous in the image panels, for example, Figure 1C, meaning the siRNA is not fully penetrant. Therefore, stating that 1% TDP-43 knockdown achieves the desired sensor regulation might be misleading. Flow sorting would provide a much more accurate quantification of how subtle changes in TDP-43 protein levels track with GFP fluorescence.

      Some panels in the manuscript would benefit from additional clarity to make the data easier to visualize. For example, Figure 2D and 2G could be presented in a more clear manner, possibly split into additional graphs since there are too many outputs. Sup Figure 2A image panels would benefit from being labeled, its difficult to tell what antibodies or fluorophores were used. Same with Figure 4B.

      Figure 3 is an important addition to this manuscript and in general is convincing showing that TDP-43 loss of function mutants can alter the sensor. However, there is still wild-type endogenous TDP-43 in these cells, and it's unclear whether the 5FL mutant is acting as a dominant negative to deplete the total TDP-43 pool, which is what the data would suggest. This could have been clarified. Additional treatment with stressors that inactivate TDP-43 could be tested in future studies.

      Overall, the authors definitely achieved their goals by developing a very sensitive readout for TDP-43 function. The results are convincing, rigorous, and support their main conclusions. There are some minor weaknesses listed above, chief of which is the use of flow sorting to improve the data analysis. But regardless, this study will have an immediate impact for those who need a rapid, reliable, and sensitive assessment of TDP-43 activity, and it will be particularly impactful once this reporter can be used in isolated primary cells (ie neurons) and in vivo in animal models. Since TDP-43 loss of function is thought to be a dominant pathological mechanism in ALS/FTD and likely many other disorders, having these types of sensors is a major boost to the field and will change our ability to see sub-threshold changes in TDP-43 function that might otherwise not be possible with current approaches.

    4. Reviewer #3 (Public review):

      The DNA and RNA binding protein TDP-43 has been pathologically implicated in a number of neurodegenerative diseases including ALS, FTD, and AD. Normally residing in the nucleus, in TDP-43 proteinopathies, TDP-43 mislocalizes to the cytoplasm where it is found in cytoplasmic aggregates. It is thought that both loss of nuclear function and cytoplasmic gain of toxic function are contributors to disease pathogenesis in TDP-43 proteinopathies. Recent studies have demonstrated that depletion of nuclear TDP-43 leads to loss of its nuclear function characterized by changes in gene expression and splicing of target mRNAs. However, to date, most readouts of TDP-43 loss of function events are dependent upon PCR-based assays for single mRNA targets. Thus, reliable and robust assays for detection of global changes in TDP-43 splicing events are lacking. In this manuscript, Xie, Merjane, Bergmann and colleagues describe a biosensor that reports on TDP-43 splicing function in real time. Overall, this is a well described unique resource that would be of high interest and utility to a number of researchers. Nonetheless, a couple of points should be addressed by the authors to enhance the overall utility and applicability of this biosensor.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors create an elegant sensor for TDP -43 loss of function based on cryptic splicing of CFTR and UNC13A. The usefulness of this sensor primarily lies in its use in eventual high throughput screening and eventual in vivo models. The TDP-43 loss of function sensor was also used to express TDP-43 upon reduction of its levels.

      Strengths:

      The validation is convincing, the sensor was tested in models of TDP-43 loss of function, knockdown and models of TDP-43 mislocalization and aggregation. The sensor is susceptible to a minimal decrease of TDP-43 and can be used at the protein level unlike most of the tests currently employed.

      Weaknesses:

      Although the LOF sensor described in this study may be a primary readout for high-throughput screens, ALS/TDP-43 models typically employ primary readouts such as protein aggregation or mislocalization. The information in the two following points would assist users in making informed choices. 1. Testing the sensor in other cell lines 2. Establishing a correlation between the sensor's readout and the loss of function (LOF) in the physiological genes would be useful given that the LOF sensor is a hybrid structure and doesn't represent any physiological gene. It would be beneficial to determine if a minor decrease (e.g., 2%) in TDP-43 levels is physiologically significant for a subset of exons whose splicing is controlled by TDP-43.

      Considering that most TDP-LOF pathologically occurs due to aggregation and or mislocalization, and in most cases the endogenous TDP-43 gene is functional but the protein becomes non-functional, the use of the loss of function sensor as a switch to produce TDP-43 and its eventual use as gene therapy would have to contend with the fact that the protein produced may also become nonfunctional. This would eventually be easy to test in one of the aggregation modes that were used to test the sensor.. However, as the authors suggest, this is a very interesting system to deliver other genetic modifiers of TDP-43 proteinopathy in a regulated fashion and timely fashion.

      We thank Reviewer #1 for their detailed feedback. In response, we will investigate the function of CUTS in neuronal cells and evaluate how a modest reduction in TDP-43 levels affects the splicing of physiologically relevant TDP-43-regulated cryptic exons within these cells (eg. STMN2, UNC13A, etc…).

      Reviewer #2 (Public review):

      Summary:

      The authors goal is to develop a more accurate system that reports TDP-43 activity as a splicing regulator. Prior to this, most methods employed western blotting or QPCR-based assays to determine whether targets of TDP-43 were up or down-regulated. The problem with that is the sensitivity. This approach uses an ectopic delivered construct containing splicing elements from CFTR and UNC13A (two known splicing targets) fused to a GFP reporter. Not only does it report TDP-43 function well, but it operates at extremely sensitive TDP-43 levels, requiring only picomolar TDP-43 knockdown for detection. This reporter should supersede the use of current TDP-43 activity assays, it's cost-effective, rapid and reliable.

      Strengths:

      In general, the experiments are convincing and well designed. The rigor, number of samples and statistics, and gradient of TDP-43 knockdown were all viewed as strengths. In addition, the use of multiple assays to confirm the splicing changes were viewed as complimentary (ie PCR and GFP-fluorescence) adding additional rigor. The final major strength I'll add is the very clever approach to tether TDP-43 to the loss of function cassette such that when TDP-43 is inactive it would autoregulate and induce wild-type TDP-43. This has many implications for the use of other genes, not just TDP-43, but also other protective factors that may need to be re-established upon TDP-43 loss of function.

      Weaknesses:

      Admittedly, one needs to initially characterize the sensor and the use of cell lines is an obvious advantage, but it begs the question of whether this will work in neurons. Additional future experiments in primary neurons will be needed. The bulk analysis of GFP-positive cells is a bit crude. As mentioned in the manuscript, flow sorting would be an easy and obvious approach to get more accurate homogenous data. This is especially relevant since the GFP signal is quite heterogeneous in the image panels, for example, Figure 1C, meaning the siRNA is not fully penetrant. Therefore, stating that 1% TDP-43 knockdown achieves the desired sensor regulation might be misleading. Flow sorting would provide a much more accurate quantification of how subtle changes in TDP-43 protein levels track with GFP fluorescence.

      Some panels in the manuscript would benefit from additional clarity to make the data easier to visualize. For example, Figure 2D and 2G could be presented in a more clear manner, possibly split into additional graphs since there are too many outputs. Sup Figure 2A image panels would benefit from being labeled, its difficult to tell what antibodies or fluorophores were used. Same with Figure 4B.

      Figure 3 is an important addition to this manuscript and in general is convincing showing that TDP-43 loss of function mutants can alter the sensor. However, there is still wild-type endogenous TDP-43 in these cells, and it's unclear whether the 5FL mutant is acting as a dominant negative to deplete the total TDP-43 pool, which is what the data would suggest. This could have been clarified. Additional treatment with stressors that inactivate TDP-43 could be tested in future studies.

      Overall, the authors definitely achieved their goals by developing a very sensitive readout for TDP-43 function. The results are convincing, rigorous, and support their main conclusions. There are some minor weaknesses listed above, chief of which is the use of flow sorting to improve the data analysis. But regardless, this study will have an immediate impact for those who need a rapid, reliable, and sensitive assessment of TDP-43 activity, and it will be particularly impactful once this reporter can be used in isolated primary cells (ie neurons) and in vivo in animal models. Since TDP-43 loss of function is thought to be a dominant pathological mechanism in ALS/FTD and likely many other disorders, having these types of sensors is a major boost to the field and will change our ability to see sub-threshold changes in TDP-43 function that might otherwise not be possible with current approaches.

      We thank Reviewer #2 for their constructive evaluation of our study. In response, we will assess CUTS in human neuronal cells, as also recommended by Reviewer #1. Additionally, we will incorporate an analysis of CUTS using flow cytometry to provide quantitative measurements of GFP signal. We agree that investigating how CUTS responds to stressors affecting TDP-43 function would be a valuable addition (eg. MG132), and we will include this data in the revisions to the study.

      We also appreciate the feedback on our figures and will work to enhance their clarity, incorporating the Reviewer’s suggestions. Specifically, we will split Figure 2D and 2G into multiple plots and ensure clearer labeling of the image panels in Figures 2A and 4B.

      Regarding the comment on the 5FL data, we believe this occurrence can be explained by existing literature, and we will address this directly in the discussion section of the manuscript.

      Reviewer #3 (Public review):

      The DNA and RNA binding protein TDP-43 has been pathologically implicated in a number of neurodegenerative diseases including ALS, FTD, and AD. Normally residing in the nucleus, in TDP-43 proteinopathies, TDP-43 mislocalizes to the cytoplasm where it is found in cytoplasmic aggregates. It is thought that both loss of nuclear function and cytoplasmic gain of toxic function are contributors to disease pathogenesis in TDP-43 proteinopathies. Recent studies have demonstrated that depletion of nuclear TDP-43 leads to loss of its nuclear function characterized by changes in gene expression and splicing of target mRNAs. However, to date, most readouts of TDP-43 loss of function events are dependent upon PCR-based assays for single mRNA targets. Thus, reliable and robust assays for detection of global changes in TDP-43 splicing events are lacking. In this manuscript, Xie, Merjane, Bergmann and colleagues describe a biosensor that reports on TDP-43 splicing function in real time. Overall, this is a well described unique resource that would be of high interest and utility to a number of researchers. Nonetheless, a couple of points should be addressed by the authors to enhance the overall utility and applicability of this biosensor.

      We thank Reviewer #3 for their time and thoughtful assessment of our manuscript. We will address all their recommendations, including expanding the discussion on the CE sequences utilized in the CUTS sensor and exploring the potential utility of the CUTS sensor in alternative disease-relevant systems.

    1. eLife assessment

      This work describes how the toxin-antitoxin (TA) system, which uses the cyclic di-GMP as an antitoxin, controls both the persistence of antibiotics linked to biofilms and the integrity of the bacterial genome. The authors present solid evidence linking cyclic di-GMP and the toxin HipH. The work is valuable because it establishes the relationship between bacterial persistence and biofilm resilience, which lays a strong basis for future research on the formation of bacterial biofilms and antibiotic resistance.

    2. Reviewer #2 (Public Review):

      Summary:

      Hebin et al reported a fascinating story about antibiotic persistence in the biofilms. First, they set up a model to identify the increased persisters in the biofilm status. They found that the adhesion of bacteria to the surface leads to increased c-di-GMP levels, which might lead to the formation of persisters. To figure out the molecular mechanism, they screened the E.coli Keio Knockout Collection and identified the HipH. Finally, the authors used a lot of data to prove that c-di-GMP not only controls HipH over-expression but also inhibits HipH activity, though the inhibition might be weak.

      Strengths:

      They used a lot of state-of-the-art technologies, such as single-cell technologies as well as classical genetic and biochemistry approaches to prove the concept, which makes the conclusions very solid. Overall, it is a very interesting and solid story that might attract diverse readers working with c-di-GMP, persisters, and biofilm.

      Comments on the revised version:

      All my concerns have been addressed.

    3. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This preprint explores the involvement of cyclic di-GMP in genome stability and antibiotic persistence regulation in bacterial biofilms. The authors proposed a novel mechanism that, due to bacterial adhesion, increases c-di-GMP levels and influences persister formation through interaction with HipH. While the work may provide useful insights that could attract researchers in biofilm studies and persistence mechanisms, the main findings are inadequately supported and require further validation and refinement in experimental design.

      We sincerely thank eLife for the through assessment of our manuscript. We appreciate the constructive criticism and see it as an opportunity to strengthen our research. In response to the reviewers' comments and suggestions, we have made significant improvements to our study. We have refined our experimental design and conducted additional experiments to provide more robust evidence supporting our findings. We believe that with these additional experiments and refinements, our study provides robust evidence for this novel mechanism, contributing significantly to the fields of biofilm research and bacterial persistence.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors propose a UPEC TA system in which a metabolite, c-di-GMP, acts as the AT with the toxin HipH. The idea is novel, but several key ideas are missing in regard to the relevant literature, and the experimental design is flawed. Moreover, they are absolutely not studying persister cells as Figure 1b clearly shows they are merely studying dying cells since no plateau in killing (or anything close to a plateau) was reached. So in no way has persistence been linked to c-di-GMP. Moreover, I do not think the authors have shown how the c-di-GMP sensor works. Also, there is no evidence that c-di-GMP is an antitoxin as no binding to HipH has been shown. So at best, this is an indirect effect, not a new toxin/antitoxin system as for all 7 TAs, a direct link to the toxin has been demonstrated for antitoxins.

      Thank you for your constructive comments on our manuscript. Your insights have prompted us to revisit our data and experimental design, leading to significant improvements in our study.

      (1) Clarification on Persister Cell Detection: We sincerely appreciate your astute observation regarding the interpretation of our killing curve in Figure 1B. Upon careful re-examination, we concur that our initial methodology had limitations in revealing the characteristic biphasic pattern associated with persister cells. To address these limitations, we have implemented two key modifications: shortening the sampling interval and extending the antibiotic treatment duration. ​These adjustments have resulted in an updated killing curve that now exhibits a more pronounced biphasic pattern and a prominent plateau in the late stage of killing, as illustrated in Response Figure 1.​ This refined pattern aligns with established characteristics of persister cell behavior in antibiotic tolerance studies, providing a more accurate representation of the persister population dynamics in our experimental system. We believe these methodological enhancements significantly improve the reliability and interpretability of our results, offering a clearer insight into the persister cell phenomenon under investigation.

      (2) Validation of c-di-GMP Sensor: We appreciate your point about the c-di-GMP sensor. The c-di-GMP sensor, developed by Howard C. Berg's team, is specifically designed to detect relative intracellular concentrations of c-di-GMP in Escherichia coli cells. This capability is crucial for understanding the dynamic regulation of c-di-GMP during bacterial responses to environmental stimuli. We have expanded our explanation of the sensor's detection mechanism in lines 138-146 of the manuscript, detailing how it functions to reflect changes in c-di-GMP levels within the cells accurately. The mechanism operates through a series of signaling events that are initiated when c-di-GMP binds to the sensor, leading to measurable outputs that correlate with intracellular concentrations. Additionally, we have provided a schematic chart in Figure S1B to visually support our description regarding the sensor. This figure demonstrates the sensor's responsiveness and specificity in detecting fluctuations in c-di-GMP levels, effectively linking the signaling molecule to cellular behavior. We hope these additions clarify the role of the c-di-GMP sensor in our research and address your concerns regarding its functionality.​

      (3) HipH and c-di-GMP Interaction: Our pull-down experiments presented in Figure 5A-E provide robust and compelling evidence for a direct physical interaction between HipH and c-di-GMP, and the effects of their interaction reminiscent of toxin-antitoxin systems. Yet we acknowledge c-di-GMP is not a traditional antitoxin since it is not genetically linked to HipH. We have revised our terminology to "TA-like system" to reflect this difference more accurately.

      Weaknesses:

      (1) L 53: biofilm persisters are no different than any other persisters (there is no credible evidence of any different persister cells) so this reviewer suggests changing 'biofilm persisters' to 'persisters' throughout the text.

      Thank you for your thoughtful consideration. Upon careful consideration of the current scientific literature, we agree that there is no substantial evidence supporting a distinct category of persister cells specific to biofilms. We have systematically replaced 'biofilm persisters' with 'persisters' throughout the manuscript​.

      (2) L 51: persister cells do not mutate and, once resuscitated, mutate like any other growing cell so this sentence should be deleted as it promotes an unnecessary myth about persistence.

      We sincerely appreciate your astute observation regarding the inaccuracy in line 51. We have removed the sentence in question from line 51​. And we also have thoroughly reviewed the entire manuscript to ensure no similar misconceptions are present elsewhere in the text.

      (3) L 69: please include the only metabolic model for persister cell formation and resuscitation here based on single cells (e.g., doi.org/10.1016/j.bbrc.2020.01.102 , https://doi.org/10.1016/j.isci.2019.100792 ); otherwise, you write as if there are no molecular mechanisms for persistence/resuscitation.

      Thank you for your valuable suggestion. We appreciate the opportunity to enhance the scientific context of our manuscript. We have added a brief explanation of how ppGpp mediates ribosome dimerization, leading to persistence, and how its degradation triggers resuscitation [1-3] (lines 68-74). We have described the role of cAMP-CRP in regulating persistence through its effects on metabolism and stress responses [4, 5] (lines 74-78). We also explore potential interactions or synergies between our proposed mechanisms and these established metabolic models [6] (lines 383-409). We believe this revision significantly enhances our manuscript by providing a more accurate representation of the current state of knowledge in the field and demonstrating how our work builds upon and contributes to existing models of bacterial persistence.

      (4) The authors should cite in the Intro or Discussion that others have proposed similar novel TAs including a ppGpp metabolic toxin paired with an enzymatic antitoxin SpoT that hydrolyzes the toxin (http://dx.doi.org/10.1016/j.molcel.2013.04.002).

      We are grateful for your expertise in pointing out this crucial reference. We sincerely appreciate your suggestion to include the reference to previously proposed novel toxin-antitoxin (TA) systems, particularly the ppGpp-SpoT system [6]. In light of this reference, we have expanded our discussion to include: 1) A brief overview of the ppGpp-SpoT system as a novel TA-like mechanism. 2) Comparisons between the ppGpp-SpoT system and our findings on the HipH-c-di-GMP interaction. 3) Reflections on how these systems challenge and expand traditional definitions of TA systems (lines 383-409). We believe this addition significantly enhances the context and strengthens the rationale for considering the HipH-c-di-GMP interaction as a TA-like system. Thank you for your valuable input in helping us situate our research within the broader landscape of TA system biology.

      (5) Figure 1b: there are no results in this paper related to persister cells. Figure 1b simply shows dying cells were enumerated. Hence, the population of stressed cells increased, not 'persister cells' (Figure 1f), in the course of these experiments.

      We sincerely appreciate your astute observation regarding the interpretation of our killing curve in Figure 1B. Upon careful re-examination, we concur that our initial methodology had limitations in revealing the characteristic biphasic pattern associated with persister cells. To address these limitations, we have implemented 1) Shortened sampling interval: We have reduced the interval between measurements to one hour. 2) Extended sampling duration: The total duration of sampling has been increased to 6 hours (Response Figure 1). The updated killing curve now exhibits a more pronounced biphasic pattern and a prominent plateau in the late stage of killing: 1) Initial rapid decline: From 0-1hours, we observe a steep decrease in bacterial survival (slope ≈ -3~-1.8); 2) Slower decline phase: From 4.5-6 hours, the rate of decline is markedly reduced (slope ≈ -0.17~-0.06). This pattern aligns more closely with established characteristics of persister cell behavior in antibiotic tolerance studies.

      (6) Figure S1: I see no evidence that the authors have shown this c-di-GMP detects different c-di-GMP levels since there appears to be no data related to varying c-di-GMP concentrations with a consistent decrease. Instead, there is a maximum. What are the concentration of c-di-GMP on the X-axis for panels C, D, and E? How were c-di-GMP levels varied such that you know the c-di-GMP concentration?

      We appreciate your point about the c-di-GMP sensor. To address this, we have included additional data on the sensor's mechanism and validation. The sensor, developed by Howard C. Berg's team, is designed for detecting intracellular c-di-GMP concentrations in E. coli [7].

      Sensor Design and Mechanism:The sensor developed for detecting c-di-GMP levels in Escherichia coli cells is based on a single fluorescent protein biosensor. The protein includes a Fluorescent Protein Base and a c-di-GMP Binding Domain. The fluorescent protein base is mVenusNB, which is the fastest-folding yellow fluorescent protein (YFP). The c-di-GMP binding domain is the MrkH protein is inserted between Y145 and N146 of mVenusNB. MrkH is a transcription factor with a high affinity for c-di-GMP. When MrkH binds to c-di-GMP, it undergoes a significant conformational change. The amino-terminal domain of MrkH rotates 138° relative to its carboxyl-terminal domain upon c-di-GMP binding.This rotation disrupts the mVenusNB chromophore environment, resulting in reduced fluorescence. The sensor system co-expresses mScarletI, a bright, rapidly folding red fluorescent protein. mScarletI serves as a reference for ratiometric measurements. Such design allows for ratiometric measurement of real-time monitoring of c-di-GMP levels in individual cells and control of variations in protein expression levels between cells. This enables the observation of dynamic changes in c-di-GMP concentration, such as the increase seen after E. coli surface attachment.

      Functioning and Accuracy: The sensor is designed to detect c-di-GMP in the 100 to 700 nM range, which is the physiological range in E. coli. The use of a low copy plasmid for expression ensures detection at low concentrations. The ratio (R) of mVenusNB to mScarletI fluorescence emission is measured for individual cells. The sensor shows at least a twofold dynamic range between low and high c-di-GMP conditions. Cells with low c-di-GMP (expressing phosphodiesterase PdeH) show higher R values compared to cells with high c-di-GMP (expressing constitutively active diguanylate cyclase WspR:D70E). A mutant biosensor (Sensor*) with the R113A mutation in MrkH is used as a control. This mutation eliminates c-di-GMP binding ability, allowing differentiation between specific c-di-GMP effects and other cellular changes.

      This biosensor system provides a sophisticated tool for visualizing and quantifying c-di-GMP levels in individual bacterial cells with high sensitivity and temporal resolution.​ By combining a c-di-GMP-sensitive fluorescent protein with a reference fluorescent protein and utilizing ratiometric analysis, the system can accurately reflect changes in intracellular c-di-GMP levels while controlling for other cellular variables.

      We have expanded our explanation of its detection mechanism in lines 138-146 and Figure S1B.

      (7) The viable portion of the VBNC population are persister cells so there is no reason to use VBNC as a separate term. Please see the reported errors often made with nucleic acid staining dyes in regard to VBNCs.

      We appreciate the opportunity to clarify the distinction between VBNC cells and persister cells in our manuscript. It is essential to recognize that VBNC cells and persister cells represent two fundamentally different states of bacterial dormancy. While both may exhibit viability under certain conditions, persister cells are characterized by their ability to resuscitate and grow when environmental conditions become favorable [8]. In contrast, VBNC cells are in a deep dormant state where they cannot be revived through normal culture conditions [9, 10]. This distinction is critical for accurately representing bacterial survival strategies and population dynamics, which is why we maintain the use of the term VBNC separately from persister cells. We have added related references in lines 259.

      Regarding the reported errors associated with nucleic acid staining dyes for identifying VBNC cells, we acknowledge that these methods can exhibit limitations. Specifically, nucleic acid stains may fail to reliably differentiate between metabolically active and inactive cells, leading to inaccuracies in quantifying the true VBNC population [11]. In our study, we have opted to utilize propidium iodide (PI) staining to assess cell viability more accurately, as it effectively distinguishes dead cells from viable cells based on membrane integrity [12]. By employing this methodology, we ensure a more precise estimation of the VBNC proportion without conflating it with persister cell dynamics.

      Reviewer #2 (Public Review):

      Summary:

      Hebin et al reported a fascinating story about antibiotic persistence in the biofilms. First, they set up a model to identify the increased persisters in the biofilm status. They found that the adhesion of bacteria to the surface leads to increased c-di-GMP levels, which might lead to the formation of persisters. To figure out the molecular mechanism, they screened the E.coli Keio Knockout Collection and identified the HipH. Finally, the authors used a lot of data to prove that c-di-GMP not only controls HipH over-expression but also inhibits HipH activity, though the inhibition might be weak.

      Thank you for your insightful summary of our research. We greatly appreciate your thoughtful consideration of our work.

      Strengths:

      They used a lot of state-of-the-art technologies, such as single-cell technologies as well as classical genetic and biochemistry approaches to prove the concept, which makes the conclusions very solid. Overall, it is a very interesting and solid story that might attract diverse readers working with c-di-GMP, persisters, and biofilm.

      Weaknesses:

      (1) Is HipH the only target identified by screening the E. coli Keio Knockout Collection?

      We appreciate your inquiry about our screening process and the identification of HipH. We did not screen the entire E. coli Keio Knockout Collection. Our approach was more targeted, focusing on mutants relevant to enzyme activity regulation. We selected specific mutants based on their potential involvement in c-di-GMP-mediated regulatory pathways. This focused approach allowed us to efficiently identify candidates likely to be involved in persister formation. Among the screened mutants, HipH emerged as a significant hit. Its identification was particularly noteworthy due to its known role in persister formation and its potential as a regulatory target of c-di-GMP. We acknowledge that our targeted approach may have overlooked other potential candidates. We are considering a more comprehensive screening approach in future studies to identify additional targets.

      (2) Since the story is complicated, a diagrammatic picture might be needed to illustrate the whole story. And the title does not accurately summarize the novelty of this study.

      Thank you for your valuable feedback. We fully agree with your assessment that a visual representation would greatly enhance the clarity of our complex findings. In response to your suggestion, we have added Response Figure 2 (Fig. 6 in revised manuscript, lines 976-981) to our manuscript. This new figure provides a comprehensive visual summary of the key processes and mechanisms uncovered in our study. This graphic summary provides a clear overview of the interconnected nature of surface adhesion, c-di-GMP signaling, and HipH regulation. It also highlights the complex role of c-di-GMP in persister formation and offers readers a visual aid to better understand the molecular mechanisms underlying our findings.

      We sincerely appreciate your thoughtful comment regarding the title and its reflection of the study's novelty. ​After careful consideration, we believe that our original title adequately captures the essence and significance of our research.​ We have strived to ensure that it accurately represents the scope and novelty of our work while maintaining clarity and conciseness. Nevertheless, we value your input and thank you for taking the time to provide this feedback, as it encourages us to critically evaluate our presentation.

      (3) The ratio of mVenusNB to mScarlet-I (R) negatively correlates with the concentration of c-di-GMP. Therefore, R-1 demonstrates a positive correlation with the concentration of c-di-GMP. Is this method validated with other methods to quantify c-di-GMP, or used in other studies?

      We appreciate your point about the c-di-GMP sensor. To address this, we have included additional data on the sensor's mechanism and validation. The sensor, developed by Howard C. Berg's team, is designed for detecting intracellular c-di-GMP concentrations in E. coli [7].

      Sensor Design and Mechanism:The sensor developed for detecting c-di-GMP levels in Escherichia coli cells is based on a single fluorescent protein biosensor. The protein includes a Fluorescent Protein Base and a c-di-GMP Binding Domain. The fluorescent protein base is mVenusNB, which is the fastest-folding yellow fluorescent protein (YFP). The c-di-GMP binding domain is the MrkH protein is inserted between Y145 and N146 of mVenusNB. MrkH is a transcription factor with a high affinity for c-di-GMP. When MrkH binds to c-di-GMP, it undergoes a significant conformational change. The amino-terminal domain of MrkH rotates 138° relative to its carboxyl-terminal domain upon c-di-GMP binding.This rotation disrupts the mVenusNB chromophore environment, resulting in reduced fluorescence. The sensor system co-expresses mScarletI, a bright, rapidly folding red fluorescent protein. mScarletI serves as a reference for ratiometric measurements. Such design allows for ratiometric measurement of real-time monitoring of c-di-GMP levels in individual cells and control of variations in protein expression levels between cells. This enables the observation of dynamic changes in c-di-GMP concentration, such as the increase seen after E. coli surface attachment.

      Functioning and Accuracy: The sensor is designed to detect c-di-GMP in the 100 to 700 nM range, which is the physiological range in E. coli. The use of a low copy plasmid for expression ensures detection at low concentrations. The ratio (R) of mVenusNB to mScarletI fluorescence emission is measured for individual cells. The sensor shows at least a twofold dynamic range between low and high c-di-GMP conditions. Cells with low c-di-GMP (expressing phosphodiesterase PdeH) show higher R values compared to cells with high c-di-GMP (expressing constitutively active diguanylate cyclase WspR:D70). A mutant biosensor (Sensor*) with the R113A mutation in MrkH is used as a control. This mutation eliminates c-di-GMP binding ability, allowing differentiation between specific c-di-GMP effects and other cellular changes.

      This biosensor system provides a sophisticated tool for visualizing and quantifying c-di-GMP levels in individual bacterial cells with high sensitivity and temporal resolution.​ By combining a c-di-GMP-sensitive fluorescent protein with a reference fluorescent protein and utilizing ratiometric analysis, the system can accurately reflect changes in intracellular c-di-GMP levels while controlling for other cellular variables.

      We have expanded our explanation of its detection mechanism in lines 138-146 and Figure S1B.

      (4) References are missing throughout the manuscript. Please add enough references for every conclusion.

      We appreciate your feedback regarding the references in our manuscript. We acknowledge the importance of proper citation to support our conclusions and provide context for our work. ​In response to your comment, we have conducted a comprehensive review of our manuscript and have significantly enhanced our referencing throughout.​ We have added appropriate citations to support each key statement and conclusion presented in our study. These additional references provide a robust foundation for our findings and place our work within the broader context of the field. The complete list of all references, including the newly added ones, can be found at the end of this response letter as well as in the revised manuscript.

      (5) The novelty of this study should be clearly written and compared with previous references. For example, is it the first study to report the mechanism that the adhesion of bacteria to the surface leads to increased persister formation?

      We sincerely appreciate the opportunity to highlight and elaborate the novelty of our research. This study provides novel insights into the relationship between bacterial adhesion to surfaces and the subsequent increase in persister cell formation, which has not been explicitly detailed in previous literature. While existing research has established that biofilms typically harbor higher numbers of persister cells, this investigation not only corroborates that finding but also elucidates the mechanisms through which surface adhesion contributes to this phenomenon.

      Past studies have predominantly focused on the general characteristics of persister cells and their role in biofilm resilience and antibiotic tolerance without specifically addressing the mechanistic link between adhesion and persister formation [13, 14]. For instance, previous work has shown that surface attachment leads to changes in metabolic activity and signaling pathways within bacterial cells, which could promote persistence, but it has not definitively established a causal relationship between adhesion and increased persister formation. Our study highlights that the elevation of cyclic di-GMP levels after surface adhesion triggers a cascade of physiological changes that significantly enhance the formation of persister cells. In particular, we report that adhesion-induced signaling pathways promote dormancy and tolerance to antibiotics, marking an important advancement from the previous understanding that treated persister cells might arise from random phenotypic variation during biofilm development. we have expanded our discussion in lines 366-381.

      In summary, we believe this study stands as one of the first to clearly delineate the mechanism by which bacterial adhesion leads to increased persister formation, providing a valuable contribution to the current understanding of bacterial persistence and biofilm ecology. Thus, we can assert that our findings are not only novel but also essential for informing future research and therapeutic strategies aimed at managing bacterial infections.

      (6) in vitro DNA cleavage assay. Why not use bacterial genomic DNA to test the cleaving of HipH on the bacterial genome?

      Thank you for your feedback regarding our experimental approach. The decision of not directly using genomic DNA in our experiments was made after careful consideration. The high molecular weight of genomic DNA, which presents significant challenges in handling and analysis. The difficulty in extracting intact genomic DNA, which could potentially compromise the integrity of our results. The challenges associated with electrophoretic separation of such large DNA molecules, which could limit our ability to accurately interpret the data.

      Instead, following established practices in molecular biology research and drawing from similar studies in the field [15-17], we opted to use plasmids as model DNA for our experiments.​ This approach offers several advantages: Plasmids are smaller and more manageable, making them easier to manipulate in laboratory conditions; They can be more readily extracted in intact form, ensuring the quality of our experimental material; Plasmid DNA is more amenable to electrophoretic separation, allowing for clearer and more precise analysis. Despite their smaller size, plasmids retain many of the key characteristics of genomic DNA that are relevant to our study. We believe this approach provides a robust and reliable model for our research while overcoming the practical limitations associated with genomic DNA. It allows us to investigate the fundamental principles we're interested in, while maintaining experimental feasibility and data integrity. We have added related references in lines 314 and 599.

      (7) C-di-GMP -HipH is not a TA, it does not fit in the definition of the TA systems. You can say C-di-gmp is an antitoxin based on your study, but C-di-gmp -HipH is not a TA pair.

      We appreciate your insightful feedback regarding the classification of the c-di-GMP-HipH interaction. We acknowledged that while our study suggests c-di-GMP may function as an antitoxin to HipH, the c-di-GMP-HipH pair does not constitute a classical TA system due to the lack of genetic linkage. We have replaced the term "TA system" with "TA-like system" when referring to the c-di-GMP-HipH interaction. This more accurately reflects the nature of their relationship while acknowledging that it differs from traditional TA systems.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Either indent or skip a line to indicate a new paragraph; there is no need to do both.

      Thank you for your feedback regarding the formatting of our manuscript. We have revised the formatting throughout the main text by using a single blank line to separate paragraphs, without indentation.

      (2) L 77: need to define 'c-di-GMP' without using another abbreviation; please write '3,5-cyclic diguanylic acid', etc.

      Thank you for your valuable feedback regarding the proper introduction of abbreviations in our manuscript. We have revised line 86 to introduce the full name of c-di-GMP as "3,5-cyclic diguanylic acid". Following this initial introduction, we consistently use the abbreviation "c-di-GMP" throughout the rest of the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      This is a fascinating story, but the title and the manuscript need careful revision to make it more clear. The novelty and logic are not very easy to follow.

      (1) Figure 1B, " h" is missing

      We sincerely thank you for your attentive review and for pointing out the missing "h" in Figure 1B. We have carefully reviewed and revised the figure legend in Figure 1B.​ The unit of time has been corrected to include "h" (hours) where appropriate, ensuring consistency and accuracy throughout the figure.

      (2) Line 222, the in vivo mice model should be cited with the reference.

      Thank you for the reminding. We have cited the following reference related to the mice model (line 231).

      Pang Y, et al., (2022) Bladder epithelial cell phosphate transporter inhibition protects mice against uropathogenic Escherichia coli infection. Cell reports 39: 110698

      References

      (1) Wood, T.K. and S. Song, Forming and waking dormant cells: The ppGpp ribosome dimerization persister model. Biofilm, 2020. 2: p. 100018.

      (2) Song, S. and T.K. Wood, ppGpp ribosome dimerization model for bacterial persister formation and resuscitation. Biochem Biophys Res Commun, 2020. 523(2): p. 281-286.

      (3) Wood, T.K., S. Song, and R. Yamasaki, Ribosome dependence of persister cell formation and resuscitation. J Microbiol, 2019. 57(3): p. 213-219.

      (4) Niu, H., J. Gu, and Y. Zhang, Bacterial persisters: molecular mechanisms and therapeutic development. Signal Transduct Target Ther, 2024. 9(1): p. 174.

      (5) Mok, W.W., M.A. Orman, and M.P. Brynildsen, Impacts of global transcriptional regulators on persister metabolism. Antimicrob Agents Chemother, 2015. 59(5): p. 2713-9.

      (6) Amato, S.M., M.A. Orman, and M.P. Brynildsen, Metabolic control of persister formation in Escherichia coli. Mol Cell, 2013. 50(4): p. 475-87.

      (7) Vrabioiu, A.M. and H.C. Berg, Signaling events that occur when cells of Escherichia coli encounter a glass surface. Proc Natl Acad Sci U S A, 2022. 119(6).

      (8) Liu, J., et al., Viable but nonculturable (VBNC) state, an underestimated and controversial microbial survival strategy. Trends Microbiol, 2023. 31(10): p. 1013-1023.

      (9) Pan, H. and Q. Ren, Wake Up! Resuscitation of Viable but Nonculturable Bacteria: Mechanism and Potential Application. Foods, 2022. 12(1).

      (10) Ayrapetyan, M., T. Williams, and J.D. Oliver, Relationship between the Viable but Nonculturable State and Antibiotic Persister Cells. J Bacteriol, 2018. 200(20).

      (11) Zhao, S., et al., Absolute Quantification of Viable but Nonculturable Vibrio cholerae Using Droplet Digital PCR with Oil-Enveloped Bacterial Cells. Microbiol Spectr, 2022. 10(4): p. e0070422.

      (12) Zhao, S., et al., Enumeration of Viable Non-Culturable Vibrio cholerae Using Droplet Digital PCR Combined With Propidium Monoazide Treatment. Front Cell Infect Microbiol, 2021. 11: p. 753078.

      (13) Pan, X., et al., Recent Advances in Bacterial Persistence Mechanisms. Int J Mol Sci, 2023. 24(18).

      (14) Patel, H., H. Buchad, and D. Gajjar, Pseudomonas aeruginosa persister cell formation upon antibiotic exposure in planktonic and biofilm state. Sci Rep, 2022. 12(1): p. 16151.

      (15) Maki, S., et al., Partner switching mechanisms in inactivation and rejuvenation of Escherichia coli DNA gyrase by F plasmid proteins LetD (CcdB) and LetA (CcdA). J Mol Biol, 1996. 256(3): p. 473-82.

      (16) Hockings, S.C. and A. Maxwell, Identification of four GyrA residues involved in the DNA breakage-reunion reaction of DNA gyrase. J Mol Biol, 2002. 318(2): p. 351-9.

      (17) Chan, P.F., et al., Structural basis of DNA gyrase inhibition by antibacterial QPT-1, anticancer drug etoposide and moxifloxacin. Nat Commun, 2015. 6: p. 10048.

    1. eLife assessment

      This important study evaluates the outcomes of a single-institution pilot program designed to provide graduate students and postdoctoral fellows with internship opportunities in areas representing diverse career paths in the life sciences. The data convincingly show the benefit of internships to students and postdocs, their research advisors, and potential employers, without adverse impacts on scientific productivity. This work will be of interest to multiple stakeholders in graduate and postgraduate life sciences education and should stimulate further research into how such programs can best be broadly implemented.

    2. Reviewer #2 (Public review):

      Summary:<br /> The authors describe five year outcomes of an internship program for graduate students and postdoctoral fellows at their institution spurred by pilot funding from an NIH BEST grant. They hypothesized that such a program would be beneficial to interns, internship hosts, and research advisors. The mixed methods study used surveys and focus groups to gather qualitative and quantitative data from the stakeholder groups, and the authors acknowledge that limitation that the study subjects were self-selected and also had research advisors who agreed to allow them to participate. Thus the generally favorable outcomes may not be applicable to students such as those who are struggling in the lab and/or lack career focus or supportive research advisors. Nonetheless, the overall finding support the hypothesis and also suggest additional benefits, including in some cases positive impact for the lab, improved communication between the intern and their research advisor, and an advantage for recruitment of students to the institution. The data refute one of the principle concerns of research advisors: that by taking students out of the lab, internships reduce individual and overall lab productivity. Students who did internships were significantly less likely to pursue postdoctoral fellowships before entering the biomedical workforce and were more likely to have science-related careers versus research careers than control students who did not do internships, although the study design cannot determine whether this is a causal relationship.

      Strengths:<br /> (1) Sample size is good (123 internships).

      (2) Response rate is high, minimizing potential bias.

      (3) The internship program is well described. Outcomes are clearly defined.

      (4) Methods and statistical analyses appear to be appropriate (although I am not expert in mixed methods).

      (5) "Take-home" lessons for institutions considering implementing internship programs are clearly stated.

      Appraisal:<br /> Overall the authors achieve their aims of describing outcomes of an internship program for graduate career development and offering lessons learned for other institutions seeking to create their own internship programs.

      Impact:<br /> The paper will be very useful for other institutions to dispel some of the concerns of research advisers about internships for PhD students (although not necessarily for postdoctoral fellows). In the long run, wider adoption of internships as part of PhD training will depend not only on faculty buy-in but also on availability of resources and changes to the graduate school funding model so that such programs are not viewed as another "unfunded mandate" in graduate education. Perhaps industry will be motivated to support internships by the positive outcomes for hosts reported in this paper. Additionally, NIH could allow a certain amount of F, T, or even RPG funds to be used to support internships for purposes of career development.

    3. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment<br /> This important study evaluates the outcomes of a single-institution pilot program designed to provide graduate students and postdoctoral fellows with internship opportunities in areas representing diverse career paths in the life sciences. The data convincingly show the benefit of internships to students and postdocs, their research advisors, and potential employers, without adverse impacts on scientific productivity. This work will be of interest to multiple stakeholders in graduate and postgraduate life sciences education and should stimulate further research into how such programs can best be broadly implemented.

      Thank you for your assessment. We agree that sharing our process for creating this internship program with the wider higher education community is important and we hope it will spur establishment of new programs at other institutions.

      Public Reviews:

      Reviewer #1 (Public Review):

      The goal of this study was to determine whether short (1 month) internships for biomedical science trainees (mostly graduate students but some post-docs) were beneficial for the trainees, their mentors, and internship hosts. Over a 5 year period, the outcomes of trainees who completed internships were compared with peers who did not. Both quantitative results in terms of survey responses and qualitative results obtained from discussion groups were provided. Overall, the data suggest that internships aid graduate students in multiple ways and do not harm progress on dissertation projects. 'Buy-in' from mentors and prospective mentors appeared to increase over time, and hosts also gained from the contributions of the interns even in a short time period. While the program also appeared valuable for post-doctoral trainees, it was less favorably considered by post-doc mentors.

      Thank you for such a positive and concise overview of this paper.

      Strengths:

      The internship program that was examined here appears to have been very well designed in terms of availability to students, range of internship offerings, length of time away from PhD lab, and assessments.

      Having a built-in peer control group of graduate students who did not do internships was valuable for much of the quantitative analyses. However, as the authors acknowledge, those who did opt for internships are a self-selected group who may have character traits that would help them overcome the potential negative impacts of the internship.

      The quantitative data is convincing and addresses important considerations for all stakeholders.

      The manuscript is well-constructed to individually address the impact of the program on each set of stakeholders, while also showcasing areas of mutual benefit.

      The discussion of challenges and limitations, from the perspectives of participating stakeholders, program leaders, and also institutions, is comprehensive and very thoughtful.

      Thank you for noting these strengths in experimental design, control group, and manuscript format.

      Weaknesses:

      The qualitative data that resulted from the 'focus groups' of faculty mentors was somewhat difficult to evaluate given the very limited number of participants (n=7).

      Thank you for pointing out the potential limitations of a small sample size. One reason we selected a qualitative approach to focus group data analysis in our experimental design was to supplement our larger quantitative analyses with faculty advisors. A benefit of relying on qualitative methods is that saturation of a representative set of themes can be reached even with a limited number of participants. This is particularly true when a homogenous sample is used, such as faculty members in the biomedical sciences (Guest, et al. 2006). We have added the following sentences at line 188 in the text to expand on the faculty focus groups:

      “A group of faculty advisors in a range of disciplines and demographics, all of whom were active mentors with extensive training experience were invited to participate in the focus groups. Seven faculty advisors participated in the Year 1 focus group and 5 of those same 7 participated in Year 5. Saturation can occur with as little as six interviews in homogeneous samples (Guest et al. 2006) such as our biomedical faculty research advisors at a single institution.”

      In the original analysis, we increased the generalizability of our findings by gathering faculty opinions and feedback using multiple methods. For example, faculty post internship surveys responses were returned by 75 faculty members over a 5-year period, which represents a 61% response rate. (Faculty post internship surveys results are shown in Figure 1, panels v-x and Figure 4, panels i-t.) In addition, the survey gauging general faculty advisor support for the program (Figure 3); which was administered two times, 4 years apart; gathers the opinions of 115 advisors in year 1 and 122 advisors in year 4. Thus, the faculty focus group surveys were only one of 3 ways that faculty input was gathered. In sum, while the small number of faculty mentors who participated in the focus groups has the potential to introduce bias, we made a conscious decision to use a mixed methods approach to expand beyond one sample to increase the generalizability of our results. However, to acknowledge the complexity of faculty advisor views on internships, we have noted the need to further study faculty advisor support for internships in broader samples as a future direction. This is the new wording we included at line 788:

      “Other future studies could probe faculty advisor support for internships at institutions beyond our own since training culture and faculty perspectives are influenced by many factors and vary from institution to institution.”

      Overall, the data support the authors' conclusions with respect to the utility of internship programs for all stakeholders. As the authors note, the data relate to a specific program where internship length was defined, costs were covered by a grant or institutional funding, and there were multiple off-site internship hosts available. Thus, the results here may not replicate for other programs with different criteria.

      Thank you for noting these advantages that contributed to the success of this program. We agree that other institutions will encounter unique challenges when implementing their own internship program and have addressed some of these limitations in our discussion section. In the Discussion section of the paper, we outline considerations and review lessons learned in an effort to help others know what aspects of the program might or might not work in distinct situations or locations. We also point the reader to distinct internship models at other institutions in the hope that any university hoping to provide their trainees with internship opportunities can benefit from the collective experience of the relatively few programs that have found sustainable ways to accomplish this.  

      This work provides a valuable assessment of how relatively short internships can impact graduate students, both in terms of their graduate tenure and in their decision-making for careers post-graduation. As more graduate programs are heeding calls from funding agencies and professional societies to increase knowledge about, and familiarity with, multiple career paths beyond academia for PhD students, there is a need to evaluate the best ways to accomplish that goal. Hands-on internships are valuable across many spheres so it makes sense that they would be for life science graduates too. However, the fear that time-to-degree and/or productivity would be negatively impacted is important to acknowledge. By providing clear data that this is not the case, these investigators have increased the likelihood that internships could be considered by more institutions. The one big drawback, and one that the authors discuss at some length, is the funding model that could enable internship programs to be used more widely.

      Thank you for providing suggestions to improve the generalizability of our results. We agree that finding a sustainable source of funding for internship programs, and the staff who direct them, is a primary obstacle to implementing these programs more widely. We provide some ideas and funding models for other institutions to consider, and future directions could examine internships that are un-funded or funded primarily by fellowships from supportive granting agencies. Accordingly, we have added the following text to future directions at Line 755:

      “We acknowledge the need for future studies to evaluate the feasibility and outcomes of internship programs funded via different models to see if faculty support and student outcomes would be comparable under different models.”

      Reviewer #2 (Public Review):

      Summary:

      The authors describe five-year outcomes of an internship program for graduate students and postdoctoral fellows at their institution spurred by pilot funding from an NIH BEST grant. They hypothesized that such a program would be beneficial to interns, internship hosts, and research advisors. The mixed methods study used surveys and focus groups to gather qualitative and quantitative data from the stakeholder groups, and the authors acknowledge the limitation that the study subjects were self-selected and also had research advisors who agreed to allow them to participate. Thus the generally favorable outcomes may not be applicable to students such as those who are struggling in the lab and/or lack career focus or supportive research advisors. Nonetheless, the overall findings support the hypothesis and also suggest additional benefits, including in some cases positive impact for the lab, improved communication between the intern and their research advisor, and an advantage for recruitment of students to the institution. The data refute one of the principal concerns of research advisors: that by taking students out of the lab, internships reduce individual and overall lab productivity. Students who did internships were significantly less likely to pursue postdoctoral fellowships before entering the biomedical workforce and were more likely to have science-related careers versus research careers than control students who did not do internships, although the study design cannot determine whether this was due to selection bias or to the internship.

      Thank you for such a positive and concise overview of this paper.

      Strengths:

      (1) The sample size is good (123 internships).

      (2) The internship program is well described. Outcomes are clearly defined.

      (3) Methods and statistical analyses appear to be appropriate (although I am not an expert in mixed methods).

      (4) "Take-home" lessons for institutions considering implementing internship programs are clearly stated.

      Thank you for enumerating these strengths. We also hope that the sample size, positive outcomes, and take-home lessons will be of benefit to other institutions.

      Weaknesses:

      (1) It is possible that interns, hosts, and research advisers with positive experiences were more likely to respond to surveys than those with negative experiences. The response rate and potential bias in responses should be discussed in the Results, not just given in a table legend in Methods.

      Thank you for noting this oversight. We were pleased that throughout our study, the majority of interns, faculty advisors and internship hosts responded to the surveys. As suggested, we have included the following text at line 132 in the first paragraph of the results section:

      “The response rate for the 123 survey invitations sent to interns and their current research advisors and internship hosts ranged from 61% for research advisors to 73% for hosts, and about 66% for interns (averaging pre and post survey responses). In addition to quantitative surveys, qualitative themes and exemplars were collected from focus groups.”

      (2) With regard to the biased selection of participants, do the authors know how many subjects requested but were not permitted to do internships?

      We too were concerned about trainees who would not be able to secure their PI’s support to participate in an internship.  Accordingly, as part of our program design and evaluation, in the inaugural year of the program our external evaluator, Strategic Evaluations, Inc., administered a survey to graduate students and postdocs who registered for an internship information session or who started, but did not complete the application. Registrants were asked about their decision to complete an application, their experience completing the application if they chose to do so, and the likelihood that they would apply to the program next year. Of the respondents, only 9% indicated that lack of PI support prevented them from participating (n=53 respondents). Hence while we cannot completely rule out PI support as a barrier, only a small percentage of trainees reported this as a barrier despite a robust response rate (43%).  A second line of evidence that there was not a large number of students who were prevented from doing an internship by their research advisor is the high faculty approval rating of the program which was gathered in both year 1 and year 4 of the program (see figure 3). These two independent lines of evidence diminish our concern that faculty advisor resistance was a significant barrier to participation.

      (3) While the authors mention internships in professional degree programs in fields such as law and business, some mention of internship practices in non-biomedical STEM PhD programs such as engineering or computer science would be helpful. Is biomedical science rediscovering lessons learned when it comes to internships?

      Excellent point. We noted that internships are common in non-biomedical STEM masters and PhD programs, but we did not list experiential rotations and internships that are common in nursing, engineering, computer science and other such programs. We agree that many lessons learned from internships in all fields are transferable to the biomedical fields, and we also strongly believe that findings there need to be replicated in the biomedical sciences because of the unique funding model, incentive structure, and apprentice structure of the biomedical training. In response to this critique, we added the following text to the manuscript at line 724:

      “Internships are ubiquitous in many other professional training programs such as law, business, nursing, computer science, and engineering programs (Van Wart, O’Brien et al, 2020).”

      (4) Figure 1 k, l - internships did not appear to change career goals, but are the 76% who agreed pre-internship the same individuals as the 75% who agreed post-internship? What percentage gave discordant responses?

      While our data cannot directly address this question as collected, we surmise that because internships in this program usually occur in the final 12-18 months of training and because there is an emphasis on the internship being a skill-building and not necessarily a career exploration initiative, therefore we were not surprised to see that the internship doesn’t radically alter many trainees’ career plans. One limitation of our study is that career goals were defined by pre-surveys at different timepoints depending on what stage of training an individual (whether control or internship participant) happened to be at during the administration of the baseline survey. We know from previous work that career goals often shift during training (see Roach and Sauermann, 2017 PLOS One, https://doi.org/10.1371/journal.pone.0184130, and Gibbs et al, 2014, PLOS One, https://doi.org/10.1371/journal.pone.0114736), so the point at which career interests are gathered makes a difference in this kind of analysis. Hence, we have expanded our discussion of this limitation to better acknowledge this critique beginning at Line 319.

      “Because of the variable timing between pre-internship career interest surveys among interns and control trainees and securing the first job, future studies could more rigorously evaluate changes in career preferences between pre and post internship with an analysis that considers the time that has elapsed between career interest noted pre-internship vs post internship career placement. “

      Appraisal:

      Overall the authors achieve their aims of describing outcomes of an internship program for graduate career development and offering lessons learned for other institutions seeking to create their own internship programs.

      We thank you for your thorough reading and review of the manuscript.

      Impact:

      The paper will be very useful for other institutions to dispel some of the concerns of research advisers about internships for PhD students (although not necessarily for postdoctoral fellows). In the long run, wider adoption of internships as part of PhD training will depend not only on faculty buy-in but also on the availability of resources and changes to the graduate school funding model so that such programs are not viewed as another "unfunded mandate" in graduate education. Perhaps the industry will be motivated to support internships by the positive outcomes for hosts reported in this paper. Additionally, NIH could allow a certain amount of F, T, or even RPG funds to be used to support internships for purposes of career development. 

      Thank you. We share your hope that the information and data resulting from this study will be valuable to other institutions. Your point about NIH (and other funders, for that matter) allowing trainees to participate in internship experiences while funded by the granting agency is an excellent one. We have found that communication with program officers often garners their support for the intern remaining on a fellowship or training grant during the internship. This allows the internship program to fund additional interns, especially those that are supported by the faculty advisor’s grants.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Two minor points about the comments used from focus groups.

      (i) In figure 5, there is a specific quote about being a reward that is used twice;

      (ii) It seems that there should be some consistency in how these quotes are relayed with respect to gender identification of the trainee. In some cases 's/he' is used, in others 'he' or 'she' is used, and in others 'they' is used.

      We appreciate this suggestion and agree that a non-gendered convention would clearer – accordingly, we have revised all quotes to use “they” to be more consistent. In addition, we have removed the duplicated quote from figure 5, which was originally inserted in two sections because of its applicability to both the “Persisting Challenges” and “Trainees’ abilities and skills were primary drivers of the success of the internship”.

      Reviewer #2 (Recommendations For The Authors):

      (1) The paper is somewhat lengthy. Some redundant material can be eliminated - Lines 366-371 simply restate the data in Table 5. Lines 393-396 restate the data in Figure 3. The text should be reserved for interpreting rather than restating the data in tables and figures.

      Thank you for this feedback and we agree that these sections can be condensed. We have removed some of the redundancy and retained enough for figures and text to each be stand alone for accessibility to the readers.

    1. eLife assessment

      Understanding how genomic regulatory elements control spatiotemporal gene expression is essential for explaining cell type diversification, function, and the impact of genetic variation on disease. This important study provides solid evidence that enhancers generally combine additively to influence gene expression. Moreover, promoters, particularly weaker ones, can exhibit supra-additivity when integrating enhancer effects. These findings highlight the context-dependent nature of enhancer-promoter interactions in gene regulation, and contribute to ongoing discussions about the selectivity and combination of regulatory elements.

    2. Reviewer #1 (Public review):

      This manuscript by Martinez-Ara et al investigates how combinations of cis-regulatory elements combine to influence gene expression. Using a clever iteration on massively parallel reporter assays (MPRAs), the authors measure the combinatorial effects of pairs of enhancers on specific promoters. Specifically, they assayed the activity of 59x59 different enhancer-enhancer (E-E) combinations on 8 different promoters in mouse embryonic stem cells. The main claims of the paper are that E-E pairs combine nearly additively, and that supra-additive E-E pairs are rare and often promoter-dependent. The data in this study do generally support these claims.

      This paper makes a good contribution to the ongoing discussions about the selectivity of gene regulatory elements. Recent works, such as those by Martinez-Ara et al. and Burgman et al., have indicated limited selectivity between E-P pairs on plasmid-based assays; this paper adds another layer to that by suggesting a similar lack of selectivity between E-E pairs.

      An interesting result in this manuscript is the observation that weak promoters allow more supra-additive E-E interactions than strong promoters (Figure 4b). This nonlinear promoter response to enhancers aligns with the model previously proposed in Hong et al. (from my own group), which posited that core promoter activities are nonlinearly scaled by the genomic environment, and that (similar to the trend observed in Figure 5b) the steepness of the scaling is negatively correlated with promoter strength.

      My only suggestion for the authors is that they include more plots showing how much the intrinsic strengths of the promoters and enhancers they are working with explain the trends in their data.

      Specific Suggestions<br /> Supplementary Figure 4 is presented as evidence for selectivity between single enhancers and promoters. Could the authors inspect the relationship between enhancer/promoter strength and this selectivity? Generating plots similar to Figure 4B and Figure 5B, but for single enhancers, should show if the ability of an enhancer to boost a promoter is inversely correlated to that promoter's intrinsic strength. Also, in Supplementary Figure 4, coloring each point by promoter type would clarify if certain promoters (the weak ones) consistently show higher boost indices across all enhancers. If they do not, the authors may want to speculate how single enhancers can show selectivity for promoters while the effect of adding a second enhancer to an existing E-P has little selectivity. An alternate explanation, based solely on the strength of the elements, would be that when the expression of a gene is low the addition of enhancer(s) have large effects, but when the expression of a gene is high (closer to saturation) the addition of enhancer(s) have small effects.

      Can anything more be said about the enhancers in E-E-P combinations that exhibit supra-additivity? Specifically, it would be interesting to know if certain enhancers, e.g. strong enhancers or enhancers with certain motifs, are more likely to show supra-additivity with a given promoter.

      Comments on revised version:

      The revised manuscript satisfactorily addresses the points I raised in the review. With the addition of the new graphs there is enough data for readers to decide whether the supra-additivity depends only on the strength of the promoter or on some other (undefined) feature of E-P pairs. This manuscript is a solid contribution to the ongoing debate about enhancer-promoter selectivity.

    3. Reviewer #2 (Public review):

      Summary

      This work investigates how multiple DNA elements combine to regulate gene expression. The authors use an episomal reporter assay which measures the transcriptional output of the reporter under the regulation of an enhancer-enhancer-promoter triple. The authors test all combinations of 8 promoters and 59 enhancers in this assay. There are two main findings: (1) enhancer pairs generally combine additively on reporter output (2) the extent to which enhancers increase reporter output over the promoter (individually and as enhancer-enhancer pairs) is inversely related to the intrinsic strength of the promoter. Both of these findings are interesting and are well supported by the data.

      This study extends previous results on enhancer-promoter combinations to enhancer-enhancer-promoter triples. For example the near equivalence of Fig. 5b and Fig. S7b is intriguing. This experimental design also provides the ability to investigate the notion of selectivity (also commonly referred to as compatibility) between enhancer-enhancer pairs and promoters.

      The authors note many limitations, including the selection of the elements and the size and spacing of the tested elements. Some of the enhancer-enhancer-promoter triples they test were also investigated by a different experimental design in Brosh et al 2023. Brosh et al observed non-additivity between these elements while this study did not. Ultimately we do not know which mechanisms produce the non-additivity that has been observed in native loci and which experimental designs would preserve such mechanisms.

      Overall this is a nice experimental design and a great dataset for probing how enhancers and promoters combine to regulate gene expression. I have no major concerns, but I will try to clarify some methodological points I found confusing.

      Methodology<br /> The following two comments are meant to help the reader understand the methodology/terminology used in this paper and how it relates to other similar studies.

      The interpretation that "promoters scale enhancer signals in a non-linear manner" is potentially confusing. I believe that the authors use "non-linear" to refer to the slopes (represented by the letter 'b' in Fig. 5b) being not equal to 1. Given how the boost index is defined, this implies the relationship

      Activity of EEP = (Activity of CCP) * (Average Linear Boost)^b

      One potential source of confusion is that the Average Linear Boost term itself depends on the set of promoters that are assayed. Averaging across (many) promoters may alleviate this concern, in which case Average Linear Boost may be considered some form of intrinsic enhancer strength. If so, there is a correspondence between this terminology and the terminology presented in Bergman et al 2022. If b not equal to 1 refers to a non-linear scaling, then the reader may think that b=1 refers to a linear scaling. But if b=1, and the Average Linear Boost term is interpreted as intrinsic enhancer strength, then the equation above implies that the activity of EEP is equal to an intrinsic promoter strength times an intrinsic enhancer strength. This is essentially the relationship that is considered in Bergman et al 2022 and which is referred to in that paper as 'multiplicative'. The purpose of this comment is not to argue for what is the relationship that best explains the data, it is just to clarify the terminology.

      Enhancer-promoter selectivity: As a follow-up to a previous study (Martinez-Ara et al, Molecular Cell 2022) the authors mention that the data in this study also shows that enhancers show selectivity for certain promoters. I found the methodology hard to follow, so this section of the review is meant to guide the reader in understanding how the authors define 'selectivity'. The authors consider an enhancer to be not selective if its 'boost index' is the same across a set of promoters. 'Boost index' is defined to be the ratio of the reporter output with the enhancer and promoter divided by the reporter output with just the promoter. Conceptually, I think that considering the boost index is a reasonable way to quantify selectivity. The authors use a frequentist approach to classify each enhancer as selective or not selective. The null hypothesis is that the boost index of the enhancer is equal across a set of promoters. This can be visualized in Fig. 2C where the null hypothesis is that the mean of each vertical distribution is equal. Note that in Figure S4b of this paper (and in Figure 4B of their 2022 paper) the within-group variance is not plotted. Statistical significance is assessed using a Welch F-test.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      We thank the reviewer for the positive and constructive comments. We apologize for the very long delay in submitting this revised manuscript; due to personal circumstances we were not able to do this earlier.

      This manuscript by Martinez-Ara et al investigates how combinations of cis-regulatory elements combine to influence gene expression. Using a clever iteration on massively parallel reporter assays (MPRAs), the authors measure the combinatorial effects of pairs of enhancers on specific promoters. Specifically, they assayed the activity of 59x59 different enhancer-enhancer (E-E) combinations on 8 different promoters in mouse embryonic stem cells. The main claims of the paper are that E-E pairs combine nearly additively, and that supra-additive E-E pairs are rare and often promoter-dependent. The data in this study generally support these claims.

      This paper makes a good contribution to the ongoing discussions about the selectivity of gene regulatory elements. Recent works, such as those by Martinez-Ara et al. and Burgman et al., have indicated limited selectivity between E-P pairs on plasmid-based assays; this paper adds another layer to that by suggesting a similar lack of selectivity between E-E pairs.

      An interesting result in this manuscript is the observation that weak promoters allow more supra-additive E-E interactions than strong promoters (Figure 4b). This nonlinear promoter response to enhancers aligns with the model previously proposed in Hong et al. (from my own group), which posited that core promoter activities are nonlinearly scaled by the genomic environment, and that (similar to the trend observed in Figure 5b) the steepness of the scaling is negatively correlated with promoter strength.

      We now discuss the parallel with the Hong 2022 study (Discussion, lines 307-310).

      My only suggestion for the authors is that they include more plots showing how much the intrinsic strengths of the promoters and enhancers they are working with explain the trends in their data.

      Agreed, see below.

      Specific Suggestions

      Supplementary Figure 4 is presented as evidence for selectivity between single enhancers and promoters. Could the authors inspect the relationship between enhancer/promoter strength and this selectivity? Generating plots similar to Figure 4B and Figure 5B, but for single enhancers, should show if the ability of an enhancer to boost a promoter is inversely correlated to that promoter's intrinsic strength...

      Thank you for the suggestion, we have now repeated the analysis of Figure 5 for EP pairs instead of EEP triplets, and included it as new Supplementary Figure S7. Despite the lower statistical power, the trends are very similar. 

      ...Also, in Supplementary Figure 4, coloring each point by promoter type would clarify if certain promoters (the weak ones) consistently show higher boost indices across all enhancers. If they do not, the authors may want to speculate how single enhancers can show selectivity for promoters while the effect of adding a second enhancer to an existing E-P has little selectivity. An alternate explanation, based solely on the strength of the elements, would be that when the expression of a gene is low the addition of enhancer(s) has large effects, but when the expression of a gene is high (closer to saturation) the addition of enhancer(s) have small effects.

      We now added colour coding for each of the promoters in figure S4. We agree this clarifies the contribution of each promoter to the selectivity of each enhancer and it further confirms the responsiveness trends observed in Figure 5.

      Can anything more be said about the enhancers in E-E-P combinations that exhibit supra-additivity? Specifically, it would be interesting to know if certain enhancers, e.g. strong enhancers or enhancers with certain motifs, are more likely to show supra-additivity with a given promoter.

      Unfortunately, even with the number of enhancers that we tested, we lack statistical power to identify sequence motifs that may favour supra-additivity.

      Reviewer #2 (Public Review):

      We thank the reviewer for the supportive and constructive comments. We apologize for the very long delay in submitting this revised manuscript; due to personal circumstances we were not able to do this earlier.

      Summary

      This work investigates how multiple regulatory elements combine to regulate gene expression. The authors use an episomal reporter assay which measures the transcriptional output of the reporter under the regulation of an enhancer-enhancer-promoter triple. The authors test all combinations of 8 promoters and 59 enhancers in this assay. The main finding is that enhancer pairs generally combine additively on reporter output. The authors also find that the extent to which enhancers increase reporter output is inversely related to the intrinsic strength of the promoter.

      This manuscript presents a compact experiment that investigates an important open question in gene regulation. The results and data will be of interest to researchers studying enhancers. Given that my expertise is in modeling and computation, I will take the experimental results at face value and focus my review on the interpretation of the results and the computational methodology. I find the result of additivity between enhancers to be well supported. The findings on differential responsiveness between promoters are very interesting but the interpretation of such responses as 'non-linear' or 'following a power-law' may be misleading. More broadly, I think a more rigorous description of the mathematical methodology would increase the clarity and accessibility of this manuscript. A major unanswered question is whether the findings in this study apply to enhancers in their native genomic context. Regardless, investigating such questions in an episomal reporter assay is valuable.

      Main comments

      Applicability to native genomic context: The applicability of the results in this paper to enhancers in their native genomic context is unclear. As the authors state in the discussion section, the reporter gene is not integrated into the genome, the spacing between enhancers does not match their native context etc. It is thus unclear whether this experimental design is able to detect the non-additivity between enhancers which is known to be present in the genome. This could be investigated by testing the enhancer-enhancer-promoter tuples for which non-additivity has been observed in the genome (references are given in the introduction) in this assay.

      We appreciate the suggestion, but we chose not to go back to the lab to generate additional data to address this point. Of the cited previous studies, two are comparable to our study because they also used mESCs and included loci that we also studied:  Thomas et al. (2021) and Brosh et al. (2023). We now discuss how the findings of these two studies relate to our observations in the Discussion, lines 336-345.

      Interpretation of promoter responses as non-linear and following a power-law: In Fig 5, the authors demonstrate that enhancer-enhancer pairs boost reporter output more for weak promoters as opposed to strong promoters. I agree the data supports this finding, but I find the interpretation of such data as promoters scaling enhancers according to a power-law (as stated in the abstract) to be misleading. As mentioned on line 297, it is not possible to define an intrinsic measure of enhancer strength, thus the authors assign the base of the power-law to be the average boost index of the enhancer-enhancer pair across the 8 promoters. But this measure incorporates some aspect of a promoter and is not solely a property of enhancers...

      We agree that the power-law conclusion in the abstract was too strong; we have rephrased it as "non-linear".

      ...It would also be useful to know whether the results in Fig 5 apply to only enhancer-enhancer-promoter triples or also to enhancer-promoter pairs.

      We have now added this EP analysis as new Supplemental Figure S7. Although the statistical power is much lower, this shows very similar trends as the EEP analysis. We briefly report this, lines 275-278.

      Enhancer-promoter selectivity: As a follow-up to a previous study (Martinez-Ara et al, Molecular Cell 2022) the authors mention that the data in this study also shows that enhancers show selectivity for certain promoters. The authors mention that both studies use the same statistical methodology and the data in this study is consistent with the data from the 2022 paper. However, I think the statistical methodology in both studies needs further exposition. This section of the review is thus meant to ensure that I understand the author's methodology, to guide the reader in understanding how the authors define 'selectivity', and to probe certain assumptions underlying the methodology.

      My understanding of the approach is as follows: The authors consider an enhancer to be not selective if its 'boost index' is the same across a set of promoters. 'Boost index' is defined to be the ratio of the reporter output with the enhancer and promoter divided by the reporter output with just the promoter. Conceptually, I think that considering the boost index is a reasonable way to quantify selectivity.

      The authors use a frequentist approach to classify each enhancer as selective or not selective. The null hypothesis is that the boost index of the enhancer is equal across a set of promoters. This can be visualized in Fig. 2C where the null hypothesis is that the mean of each vertical distribution is equal. Note that in Figure S4 of this paper (and in Figure 4B of their 2022 paper) the within-group variance is not plotted. Statistical significance is assessed using a Welch F-test. This is a parametric test that assumes that the observations within each vertical distribution in Fig 2C are normally distributed (this test does allow for heteroskedasticity - which means that the variance may differ within each vertical distribution). Does the normality assumption hold? This analysis should be reported. If this assumption does not hold, is the Welch test well calibrated?

      We have tested the normality of all of the single enhancer + promoter combinations that were tested using the welch F-test. 94.1% of the 439 single enhancers + Promoter combinations show normal distributions (at a 1% FDR). We have added this to the methods section of the revised manuscript. Apart from this, non-normality has little to no influence on the Welch F-test performance (https://rips-irsp.com/articles/10.5334/irsp.198). Therefore, the use of the Welch F-test to score enhancer selectivity on these data is valid. Apart from this, we agree that a simple binary classification of selective vs non-selective is not descriptive enough for these kinds of data. We addressed this in our previous publication by exploring the relationship between selectivity and enhancer strength. However, in the objective in this publication was solely to show that this new dataset follows similar selectivity patterns to our previous publication. Furthermore, our analysis on the non-linearity of promoter response is a more quantitative continuation on the analysis on selectivity as this is probably one of the major contributors to enhancer selectivity. This was probably present in our previous paper but could not be analyzed as there were less combinations per promoter.

      For further clarity, we have now highlighted the individual promoters in Figure S4 by colors.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I found this to be an interesting manuscript and am glad this experiment was conducted. As I wrote in my public review, I think that clarifying the computational methods/ideas would really help. I also think it would be helpful to properly define the terms that are being used. For example, this manuscript uses the terminology cooperativity and synergy. Are these meant to be synonymous with supra-addivity?

      Thank you for this point. The revised manuscript no longer uses the word “cooperativity”. We now use “supra-additivity” when describing our data, and “synergy” as biological interpretation. In the Introduction we now clarify this distinction.

      Comments on enhancer selectivity:

      In the public review, I have given comments on the statistical methodology employed to assess enhancer selectivity. On a more subjective note, I'm not convinced that a frequentist approach to a binary classification of 'selective' vs 'not selective' is that useful here. I think it would be more useful to report an 'effect size' of the extent to which an enhancer is selective and to study the sources of this effect size. I think you've tried to do this in lines 329-339 of the discussion but I think the exposition could be clearer.

      Figure S4B may suggest how to do this. It appears that the distribution of boost indices for a given enhancer is trimodal (this is most obvious for the stronger enhancers on the top of the plot). Is it the case that each mode (for each enhancer) consists of the same set of promoters? I think what is implied by Figure 5B is that the stronger promoters are not boosted as much as the weaker promoters. So does the leftmost mode consist of Ap1m1, the middle mode consist of Klf2/Otx2/Nanog, and the rightmost mode of Sox2/Fgf5/Lefty1/Tbx3? If so, I would recommend emphasizing this in the text/figure and clarifying how this relates to selectivity. It seems that the chain of logic is as follows: (1) We define an enhancer to be selective if its boost indices across a set of promoters are not the same. (2) We generally observe that stronger promoters get boosted less than weaker promoters. (3) Thus selectivity arises due to differences in intrinsic strengths of the promoter. I think this is what is being implied in lines 329-339 of the discussion, but it took me multiple readings to understand this and I'm not convinced the power-law explanation is justified (see public review).

      We have modified this paragraph of the Discussion (now lines 350-359).

      Regarding the power-law: in the Results we state “roughly a power-law function”. We have removed the power-law claim from the abstract, that conclusion as phrased was indeed too firm.

      Reference to Zuin et al

      Lines 323 - 325: A reference is made to the data from Zuin et al "following approximately a power-law". What data in Zuin et al does this statement refer to? I do not believe the authors in Zuin et al claim that the relationship between GFP intensity and enhancer-promoter distance (Figure 1h,i from Zuin et al) follows a power law. It is certainly non-linear, but I have taken a look at this data myself and do not find it follows a power-law. Please either explain this further and rigorously justify the claim or adjust the wording accordingly.

      Good point, in the discussion of Zuin et al we have replaced “power law” with “non-linear decay function”

    1. eLife assessment

      This is a valuable contribution to our understanding of how different cell stressors (ethanol or heat-shock) elicit unique responses at the genomic and topographical level under the regulation of yeast transcription factor Hsf1, providing solid evidence documenting the temporal coupling (or lack thereof) between Hsf1 aggregation and long-range communication among co-regulated heat-shock loci versus chromatin remodeling and transcriptional activation. A particular strength is the combination of genomic and imaging-based experimental approaches applied to genetically engineered in vivo systems.

    2. Reviewer #2 (Public Review):

      Rubio et al. study the behavior of the transcription factor Hsf1 under ethanol stress, examining its distribution within the nucleus and the coalescence of heat shock response genes in budding yeast. In comparison to the heat shock response, the response to ethanol stress shows similar gene coalescence and Hsf1 binding. However, there is a notable delay in the transcriptional response to ethanol, and a disconnect between it and the appearance of irreversible Hsf1 condensates/puncta, highlighting important differences in how Hsf1 responds to these two related but distinct environmental stresses.

      The authors have addressed the majority of my previous comments effectively. The Sis1 experiment provides a clear illustration of a distinctive response to ethanol and heat. This work offers a comprehensive perspective on Hsf1 in stress response from multiple angles.

    3. Reviewer #3 (Public Review):

      This is an interesting manuscript that builds off of this group's previous work focused on the interface between Hsf1, heat shock protein (HSP) mRNA production, and 3D genome topology. Here the group subjects the yeast Saccharomyces cerevisiae to either heat stress (HS) or ethanol stress (ES) and examines Hsf1 and Pol II chromatin binding, Histone occupancy, Hsf1 condensates, HSP gene coalescence (by 3C and live cell imaging), and HSP mRNA expression (by RT-qPCR and live cell imaging). The manuscript is well written, and the experiments seem well done, and generally rigorous, with orthogonal approaches performed to support conclusions. The main findings are that both HS and ES result in Hsf1/Pol II-dependent intergenic interactions, along with formation of Hsf1 condensates. Yet, while HS results in rapid and strong induction of HSP gene expression and Hsf1 condensate resolution, ES result in slow and weak induction of HSP gene expression without Hsf1 condensate resolution. Thus, the conclusion is somewhat phenomenological - that the same transcription factor can drive distinct transcription, topologic, and phase-separation behavior in response to different types of stress.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Public Review):

      The authors have addressed the majority of my comments effectively. The new Sis1 experiment provides a clear illustration of a distinctive response to ethanol and heat. This work offers a comprehensive perspective on Hsf1 in stress response from multiple angles. I have two additional comments to improve the paper without re-review:

      (Original point #3) Could the authors clarify the differences between DPY1561 and the original strain used? There appears to be missing statistical analysis for Figure 1E at the bottom.

      DPY1561 is a haploid version of the original heterozygous diploid strain (LRY033). We opted for this strain in the analysis depicted in Figures 1D and 1E since 100% of Hsp104 is BFP-tagged; thus, the signal above background is stronger and the scoring of Hsp104 foci cleaner. The statistical analysis (Mann Whitney test) for the lower graphs in Fig. 1E has been added. We thank the reviewer for pointing this out.

      (Original point #4) In the new Figure 7F, '% transcription' and '% coalescence' are presented. My understanding is that Figures 7D and 7E aim to demonstrate the correlation between HSP104 transcription (a continuous variable) and HSP104-HSP12 coalescence (a binary variable) at the single-cell level. However, averaging the data across cells masks individual variations and potential anti-correlations. The authors could explore statistical methods that handle correlations between a continuous variable and a binary variable. Alternatively, consider converting 'HSP104 transcription' to a binary variable and then performing a chi-square test to assess the association.

      We thank the reviewer for this suggestion. In response, we have made the following changes:

      (1)  Clarified that the data used in this analysis were derived from Fig. 7 – figure supplement 1 in which ‘HSP104 transcription’ was converted to a binary variable.

      (2)  Indicated that the theoretical ceiling for coalescence of these tagged alleles is 25% given their heterozygous state (Figure 7–figure supplement 1D legend).  In the other 75% of cells scored, HSP104-HSP12 coalescence might also be taking place but is not detectable using this strategy. Therefore, it is not possible to elucidate any anti-correlation between HSR transcription and HSR coalescence in this experiment.

      In addition, we attempted to buttress the argument suggested by the Pearson correlation coefficient analysis (Fig. 7F) that a stronger association exists between transcription and gene coalescence in heat-shocked (HS) vs. ethanol stressed (ES) cells. To do so, we used the chi-square test as suggested by the reviewer. However, the results of this test were ambiguous, and we therefore did not include it in the manuscript.

    1. eLife assessment

      This manuscript investigates how chloroplasts are broken down during light-limiting conditions as plants reorganize their energy-producing organelles during carbon limitation. The authors provide compelling live-cell imaging data of plastids and solid quantification of events, documenting that buds form on the surface of chloroplasts and pinch away, then associate with the vacuole via a mechanism that depends on autophagy machinery, but not plastid division machinery. This manuscript provides valuable groundwork for other scientists studying the regulation and breakdown of energy-producing organelles, including chloroplasts and mitochondria.

    2. Reviewer #1 (Public review):

      Summary:

      The authors demonstrated that carbon depletion triggers the autophagy-dependent formation of Rubisco Containing Bodies, which contain chloroplast stroma material, but exclude thylakoids. The authors show that RCBs bud directly from the main body of chloroplasts rather than from stromules and that their formation is not dependent on the chloroplast fission factor DRP5. The authors also observed a transient engulfment of the RBCs by the tonoplast during delivery to the vacuolar lumen.

      Strengths:

      The authors demonstrate that autophagy-related protein 8 (ATG8) co-localizes to the chloroplast demarking the place for RCB budding. The authors provide good-quality time-lapse images and co-localization of the markers corroborating previous observations that RCBs contain only stroma material and do not include thylakoid. The text is very well written and easy to follow.

      Weaknesses:

      The study adds more valuable descriptive information about the previously published phenomenon of RCB formation under carbon starvation but does not reveal the putative mechanisms governing formation of RCBs and their release to the vacuole.

      Comments on revised version:

      The authors have done an impressive job revising the manuscript and addressed my comments. The authors clarified previous ambiguities and the new version of the manuscript greatly benefits from the provided quantifications and adjusted discussion.

    3. Reviewer #2 (Public review):

      This manuscript proposed a new link between the formation of chloroplast budding vesicles (Rubisco-containing bodies [RCBs]) and the development of chloroplast-associated autophagosomes. The authors' previous work demonstrated two types of autophagy pathways involved in chloroplast degradation, including piecemeal degradation of partial chloroplast and whole chloroplast degradation. However, the mechanisms underlying piecemeal degradation are largely unknown, particularly regarding the initiation and release of the budding structures. Here, the authors investigated the progression of piecemeal-type chloroplast trafficking by visualizing it with a high-resolution time-lapse microscope. They provide evidence that autophagosome formation is required for the initiation of chloroplast budding, and that stromule formation is not correlated with this process. In addition, the authors also demonstrated that the release of chloroplast-associated autophagosome is independent of a chloroplast division factor, DRP5b.

      Overall, the findings are interesting, and in general, the experiments are very well executed.

      Comments on revised version:

      The authors have generally addressed all of my concerns (and the other reviewer's) and adapted the manuscript where necessary. The revised version has significantly improved the manuscript. From my perspective there are no further concerns.

    4. Reviewer #3 (Public review):

      Summary:

      Regulated chloroplast breakdown allows plants to modulate these energy-producing organelles, for example during leaf aging, or during changing light conditions. This manuscript investigates how chloroplasts are broken down during light-limiting conditions.

      The authors present very nice time lapse imaging of multiple proteins as buds form on the surface of chloroplasts and pinch away, then associate with the vacuole. They use mutant analysis and autophagy markers to demonstrate that this process requires the ATG machinery, but not dynamin-related proteins that are required for chloroplast division. The manuscript concludes with discussion of an internally-consistent model that summarizes the results.

      Strengths:

      The main strength of the manuscript is the high-quality microscopy data. The authors use multiple markers and high-resolution timelapse imaging to track chloroplast dynamics under light limiting conditions.

      Weaknesses:

      The main weakness of the manuscript is the limited quantitative data. While it can be challenging to quantify dynamic intracellular events, quantification of these processes is important to appreciate the significance of these findings.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      The authors demonstrated that carbon depletion triggers the autophagy-dependent formation of Rubisco Containing Bodies, which contain chloroplast stroma material, but exclude thylakoids. The authors show that RCBs bud directly from the main body of chloroplasts rather than from stromules and that their formation is not dependent on the chloroplast fission factor DRP5. The authors also observed a transient engulfment of the RBCs by the tonoplast during delivery to the vacuolar lumen.

      Strengths: 

      The authors demonstrate that autophagy-related protein 8 (ATG8) co-localizes to the chloroplast demarking the place for RCB budding. The authors provide good-quality time-lapse images and co-localization of the markers corroborating previous observations that RCBs contain only stroma material and do not include thylakoid. The text is very well written and easy to follow. 

      Weaknesses: 

      A significant portion of the results presented in the study comes across as a corroboration of the previous findings made under different stress conditions: autophagy-dependent formation of RCBs was reported by Ishida et all in 2009. Furthermore, some included results are not of particular relevance to the study's aim. For example, it is unclear what is the importance of the role of SA in the formation of stromules, which do not serve as an origin for the RCBs. Similarly, the significance of the transient engulfment of RCBs by the tonoplast remained elusive. Although it is indeed a curious observation, previously reported for peroxisomes, its presentation should include an adequate discussion maybe suggesting the involved mechanism. Finally, some conclusions are not fully supported by the data: the suggested timing of events poorly aligns between and even within experiments mostly due to high variation and low number of replicates. Most importantly, the discussion does not place the findings of this study into the context of current knowledge on chlorophagy and does not propose the significance of the piece-meal vs complete organelle sequestration into the vacuole under used conditions, and does not dwell on the early localization of ATG8 to the future budding place on the chloroplast. 

      We performed additional experiments with biological replicates that involved quantification. The results of these experiments validate the findings of this study. We also revised the Discussion section, which now includes a discussion of the interplay between piecemeal-type and entire-organelle-type chloroplast autophagy and the relevance of autophagy adaptor and receptor proteins to the localization of ATG8 on the chloroplast surface. Accordingly, the first subheading section in the Discussion became too long. Therefore, we divided it into two subheading sections. We believe that the revisions successfully address the weaknesses pointed out by the reviewer and enhance the importance of the current study. Below is a detailed description of the improvements made to our manuscript in response to the reviewer comments.

      Reviewer #1 (Recommendations For The Authors): 

      It would be great if the authors kindly used numbered lines to facilitate the review process. 

      We have added line numbers to the text of the revised version of the manuscript.  

      The authors use the words "budding", "protrusion" and "stromule formation" interchangeably in some parts of the text. For the sake of clarity, it would be best to be consistent in the terminology and possibly elaborate on the exact differences between these structure types and the criteria by which they were identified. 

      We have checked all of the text and improved the consistency of the terminology. An important finding of this study is that chloroplasts form budding structures at the site associated with ATG8. These structures then divide to become a type of autophagic cargo termed a Rubiscocontaining body. We therefore mainly use the terms “bud” and “budding” throughout the text. In the experiments shown in Figure 5, we considered the possibility that chloroplast protrusions accumulate in leaves of atg mutants and do not divide because the mutants cannot create autophagosomes. Therefore, the word “protrusion” was used to describe the results shown in Figure 5 in which the proportion of chloroplasts forming protrusions was scored. In the revised text, the word “protrusion” is only used in descriptions of Figure 5. Previous reports define stromules as thin, tubular, extended structures (less than 1 µm in diameter) of the plastid stroma (Hanson and Sattarzadeh, 2011; Brunkard et al., 2015). In the revised text, the word “stromules” is used to describe the structures defined in these previous reports. We have added definitions of each term to the Introduction, Methods and Results sections where appropriate (lines 57–58, 160–162, 247–249, 313–316, 655–658, 668–670).      

      Pages 3-4: the authors observed budding of the chloroplasts within a few minutes - it would be helpful to specify that time was probably counted from the first observation of budding, not from the start of the dark treatment, and also specify the exact treatment duration for each of the experiments. 

      The time scales in the figures do not represent the time from the start of the dark treatment. Instead, they describe the duration from the start of the time-lapse videos that were used to generate the still images. Therefore, the indicated time scales are almost the same as the duration from the start of the observations of each target structure (chloroplast buds or GFPATG8a-labeled structures). As described in the Methods section, leaves were incubated in darkness for 5 to 24 h to induce sugar starvation. Such sugar-starved leaves were subjected to live-cell monitoring for the target structures. Since Arabidopsis leaves accumulate starch as a stored sugar source (Smith and Stitt, 2007; Usadel et al., 2008), dark treatment lasting several minutes is not sufficient for the starch to be consumed and sugar starvation to be induced.   To avoid confusion, we have added definitions of the time scales to the legends of figures containing the results of time-lapse imaging. We have also specified the durations of dark treatments used to obtain the respective results in the legends. 

      Figure 6: the time scale for complete autophagosome formation is in the range of 100-120 sec, how do these results align with the results shown in Figures 3B and C, where complete autophagosomes are suggested to be released into the vacuole after 73.8 sec. Furthermore, another structure is suggested to be formed within 50 sec. Such experiments possibly require a large number of replicates to estimate representative timing. 

      As mentioned in the previous response, the time scales in still frames represent the duration from the start of the corresponding video. Leaves incubated in darkness for 5 to 24 h were subjected to live-cell imaging. When we identified the target structures, e.g., GFP-ATG8alabeled structures on the surfaces of chloroplasts (Figure 6) or chloroplast budding structures (Figure 3), we began to track these structures. Therefore, the time scales in the figures do not align to a common time axis. We revised the descriptions about Figure 3 and Figure 6 in the Results section to clearly explain that the time points in each experiment merely indicates the time of one observation.

      The authors might want to consider using arrows to indicate structures of interest in all movies and figures.

      We have added arrows to indicate the structures of interest in the starting frames of all videos. We hesitate to add arrows to highlight RCBs accumulating in the vacuole (Figure 1-figure supplement 1, Figure 5 and Figure 8) and stromules (Figure 7) because many arrows would be required, which would obscure large portions of the images. We believe that the images without arrows clearly represent the appearance of RCBs or stromules and that their quantification (Figure 1-figure supplement 1C, Figure 5B, Figure 5-figure supplement 1B, Figure 7B, 7D, 7F, and Figure 8B) well supports the results.   

      Figure 7 Supplement 1: do the authors detect complete chloroplasts in the vacuole of atg7 and sid2/atg7? 

      We did not observe the vacuolar transport of whole chloroplasts in atg7 or atg7 sid2 plants under our experimental conditions. The figure below (Figure 1 for Response to reviewers) shows images of mesophyll cells from a leaf (third rosette leaf of a 20-d-old plant) of atg7 accumulating chloroplast stroma–targeted GFP (CT-GFP); this is from the previous version of Figure 7–figure supplement 1. Indeed, some GFP bodies exhibiting strong stromal GFP (CTGFP) signals appeared in the central area of the cell (arrowheads in A). However, such bodies were chloroplasts in epidermal cells. The 3D images (B) and cross-section image (x to z axis) of the region highlighted by the blue dotted line (C) indicate that such GFP bodies are the edges of chloroplasts that localize on the abaxial side of the observed region. Because CT-GFP expression was driven by the 35S promoter, strong GFP signals appeared in chloroplasts in epidermal cells in addition to chloroplasts in mesophyll cells. Previous studies using the same transgenic lines also showed that chloroplasts in epidermal cells exhibit strong GFP signals (Kohler et al., 1997; Caplan et al., 2015; Lee et al., 2023). RBCS-mRFP or GFP driven by the RBCS2B promoter do not label the chloroplasts in epidermal cells (new Figure 7-figure supplement 1). Additionally, because the borders between the mesophyll cell layer and the epidermal cell layer are not even, chloroplasts in epidermal cells are sometimes visible during observations of mesophyll cells. Such detection more frequently occurs during the acquisition of z-stack images. This point was more precisely demonstrated in our previous study with the aid of Calcofluor white staining of cell walls (Nakamura et al., 2018). Please see Supplemental Figure S3 in our previous report. To avoid any misunderstanding, we replaced the image of the leaf from atg7 in the revised figure, which is now Figure 7-figure supplement 2, with an image of another region to more precisely visualize mesophyll cells in this plant line.

      Author response image 1.

      Mesophyll cells in a leaf of atg7 accumulating stromal CT-GFP, reconstructed from the data shown in the previous version of Figure 7–figure supplement 1. (A) Individual channel images (CT-GFP and chlorophyll) from the merged orthogonal projection image shown in the previous version of Figure 7–figure supplement 1. The right panel shows the enhanced chlorophyll signal to clearly visualize the chloroplasts in epidermal cells. Green, CTGFP; magenta, chlorophyll fluorescence. Scale bar, 20 µm. (B) 3D structure of the merged image shown in (A). (C) Images of the cross section indicated by the blue dotted line (a to b) in B. Arrowheads indicate the edges of chloroplasts in epidermal cells.

      Figure 8: it would be interesting to hear the authors' opinion on why they observed a significant increase in RCBs number in the drp5b mutant background

      We have added a discussion of this issue to the revised manuscript (lines 445–459). We now have two hypotheses to explain this issue. One hypothesis is that the impaired chloroplast division due to the drp5b mutation reduces energy availability and thus activates chloroplast autophagy. The other hypothesis is that the drp5b mutation impairs the type of chlorophagy that degrades whole chloroplasts, and thus piecemeal-type chloroplast autophagy via Rubiscocontaining bodies is activated. However, we do not have any experimental evidence supporting either hypothesis.  

      Reviewer #2 (Public Review): 

      This manuscript proposed a new link between the formation of chloroplast budding vesicles (Rubisco-containing bodies [RCBs]) and the development of chloroplast-associated autophagosomes. The authors' previous work demonstrated two types of autophagy pathways involved in chloroplast degradation, including piecemeal degradation of partial chloroplast and whole chloroplast degradation. However, the mechanisms underlying piecemeal degradation are largely unknown, particularly regarding the initiation and release of the budding structures. Here, the authors investigated the progression of piecemeal-type chloroplast trafficking by visualizing it with a high-resolution time-lapse microscope. They provide evidence that autophagosome formation is required for the initiation of chloroplast budding, and that stromule formation is not correlated with this process. In addition, the authors also demonstrated that the release of chloroplast-associated autophagosome is independent of a chloroplast division factor, DRP5b. 

      Overall, the findings are interesting, and in general, the experiments are very well executed. Although the mechanism of how Rubisco-containing bodies are processed is still unclear, this study suggests that a novel chloroplast division machinery exists to facilitate chloroplast autophagy, which will be valuable to investigate in the future. 

      Reviewer #2 (Recommendations For The Authors): 

      Below are some specific comments. 

      (1) In Supplement Figure 1B, there is no chloroplast stromule in RBCS-mRFP x atg7-2 plants under dark treatment with ConA, but in Figure 7A, there are stromules in CT-GFP x atg7-2 plants. How to explain such a discrepancy? Did the authors check the chloroplast morphology of RBCS-mRFP x atg7-2 plants in different developmental stages? Will it behave the same as CT-GFP x atg7-2 under the same condition as in Figure 7A?

      As described in the text, the ages and conditions of the leaves shown in Figure 1–figure supplement 1 and Figure 7 are different. In Figure 1–figure supplement 1, second rosette leaves from 21-d-old plants were incubated in the dark with concanamycin A for 1 d. In Figure 7E and 7F, we explored the condition under which mesophyll chloroplasts in atg leaves actively form stromules to assess how a deficiency in autophagy is related to stromule formation. We found that late senescing leaves (third rosette leaves from 36-d-old plants) of atg5 and atg7 plants accumulated many stromules without additional treatment (Figure 7). It is not surprising that the chloroplast morphologies shown in Figures 1 and 7 are different because the leaf ages and conditions are largely different.

      However, we agree that the differences in chloroplast stroma–targeted GFP and RBCS-mRFP might influence the visualization of stromules. For instance, fluorescent protein– labeled RBCS proteins are incorporated into the Rubisco holoenzyme, comprising eight RBCS and eight RBCL proteins (Ishida et al., 2008; Ono et al., 2013). Such a large protein complex might not accumulate in stromules. Therefore, we examined the chloroplast morphology in late senescing leaves (third rosette leaves from 36-d-old plants) from WT, atg5, and atg7 plants harboring ProRBCS:RBCS-mRFP, as you suggested. Mesophyll chloroplasts formed many stromules in atg5 and atg7 leaves but not in WT leaves (Figure 7–figure supplement 1). These results indicate that RBCS-mRFP can be used to visualize stromules and that the differences in chloroplast morphology between Figure 1-figure supplement 1 and Figure 7 cannot be attributed to the different marker proteins used. A previous study also indicated that Rubisco is present in plastid stromules (Kwok and Hanson, 2004).

      (2) In Figure 2, the author showed that the outer envelope marker Toc64 was colocalized with chloroplast buds. How about proteins in the inner envelope membrane of chloroplasts? 

      We generated Arabidopsis plants expressing red fluorescent protein–tagged K+ EFFLUX ANTIPORTER 1 (KEA1), a chloroplast inner envelope membrane protein (Kunz et al., 2014; Boelter et al., 2020). We found that the chloroplast buds visualized by RBCS-GFP were also marked by KEA1-mRFP (Figure 2–figure supplement 1B). We observed the transport of such buds (Figure 2–figure supplement 2). These results strengthen our claim that autophagy degrades chloroplast stroma and envelope components as a type of specific cargo termed a Rubisco-containing body. The descriptions about this additional experiment are in lines 181– 187. 

      (3) In Figure 3, how many RCBs were tracked for the trafficking analysis to raise the conclusion that the vesicle was released into the vacuole around 73.8s? 

      We apologize for our confusing explanation in the previous version of the manuscript. The time point “73.8 s” merely indicates the time of one observation, as shown in Figure 3. This time does not represent the common timing of vacuolar release of a Rubisco-containing body. As we explained in the response to the comments from reviewer 1, we subjected leaves that were incubated in the dark for several hours to live-cell imaging assays to observe chloroplast morphology in sugar-starved leaves. The time scales of each still frame represent the time from the start of the corresponding video. Therefore, the time points in the respective figures do not align to a common time axis, and the number “73.8 s” is not important. We attempted to emphasize that the type of movement of Rubisco-containing bodies changes during their tracking shown in Figure 3. Based on this finding, we hypothesized that the Rubisco-containing bodies are released into the vacuolar lumen when they initiate random movement. Therefore, we expected that the interaction between the Rubisco-containing bodies and the vacuolar membrane could be captured, and we therefore turned our attention to the dynamics of the vacuolar membrane in subsequent experiments. Accordingly, our observations of the vacuolar membrane allowed us to visualize the release of the Rubisco-containing body into the vacuole (Figure 4). We rephrased these sentences (lines 212–219) to avoid confusion and to explain this idea accurately. We also performed tracking experiments of Rubisco-containing bodies to strengthen the finding that the type of movement of the bodies changes during tracking (Figure 3-figure supplement 1, Videos 8 and 9).

      (4) I do believe the conclusion that vacuolar membranes incorporate RCBs into the vacuole in Figure 4. However, it will be more convincing if images of higher quality are provided. 

      We tried to acquire images that more clearly show the morphology of the vacuolar membrane during the incorporation of the Rubisco-containing body. We obtained the images in Figure 4A using a standard type of confocal microscope, the LSM 800 (Carl Zeiss), and obtained the images in Figure 4B using the Airyscan Fast acquisition mode, a hyper-resolution microscope mode, in the LSM 880 system (Carl Zeiss). We performed additional experiments with another type of confocal microscope, the SP8 (Leica; Figure 4-figure supplement 1A to 1C, Videos 12– 14). The quality of the images from these experiments was as high as possible under the experimental conditions (equipment and plant materials). In general, increasing the image resolution during time-lapse imaging with a confocal microscope requires reducing the time resolution. However, the transport of a Rubisco-containing body occurs relatively quickly: Its engulfment by the vacuolar membrane takes place for just a few seconds (Figure 4, Figure 4figure supplement 1). We could therefore not reduce the time resolution further to better capture the morphology of the vacuolar membrane.

      (5) In Figure 7G, the authors concluded that SA and ROS might be the cause of the extensive formation of stromules. How about the H2O2 level in NahG and atg5 NahG plants? Compared with sid2, NahG appeared to completely inhibit stromule formation in atg5. Will this be related to ROS levels?

      We measured the hydrogen peroxide (H2O2) contents in NahG atg5 plants and atg5 single mutant plants and found that their leaves accumulate more H2O2 than those of wild-type or NahG plants (Figure 7-figure supplement 3). Since we have only maintained fresh seeds of NahG atg5 plants harboring the 35S promoter–driven chloroplast stroma–targeted GFP (Pro35S:CT-GFP) construct, we first confirmed that CT-GFP accumulation does not affect the measurement of H2O2 content. H2O2 levels were similar between wild-type leaves and CT-GFPexpressing leaves. A comparison among Pro35S:CT-GFP expressing lines in the wild-type, atg5, NahG, and NahG atg5 backgrounds revealed enhanced accumulation of H2O2 in the atg5 and NahG atg5 genotypes compared with the wild-type and NahG genotypes. This finding is consistent with the results of histological staining of H2O2 using 3,3′-diaminobenzidine (DAB) in a previous study (Yoshimoto et al., 2009).   

      It is unclear why NahG expression inhibited stromule formation more strongly than the sid2 mutation in the atg5 mutant background, as you pointed out (Figure 7A–D). NahG catabolizes salicylic acid (SA), whereas sid2 mutants are knockout mutants of ISOCHORISMATE SYNTHASE1 (ICS1), a gene required for SA biosynthesis. Plants have two metabolic routes for SA biosynthesis: The isochorismate synthase (ICS) pathway and the phenylalanine ammonia-lyase (PAL) pathway. Furthermore, Arabidopsis plants contain two ICS homologs: ICS1 and ICS2. Previous studies have revealed that ICS1 (SID2) is the main player for SA biosynthesis in response to pathogen infection (Delaney et al., 1994). Another study revealed drastically lower SA contents in the leaves of both sid2 single mutants and NahGexpressing plants compared with those of wild-type plants (Abreu and Munné-Bosch, 2009). Therefore, it is clear that the sid2 single mutation sufficiently inhibits SA accumulation in Arabidopsis leaves. However, low levels of SA biosynthesis through ICS1-independent routes might influence stromule formation in leaves of sid2 atg5 and sid2 atg7. Because a previous study demonstrated that the sid2 single mutation sufficiently suppresses the SA hyperaccumulation–related phenotypes of atg plants (Yoshimoto et al., 2009), we believe that the use of the sid2 mutation was adequate to assess the effects of SA on stromule formation that actively occurs in the atg plants examined in this study.    

      (6) In Supplement Figure 7, I have noticed that there are still some CT-GFP signals (green dots) in the vacuoles of the atg7 mutant, are they RCBs? If so, how can this phenomenon be explained? 

      As we explained in the response to the comment from Reviewer 1, CT-GFP-labeled bodies are chloroplasts in the epidermal cell layer. Please see our response to Reviewer 1’s comment about Figure 7 and the associated figure (Figure 1 for Response to reviewers). The CT-GFP-labeled dots (arrowheads) are the edges of chloroplasts and localize on the abaxial side of the observed region. The dots have faint chlorophyll signals. This phenomenon is much more clear in the image with enhanced brightness (right panel in A). Since the bodies are merely the edges of epidermal chloroplasts, their chlorophyl signals are faint. Therefore, these bodies are not Rubisco-containing bodies but are instead simply the edges of chloroplasts in the epidermal cell layer. 

      (7) On page 24, the second paragraph, lines 12-14, the authors claim that no receptors similar to those involved in mitophagy that bind to LC3 (ATG8) have been established in chloroplasts. Actually, it has been reported that a homologue of mitophagy receptor, NBR1, acts as an autophagy receptor to regulate chloroplast protein degradation (Lee et al, 2023, Elife; Wan et al, 2023, EMBO Journal). Although I do think NBR1 is not involved in RCBs based on these reports, these findings should be discussed here. 

      Thank you for this good suggestion. We have added a discussion about this important point to the Discussion section, along with the relevant citations (lines 482–502).

      (8) In the figure legend, the details of the experiments will be better provided, such as leaves stages (Figure 1, Figure 5...), the number of chloroplasts analyzed (Figure 7...). This can help the readers to follow. 

      Thank you for highlighting this. We have checked all of the figure legends and added descriptions of the leaf stages and experimental conditions.  

      Reviewer #3 (Public Review):

      Summary: 

      Regulated chloroplast breakdown allows plants to modulate these energy-producing organelles, for example during leaf aging, or during changing light conditions. This manuscript investigates how chloroplasts are broken down during light-limiting conditions. 

      The authors present very nice time-lapse imaging of multiple proteins as buds form on the surface of chloroplasts and pinch away, then associate with the vacuole. They use mutant analysis and autophagy markers to demonstrate that this process requires the ATG machinery, but not dynamin-related proteins that are required for chloroplast division. The manuscript concludes with a discussion of an internally-consistent model that summarizes the results. 

      Strengths: 

      The main strength of the manuscript is the high-quality microscopy data. The authors use multiple markers and high-resolution time-lapse imaging to track chloroplast dynamics under light-limiting conditions. 

      Weaknesses: 

      The main weakness of the manuscript is the lack of quantitative data. Quantification of multiple events is required to support the authors' claims, for example, claims about which parts of the plastid bud, about the dynamics of the events, about the colocalization between ATG8 and the plastid stroma buds, and the dynamics of this association. Without understanding how often these events occur and how frequently events follow the manner observed by the authors (in the 1 or 2 examples presented in each figure) it is difficult to appreciate the significance of these findings. 

      We have performed several additional experiments, including the quantification of multiple chloroplast buds or GFP-ATG8-labeled structures from individual plants. The results strengthen our claims and thus improve the significance of the current study. Please see the responses below for details.

      Reviewer #3 (Recommendations For The Authors):

      Overall, the live-cell imaging in this paper is high quality and rigorously conducted. However, without quantification of these events, it is difficult to judge whether this is an occasional contributor to plastid breakdown, or the primary mechanism for this process. 

      - For Figure 1, the authors could estimate the importance of this mechanism for chloroplast breakdown by calculating the volume change in chloroplasts over time during light-limiting conditions, then comparing this to the volume of the puncta that bud off of plastids and the frequency of these events. That is, what percentage of chloroplast volume loss can be accounted for by puncta that bud from chloroplasts? Are there likely other mechanisms contributing to chloroplast breakdown, or is this the primary mechanism? 

      We measured the volumes of chloroplast stroma when the leaves from wild-type (WT) and atg7 plants accumulating RBCS-mRFP were subjected to extended darkness for 1 d (Figure 1-figure supplement 2). The volume of the chloroplast stroma in dark-treated leaves of WT plants was 70% that in leaves before treatment, whereas the volume of the chloroplast stroma in darktreated atg7 leaves was 86% that in leaves before treatment. The transport of Rubiscocontaining bodies into the vacuole did not occur in atg7 leaves (Figure 1-figure supplement 1). These results suggest that the release of chloroplast buds as Rubisco-containing bodies contributes to the decrease in chloroplast stroma volume during dark treatment. These results also suggest that autophagy-independent systems contribute to the decrease in chloroplast volume. It is difficult to monitor the volume or frequency of budding off of puncta from chloroplasts during dark treatment because the budding and transport of the puncta occur relatively quickly and are completed within minutes, and the puncta frequently move away from the plane of focus. Additionally, continuous monitoring of chloroplast morphology over the dark treatment period requires the long-term exposure of leaves to repeated laser excitation, and such treatment might cause unexpected stress. We believe that the evaluation of chloroplast stroma volume after 1 d of dark treatment is important for estimating the contribution of the mechanism described in this study. The descriptions about this additional experiment are in lines 163–174. 

      - The claim that structures budding from the plastid "specifically contains stroma material...without any chlorophyll signal" (p. 6 and Figure 2) should be supported by quantitative analysis of many such buds in multiple cells from multiple independent plants. 

      We performed additional experiments (Figure 2-figure supplement 1) to measure the fluorescence intensity ratios of the stroma marker RBCS-GFP and chlorophyll between chloroplast budding structures and their neighboring chloroplasts in Arabidopsis plants expressing the stromal marker RBCS-GFP along with TOC64-mRFP (a chloroplast outer envelope membrane protein), KEA1-mRFP (a chloroplast inner envelope membrane protein), or ATPC1-tagRFP (a thylakoid membrane protein). The results indicated that chloroplast buds contain chloroplast stroma without chlorophyll signals. The descriptions of this experiment are in lines 175–199. In these experiments, we observed 30 to 33 chloroplast buds from eight individual plants.  

      - Claims about the dynamics of these events in Figures 2 & 3 should be supported by quantitative analysis of many buds in multiple cells from multiple independent plants and appropriate summary statistics (e.g. mean, standard deviation), and claims about the coordination of events should be supported by statistical comparison of these measurements between different markers. 

      As mentioned in the response to the above comments, quantification of fluorescent intensities (Figure 2-figure supplement 1) revealed that the chloroplast budding structures produced TOC64-mRFP and KEA1-mRFP signals without ATPC1-tagRFP signal. These results support the claim that chloroplast buds contain chloroplast stroma and envelope components without thylakoid membranes. 

      It is not easy to quantify the dynamics of chloroplast buds since the puncta sometimes move away from the plane of focus. We therefore added data from individual time-lapse observations showing that the type of movement exhibited by the puncta changes during tracking (Figure 3-figure supplement 1A and 1B, Videos 8 and 9) to strengthen the notion that such a phenomenon was observed repeatedly. 

      - Data in Figure 4 should be supported by quantification of the proportion of plastid-derived puncta that end up inside the vacuole (compared to those that do not) in multiple cells from multiple independent plants. 

      Although we performed additional observations of the destinations of chloroplast-derived puncta, we encountered some difficulty in correctly calculating the proportion of plastid-derived puncta that ended up inside the vacuole. This problem is similar to the difficulty in tracking Rubisco-containing bodies mentioned in the response to the previous comments. During timelapse imaging, puncta sometimes move from the plane of focus toward the deeper side (abaxial side) or near side (adaxial side), causing us to lose track of a number of puncta. Therefore, we could not determine the destinations of all puncta to calculate the proportion of puncta that ended up in the vacuolar lumen.

      Alternatively, we added the results of three experiments (Figure 4-figure supplement 1, Videos 12–14) examining how the vacuolar membrane engulfs the chloroplast-derived puncta to incorporate them inside the vacuole. The data support the notion that such a phenomenon occurs repeatedly in sugar-starved leaves. All results were obtained from individual plants. 

      - Data in Figure 6 should also be supported by quantitative analysis of many buds in multiple cells from multiple independent plants, to determine whether ATG8 associates with all RBCScontaining buds, and vice versa. 

      To address this issue, we performed additional experiments on plants expressing GFP-ATG8a and RBCS-mRFP (Figure 6-figure supplements 3 and 4). First, we observed 58 chloroplast buds from eight individual plants and evaluated the proportion of GFP-ATG8a-labeled chloroplast buds. We determined that 64% of chloroplast buds were at least autophagy-associated structures (Figure 6-figure supplement 3A–3C). This result also suggests that chloroplasts can form autophagy-independent budding structures, which might be associated with stromule-related structures or the autophagy-independent vesiculation machinery. We also evaluated the number of GFP-ATG8a-labeled chloroplast buds (Figure 6-figure supplement 3D and 3E). The formation of such structures increased in response to dark treatment (Figure 6-figure supplement 3D), but they did not appear in atg7 plants exposed to the dark (Figure 6-figure supplement 3E). These results support the notion that the formation of chloroplast buds to be released as Rubisco-containing bodies requires the core ATG machinery. 

      Furthermore, we observed 157 GFP-ATG8a-labeled structures from thirteen individual plants and evaluated the proportion of chloroplast-associated isolation membranes (Figure 6-figure supplement 4). We also classified the chloroplast-associated, GFP-ATG8alabeled structures into two categories: the chloroplast surface type (Figure 7-figure supplement 4A) and the chloroplast bud type (Figure 7-figure supplement 4B). This experiment suggested that 43% of the isolation membranes labeled by GFP-ATG8a were involved in chloroplast degradation during an early phase of sugar starvation (extended darkness for 5 to 9 h from the end of night) in mesophyll cells. We believe that these results indicate that autophagy contributes substantially to chloroplast degradation via the morphological changes observed in this study.  The descriptions about these experiments are in lines 284–300 in the Results section and in lines 426–444 in the Discussion section. 

      - Which parts of the plastid bud (Fig 2), about the dynamics of the events (Fig 3), about the colocalization between ATG8 and the plastid stroma buds, and the dynamics of this association (Fig 6). 

      We performed multiple quantitative studies to address the issues listed above. We believe that these additional experiments strengthened our findings.

      - I suggest that the authors avoid using the term "vesicles" to describe the plastid-derived puncta, since it doesn't seem like coat proteins are required for their formation. I suggest "puncta" or similar terms. 

      We replaced the term “vesicles” with “puncta” or other suitable terms, as suggested.

      References for response to reviewers

      Abreu ME, Munné-Bosch S (2009) Salicylic acid deficiency in transgenic lines and mutants increases seed yield in the annual plant. J Exp Bot 60: 1261-1271.

      Boelter B, Mitterreiter MJ, Schwenkert S, Finkemeier I, Kunz HH (2020) The topology of plastid inner envelope potassium cation efflux antiporter KEA1 provides new insights into its regulatory features. Photosynth Res 145: 43-54.

      Brunkard JO, Runkel AM, Zambryski PC (2015) Chloroplasts extend stromules independently and in response to internal redox signals. Proc Natl Acad Sci U S A 112: 10044-10049.

      Caplan JL, Kumar AS, Park E, Padmanabhan MS, Hoban K, Modla S, Czymmek K, Dinesh-Kumar SP (2015) Chloroplast stromules function during innate immunity. Dev Cell 34: 45-57.

      Delaney TP, Uknes S, Vernooij B, Friedrich L, Weymann K, Negrotto D, Gaffney T, Gutrella M, Kessmann H, Ward E, Ryals J (1994) A Central Role of Salicylic-Acid in Plant-Disease Resistance. Science 266: 1247-1250.

      Hanson MR, Sattarzadeh A (2011) Stromules: Recent Insights into a Long Neglected Feature of Plastid Morphology and Function. Plant Physiol 155: 1486-1492.

      Ishida H, Yoshimoto K, Izumi M, Reisen D, Yano Y, Makino A, Ohsumi Y, Hanson MR, Mae T (2008) Mobilization of rubisco and stroma-localized fluorescent proteins of chloroplasts to the vacuole by an ATG gene-dependent autophagic process. Plant Physiol 148: 142-155.

      Kohler RH, Cao J, Zipfel WR, Webb WW, Hanson MR (1997) Exchange of protein molecules through connections between higher plant plastids. Science 276: 2039-2042.

      Kunz HH, Gierth M, Herdean A, Satoh-Cruz M, Kramer DM, Spetea C, Schroeder JI (2014) Plastidial transporters KEA1, -2, and -3 are essential for chloroplast osmoregulation, integrity, and pH regulation in. Proc Natl Acad Sci U S A 111: 74807485.

      Lee HN, Chacko JV, Solis AG, Chen KE, Barros JA, Signorelli S, Millar AH, Vierstra RD, Eliceiri KW, Otegui MS, Benitez-Alfonso Y (2023) The autophagy receptor NBR1 directs the clearance of photodamaged chloroplasts. Elife 12: e86030.

      Ono Y, Wada S, Izumi M, Makino A, Ishida H (2013) Evidence for contribution of autophagy to rubisco degradation during leaf senescence in Arabidopsis thaliana. Plant Cell Environ 36: 1147-1159.

      Smith AM, Stitt M (2007) Coordination of carbon supply and plant growth. Plant Cell Environ 30: 1126-1149.

      Usadel B, Blasing OE, Gibon Y, Retzlaff K, Hoehne M, Gunther M, Stitt M (2008) Global transcript levels respond to small changes of the carbon status during progressive exhaustion of carbohydrates in Arabidopsis rosettes. Plant Physiol 146: 1834-1861.

      Yoshimoto K, Jikumaru Y, Kamiya Y, Kusano M, Consonni C, Panstruga R, Ohsumi Y, Shirasu K (2009) Autophagy negatively regulates cell death by controlling NPR1dependent salicylic acid signaling during senescence and the innate immune response in Arabidopsis. Plant Cell 21: 2914-2927.

    1. eLife assessment

      This study provides valuable insights into the role of actin dynamics in regulating the transition of fusion models during homotypic fusion between late endosomes. The evidence supporting the authors' claims is convincing. However, while the observations are significant, the study could benefit from further exploration of the mechanistic details and physiological relevance.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript employs yolk sac visceral endoderm cells as a novel model for studying endosomal fusion, observing two distinct fusion behaviors: quick homotypic fusion between late endosomes, and slower heterotypic fusion between late endosomes and lysosomes. The mathematical modeling suggests that vesicle size critically influences the mode of fusion. Further investigations reveal that actin filaments are dynamically associated with late endosomal membranes, and are oriented in the x-y plane and along the apical-basal axis. Actin and Arf2/3 were shown to appear at the rear end of the endosomes along the moving direction suggesting polymerization of actin may provide force for the movement of endosomes. Additionally, the authors found that actin dynamics regulate homotypic and heterotypic fusion events in a different manner. The authors also provide evidence suggesting that Cofilin-dependent actin dynamics are involved in late endosome fusion.

      Strengths:

      The unique feature of this study is that the authors use yolk sac visceral endoderm cells to study endosomal fusion. Yolk sac visceral endoderm cells have huge endocytic vesicles, endosomes and lysosomes, offering an excellent system to explore endosomal fusion dynamics and the assembly of cellular factors on membranes. The manuscript provides a valuable and convincing observation of the modes of endosomal fusion and roles of actin dynamics in this process, and the conclusions of the study is justified by the data.

      Weaknesses:

      While the study offers compelling observations, it falls short in delivering clear mechanistic insights. Key questions remain unaddressed, such as the functional significance of actin filaments that extend apically in positioning late endosomes, the ways in which actin dynamics influence fusion events, and the functional implications of the slower bridge fusion process.

    3. Reviewer #3 (Public review):

      Summary:

      The authors found two endosomal fusion modes by live cell imaging of endosomes in yolk sac lateral endoderm cells of 8.5-day-old embryonic mice and described the fusion modes by mathematical models and simulations. They also showed that actin polymerization is involved in the regulation of one of the fusion modes.

      Strengths:

      The strength of this study is that the authors' claims are well supported by beautiful live cell images and theoretical models. By using specialized cells, yolk sac visceral endoderm cells, the live images of endosomal fusion, localization of actin-related molecules, and validation data from multiple inhibitor experiments are clear.

      Weaknesses:

      Although it would be out of scope of this study, there is no experimental verification of whether the mechanism of endosome fusion claimed by the authors occurs in general cells, so the article is limited to showing a phenomenon specific to yolk sac lateral endoderm cells. The methods used were very basic and solid. Most of the image analysis was performed manually, but the results were statistically tested.

      Summary:

      Seiichi Koike et al. studied two fusion models, explosive fusion, and bridge fusion, utilizing yolk sac visceral endoderm cells. They elucidated these two fusion models in vivo by employing mathematical modeling and incorporating fluctuations derived from actin dynamics as a key regulator for rapid homotypic fusion between late endosomes.

      Strengths:

      This study uncovered the role of actin dynamics in regulating the transition of fusion models in homotypic fusion between late endosomes and introduced a method for observing the fusion of single vesicles with two different targets.

      Weaknesses:

      The physiological significance of different fusion models is lacking.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      While the manuscript provides an interesting observation of the modes of endosomal fusion and roles of actin dynamics in this process and the conclusions of the study are justified by the data, there are concerns regarding the lack of important descriptions or quantification in some of the analyses and additional analyses are needed to strengthen this study. The major issues are outlined below:

      (1) The authors indicate that Zone 1 is within approximately 1 μm of the apical surface. What are the distances of Zone 2 and Zone 3 from this surface? It would be better if the authors could provide an explanation or hypothesis that explains the early endosomes, late endosomes, and lysosomes are not intermixed but separated along the z-axis.

      Thank you for pointing out this important issue. Following the comments, we have added an explanation about the depth of early endosomes, late endosomes, and lysosomes to the text (lines 123-124, 127-128, and 130-131). We have also created a new figure showing their positions in VE cells (Figure 1–figure supplement 1B).

      Because endosomes go deeper and mature with repeated fusion and enlargement after endocytosis, early endosomes, late endosomes, and lysosomes are aligned along the z-axis, though the separation is not complete. In confocal microscopic observation, endolysosomal vesicles in VE cells are largely separated into different layers because they are huge and occupy a large space, and as a result, do not exist with much overlap. We have added the explanation to the text (lines 121-122).

      (2) The authors compared the size distribution of the late endosomes that underwent fusion with that of the total late endosomes in the observed area 5 min after labeling (Figure 2C). A similar quantification analysis should also be analyzed 15 min after labeling (Figure 3G).

      Thank you for the appropriate request. We have added the data showing the size distribution of the late endosomes that underwent fusion at 15 min after labeling, to Figure 3G.

      (3) While 3D reconstructions of actin filament patterns under normal conditions are presented (Figures 4 E-F), comparable analyses using cells treated with Cytochalasin D, Jasplakinolide, or S3 peptide needs to be performed.

      As requested by the referee, we have performed additional experiments to show 3D reconstructions of actin filaments on late endosomes after pretreatment with cytochalasin D, jasplakinolide, and S3 peptide. We show the data in new figures: Figure 7–figure supplement 1A, Figure 7–figure supplement 2, and Figure 9–figure supplement 1.

      (4) The authors should provide a clear description of how they quantified the fusion frequency. Why does the fusion frequency appear very low? Why do Cytochalasin D and jasplakinolide show different effects on heterotypic fusion?

      Thank you for pointing out this important issue. We have added the description of how the fusion frequency was quantified to the Materials and Methods (lines 643-645). Briefly, we counted the number of membrane fusion events and the number of late endosomes in the 400-s time-lapse images, and then calculated how many times a single late endosome underwent fusion per minute. The apparent fusion frequency is low because it is expressed in terms of frequency per vesicle per minute.

      As for the different effects of cytochalasin D and jasplakinolide on heterotypic fusion, we already discussed this in the manuscript (lines 537-558). In short, actin filaments extending in the apical-to-basal direction are relatively static and late endosomes receive sliding forces along the apical-basal axis by means of myosins (e.g., myosin V and myosin II) in heterotypic fusion. Thus, depolymerization of actin filaments by cytochalasin D treatment reduces heterotypic fusion, and conversely stabilization of actin filaments by jasplakinolide increases heterotypic fusion.

      (5) The authors need to analyze the distribution of actin filaments during homotypic fusion post-Cytochalasin D treatment.

      As requested by the referee, we have performed additional experiments to show the distribution of actin filaments during homotypic fusion of late endosomes after pretreatment with cytochalasin D. We show the data in a new figure: Figure 7–figure supplement 3.

      (6) Clarification is needed on whether overexpressing YFP-Cofilin led to the deterioration of cell functions.

      Thank you for the comments. As the reviewer pointed out, overexpression of cofilin can change cellular functions and actin architectures in cells (Aizawa et al., 1997; Popow-Wozniak et al., Histochem. Cell Biol., 2012, (138) 725-36). Although we did not observe apparent morphological changes of VE cells after YFP-cofilin expression, we cannot exclude the possibility that YFP-cofilin overexpression affected the distribution of actin filaments. Therefore, we have described this possibility in the text (lines 425-426).

      (7) Although the authors report that the S3 peptide does not affect heterotypic fusion, a reduction in average heterotypic fusion frequency post-treatment was detected (Figure 9G). The authors need to perform a statistical analysis of the quantification performed in Figure 9G.

      We apologize for this misleading graph representation. Because S3 peptide treatment did not change the fusion frequency significantly, we simply did not mark statistical significance in the previous graph. To clarify this point, we have added the label “n.s.” (not significant) to Figure 9G.

      (8) The authors need to provide the potential functional significance of apically extended actin filaments in positioning late endosomes in the discussion.

      We observed 3 different types of actin filaments in the apical region of VE cells (Figure 5). First, the actin mesh in zone 1, which does not interact directly with late endosomes, may function as a barrier preventing enlarged late endosomes from flowing backward from zone 2 to zone 1. Second, actin filaments extending from the apical to the basal direction on the surface of late endosomes are necessary for the movement of late endosomes toward lysosomes in a myosin-dependent manner. Third, the radial branched filaments on the surface of late endosomes temporarily polymerize in an Arp2/3-dependent manner and regulate the lateral movement of late endosomes. This actin organization coordinately regulates the position of late endosomes. We have added this explanation to the Discussion (lines 483-491).

      Reviewer #2 (Recommendations For The Authors):

      (1) What is the effect or physiological significance of the transition in fusion models?

      In material transport in cells, explosive fusion that completes membrane fusion quickly is more efficient and physiologically advantageous than slow bridge fusion. On the other hand, larger vesicle size is more effective in membrane trafficking than smaller size because large vesicles can transport a large amount of cargo molecules. However, as our mathematical modeling predicts, an increase in vesicle size leads to bridge fusion and decreases the transportation rate. Actin forces can resolve these conflicting effects because they convert the fusion mode from bridge to explosive in late endosomes in VE cells

      (2) I am confused about how to study heterotypic fusion between late endosomes and lysosomes using only transferrin labeling.

      We are sorry for any confusion this may have caused. Indeed, at first, we discovered that late endosomes shrank and disappeared after labeling of endocytic vesicles with transferrin only (Figure 3A). However, subsequently, we speculated that this disappearance was the result of heterotypic fusion with lysosomes, and to prove this possibility, we developed a double-labeling method in which late endosomes and lysosomes were labeled with 2 different colors (Figure 3B). In short, VE cells were incubated with dextran rhodamine for 20 min and subsequently pulse-labeled with Alexa Fluor 488-labeled transferrin for 5 min: when VE cells were observed, dextran rhodamine was already transported to lysosomes, whereas Alexa Fluor 488-labeled transferrin was still present in late endosomes, enabling the two vesicles to be observed separately.

      Reviewer #3 (Recommendations For The Authors):

      (1) It is concerning that there are several points that are not fully explained regarding microscopic image analysis.

      (a) How were zones 1, 2, and 3 defined and how were the zones determined at each observation? Did the authors determine the zones subjectively based on the approximate size of the vesicles and the passage of time, or statistically by measuring endosomes from images of whole cells? The authors should describe this and also provide the approximate z-directional thickness of each of zones 1, 2, and 3.

      Thank you for pointing out this important issue, which is also raised by Reviewer #1. We initially analyzed the distribution and size of early endosomes, late endosomes, and lysosomes in VE cells by use of vesicle-specific markers (Figure 1B). Thereafter, at each observation, we determined the zones based on the characteristic size of the vesicles and time after labeling of endocytic vesicles. Especially, images of zone 2 and zone 3 were taken by focusing on the z-axis where late endosomes and lysosomes occupied the largest area in the optical slice images, respectively (lines 636-639). As for the z-directional thickness of each zone, we have added a description to the text (lines 123-124, 127-128, and 130-131) and also created a new figure showing their positions in VE cells (Figure 1–figure supplement 1A).

      (b) Regarding "vesicle size" measured from confocal microscopy images: Does "vesicle size" mean surface area or maximum cross-sectional area? In any case, the authors should describe how and what area of the vesicles was measured from the images. The mathematical model used the surface area of the vesicle as a parameter. Better to be consistent.

      Thank you for the important questions. As the reviewer pointed out, the cross-sectional area of endosomes varies depending on the focal plane. To ensure uniformity of the focal plane across different images, we took the images by focusing on the z-axis where late endosomes (zone 2) or lysosomes (zone 3) occupied the largest area in the image. In the focal plane, we measured the size of all intact, unfragmented endosomes. We have now added this explanation to the Method section (lines 636-639).

      (c) The authors showed several time-lapse imaging data without a description of what "0 s" is the starting time of. For example, "0 s" in Figures 2A, B, 3A, and B, may have different meanings. Other data should be carefully examined and described.

      We apologize for the inadequate description. As the reviewer pointed out, each panel has a different meaning of "0s."Therefore, we have added explanation of the meaning of “0s” to the relevant figure legends (Figure 2A, B; Figure 3A, B; Figure 6A, F; Figure 7A, E, F; Figure 8A, Figure 6–figure supplement 1C, Figure 7–figure supplement 1B, Figure 7–figure supplement 3, Figure 7–figure supplement 4).

      (d) The meaning of "fusion time" in Figures 2D and 3F is unclear. Although it was speculated that the authors estimated it from the change in shape of the vesicles, how it was measured should be described.

      We apologize for the inadequate description. To indicate more clearly, we have added an explanation of the "fusion time" to the legend of Figures 2D and 3F (lines 898-899 and line 923, respectively).

      (2) The structure of the paragraph starting on line 158 is inappropriate. The authors state in line 159 that "this disappearance appeared to result from fusion of late endosomes with the underlying lysosomes". However, no hetero-fusion was observed here, only the disappearance of vesicles. The authors should mention that hetero-fusion occurred only after analysis of Figure 3CD.

      This reviewer thinks it is natural to state in this order: first, the disappearance of transferrin-positive vesicles was observed (Figure 3A). Such vesicles became dextran-positive as the transferrin signal began to disappear (Figures 3 B ,C, D). Thus, this is thought to indicate that hetero-fusion has occurred.

      We agree with the reviewer's comment and have rewritten the text following the reviewer's suggestion (lines 163-165, 176-180).

      (3) The mathematical model estimated that the vesicle size of 0.22-1.0 [𝜇𝑚2] is the size to switch the fusion mode. Since this is close to the size of endosomes in general cells, the authors may be able to discuss the generality of the fusion mode theory. It is up to the author to respond to this suggestion or not.

      Thank you for the comments. As our mathematical model depends on the assumption that the osmotic pressure is constant, late endosomes in VE cells, exhibiting a swollen morphology, may have higher osmotic pressure compared with endosomes in other cells and if so, the predicted vesicle size when the fusion mode switches may differ. Thus, we have decided not to mention the relationship between the vesicle size and fusion mode switching.

      (4) In Line 302 the authors mentioned "These results indicated that actin spots on the surface of late endosomes were dynamically regulated, especially in the apical area." However, the t-halves of 11.5s and 18.9s are only slightly different and of the same order, so it would be too much to say that dynamic regulation of actin occurs specifically in the apical region from a difference of this magnitude. The authors should weaken their arguments. It would be good to do a statistical test for significance between the FRAP data.

      Thank you for pointing out this important issue. To highlight the significant difference in the FRAP assay, we have added a new panel showing the statistical analysis of the halftime of recovery of each region of VE cells (Figure 6E). These data indicate that a significance difference in the halftime of recovery (t1/2) between actin spots in the apical and basal regions of zone 2. However, following the reviewer’s comment, we have weakened the description of the FRAP assay results (lines 310-312).

      (5) The discussion section is rather redundant. It could be shortened to be more concise instead of repeating the results.

      Thank you for the comments. We have shortened the Discussion section.

      Minor comments

      In Figure 2C, the statistical test method was not described in the legend.

      Thank you for the comments. We have added the data of the statistical test to the figure legend of Figure 2C (lines 895-896).

      Figure 3G does not look like a normal distribution, so the t-test is inappropriate.

      Thank you for the comments. We have changed the statistical analysis method and used the Mann-Whitney U test. For the same reason, we have changed the analysis method shown in Figure 2C.

      Is Figure 5D the image of zone 1 because it is close to the apical plane? If so, are the IgG-positive structures early endosomes rather than late endosomes? This seems inconsistent with the data in Figure 1.

      Thank you for the comments. The round vesicles observed in this panel are the late endosomes in zone 2. Because most of the internalized fluorescence marker has moved to the late endosomes in zone 2 at this time point (5 min after chasing), early endosomes are not labeled in this image. We have added a dotted line to the x-z axis image (the second top panel) to indicate the depth of the x-y axis image (top panel) in Figure 5D.

      Figure 6B appears to have little or no fluorescence recovery. Is this a typical example? It is also unclear if this is an apical or basal example.

      Thank you for the comments. This image is a typical example. We focused on the dot structures on the surface of late endosomes rather than the fluorescence intensity over the entire photobleached area. To prevent misunderstanding, we have added arrowheads to highlight the actin dot structures that we were analyzing. The FRAP data shown in Figure 6B were obtained at the apical region of zone 2. We have also added this information to the figure legend.

    1. eLife assessment

      This is an important behavioral, pharmacological intervention study of the effects of the catecholamine reuptake inhibitor methylphenidate (MPH) on value-based decision-making using a combination of aversive and appetitive Pavlovian to Instrumental Transfer (PIT) in a human cohort (n=100). The design used drug dosing after learning, allowing the convincing interpretation of catecholamines being involved in the decision process, an effect dependent on baseline working memory capacity. The results also challenge the view that catecholamines operate by modulating behavioural invigoration alone.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use methylphenidate (MPH) administration after learning a Pavlovian to instrumental transfer (PIT) task to parse decision-making from instrumental influences. While the main effects were null, individual differences in working memory ability moderated the tendency of MPH to boost cognitive control in order to override PIT-biased instrumental learning. Importantly, this working memory moderator had symmetrical effects in appetite and aversive conditions, and these patterns replicated within each valence condition across different values of gain/loss (Fig S1c), suggesting a reliable effect that is generalized across instances of Pavlovian influence.

      Strengths:

      The idea of using pharmacological challenge after learning but prior to transfer is a novel technique that highlights the influence of catecholamines on the expression of learning under Pavlovian bias, and importantly it dissociated this decision feature from the learning of stimulus-outcome or action-outcome pairings.

      Weaknesses:

      While the report is largely straightforward and clearly written, some aspects may be edited to improve the clarity for other readers.

      1) Theoretical clarity. The authors seem to hedge their bets when it comes to placing these findings within a broader theoretical framework.

      2) Analytic clarity: what's c^2?

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Geurts et al. investigated the effects of the catecholamine reuptake inhibitor methylphenidate (MPH) on value-based decision-making using a combination of aversive and appetitive Pavlovian to Instrumental Transfer (PIT) in a human cohort. Using an elegant behavioural design they showed a valence- and action-specific effects of Pavlovian cues on instrumental responses. Initial analyses show no effect of MPH on these processes. However the authors performed a more in-depth analysis and demonstrated that MPH actually modulates PIT in action-specific manner depending of individual working memory capacities. The authors interpret that as an effect on cognitive control of Pavlovian biasing of actions and decision-making more than an invigoration of motivational biases.

      Strengths:

      A major strength of this study is its experimental design. The elegant combination of appetitive and aversive Pavlovian learning with approach/avoidance instrumental actions allows to precisely investigate the different modulation of value-based decision making depending on the context and environmental stimuli. Important MPH is only administered after Pavlovian and instrumental learning, restricting the effect on PIT performance only. Finally, the use of a placebo-controlled crossover design allows within-comparisons between PIT effect under placebo and MPH and the investigation of the relationships between working memory abilities, PIT and MPH effects.

      Weaknesses:

      As authors stated in their discussion, this study is purely correlational and their conclusions could be strengthened by the addition of interesting (but time- and resource-consuming) neuroimaging work.<br /> The originality of this work compared to their previous published work using the same cohort could also be clarified at different stages of the article, as I initially wondered what was really novel. This point is much clearer in the discussion section.<br /> A point which, in my opinion, really requires clarification is when the working memory performance presented in Figure 2B has been determined. Was it under placebo (as I would guess) or under MPH? If it is the former, it would be also interesting to look at how MPH modulates working memory based on initial abilities.<br /> A final point is that it could be interesting to also discuss these results, not only regarding dopamine signalling, but also including potential effect of MPH on noradrenaline in frontal regions, considering the known role of this system in modulating behavioural flexibility.

    4. Reviewer #3 (Public review):

      The manuscript by Geurts and colleagues studies the effects of methylphenidate on Pavlovian to instrumental transfer in humans and demonstrates that the effects of the drug depend on the baseline working memory capacity of the participants. The experiment used a well established cognitive task that allows to measure the effects of Pavlovian cues predicting monetary wins and losses on instrumental responding in two different contexts, namely approach and withdraw. By administering the drug after participants went through the instrumental and Pavlovian learning phases of the experiment, the authors limited the effects of the drug to the transfer phase in extinction. This allowed the authors to make inference about the invigorating effects of the cues independently from any learning bias. Moreover, the authors employed a within subject design to study the effect of the drug on 100 participants, which also allows to detect continuous between-subject relationships with covariates such as working memory capacity.

      The study replicates previous findings using this task, namely that appetitive cues promote active responding, and aversive cues promote passive responding in an approach instrumental context, whereas the effect of the cues reverses in a withdraw instrumental context. The results of the methylphenidate manipulation show that the drug decreases the effects of the Pavlovian cues on instrumental responding in participants with low working memory capacity but increases the Pavlovian effects in participants with high working memory capacity. Importantly, in the latter group, methylphenidate increases the invigorating effect of appetitive Pavlovian cues on active approach and aversive Pavlovian cues on active withdrawal as well as the inhibitory effects of aversive Pavlovian cues on active approach and appetitive Pavlovian cues on active withdrawal. These results cannot be explained if catecholamines are just involved in Pavlovian biases by modulating behavioral invigoration driven by the anticipation of reward and punishment in the striatum, as this account can't account for the reversal of the effects of a valence cue on vigor depending on the instrumental context.

      In general, I find the methods of this study very robust and the results very convincing and important. However, I have some concerns:

      I am not convinced that the inclusion of impulsivity scores in the logistic mixed model to analyze the effects of methylphenidate on PIT is warranted. The authors do not show whether inclusion of this covariate is justified in terms of BIC. Moreover, they include this covariate but do not report the effects. Finally, it is possible that impulsivity is correlated with working memory capacity. In that case, multicollinearity may impact the estimation of the coefficient estimates and may inflate the p-values for the correlated covariates. Are the reported results robust when this factor is not included?

      The authors state that working memory capacity is an established proxy for dopamine synthesis capacity and cite some studies supporting this view. However, the authors omit a recent reference by van den Bosch et al that provides evidence for the absence of links between striatal dopamine synthesis capacity and working memory capacity. The lack of a robust link between working memory capacity and dopamine synthesis capacity in the striatum strengthens the alternative explanations of the results suggested in the discussion.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use methylphenidate (MPH) administration after learning a Pavlovian to instrumental transfer (PIT) task to parse decision-making from instrumental influences. While the main effects were null, individual differences in working memory ability moderated the tendency of MPH to boost cognitive control in order to override PIT-biased instrumental learning. Importantly, this working memory moderator had symmetrical effects in appetite and aversive conditions, and these patterns replicated within each valence condition across different values of gain/loss (Fig S1c), suggesting a reliable effect that is generalized across instances of Pavlovian influence.

      Strengths:

      The idea of using pharmacological challenge after learning but prior to transfer is a novel technique that highlights the influence of catecholamines on the expression of learning under Pavlovian bias, and importantly it dissociated this decision feature from the learning of stimulus-outcome or action-outcome pairings.

      We thank the reviewer for highlighting the timing of the pharmacological intervention as a strength for this study and for the suggested improvements for clarification.

      Weaknesses:

      While the report is largely straightforward and clearly written, some aspects may be edited to improve the clarity for other readers.

      (1) Theoretical clarity. The authors seem to hedge their bets when it comes to placing these findings within a broader theoretical framework.

      Our findings ask for a revision of theories regarding how catecholamines modulate the instantiation of Pavlovian biases of decision making. The reviewer rightly notices that we offer three neuroanatomical routes through which methylphenidate might have acted to elicit these effects. It is important to note, however, that the current study does not provide evidence that can disentangle these different hypotheses. Accordingly, these three neuroanatomical routes raise questions for future research.

      Our findings ask for a revision of theories on how catecholamines are involved in instantiation of Pavlovian biases in decision making. The reviewer rightly notices that we offer three routes to modify current theory to be able to incorporate our findings. Briefly, these routes discuss a (i)modulation by catecholamines a striatal ‘origin’ of Pavlovian biases, (ii) catecholaminergic modulation of Pavlovian-biases through top-down control, primarily relying on prefrontal processes, and (ii) a combination of the two, where catecholamines regulate the balance between these frontal and striatal processes. Given the systemic nature of the pharmacological manipulation, we cannot dissociate between these three accounts.  We believe that discussing these possible explanations of our data actually enriches our discussion and strengthen our recommendation in the ultimate paragraph to use pharmacological neuro_imaging_ studies to arbitrate between these options. In the revision, we will make this clearer.

      Given the systemic nature of the pharmacological manipulation, we cannot dissociate between these three accounts. We believe that discussing these possible explanations enriches our Discussion and strengthens our recommendation in the ultimate paragraph to use pharmacological neuro_imaging_ studies to arbitrate between these options. In the revision, we will make this line of reasoning clearer.

      (2) Analytic clarity: what's c^2?

      C^2 seems a technical pdf conversion error problem: all chi-squares (Χ2) have been converted to C2. This will be corrected in our revision.

      Reviewer #2 (Public review):

      Summary:

      In this study, Geurts et al. investigated the effects of the catecholamine reuptake inhibitor methylphenidate (MPH) on value-based decision-making using a combination of aversive and appetitive Pavlovian to Instrumental Transfer (PIT) in a human cohort. Using an elegant behavioural design they showed a valence- and action-specific effects of Pavlovian cues on instrumental responses. Initial analyses show no effect of MPH on these processes. However the authors performed a more in-depth analysis and demonstrated that MPH actually modulates PIT in action-specific manner depending of individual working memory capacities. The authors interpret that as an effect on cognitive control of Pavlovian biasing of actions and decision-making more than an invigoration of motivational biases.

      Strengths:

      A major strength of this study is its experimental design. The elegant combination of appetitive and aversive Pavlovian learning with approach/avoidance instrumental actions allows to precisely investigate the different modulation of value-based decision making depending on the context and environmental stimuli. Important MPH is only administered after Pavlovian and instrumental learning, restricting the effect on PIT performance only. Finally, the use of a placebo-controlled crossover design allows within-comparisons between PIT effect under placebo and MPH and the investigation of the relationships between working memory abilities, PIT and MPH effects.

      We thank the reviewer for highlighting the experimental design as a strength for this study and the suggested improvements for clarification.

      Weaknesses:

      As authors stated in their discussion, this study is purely correlational and their conclusions could be strengthened by the addition of interesting (but time- and resource-consuming) neuroimaging work.

      We employ a pharmacological intervention within a randomized placebo controlled cross-over design, which allows for causal inferences with respect to the placebo-controlled intervention. Thus, the reported interactions of interest include correlations, but these are causally dependent on our intervention.

      Perhaps the reviewer refers to the implications of our findings for hypotheses regarding neural implementation of Pavlovian bias-generation. Indeed, based on our data we are not able to arbitrate between frontal and striatal accounts, due to the systemic nature of the pharmacological intervention. Indeed, as we discuss, we agree with the reviewer that neuroimaging (in combination with for example brain stimulation) would be a valuable next step to identify the neural correlates to these pharmacological intervention effects, to dissociate between frontal and striatal drives of the effects. In our planned revisions, we will try to clarify this point, as per our reply to reviewer 1.

      The originality of this work compared to their previous published work using the same cohort could also be clarified at different stages of the article, as I initially wondered what was really novel. This point is much clearer in the discussion section.

      As recommended, in our planned revisions, we will bring forward the statements that clarify the originality of the current experiment.

      A point which, in my opinion, really requires clarification is when the working memory performance presented in Figure 2B has been determined. Was it under placebo (as I would guess) or under MPH? If it is the former, it would be also interesting to look at how MPH modulates working memory based on initial abilities.

      We will also clarify that working memory span was assessed for all participants on Day 2 prior to the start of instrumental training (as illustrated in figure 1A). Importantly, this was done prior to ingestion of the drug or placebo (which subjects received after Pavlovian training, which followed the instrumental training). This design also precludes an assessment of the effects of MPH on working memory capacity.

      A final point is that it could be interesting to also discuss these results, not only regarding dopamine signalling, but also including potential effect of MPH on noradrenaline in frontal regions, considering the known role of this system in modulating behavioural flexibility.

      We indeed focus our Discussion more on dopamine than on noradrenaline. Our revision will follow up on the suggestion of the reviewer to include discussion about the effects of MPH on noradrenaline and behavioural flexibility (and the recommendation, in future studies, to use a multi-drug design, incorporating, for example, a session with the drug atomoxetine, which modulates cortical catecholamines, but not striatal dopamine).

      Reviewer #3 (Public review):

      The manuscript by Geurts and colleagues studies the effects of methylphenidate on Pavlovian to instrumental transfer in humans and demonstrates that the effects of the drug depend on the baseline working memory capacity of the participants. The experiment used a well established cognitive task that allows to measure the effects of Pavlovian cues predicting monetary wins and losses on instrumental responding in two different contexts, namely approach and withdraw. By administering the drug after participants went through the instrumental and Pavlovian learning phases of the experiment, the authors limited the effects of the drug to the transfer phase in extinction. This allowed the authors to make inference about the invigorating effects of the cues independently from any learning bias. Moreover, the authors employed a within subject design to study the effect of the drug on 100 participants, which also allows to detect continuous between-subject relationships with covariates such as working memory capacity.

      The study replicates previous findings using this task, namely that appetitive cues promote active responding, and aversive cues promote passive responding in an approach instrumental context, whereas the effect of the cues reverses in a withdraw instrumental context. The results of the methylphenidate manipulation show that the drug decreases the effects of the Pavlovian cues on instrumental responding in participants with low working memory capacity but increases the Pavlovian effects in participants with high working memory capacity. Importantly, in the latter group, methylphenidate increases the invigorating effect of appetitive Pavlovian cues on active approach and aversive Pavlovian cues on active withdrawal as well as the inhibitory effects of aversive Pavlovian cues on active approach and appetitive Pavlovian cues on active withdrawal. These results cannot be explained if catecholamines are just involved in Pavlovian biases by modulating behavioral invigoration driven by the anticipation of reward and punishment in the striatum, as this account can't account for the reversal of the effects of a valence cue on vigor depending on the instrumental context.

      In general, I find the methods of this study very robust and the results very convincing and important. However, I have some concerns:

      We thank the Reviewer for highlighting the robustness of the methods and the importance of the results. We are glad to shortly address the concerns here and will incorporate these in our planned revision of the manuscript.

      I am not convinced that the inclusion of impulsivity scores in the logistic mixed model to analyze the effects of methylphenidate on PIT is warranted. The authors do not show whether inclusion of this covariate is justified in terms of BIC. Moreover, they include this covariate but do not report the effects. Finally, it is possible that impulsivity is correlated with working memory capacity. In that case, multicollinearity may impact the estimation of the coefficient estimates and may inflate the p-values for the correlated covariates. Are the reported results robust when this factor is not included?

      With regard to the inclusion of impulsivity we first like to mention that this inclusion in our analyses was planned a priori and therefore consistently implemented in the other reports resulting from the overarching study (Froböse et al., 2018; Cook et al., 2019; Rostami Kandroodi et al., 2021), especially the study with regard to which the current report is an e-life research advance (Swart et al., 2017). Moreover, we preregistered both working memory span and impulsivity as potential factors (under secondary measures) that could mediate the effects of catecholamines (see https://onderzoekmetmensen.nl/nl/trial/26989). The inclusion of working memory span was based on evidence from PET imaging studies demonstrating a link with dopamine synthesis capacity (Cools et al., 2008; Landau et al, 2009), whereas the inclusion of trait impulsivity was based on evidence from other PET imaging studies showing a link with dopamine (auto)receptor availability (Buckholtz et al., 2010; Kim et al., 2014; Lee et al., 2009; Reeves et al., 2012). Although there was no significant improvement in BIC for the model with impulsivity compared with the model without impulsivity, we feel that we should follow our a priori established analyses.

      We can confirm that impulsivity and working memory were not correlated in this sample (r98\=-0.16, p=0.88), which rules out multicollinearity.

      Most importantly, results are robust to excluding impulsivity scores as evidenced by a significant four-way interaction from the omnibus GLMM without impulsivity (Action Context x Valence x Drug x WM span: X2 = 9.5, p=0.002). We will report these findings in the revised manuscript.

      The authors state that working memory capacity is an established proxy for dopamine synthesis capacity and cite some studies supporting this view. However, the authors omit a recent reference by van den Bosch et al that provides evidence for the absence of links between striatal dopamine synthesis capacity and working memory capacity. The lack of a robust link between working memory capacity and dopamine synthesis capacity in the striatum strengthens the alternative explanations of the results suggested in the discussion.

      We agree with the Reviewer that the lack of a robust link between working memory capacity and dopamine synthesis capacity in the striatum, as measured with [18F]-FDOPA PET imaging is lending support for the proposed hypothesis incorporating a broader perspective on Pavlovian bias generation than the dopaminergic direct/indirect pathway account (although it is possible that the association will hold in a larger sample when synthesis capacity is measured with [18F]-FMT PET imaging, which is sensitive to a different component of the metabolic pathway). We will indeed incorporate in our planned revision the findings from our group reported in van den Bosch et al (2022).

    1. eLife assessment

      This important study offers a powerful empirical test of a highly influential hypothesis in population genetics. It incorporates a large number of animal genomes spanning a broad phylogenetic spectrum and treats them in a rigorous unified pipeline, providing the convincing negative result that effective population size scales neither with the content of transposable elements nor with overall genome size. These observations demonstrate that there is still no simple, global hypothesis that can explain the observed variation in transposable element content and genome size in animals.

    2. Reviewer #1 (Public Review):

      Summary:

      One enduring mystery involving the evolution of genomes is the remarkable variation they exhibit with respect to size. Much of that variation is due to differences in the number of transposable elements, which often (but not always) correlates with the overall quantity of DNA. Amplification of TEs is nearly always either selectively neutral or negative with respect to host fitness. Given that larger effective population sizes are more efficient at removing these mutations, it has been hypothesized that TE content, and thus overall genome size, may be a function of effective population size. The authors of this manuscript test this hypothesis by using a uniform approach to analysis of several hundred animal genomes, using the ratio of synonymous to nonsynonymous mutations in coding sequence as a measure of the overall strength of purifying selection, which serves as a proxy for effective population size over time. The data convincingly demonstrates that it is unlikely that effective population size has a strong effect on TE content and, by extension, overall genome size (except for birds).

      Strengths:

      Although this ground has been covered before in many other papers, the strength of this analysis is that it is comprehensive and treats all the genomes with the same pipeline, making comparisons more convincing. Although this is a negative result, it is important because it is relatively comprehensive and indicates that there will be no simple, global hypothesis that can explain the observed variation.

      Weaknesses:

      In several places, I think the authors slip between assertions of correlation and assertions of cause-effect relationships not established in the results. In other places, the arguments end up feeling circular, based, I think, on those inferred causal relationships. It was also puzzling why plants (which show vast differences in DNA content) were ignored altogether.

    3. Reviewer #2 (Public Review):

      Summary:

      The Mutational Hazard Hypothesis (MHH) is a very influential hypothesis in explaining the origins of genomic and other complexity that seem to entail the fixation of costly elements. Despite its influence, very few tests of the hypothesis have been offered, and most of these come with important caveats. This lack of empirical tests largely reflects the challenges of estimating crucial parameters.

      The authors test the central contention of the MHH, namely that genome size follows effective population size (Ne). They martial a lot of genomic and comparative data, test the viability of their surrogates for Ne and genome size, and use correct methods (phylogenetically corrected correlation) to test the hypothesis. Strikingly, they not only find that Ne is not THE major determinant of genome size, as is argued by MHH, but that there is not even a marginally significant effect. This is remarkable, making this an important paper.

      Strengths:

      The hypothesis tested is of great importance.

      The negative finding is of great importance for reevaluating the predictive power of the tested hypothesis.

      The test is straightforward and clear.

      The analysis is a technical tour-de-force, convincingly circumventing a number of challenges of mounting a true test of the hypothesis.

      Weaknesses:

      I note no particular strengths, but I believe the paper could be further strengthened in three major ways.

      (1) The authors should note that the hypothesis that they are testing is larger than the MHH. The MHH hypothesis says that<br /> (i) low-Ne species have more junk in their genomes and<br /> (ii) this is because junk tends to be costly because of increased mutation rate to nulls, relative to competing non/less-junky alleles.

      The current results reject not just the compound (i+ii) MHH hypothesis, but in fact any hypothesis that relies on i. This is notably a (much) more important rejection. Indeed, whereas MHH relies on particular constructions of increased mutation rates of varying plausibility, the more general hypothesis i includes any imaginable or proposed cost to the extra sequence (replication costs, background transcription, costs of transposition, ectopic expression of neighboring genes, recombination between homologous elements, misaligning during meiosis, reduced organismal function from nuclear expansion, the list goes on and on). For those who find the MHH dubious on its merits, focusing this paper on the MHH reduces its impact - the larger hypothesis that the small costs of extra sequence dictate the fates of different organisms' genomes is, in my opinion, a much more important and plausible hypothesis, and thus the current rejection is more important than the authors let on.

      (2) In addition to the authors' careful logical and mathematical description of their work, they should take more time to show the intuition that arises from their data. In particular, just by looking at Figure 1b one can see what is wrong with the non-phylogenetically-corrected correlations that MHH's supporters use. That figure shows that mammals, many of which have small Ne, have large genomes regardless of their Ne, which suggests that the coincidence of large genomes and frequently small Ne in this lineage is just that, a coincidence, not a causal relationship. Similarly, insects by and large have large Ne, regardless of their genome size. Insects, many of which have large genomes, have large Ne regardless of their genome size, again suggesting that the coincidence of this lineage of generally large Ne and smaller genomes is not causal. Given that these two lineages are abundant on earth in addition to being overrepresented among available genomes (and were even more overrepresented when the foundational MHH papers collected available genomes), it begins to emerge how one can easily end up with a spurious non-phylogenetically corrected correlation: grab a few insects, grab a few mammals, and you get a correlation. Notably, the same holds for lineages not included here but that are highly represented in our databases (and all the more so 20 years ago): yeasts related to S. cerevisiae (generally small genomes and large median Ne despite variation) and angiosperms (generally large genomes (compared to most eukaryotes) and small median Ne despite variation). Pointing these clear points out will help non-specialists to understand why the current analysis is not merely a they-said-them-said case, but offers an explanation for why the current authors' conclusions differ from the MHH's supporters and moreover explain what is wrong with the MHH's supporters' arguments.

      (3) A third way in which the paper is more important than the authors let on is in the striking degree of the failure of MHH here. MHH does not merely claim that Ne is one contributor to genome size among many; it claims that Ne is THE major contributor, which is a much, much stronger claim. That no evidence exists in the current data for even the small claim is a remarkable failure of the actual MHH hypothesis: the possibility is quite remote that Ne is THE major contributor but that one cannot even find a marginally significant correlation in a huge correlation analysis deriving from a lot of challenging bioinformatic work. Thus this is an extremely strong rejection of the MHH. The MHH is extremely influential and yet very challenging to test clearly. Frankly, the authors would be doing the field a disservice if they did not more strongly state the degree of importance of this finding.

    4. Reviewer #3 (Public Review):

      The Mutational Hazard Hypothesis (MHH) suggests that lineages with smaller effective population sizes should accumulate slightly deleterious transposable elements leading to larger genome sizes. Marino and colleagues tested the MHH using a set of 807 vertebrate, mollusc, and insect species. The authors mined repeats de novo and estimated dN/dS for each genome. Then, they used dN/dS and life history traits as reliable proxies for effective population size and tested for correlations between these proxies and repeat content while accounting for phylogenetic nonindependence. The results suggest that overall, lineages with lower effective population sizes do not exhibit increases in repeat content or genome size. This contrasts with expectations from the MHH. The authors speculate that changes in genome size may be driven by lineage-specific host-TE conflicts rather than effective population size.

      The general conclusions of this paper are supported by a powerful dataset of phylogenetically diverse species. The use of C-values rather than assembly size for many species (when available) helps mitigate the challenges associated with the underrepresentation of repetitive regions in short-read-based genome assemblies. As expected, genome size and repeat content are highly correlated across species. Nonetheless, the authors report divergent relationships between genome size and dN/dS and TE content and dN/dS in multiple clades: Insecta, Actinopteri, Aves, and Mammalia. These discrepancies are interesting but could reflect biases associated with the authors' methodology for repeat detection and quantification rather than the true biology.

      The authors used dnaPipeTE for repeat quantification. Although dnaPipeTE is a useful tool for estimating TE content when genome assemblies are not available, it exhibits several biases. One of these is that dnaPipeTE seems to consistently underestimate satellite content (compared to repeat masker on assembled genomes; see Goubert et al. 2015). Satellites comprise a significant portion of many animal genomes and are likely significant contributors to differences in genome size. This should have a stronger effect on results in species where satellites comprise a larger proportion of the genome relative to other repeats (e.g. Drosophila virilis, >40% of the genome (Flynn et al. 2020); Triatoma infestans, 25% of the genome (Pita et al. 2017) and many others). For example, the authors report that only 0.46% of the Triatoma infestans genome is "other repeats" (which include simple repeats and satellites). This contrasts with previous reports of {greater than or equal to}25% satellite content in Triatoma infestans (Pita et al. 2017). Similarly, this study's results for "other" repeat content appear to be consistently lower for Drosophila species relative to previous reports (e.g. de Lima & Ruiz-Ruano 2022). The most extreme case of this is for Drosophila albomicans where the authors report 0.06% "other" repeat content when previous reports have suggested that 18%->38% of the genome is composed of satellites (de Lima & Ruiz-Ruano 2022). It is conceivable that occasional drastic underestimates or overestimates for repeat content in some species could have a large effect on coevol results, but a minimal effect on more general trends (e.g. the overall relationship between repeat content and genome size).

      Another bias of dnaPipeTE is that it does not detect ancient TEs as well as more recently active TEs (Goubert et al. 2015). Thus, the repeat content used for PIC and coevolve analyses here is inherently biased toward more recently inserted TEs. This bias could significantly impact the inference of long-term evolutionary trends.

    5. Author response:

      Reviewer #1:

      Summary:

      One enduring mystery involving the evolution of genomes is the remarkable variation they exhibit with respect to size. Much of that variation is due to differences in the number of transposable elements, which often (but not always) correlates with the overall quantity of DNA. Amplification of TEs is nearly always either selectively neutral or negative with respect to host fitness. Given that larger effective population sizes are more efficient at removing these mutations, it has been hypothesized that TE content, and thus overall genome size, may be a function of effective population size. The authors of this manuscript test this hypothesis by using a uniform approach to analysis of several hundred animal genomes, using the ratio of synonymous to nonsynonymous mutations in coding sequence as a measure of the overall strength of purifying selection, which serves as a proxy for effective population size over time. The data convincingly demonstrates that it is unlikely that effective population size has a strong effect on TE content and, by extension, overall genome size (except for birds).

      Strengths:

      Although this ground has been covered before in many other papers, the strength of this analysis is that it is comprehensive and treats all the genomes with the same pipeline, making comparisons more convincing. Although this is a negative result, it is important because it is relatively comprehensive and indicates that there will be no simple, global hypothesis that can explain the observed variation.

      Weaknesses:

      In several places, I think the authors slip between assertions of correlation and assertions of cause-effect relationships not established in the results. 

      Several times in the text we use the expression “effect of dN/dS on…” which might indeed suggest a causal relationship. The phrasing refers to dN/dS being used in the regression as an independent variable that can be able to predict the variation of the dependent variables genome size and TE content. We are going to rephrase these expressions so that correlation is not mistaken with causation.

      In other places, the arguments end up feeling circular, based, I think, on those inferred causal relationships. It was also puzzling why plants (which show vast differences in DNA content) were ignored altogether.

      The analysis focuses on metazoans for two reasons: one practical and one fundamental. The practical reason is computational. Our analysis included TE annotation, phylogenetic estimation and dN/dS estimation, which would have been very difficult with the hundreds, if not thousands, of plant genomes available. If we had included plants, it would have been natural to include fungi as well, to have a complete set of multicellular eukaryotic genomes, adding to the computational burden. The second fundamental reason is that plants show important genome size differences due to more frequent whole genome duplications (polyploidization) than in animals. It is therefore possible that the effect of selection on genome size is different in these two groups, which would have led us to treat them separately, decreasing the interest of this comparison. For these reasons we chose to focus on animals that still provide very wide ranges of genome size and population size well suited to test the impact of drift.

      Reviewer #2:

      Summary:

      The Mutational Hazard Hypothesis (MHH) is a very influential hypothesis in explaining the origins of genomic and other complexity that seem to entail the fixation of costly elements. Despite its influence, very few tests of the hypothesis have been offered, and most of these come with important caveats. This lack of empirical tests largely reflects the challenges of estimating crucial parameters.

      The authors test the central contention of the MHH, namely that genome size follows effective population size (Ne). They martial a lot of genomic and comparative data, test the viability of their surrogates for Ne and genome size, and use correct methods (phylogenetically corrected correlation) to test the hypothesis. Strikingly, they not only find that Ne is not THE major determinant of genome size, as is argued by MHH, but that there is not even a marginally significant effect. This is remarkable, making this an important paper.

      Strengths:

      The hypothesis tested is of great importance.

      The negative finding is of great importance for reevaluating the predictive power of the tested hypothesis.

      The test is straightforward and clear.

      The analysis is a technical tour-de-force, convincingly circumventing a number of challenges of mounting a true test of the hypothesis.

      Weaknesses:

      I note no particular strengths, but I believe the paper could be further strengthened in three major ways.

      (1) The authors should note that the hypothesis that they are testing is larger than the MHH. The MHH hypothesis says that

      (i) low-Ne species have more junk in their genomes and

      (ii) this is because junk tends to be costly because of increased mutation rate to nulls, relative to competing non/less-junky alleles.

      The current results reject not just the compound (i+ii) MHH hypothesis, but in fact any hypothesis that relies on i. This is notably a (much) more important rejection. Indeed, whereas MHH relies on particular constructions of increased mutation rates of varying plausibility, the more general hypothesis i includes any imaginable or proposed cost to the extra sequence (replication costs, background transcription, costs of transposition, ectopic expression of neighboring genes, recombination between homologous elements, misaligning during meiosis, reduced organismal function from nuclear expansion, the list goes on and on). For those who find the MHH dubious on its merits, focusing this paper on the MHH reduces its impact - the larger hypothesis that the small costs of extra sequence dictate the fates of different organisms' genomes is, in my opinion, a much more important and plausible hypothesis, and thus the current rejection is more important than the authors let on.

      The MHH is arguably the most structured and influential theoretical framework proposed to date based on the null assumption (i), therefore setting the paper up with the MHH is somehow inevitable. Because of this, in the manuscript, we mostly discuss the peculiarities of TE biology that can drive the genome away from the MHH expectations, focusing on the mutational aspect. We however agree that the hazard posed by extra DNA is not limited to the gain of function via the mutation process, but can be linked to many other molecular processes as mentioned above. In a revised manuscript, we will make the concept of hazard more comprehensive and further stress that this applies not only to TEs but any nearly-neutral mutation affecting non-coding DNA.

      (2) In addition to the authors' careful logical and mathematical description of their work, they should take more time to show the intuition that arises from their data. In particular, just by looking at Figure 1b one can see what is wrong with the non-phylogenetically-corrected correlations that MHH's supporters use. That figure shows that mammals, many of which have small Ne, have large genomes regardless of their Ne, which suggests that the coincidence of large genomes and frequently small Ne in this lineage is just that, a coincidence, not a causal relationship. Similarly, insects by and large have large Ne, regardless of their genome size. Insects, many of which have large genomes, have large Ne regardless of their genome size, again suggesting that the coincidence of this lineage of generally large Ne and smaller genomes is not causal. Given that these two lineages are abundant on earth in addition to being overrepresented among available genomes (and were even more overrepresented when the foundational MHH papers collected available genomes), it begins to emerge how one can easily end up with a spurious non-phylogenetically corrected correlation: grab a few insects, grab a few mammals, and you get a correlation. Notably, the same holds for lineages not included here but that are highly represented in our databases (and all the more so 20 years ago): yeasts related to S. cerevisiae (generally small genomes and large median Ne despite variation) and angiosperms (generally large genomes (compared to most eukaryotes) and small median Ne despite variation). Pointing these clear points out will help non-specialists to understand why the current analysis is not merely a they-said-them-said case, but offers an explanation for why the current authors' conclusions differ from the MHH's supporters and moreover explain what is wrong with the MHH's supporters' arguments.

      We agree that comparing dispersion of the points from the non-phylogenetically corrected correlation with the results of the phylogenetic contrasts intuitively emphasizes the importance of accounting for species relatedness. Just looking at the clade colors in Figure 2 makes immediately stand out that a simple regression hides phylogenetic structure. We will stress this in the discussion to make the point clear.

      (3) A third way in which the paper is more important than the authors let on is in the striking degree of the failure of MHH here. MHH does not merely claim that Ne is one contributor to genome size among many; it claims that Ne is THE major contributor, which is a much, much stronger claim. That no evidence exists in the current data for even the small claim is a remarkable failure of the actual MHH hypothesis: the possibility is quite remote that Ne is THE major contributor but that one cannot even find a marginally significant correlation in a huge correlation analysis deriving from a lot of challenging bioinformatic work. Thus this is an extremely strong rejection of the MHH. The MHH is extremely influential and yet very challenging to test clearly. Frankly, the authors would be doing the field a disservice if they did not more strongly state the degree of importance of this finding.

      We respectfully disagree with the reviewer that there is currently no evidence for an effect of Ne on genome size evolution. While it is accurate that our large dataset allows us to reject the universality of Ne as the major contributor to genome size variation, this does not exclude the possibility of such an effect in certain contexts. Notably, there are several pieces of evidence that find support for Ne to determine genome size variation and to entail nearly-neutral TE dynamics under certain circumstances, e.g. of particularly strongly contrasted Ne and moderate divergence times (Lefébure et al. 2017; Mérel et al. 2024; Tollis and Boissinot 2013; Ruggiero et al. 2017). The strength of such works is to analyze the short-term dynamics of TEs in response to Ne within groups of species/populations, where the cost posed by extra DNA is likely to be similar. Indeed, the MHH predicts genome size to vary according to the combination of drift and mutation under the nearly-neutral theory of molecular evolution. Our work demonstrates that it is not true universally but does not exclude that it could exist locally. Moreover, defense mechanisms against TEs proliferation are often complex molecular machineries that might or might not evolve according to different constraints among clades. We have detailed these points in the discussion.

      Reviewer #3:

      Summary

      The Mutational Hazard Hypothesis (MHH) suggests that lineages with smaller effective population sizes should accumulate slightly deleterious transposable elements leading to larger genome sizes. Marino and colleagues tested the MHH using a set of 807 vertebrate, mollusc, and insect species. The authors mined repeats de novo and estimated dN/dS for each genome. Then, they used dN/dS and life history traits as reliable proxies for effective population size and tested for correlations between these proxies and repeat content while accounting for phylogenetic nonindependence. The results suggest that overall, lineages with lower effective population sizes do not exhibit increases in repeat content or genome size. This contrasts with expectations from the MHH. The authors speculate that changes in genome size may be driven by lineage-specific host-TE conflicts rather than effective population size.

      Strengths

      The general conclusions of this paper are supported by a powerful dataset of phylogenetically diverse species. The use of C-values rather than assembly size for many species (when available) helps mitigate the challenges associated with the underrepresentation of repetitive regions in short-read-based genome assemblies. As expected, genome size and repeat content are highly correlated across species. Nonetheless, the authors report divergent relationships between genome size and dN/dS and TE content and dN/dS in multiple clades: Insecta, Actinopteri, Aves, and Mammalia. These discrepancies are interesting but could reflect biases associated with the authors' methodology for repeat detection and quantification rather than the true biology.

      Weaknesses

      The authors used dnaPipeTE for repeat quantification. Although dnaPipeTE is a useful tool for estimating TE content when genome assemblies are not available, it exhibits several biases. One of these is that dnaPipeTE seems to consistently underestimate satellite content (compared to repeat masker on assembled genomes; see Goubert et al. 2015). Satellites comprise a significant portion of many animal genomes and are likely significant contributors to differences in genome size. This should have a stronger effect on results in species where satellites comprise a larger proportion of the genome relative to other repeats (e.g. Drosophila virilis, >40% of the genome (Flynn et al. 2020); Triatoma infestans, 25% of the genome (Pita et al. 2017) and many others). For example, the authors report that only 0.46% of the Triatoma infestans genome is "other repeats" (which include simple repeats and satellites). This contrasts with previous reports of {greater than or equal to}25% satellite content in Triatoma infestans (Pita et al. 2017). Similarly, this study's results for "other" repeat content appear to be consistently lower for Drosophila species relative to previous reports (e.g. de Lima & Ruiz-Ruano 2022). The most extreme case of this is for Drosophila albomicans where the authors report 0.06% "other" repeat content when previous reports have suggested that 18%->38% of the genome is composed of satellites (de Lima & Ruiz-Ruano 2022). It is conceivable that occasional drastic underestimates or overestimates for repeat content in some species could have a large effect on coevol results, but a minimal effect on more general trends (e.g. the overall relationship between repeat content and genome size).

      There are indeed some discrepancies between our estimates of low complexity repeats and those from the literature due to the approach used. Hence, occasional underestimates or overestimates of repeat content are possible. As noted, the contribution of “Other” repeats to the overall repeat content is generally very low, meaning an underestimation bias. We thank the reviewer for providing this interesting review. We will emphasize it in the discussion of our revised manuscript.

      Not being able to correctly estimate the quantity of satellites might pose a problem for quantifying the total content of junk DNA. However, the overall repeat content mostly composed of TEs correlates very well with genome size, both in the overall dataset and within clades (with the notable exception of birds) so we are confident that this limitation is not the explanation of our negative results. Moreover, while satellite information might be missing, this is not problematic to test our a priori hypothesis since we focus our attention on TEs, whose proliferation mechanism is very different from that of tandem repeats.

      Finally, divergence from the consensus can be estimated only for TEs. Therefore, recently active elements do not include simple and tandem repeats: yet the results based on recent TE content are very similar to those based on the overall repeat content.

      Another bias of dnaPipeTE is that it does not detect ancient TEs as well as more recently active TEs (Goubert et al. 2015). Thus, the repeat content used for PIC and coevolve analyses here is inherently biased toward more recently inserted TEs. This bias could significantly impact the inference of long-term evolutionary trends.

      Indeed, dnaPipeTE is not good at detecting old TE copies due to the read-based approach, biasing the outcome towards new elements. We agree on TE content being underestimated, especially in those genomes that tend to accumulate TEs rather than getting rid of them. However, the sum of old TEs and recent TEs is extremely well correlated to genome size (Pearson’s correlation: r = 0.87, p-value < 2.2e-16; PIC: slope = 0.22, adj-R2 = 0.42, p-value < 2.2e-16). Our main result therefore does not rely on an accurate estimation of old TEs. In contrast, we hypothesized that recent TEs could be interesting if selection acted on TEs insertion and dynamics rather than on non-coding DNA. Our results demonstrate that this is not the case: it should be noted that in spite of its limits for old TEs, dnaPipeTE is especially fitting for this specific analysis as it is not biased by very repetitive new TE families that are problematic to assemble. We will clearly emphasize the limitation of dnaPipeTE and discuss the consequences on our results in the discussion of the revised manuscript.

      Finally, in a preliminary analysis on the dipteran species, we show that the TE content estimated with dnaPipeTE is generally similar to that estimated from the assembly with earlGrey (Baril et al. 2024) across a good range of genome sizes going from drosophilid-like to mosquito-like (Pearson’s correlation: r = 0.88, p-value = 3.22e-10; see also the corrected Supplementary Figure S2 below). While for these species TEs are probably dominated by recent to moderately recent TEs, Aedes albopictus is an outlier for its genome size and the estimations with the two methods are largely consistent. However, the computation time required to estimate TE content using EarlGrey was significantly longer, with a ~300% increase in computation time, making it a very costly option (a similar issue is applicable to other assembly-based annotation pipelines). Given the rationale presented above, we decided to use dnaPipeTE instead of EarlGrey.

    1. eLife assessment

      This valuable study investigates the immune system's role in pre-eclampsia. The authors map the immune cell landscape of the human placenta and find an increase in macrophages and Th17 cells in patients with pre-eclampsia. Following mouse studies, the authors suggest that the IGF1-IGF1R pathway might play a role in how macrophages influence T cells, potentially driving the pathology of pre-eclampsia. There is solid evidence in this study that will be of interest to immunologists and developmental biologists, however, some of the conclusions require additional detail and/or more appropriate statistical tests.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors utilized human placental samples together with multiple mouse models to explore the mechanisms whereby inflammatory macrophages and T cells are linked to preeclampsia (PE). The authors first undertook CyTOF of placental samples from women with normal pregnancies, PE, gestational diabetes mellitus (GDM), and GDM with superimposed PE (GDM+PE). The authors report an increase of memory-like Th17 cells, memory-like CD8+ T cells, and pro-inflammatory macrophages in PE cases, but not GDM or GDM+PE, together with diminished γδT cells, anti-inflammatory macrophages, and granulocyte myeloid-derived suppressor cells (gMDSC). The authors then undertook several experiments using scRNA-seq, bulk RNA-seq, and flow cytometry in a RUPP model to first show that the transfer of pro-inflammatory macrophages from RUPP mice into normal pregnant mice with depleted macrophages resulted in increased embryo resorption and diminished fetal weight and size. Moreover, pro-inflammatory macrophages induced memory-like Th17 cells in mice. Similarly, injection of T-cells from RUPP mice resulted in increased embryo resorption and diminished fetal weight and size. Such mice that received RUPP-derived T cells displayed similarly worsened outcomes in their second pregnancy in the absence of any additional T cell transfer. The authors identified the IGF1-IGF1R ligand-receptor pair as a factor involved in the macrophage-mediated induction of memory-like Th17 cells, as confirmed by experiments using an IGF1R inhibitor. Finally, the authors transferred IGF1R inhibitor-treated T cells to a pregnant mouse that was administered LPS and depleted of T cells and observed improved outcomes compared to mice that received non-treated T cells. The authors conclude that their study identifies a PE-specific immune cell network regulated by pro-inflammatory macrophages and T cells.

      Strengths:

      Utilization of both human placental samples and multiple mouse models to explore the mechanisms linking inflammatory macrophages and T cells to preeclampsia (PE).<br /> Incorporation of advanced techniques such as CyTOF, scRNA-seq, bulk RNA-seq, and flow cytometry.

      Identification of specific immune cell populations and their roles in PE, including the IGF1-IGF1R ligand-receptor pair in macrophage-mediated Th17 cell differentiation.<br /> Demonstration of the adverse effects of pro-inflammatory macrophages and T cells on pregnancy outcomes through transfer experiments.

      Weaknesses:

      Inconsistent use of uterine and placental cells, which are distinct tissues with different macrophage populations, potentially confounding results.

      Missing observational data for the initial experiment transferring RUPP-derived macrophages to normal pregnant mice.

      Unclear mechanisms of anti-macrophage compounds and their effects on placental/fetal macrophages.

      Difficulty in distinguishing donor cells from recipient cells in murine single-cell data complicates interpretation.

      Limitation of using the LPS model in the final experiments, as it more closely resembles systemic inflammation seen in endotoxemia rather than the specific pathology of PE.

    3. Reviewer #2 (Public review):

      Summary:

      Fei, Lu, Shi, et al. present a thorough evaluation of the immune cell landscape in pre-eclamptic human placentas by single-cell multi-omics methodologies compared to normal control placentas. Based on their findings of elevated frequencies of inflammatory macrophages and memory-like Th17 cells, they employ adoptive cell transfer mouse models to interrogate the coordination and function of these cell types in pre-eclampsia immunopathology. They demonstrate the putative role of the IGF1-IGF1R axis as the key pathway by which inflammatory macrophages in the placenta skew CD4+ T cells towards an inflammatory IL-17A-secreting phenotype that may drive tissue damage, vascular dysfunction, and elevated blood pressure in pre-eclampsia, leaving researchers with potential translational opportunities to pursue this pathway in this indication.

      They present a major advance to the field in their profiling of human placental immune cells from pre-eclampsia patients where most extant single-cell atlases focus on term versus preterm placenta, or largely examine trophoblast biology with a much rarer subset of immune cells. While the authors present vast amounts of data at both the protein and RNA transcript level, we, the reviewers, feel this manuscript is still in need of much more clarity in its main messaging, and more discretion in including only key data that supports this main message most effectively.

      Strengths:

      (1) This study combines human and mouse analyses and allows for some amount of mechanistic insight into the role of pro-inflammatory and anti-inflammatory macrophages in the pathogenesis of pre-eclampsia (PE), and their interaction with Th17 cells.

      (2) Importantly, they do this using matched cohorts across normal pregnancy and common PE comorbidities like gestation diabetes (GDM).

      (3) The authors have developed clear translational opportunities from these "big data" studies by moving to pursue potential IGF1-based interventions.

      Weaknesses:

      (1) Clearly the authors generated vast amounts of multi-omic data using CyTOF and single-cell RNA-seq (scRNA-seq), but their central message becomes muddled very quickly. The reader has to do a lot of work to follow the authors' multiple lines of inquiry rather than smoothly following along with their unified rationale. The title description tells fairly little about the substance of the study. The manuscript is very challenging to follow. The paper would benefit from substantial reorganizations and editing for grammatical and spelling errors. For example, RUPP is introduced in Figure 4 but in the text not defined or even talked about what it is until Figure 6. (The figure comparing pro- and anti-inflammatory macrophages does not add much to the manuscript as this is an expected finding).

      (2) The methods lack critical detail about how human placenta samples were processed. The maternal-fetal interface is a highly heterogeneous tissue environment and care must be taken to ensure proper focus on maternal or fetal cells of origin. Lacking this detail in the present manuscript, there are many unanswered questions about the nature of the immune cells analyzed. It is impossible to figure out which part of the placental unit is analyzed for the human or mouse data. Is this the decidua, the placental villi, or the fetal membranes? This is of key importance to the central findings of the manuscript as the immune makeup of these compartments is very different. Or is this analyzed as the entirety of the placenta, which would be a mix of these compartments and significantly less exciting?

      (3) Similarly, methods lack any detail about the analysis of the CyTOF and scRNAseq data, much more detail needs to be added here. How were these clustered, what was the QC for scRNAseq data, etc? The two small paragraphs lack any detail.

      (4) There is also insufficient detail presented about the quantities or proportions of various cell populations. For example, gdT cells represent very small proportions of the CyTOF plots shown in Figures 1B, 1C, & 1E, yet in Figures 2I, 2K, & 2K there are many gdT cells shown in subcluster analysis without a description of how many cells are actually represented, and where they came from. How were biological replicates normalized for fair statistical comparison between groups?

      (5) The figures themselves are very tricky to follow. The clusters are numbered rather than identified by what the authors think they are, the numbers are so small, that they are challenging to read. The paper would be significantly improved if the clusters were clearly labeled and identified. All the heatmaps and the abundance of clusters should be in separate supplementary figures.

      (6) The authors should take additional care when constructing figures that their biological replicates (and all replicates) are accurately represented. Figure 2H-2K shows N=10 data points for the normal pregnant (NP) samples when clearly their Table 1 and test denote they only studied N=9 normal subjects.

      (7) There is little to no evaluation of regulatory T cells (Tregs) which are well known to undergird maternal tolerance of the fetus, and which are well known to have overlapping developmental trajectory with RORgt+ Th17 cells. We recommend the authors evaluate whether the loss of Treg function, quantity, or quality leaves CD4+ effector T cells more unrestrained in their effect on PE phenotypes. References should include, accordingly: PMCID: PMC6448013 / DOI: 10.3389/fimmu.2019.00478; PMC4700932 / DOI: 10.1126/science.aaa9420.

      (8) In discussing gMDSCs in Figure 3, the authors have missed key opportunities to evaluate bona fide Neutrophils. We recommend they conduct FACS or CyTOF staining including CD66b if they have additional tissues or cells available. Please refer to this helpful review article that highlights key points of distinguishing human MDSC from neutrophils: https://doi.org/10.1038/s41577-024-01062-0. This will both help the evaluation of potentially regulatory myeloid cells that may suppress effector T cells as well as aid in understanding at the end of the study if IL-17 produced by CD4+ Th17 cells might recruit neutrophils to the placenta and cause ROS immunopathology and fetal resorption.

      (9) Depletion of macrophages using several different methodologies (PLX3397, or clodronate liposomes) should be accompanied by supplementary data showing the efficiency of depletion, especially within tissue compartments of interest (uterine horns, placenta). The clodronate piece is not at all discussed in the main text. Both should be addressed in much more detail.

      (10) There are many heatmaps and tSNE / UMAP plots with unhelpful labels and no statistical tests applied. Many of these plots (e.g. Figure 7) could be moved to supplemental figures or pared down and combined with existing main figures to help the authors streamline and unify their message.

      (11) There are claims that this study fills a gap that "only one report has provided an overall analysis of immune cells in the human placental villi in the presence and absence of spontaneous labor at term by scRNA-seq (Miller 2022)" (lines 362-364), yet this study itself does not exhaustively study all immune cell subsets...that's a monumental task, even with the two multi-omic methods used in this paper. There are several other datasets that have performed similar analyses and should be referenced.

      (12) Inappropriate statistical tests are used in many of the analyses. Figures 1-2 use the Shapiro-Wilk test, which is a test of "goodness of fit", to compare unpaired groups. A Kruskal-Wallis or other nonparametric t-test is much more appropriate. In other instances, there is no mention of statistical tests (Figures 6-7) at all. Appropriate tests should be added throughout.

    4. Author response:

      Reviewer #1:

      Strengths:

      Utilization of both human placental samples and multiple mouse models to explore the mechanisms linking inflammatory macrophages and T cells to preeclampsia (PE).<br /> Incorporation of advanced techniques such as CyTOF, scRNA-seq, bulk RNA-seq, and flow cytometry.

      Identification of specific immune cell populations and their roles in PE, including the IGF1-IGF1R ligand-receptor pair in macrophage-mediated Th17 cell differentiation.<br /> Demonstration of the adverse effects of pro-inflammatory macrophages and T cells on pregnancy outcomes through transfer experiments.

      Weaknesses:

      Comment 1. Inconsistent use of uterine and placental cells, which are distinct tissues with different macrophage populations, potentially confounding results.

      Response1: We thank the reviewers' comments. We have done the green fluorescent protein (GFP) pregnant mice-related animal experiment, which was not shown in this manuscript. The wild-type (WT) female mice were mated with either transgenic male mice, genetically modified to express GFP, or with WT male mice, in order to generate either GFP-expressing pups (GFP-pups) or their genetically unmodified counterparts (WT-pups), respectively. Mice were euthanized on day 18.5 of gestation, and the uteri of the pregnant females and the placentas of the offspring were analyzed using flow cytometry. The majority of macrophages in the uterus and placenta are of maternal origin, which was defined by GFP negative. In contrast, fetal-derived macrophages, distinguished by their expression of GFP, represent a mere fraction of the total macrophage population, signifying their inconsequential or restricted presence amidst the broader cellular landscape. We will added the GPF pregnant mice-related data in Figure 4-figure supplement 1 to explain the different macrophage populations in the uterine and placental cells.

      Comment 2. Missing observational data for the initial experiment transferring RUPP-derived macrophages to normal pregnant mice.

      Response 2: We thank the reviewers' comments. In our experiments, PLX3397 or Clodronate Liposomes was used to deplete the macrophages of pregnant mice, and then we injected RUPP-derived pro-inflammatory macrophages and anti-inflammatory macrophages back into PLX3397 or Clodronate Liposomes-treated pregnant mice. And We found that RUPP-derived F480+CD206- pro-inflammatory macrophages induced immune imbalance at the maternal-fetal interface and PE-like symptoms (Figure 4E-4H and Figure 4-figure supplement 1 A-C).

      Comment 3. Unclear mechanisms of anti-macrophage compounds and their effects on placental/fetal macrophages.

      Response 3: We thank the reviewers' comments. PLX3397, the inhibitor of CSF1R, which is needed for macrophage development (Nature. 2023, PMID: 36890231; Cell Mol Immunol. 2022, PMID: 36220994), we have stated that on line 189-191. However, PLX3397 is a small molecule compound that possesses the potential to cross the placental barrier and affect fetal macrophages. We will discuss the impact of this factor on the experiment in the discussion section.

      Comment 4. Difficulty in distinguishing donor cells from recipient cells in murine single-cell data complicates interpretation.

      Response 4: We thank the reviewers' comments. Upon analysis, we observed a notable elevation in the frequency of total macrophages within the CD45+ cell population. Then we subsequently performed macrophage clustering and uncovered a marked increase in the frequency of Cluster 0, implying a potential correlation between Cluster 0 and donor-derived cells. RNA sequencing revealed that the F480+CD206- pro-inflammatory donor macrophages exhibited a Folr2+Ccl7+Ccl8+C1qa+C1qb+C1qc+ phenotype, which is consistent with the phenotype of cluster 0 in macrophages observed in single-cell RNA sequencing (Figure 4D and Figure 5E). Therefore, we believe that the donor cells is cluster 0 in macrophages.

      Comment 5. Limitation of using the LPS model in the final experiments, as it more closely resembles systemic inflammation seen in endotoxemia rather than the specific pathology of PE.

      Response 5: We thank the reviewers' comments. Firstly, our other animal experiments in this manuscript used the Reduction in Uterine Perfusion Pressure (RUPP) mouse model to simulate the pathology of PE. However, the RUPP model requires ligation of the uterine arteries in pregnant mice on day 12.5 of gestation, which hinders T cells returning from the tail vein from reaching the maternal-fetal interface. In addition, this experiment aims to prove that CD4+ T cells are differentiated into memory-like Th17 cells through IGF-1R receptor signalling to affect pregnancy by clearing CD4+ T cells in vivo with an anti-CD4 antibody followed by injecting IGF-1R inhibitor-treated CD4+ T cells. And we proved that injection of RUPP-derived memory-like CD4+ T cells into pregnant rats induces PE-like symptoms (Figure 6). In summary, the application of the LPS model in Figure 8 does not affect the conclusions.

      Reviewer #2:

      Strengths:

      (1) This study combines human and mouse analyses and allows for some amount of mechanistic insight into the role of pro-inflammatory and anti-inflammatory macrophages in the pathogenesis of pre-eclampsia (PE), and their interaction with Th17 cells.

      (2) Importantly, they do this using matched cohorts across normal pregnancy and common PE comorbidities like gestation diabetes (GDM).

      (3) The authors have developed clear translational opportunities from these "big data" studies by moving to pursue potential IGF1-based interventions.

      Weaknesses:

      Comment 1. Clearly the authors generated vast amounts of multi-omic data using CyTOF and single-cell RNA-seq (scRNA-seq), but their central message becomes muddled very quickly. The reader has to do a lot of work to follow the authors' multiple lines of inquiry rather than smoothly following along with their unified rationale. The title description tells fairly little about the substance of the study. The manuscript is very challenging to follow. The paper would benefit from substantial reorganizations and editing for grammatical and spelling errors. For example, RUPP is introduced in Figure 4 but in the text not defined or even talked about what it is until Figure 6. (The figure comparing pro- and anti-inflammatory macrophages does not add much to the manuscript as this is an expected finding).

      Response 1: We thank the reviewers' comments. According to the reviewer's suggestion, we will proceed with making the necessary revisions. Firstly, We will modify the title of the article to be more specific. Then, we will introduce the RUPP mouse model when interpreted Figure 4. Thirdly, we plan to simplify or consolidate the images from Figure5 to Figure7 to make them easier to follow. Finally, We will diligently correct the grammatical and spelling errors in the article. As for the figure comparing pro- and anti-inflammatory macrophages, The Editor requested a more comprehensive description of the macrophage phenotype during the initial submission. As a result, we conducted the transcriptomes of both uterine-derived pro-inflammatory and anti-inflammatory macrophages and conducted a detailed analysis of macrophages in single-cell data.

      Comment 2. The methods lack critical detail about how human placenta samples were processed. The maternal-fetal interface is a highly heterogeneous tissue environment and care must be taken to ensure proper focus on maternal or fetal cells of origin. Lacking this detail in the present manuscript, there are many unanswered questions about the nature of the immune cells analyzed. It is impossible to figure out which part of the placental unit is analyzed for the human or mouse data. Is this the decidua, the placental villi, or the fetal membranes? This is of key importance to the central findings of the manuscript as the immune makeup of these compartments is very different. Or is this analyzed as the entirety of the placenta, which would be a mix of these compartments and significantly less exciting?

      Response 2: We thank the reviewers' comments. Placental villi rather than fetal membranes and decidua were used for CyToF in this study. This detail about how human placenta samples were processed will be added to the Materials and Methods section.

      Comment 3. Similarly, methods lack any detail about the analysis of the CyTOF and scRNAseq data, much more detail needs to be added here. How were these clustered, what was the QC for scRNAseq data, etc? The two small paragraphs lack any detail.

      Response 3: We thank the reviewers' comments. The detail about the analysis of the CyTOF and scRNAseq data will be added in the Materials and Methods section.

      Comment 4. There is also insufficient detail presented about the quantities or proportions of various cell populations. For example, gdT cells represent very small proportions of the CyTOF plots shown in Figures 1B, 1C, & 1E, yet in Figures 2I, 2K, & 2K there are many gdT cells shown in subcluster analysis without a description of how many cells are actually represented, and where they came from. How were biological replicates normalized for fair statistical comparison between groups?

      Response 4: We thank the reviewers' comments. In Figure 1, CD45+ immune cells were clustered into 10 subpopulations, which included gdT cells. While Figure 2 displays the further clustering analysis of CD4+T, CD8+T, and gdT cells, with gdT cells being further subdivided into 22 clusters (Figure 2-figure supplement 1C). The number of biological replicates (samples) is consistent with Figure 1.

      Comment 5. The figures themselves are very tricky to follow. The clusters are numbered rather than identified by what the authors think they are, the numbers are so small, that they are challenging to read. The paper would be significantly improved if the clusters were clearly labeled and identified. All the heatmaps and the abundance of clusters should be in separate supplementary figures.

      Response 5: We thank the reviewers' comments. The t-SNE distributions of the 15 clusters of CD4+ T cells, 18 clusters of CD8+ T cells, and 22 clusters of gdT cells are shown separately in Figure 2A, F, and I. The heatmaps displaying the expression levels of markers in these clusters of CD4+ T cells, CD8+ T cells, and gdT cells are presented in Figure 2-figure supplement 1A, B, and C, respectively. The t-SNE distributions of the 29 clusters of CD11b+ cells are shown in Figure 3A, and the heatmap displaying the expression levels of markers in these clusters is presented in Figure 3B. As for sc-RNA sequencing, the heatmap and UMAP distributions of the 15 clusters of macrophages are shown separately in Figure 5C and 5D. The UMAP distributions and heatmap of the 12 clusters of T/NK cells are shown in Figure 6A and 6B. The UMAP distributions and heatmap of the 9 clusters of T/NK cells are shown in Figure 7A and 7B.

      Comment 6. The authors should take additional care when constructing figures that their biological replicates (and all replicates) are accurately represented. Figure 2H-2K shows N=10 data points for the normal pregnant (NP) samples when clearly their Table 1 and test denote they only studied N=9 normal subjects.

      Response 6: We thank the reviewers' careful checking. During our verification, we found that one sample in the NP group had pregnancy complications other than PE and GMD. The data in Figure 2H-2K was not updated in a timely manner. We will promptly update this data and reanalyze it.

      Comment 7. There is little to no evaluation of regulatory T cells (Tregs) which are well known to undergird maternal tolerance of the fetus, and which are well known to have overlapping developmental trajectory with RORgt+ Th17 cells. We recommend the authors evaluate whether the loss of Treg function, quantity, or quality leaves CD4+ effector T cells more unrestrained in their effect on PE phenotypes. References should include, accordingly: PMCID: PMC6448013 / DOI: 10.3389/fimmu.2019.00478; PMC4700932 / DOI: 10.1126/science.aaa9420.

      Response 7: We thank the reviewers' comments. We have done the Treg-related animal experiment, which was not shown in this manuscript. We will add the Treg-related data in Figure 6. The injection of CD4+ T cells derived from RUPP mouse, characterized by a reduced frequency of Tregs, could induce PE-like symptoms in pregnant mice. Additionally, we will add a necessary discussion about Tregs.

      Comment 8. In discussing gMDSCs in Figure 3, the authors have missed key opportunities to evaluate bona fide Neutrophils. We recommend they conduct FACS or CyTOF staining including CD66b if they have additional tissues or cells available. Please refer to this helpful review article that highlights key points of distinguishing human MDSC from neutrophils: https://doi.org/10.1038/s41577-024-01062-0. This will both help the evaluation of potentially regulatory myeloid cells that may suppress effector T cells as well as aid in understanding at the end of the study if IL-17 produced by CD4+ Th17 cells might recruit neutrophils to the placenta and cause ROS immunopathology and fetal resorption.

      Response 8: We thank the reviewers' comments. Although we do not have additional tissues or cells available to conduct FACS or CyTOF staining, including for CD66b, we plan to utilize CD15 and CD66b antibodies for immunofluorescence staining of placental tissue. Suppressing effector T cells is a signature feature of MDSCs, and T cells may also influence the functions of MDSCs, we will refer to this review and discuss it in the Discussion section of the article.

      Comment 9. Depletion of macrophages using several different methodologies (PLX3397, or clodronate liposomes) should be accompanied by supplementary data showing the efficiency of depletion, especially within tissue compartments of interest (uterine horns, placenta). The clodronate piece is not at all discussed in the main text. Both should be addressed in much more detail.

      Response 9: We thank the reviewers' comments. We already have the additional data on the efficiency ofmacrophage depletion involving PLX3397 and clodronate liposomes, which were not present in this manuscript, and we'll add it to the manuscript. The clodronate piece is mentioned in the main text (Line 197-201), but only briefly described, because the results using clodronate we obtained were similar to those using PLX3397.

      Comment 10. There are many heatmaps and tSNE / UMAP plots with unhelpful labels and no statistical tests applied. Many of these plots (e.g. Figure 7) could be moved to supplemental figures or pared down and combined with existing main figures to help the authors streamline and unify their message.

      Response 10: We thank the reviewers' comments. We plan to simplify or consolidate the images from Figure5 to Figure7 to make them easier to follow.

      Comment 11. There are claims that this study fills a gap that "only one report has provided an overall analysis of immune cells in the human placental villi in the presence and absence of spontaneous labor at term by scRNA-seq (Miller 2022)" (lines 362-364), yet this study itself does not exhaustively study all immune cell subsets...that's a monumental task, even with the two multi-omic methods used in this paper. There are several other datasets that have performed similar analyses and should be referenced.

      Response 11: We thank the reviewers' comments. We will search for more literature and reference additional studies that have conducted similar analyses.

      Comment 12. Inappropriate statistical tests are used in many of the analyses. Figures 1-2 use the Shapiro-Wilk test, which is a test of "goodness of fit", to compare unpaired groups. A Kruskal-Wallis or other nonparametric t-test is much more appropriate. In other instances, there is no mention of statistical tests (Figures 6-7) at all. Appropriate tests should be added throughout.

      We thank the reviewers' comments. As stated in the Statistical Analysis section (lines 601-604), the Kruskal-Wallis test was used to compare the results of experiments with multiple groups. Comparisons between the two groups in Figures 6-7 were conducted using Student's t-test. The aforementioned statistical methods will be included in the figure legends.

    1. eLife assessment

      This important work advances our understanding of how mechanical forces transmitted by blood flow contribute to cardiac development by identifying id2b as a flow-responsive factor that is required for valve development and calcium-mediated cardiac contractility and its downstream mechanism of action. However, the evidence supporting the conclusions is incomplete and would benefit from more rigorous approaches. With additional support of the main conclusions, the work will be of interest to those working on developmental biology, heart development, and congenital heart disease.

    2. Reviewer #1 (Public review):

      Summary:<br /> Chen et al. identified the role of endocardial id2b expression in cardiac contraction and valve formation through pharmaceutical, genetic, electrophysiology, calcium imaging, and echocardiography analyses. CRISPR/Cas9 generated id2b mutants demonstrated defective AV valve formation, excitation-contraction coupling, reduced endocardial cell proliferation in AV valve, retrograde blood flow, and lethal effects.

      Strengths:<br /> Their methods, data and analyses broadly support their claims.

      Weaknesses:<br /> The molecular mechanism is somewhat preliminary.

    3. Reviewer #2 (Public review):

      Summary:<br /> Biomechanical forces, such as blood flow, are crucial for organ formation, including heart development. This study by Shuo Chen et al. aims to understand how cardiac cells respond to these forces. They used zebrafish as a model organism due to its unique strengths, such as the ability to survive without heartbeats, and conducted transcriptomic analysis on hearts with impaired contractility. They thereby identified id2b as a gene regulated by blood flow and is crucial for proper heart development, in particular, for the regulation of myocardial contractility and valve formation. Using both in situ hybridization and transgenic fish they showed that id2b is specifically expressed in the endocardium, and its expression is affected by both pharmacological and genetic perturbations of contraction. They further generated a null mutant of id2b to show that loss of id2b results in heart malformation and early lethality in zebrafish. Atrioventricular (AV) and excitation-contraction coupling were also impaired in id2b mutants. Mechanistically, they demonstrate that Id2b interacts with the transcription factor Tcf3b to restrict its activity. When id2b is deleted, the repressor activity of Tcf3b is enhanced, leading to suppression of the expression of nrg1 (neuregulin 1), a key factor for heart development. Importantly, injecting tcf3b morpholino into id2b-/- embryos partially restores the reduced heart rate. Moreover, treatment of zebrafish embryos with the Erbb2 inhibitor AG1478 results in decreased heart rate, in line with a model in which Id2b modulates heart development via the Nrg1/Erbb2 axis. The research identifies id2b as a biomechanical signaling-sensitive gene in endocardial cells that mediates communication between the endocardium and myocardium, which is essential for heart morphogenesis and function.

      Strengths:<br /> The study provides novel insights into the molecular mechanisms by which biomechanical forces influence heart development and highlights the importance of id2b in this process.

      Weaknesses:<br /> The claims are in general well supported by experimental evidence, but the following aspects may benefit from further investigation:

      (1) In Figure 1C, the heatmap demonstrates the up-regulated and down-regulated genes upon tricane-induced cardiac arrest. Aside from the down-regulation of id2b expression, it was also evident that id2a expression was up-regulated. As a predicted paralog of id2b, it would be interesting to see whether the up-regulation of id2a in response to tricane treatment was a compensatory response to the down-regulation of id2b expression.

      (2) The study mentioned that id2b is tightly regulated by the flow-sensitive primary cilia-klf2 signaling axis; however aside from showing the reduced expression of id2b in klf2a and klf2b mutants, there was no further evidence to solidify the functional link between id2b and klf2. It would therefore be ideal, in the present study, to demonstrate how Klf2, which is a transcriptional regulator, transduces biomechanical stimuli to Id2b.

      (3) The authors showed the physical interaction between ectopically expressed FLAG-Id2b and HA-Tcf3b in HEK293T cells. Although the constructs being expressed are of zebrafish origin, it would be nice to show in vivo that the two proteins interact.

    4. Reviewer #3 (Public review):

      Summary:<br /> How mechanical forces transmitted by blood flow contribute to normal cardiac development remains incompletely understood. Using the unique advantages of the zebrafish model system, Chen et al make the fundamental discovery that endocardial expression of id2b is induced by blood flow and required for normal atrioventricular canal (AVC) valve development and myocardial contractility by regulating calcium dynamics. Mechanistically, the authors suggest that Id2b binds to Tcf3b in endocardial cells, which relieves Tcf3b-mediated transcriptional repression of Neuregulin 1 (NRG1). Nrg1 then induces expression of the L-type calcium channel component LRRC1. This study significantly advances our understanding of flow-mediated valve formation and myocardial function.

      Strengths:<br /> Strengths of the study are the significance of the question being addressed, use of the zebrafish model, and data quality (mostly very nice imaging). The text is also well-written and easy to understand.

      Weaknesses:<br /> Weaknesses include a lack of rigor for key experimental approaches, which led to skepticism surrounding the main findings. Specific issues were the use of morpholinos instead of genetic mutants for the bmp ligands, cilia gene ift88, and tcf3b, lack of an explicit model surrounding BMP versus blood flow induced endocardial id2b expression, use of bar graphs without dots, the artificial nature of assessing the physical interaction of Tcf3b and Id2b in transfected HEK293 cells, and artificial nature of examining the function of the tcf3b binding sites upstream of nrg1.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Chen et al. identified the role of endocardial id2b expression in cardiac contraction and valve formation through pharmaceutical, genetic, electrophysiology, calcium imaging, and echocardiography analyses. CRISPR/Cas9 generated id2b mutants demonstrated defective AV valve formation, excitation-contraction coupling, reduced endocardial cell proliferation in AV valve, retrograde blood flow, and lethal effects.

      Strengths:

      Their methods, data and analyses broadly support their claims.

      Weaknesses:

      The molecular mechanism is somewhat preliminary.

      We thank the reviewer for the constructive comments. To further elucidate the molecular mechanisms underlying the observed phenotypes, we will conduct the following experiments: (1) perform qRT-PCR to analyze the expression of id2a in hearts isolated from tricane-treated embryos and in id2b-deleted embryos; (2) use RNAscope to detect the expression of id2b in developing embryos; (3) validate the interaction between Id2b and Tcf3b in vivo; and (4) conduct CUT&Tag experiments in developing zebrafish embryos to further validate the Tcf3b binding sites upstream of nrg1.

      Reviewer #2 (Public review):

      Summary:

      Biomechanical forces, such as blood flow, are crucial for organ formation, including heart development. This study by Shuo Chen et al. aims to understand how cardiac cells respond to these forces. They used zebrafish as a model organism due to its unique strengths, such as the ability to survive without heartbeats, and conducted transcriptomic analysis on hearts with impaired contractility. They thereby identified id2b as a gene regulated by blood flow and is crucial for proper heart development, in particular, for the regulation of myocardial contractility and valve formation. Using both in situ hybridization and transgenic fish they showed that id2b is specifically expressed in the endocardium, and its expression is affected by both pharmacological and genetic perturbations of contraction. They further generated a null mutant of id2b to show that loss of id2b results in heart malformation and early lethality in zebrafish. Atrioventricular (AV) and excitation-contraction coupling were also impaired in id2b mutants. Mechanistically, they demonstrate that Id2b interacts with the transcription factor Tcf3b to restrict its activity. When id2b is deleted, the repressor activity of Tcf3b is enhanced, leading to suppression of the expression of nrg1 (neuregulin 1), a key factor for heart development. Importantly, injecting tcf3b morpholino into id2b-/- embryos partially restores the reduced heart rate. Moreover, treatment of zebrafish embryos with the Erbb2 inhibitor AG1478 results in decreased heart rate, in line with a model in which Id2b modulates heart development via the Nrg1/Erbb2 axis. The research identifies id2b as a biomechanical signaling-sensitive gene in endocardial cells that mediates communication between the endocardium and myocardium, which is essential for heart morphogenesis and function.

      Strengths:

      The study provides novel insights into the molecular mechanisms by which biomechanical forces influence heart development and highlights the importance of id2b in this process.

      Weaknesses:

      The claims are in general well supported by experimental evidence, but the following aspects may benefit from further investigation:

      (1) In Figure 1C, the heatmap demonstrates the up-regulated and down-regulated genes upon tricane-induced cardiac arrest. Aside from the down-regulation of id2b expression, it was also evident that id2a expression was up-regulated. As a predicted paralog of id2b, it would be interesting to see whether the up-regulation of id2a in response to tricaine treatment was a compensatory response to the down-regulation of id2b expression.

      As suggested by the reviewer, we will perform qRT-PCR to analyze the expression of id2a in hearts isolated from tricane-treated embryos, as well as in id2b-deleted embryos.

      (2) The study mentioned that id2b is tightly regulated by the flow-sensitive primary cilia-klf2 signaling axis; however aside from showing the reduced expression of id2b in klf2a and klf2b mutants, there was no further evidence to solidify the functional link between id2b and klf2. It would therefore be ideal, in the present study, to demonstrate how Klf2, which is a transcriptional regulator, transduces biomechanical stimuli to Id2b.

      We have examined the expression levels of id2b in both klf2a and klf2b mutants. The whole mount in situ results clearly demonstrate a decrease in id2b signal in both mutants. As noted by the reviewer, klf2 is a transcriptional regulator, suggesting that the regulation of id2b may occur at the transcriptional level. However, dissecting the molecular mechanisms underling the crosstalk between klf2 and id2b is beyond the scope of the present study.

      (3) The authors showed the physical interaction between ectopically expressed FLAG-Id2b and HA-Tcf3b in HEK293T cells. Although the constructs being expressed are of zebrafish origin, it would be nice to show in vivo that the two proteins interact.

      We agree with the reviewer and will perform additional experiments to validate the interaction between Id2b and Tcf3b in vivo. Due to the lack of antibodies targeting these proteins, we will overexpress Flag-id2b and HA-Tcf3b in zebrafish embryos and conduct a co-IP analysis.

      Reviewer #3 (Public review):

      Summary:

      How mechanical forces transmitted by blood flow contribute to normal cardiac development remains incompletely understood. Using the unique advantages of the zebrafish model system, Chen et al make the fundamental discovery that endocardial expression of id2b is induced by blood flow and required for normal atrioventricular canal (AVC) valve development and myocardial contractility by regulating calcium dynamics. Mechanistically, the authors suggest that Id2b binds to Tcf3b in endocardial cells, which relieves Tcf3b-mediated transcriptional repression of Neuregulin 1 (NRG1). Nrg1 then induces expression of the L-type calcium channel component LRRC1. This study significantly advances our understanding of flow-mediated valve formation and myocardial function.

      Strengths:

      Strengths of the study are the significance of the question being addressed, use of the zebrafish model, and data quality (mostly very nice imaging). The text is also well-written and easy to understand.

      Weaknesses:

      Weaknesses include a lack of rigor for key experimental approaches, which led to skepticism surrounding the main findings. Specific issues were the use of morpholinos instead of genetic mutants for the bmp ligands, cilia gene ift88, and tcf3b, lack of an explicit model surrounding BMP versus blood flow induced endocardial id2b expression, use of bar graphs without dots, the artificial nature of assessing the physical interaction of Tcf3b and Id2b in transfected HEK293 cells, and artificial nature of examining the function of the tcf3b binding sites upstream of nrg1.

      We thank the reviewer for the constructive assessments. Our specific responses are as follows:

      (1) As all the morpholinos used in this study, including those targeting bmp ligands, the cilia gene ift88, and tcf3b, have been published and validated using genetic mutants in previous studies, we believe these loss-of-function analyses are sufficient to delineate their role in regulating id2b expression or function.

      (2) To assess the role of BMP versus blood flow in regulating endocardial id2b expression, we plan to perform live imaging in the id2b:GFP knockin line prior to the initiation of the heartbeat, with or without of BMP inhibitors.

      (3) We will revise the data presentation and use bar graphs with individual data points.

      (4) We plan to perform additional Co-IP experiment in zebrafish embryos to assess the interaction between Tcf3b and Id2b.

      (5) To further validate the tcf3b binding sites upstream of nrg1, we will conduct CUT&Tag experiments in developing zebrafish embryos.

    1. eLife assessment

      This valuable work analyzes how specialized cells in the auditory cells, known as the octopus cells, can detect coincidences in their inputs at the submillisecond time scale. While previous work indicated that these cells receive no inhibitory inputs, the present study unambiguously demonstrates that these cells receive inhibitory glycinergic inputs. The physiologic impact of these inputs needs to be studied further. It remains incomplete at present but could be improved by addressing caveats related to similar sizes of excitatory postsynaptic potentials and spikes in the octopus neurons.

    1. eLife assessment

      This important study investigates the relationship between transcription factor condensate formation, transcription, and 3D gene clustering of the MET regulon in the model organism S. cerevisiae. The authors provide solid experimental evidence that transcription factor condensates enhance transcription of MET-regulated genes, but evidence for the role of Met4 IDRs and Met4-containing condensates in mediating target gene clustering in the MET regulon is not as strong. This paper will be of interest to molecular biologists working on chromatin and transcription, although its impact would be strengthened by further investigation.

    1. eLife assessment

      In this important study, the findings have theoretical and practical implications beyond a single subfield; the work supports the role of breast carcinoma amplified sequence 2 (Bcas2) in positively regulating primitive wave hematopoiesis through amplification of beta-catenin-dependent (canonical) Wnt signaling. The study is convincing, using appropriate and validated methodology in line with the current state-of-the-art; there is a first-rate analysis of a strong phenotype with highly supportive mechanistic data. The findings shed light on the controversial question of whether, when, and how canonical Wnt signaling may be involved in hematopoietic development. The work will be of interest to hematologists but also to developmental biologists.

    2. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Ning et al. reported that Bcas2 played an indispensable role in zebrafish primitive hematopoiesis via sequestering β-catenin in the nucleus. The authors showed that loss of Bcas2 caused primitive hematopoietic defects in zebrafish. They unraveled that Bcas2 deficiency promoted β-catenin nuclear export via a CRM1-dependent manner in vivo and in vitro. They further validated that BCAS2 directly interacted with β-catenin in the nucleus and enhanced β-catenin accumulation through its CC domains. They unveil a novel insight into Bcas2, which is critical for zebrafish primitive hematopoiesis via regulating nuclear β-catenin stabilization rather than its canonical pre-mRNA splicing functions. Overall, the study is impressive and well-performed, although there are also some issues to address.

      Strengths:

      The study unveils a novel function of Bcas2, which is critical for zebrafish primitive hematopoiesis by sequestering β-catenin. The authors validated the results in vivo and in vitro. Most of the figures are clear and convincing. This study nicely complements the function of Bcas2 in primitive hematopoiesis.

      Weaknesses:

      A portion of the figures were over-exposed.

    3. Reviewer #2 (Public Review):

      Summary:

      Ning and colleagues present studies supporting a role for breast carcinoma amplified sequence 2 (Bcas2) in positively regulating primitive wave hematopoiesis through amplification of beta-catenin-dependent (canonical) Wnt signaling. The authors present compelling evidence that zebrafish bcas2 is expressed at the right time and place to be involved in primitive hematopoiesis, that there are primitive hematopoietic defects in hetero- and homozygous mutant and knockdown embryos, that Bcas2 mechanistically positively regulates canonical Wnt signaling, and that Bcas2 is required for nuclear retention of B-cat through physical interaction involving armadillo repeats 9-12 of B-cat and the coiled-coil domains of Bcas2. Overall, the data and writing are clean, clear, and compelling. This study is a first-rate analysis of a strong phenotype with highly supportive mechanistic data. The findings shed light on the controversial question of whether, when, and how canonical Wnt signaling may be involved in hematopoietic development. We detail some minor concerns and questions below, which if answered, we believe would strengthen the overall story and resolve some puzzling features of the phenotype. Notwithstanding these minor concerns, we believe this is an exceptionally well-executed and interesting manuscript.

      Strengths:

      (1) The study features clear and compelling phenotypes and results.

      (2) The manuscript narrative exposition and writing are clear and compelling.

      (3) The authors have attended to important technical nuances sometimes overlooked, for example, focusing on different pools of cytosolic or nuclear b-catenin.

      (4) The study sheds light on a controversial subject: regulation of hematopoietic development by canonical Wnt signaling and presents clear evidence of a role.

      (5) The authors present evidence of phylogenetic conservation of the pathway.

      Weaknesses:

      (1) The authors present compelling data that Bcas2 regulates nuclear retention of B-cat through physical association involving binding between the Bcas2 CC domains and B-cat arm repeats 9-12. Transcriptional activation of Wnt target genes by B-cat requires physical association between B-cat and Tcf/Lef family DNA binding factors involving key interactions in Arm repeats 2-9 (Graham et al., Cell 2000). Mutually exclusive binding by B-cat regulatory factors, such as ICAT that prevent Tcf-binding is a documented mechanism (e.g. Graham et al., Mol Cell 2002). It would appear - based on the arm repeat usage by Bcas2 (repeats 9-12)-that Bcas2 and Tcf binding might not be mutually exclusive, which would support their model that Bcas2 physical association with B-cat to retain it in the nucleus would be compatible with co-activation of genes by allowing association with Tcf. It might be nice to attempt a three-way co-IP of these factors showing that B-cat can still bind Tcf in the presence of Bcas2, or at least speculate on the plausibility of the three-way interaction.

      (2) A major way that canonical Wnt signaling regulates hematopoietic development is through regulation of the LPM hematopoietic competence territories by activating expression of cdx1a, cdx4, and their downstream targets hoxb5a and hoxa9a (Davidson et al., Nature 2003; Davidson et al., Dev Biol 2006; Pilon et al., Dev Biol 2006; Wang et al., PNAS 2008). Could the authors assess (in situ) the expression of cdx1a, cdx4, hoxb5a, and hoxa9a in the bcas2 mutants?

      (3) The authors show compellingly that even heterozygous loss of bcas2 has strong Wnt-inhibitory effects. If Bcas2 is required for canonical Wnt signaling and bcas2 is expressed ubiquitously from the 1-cell stage through at least the beginning of gastrulation, why do bcas2 KO embryos not have morphological axis specification defects consistent with loss of early Wnt signaling, like loss of head (early), or brain anteriorization (later)? Could the authors provide some comments on this puzzle? Or if they do see any canonical Wnt signaling patterning defects in het- or homozygous embryos, could they describe and/or present them?

    4. Reviewer #3 (Public Review):

      Summary:

      This manuscript utilized zebrafish bcas2 mutants to study the role of bcas2 in primitive hematopoiesis and further confirms that it has a similar function in mice. Moreover, they showed that bcas2 regulates the transition of hematopoietic differentiation from angioblasts via activating Wnt signaling. By performing a series of biochemical experiments, they also showed that bcas2 accomplishes this by sequestering b-catenin within the nucleus, rather than through its known function in pre-mRNA splicing.

      Strengths:

      The work is well-performed, and the manuscript is well-written.

      Weaknesses:

      Several issues need to be clarified.

      (1) Is wnt signaling also required during hematopoietic differentiation from angioblasts? Can the authors test angioblast and endothelial markers in embryos with wnt inhibition? Also, can the authors add export inhibitor LMB to the mouse mutants to test if sequestering of b-catenin by bcas2 is conserved during primitive hematopoiesis in mice?

      (2) Bcas2 is required for primitive myelopoiesis in ALM. Does bcas2 play a similar function in primitive myelopoiesis, or is bcas2/b-catenin interaction more important for hematopoietic differentiation in PLM?

      (3) Is it possible that CC1-2 fragment sequester b-catenin? The different phenotypes between this manuscript and the previous article (Yu, 2019) may be due to different mutations in bcas2. Is it possible that the bcas2 mutation in Yu's article produces a complete CC1-2 fragment, which might sequester b-catenin?

      (4) Can the author clarify what embryos the arrows point to in SI Figure 2D? In SI Figure 6B and B', can the author clarify how the nucleus and cytoplasm are bleached? In B, the nucleus also appears to be bleached.

    5. Author Response:

      Thank you very much for your consideration and assessment. We really appreciate the generous comments from the reviewers on our manuscript entitled “BCAS2 promotes primitive hematopoiesis by sequestering β-catenin within the nucleus”. The comments are very helpful for the improvement of our work. We would like to provide the following provisional revision plan to address the public reviews:

      1. To clarify if Bcas2 also promotes primitive myelopoiesis by enhancing nuclear accumulation of β-catenin, bcas2 morpholino will be injected into the Tg(coro1a:EGFP) zebrafish embryos at 1-cell stage, and subsequently the β-catenin distribution in the myeloid cells will be examined. Tg(coro1a:EGFP) is commonly used to track both macrophages and neutrophils.

      2. According to the reviewers’ comments, we will quantify the fluorescence intensity in the cell nucleus and cytoplasm in Figure 3H. Meanwhile, we will adjust the exposure of Figure 5C and Figure 7E, or replaced the figures with high-resolution ones.

      3. Previous studies have reported that β-catenin can bind directly to CRM1 through its central armadillo (ARM) repeats region. β-catenin region containing ARM repeats 10 and the C terminus are essential for its nuclear export (Koike M, et al., The Journal of Biological Chemistry, 2004). In our research, BCAS2 has been demonstrated to bind to the 9-12 ARM repeats of β-catenin. Therefore, it is highly likely that Bcas2 may compete with CRM1 for binding with the nuclear export signal peptide on β-catenin. To further test this possibility, we will transfect HEK293T cells with constructs expressing full-length or truncated forms of β-catenin, and then examine their nuclear distribution. 

      4. To validate if BCAS2 affects CRM1-dependent nuclear export of other classical factors, we plan to knock down or overexpress BCAS2 in HeLa cells, and detect the distribution of ATG1 and CDC37L, which have been identified as CRM1 cargoes.

      5. Considering that the ARM repeats bound by Bcas2 (repeats 9-12) and Tcf (repeats 2-9) might not be mutually exclusive, it is indeed appealing to investigate whether β-catenin can simultaneously interact with Tcf and Bcas2. We will follow review’s suggestion to perform a three-way co-immunoprecipitation assay. Plasmids encoding these three proteins will be co-transfected into cells. Cell lysates will be immunoprecipitated using antibodyspecific to the bait protein (e.g., β-catenin) and eluted proteins will be analyzed using antibodies specific to the other two proteins.

      6. To elucidate that canonical Wnt signaling regulates hematopoietic development by activating expression of cdx1acdx4, and their downstream targets hoxb5a and hoxa9a as previously reported, we intend to examine the expression of cdx4 and hoxa9a in bcas2+/- embryos at 10 ss by performing in situ hybridization.

      7. To further validate whether Wnt signaling is required during endothelial differentiation from angioblasts, wild-type embryos will be subjected to treatment with Wnt inhibitor CCT036477 and the expression of hemangioblast markers npas4lscl, and gata2 and endothelial markers fli1 will be analyzed using in situ hybridization.

      8. In order to clarify whether coiled-coil (CC) domain 1-2 of Bcas2 is sufficient to interact with β-catenin and restore the primitive hematopoietic defect, we will overexpress CC1-2 in Tg(gata1:GFP) embryos injected with bcas2 morpholino, and then investigate the distribution of β-catenin, as well as gata1 expression at 10 ss in these embryos.

    1. eLife assessment

      The authors present 16 new well-preserved specimens from the early Cambrian Chengjiang biota. These specimens potentially represent a new taxon which could be useful in sorting out the problematic topology of artiopodan arthropods - a topic of interest to specialists in Cambrian arthropods. The authors provide solid anatomical and phylogenetic evidence in support of a new interpretation of the homology of dorsal sutures in trilobites and their relatives.

    2. Reviewer #1 (Public Review):

      Summary:

      Du et al. report 16 new well-preserved specimens of atiopodan arthropods from the Chengjiang biota, which demonstrate both dosal and vental anatomies of a potential new taxon of atiopodans that are closely related to trolobites. Authors assigned their specimens to Acanthomeridion serratum, and proposed A. anacanthus as a junior subjective synonym of Acanthomeridion serratum. Critically, the presence of ventral plates (interpreted as cephalic liberigenae), together with phylogenic results, lead authors to conclude that the cephalic sutures originated multiple times within the Artiopoda.

      Strengths:

      New specimens are highly qualified and informative. The morphology of dorsal exoskeleton, except for the supposed free cheek, were well illustrated and described in detail, which provides a wealth of information for taxonomic and phylogenic analyses.

    3. Reviewer #3 (Public Review):

      Summary:

      Well-illustrated new material is documented for Acanthomeridion, a formerly incompletely known Cambrian arthropod. The formerly known facial sutures are proposed be associated with ventral plates that the authors homologise with the free cheeks of trilobites (although also testing alternative homologies). An update of a published phylogenetic dataset permits reconsideration of whether dorsal ecdysial sutures have a single or multiple origins in trilobites and their relatives.

      Strengths:

      Documentation of an ontogenetic series makes a sound case that the proposed diagnostic characters of a second species of Acanthomeridion are variation within a single species. New microtomographic data shed light on appendage morphology that was not formerly known. The new data on ventral plates and their association with the ecdysial sutures are valuable in underpinning homologies with trilobites.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Du et al. report 16 new well-preserved specimens of atiopodan arthropods from the Chengjiang biota, which demonstrate both dosal and vental anatomies of a pothential new taxon of atiopodans that are closely related to trolobites. Authors assigned their specimens to Acanthomeridion serratum, and proposed A. anacanthus as a junior subjective synonym of Acanthomeridion serratum. Critially, the presence of ventral plates (interpreted as cephalic liberigenae), together with phylogenic results, lead authors to conclude that the cephalic sutures originated multiple times within the Artiopoda.

      Strengths:

      New specimens are highly qualified and informative. The morphology of dorsal exoskeleton, except for the supposed free cheek, were well illustrated and described in detail, which provide a wealth of information for taxonmic and phylogenic analyses.

      Weaknesses:

      The weaknesses of this work is obvious in a number of aspects. Technically, ventral morphlogy is less well revealed and is poorly illustrated. Additional diagrams are necessary to show the trunk appendages and suture lines. Taxonomically, I am not convinced by authors' placement. The specimens are markedly different from either Acanthomeridion serratum Hou et al. 1989 or A. anacanthus Hou et al. 2017. The ontogenetic description is extremely weak and the morpholical continuity is not established. Geometric and morphomitric analyses might be helpful to resolve the taxonomic and ontogenic uncertainties. I am confused by author's description of free cheek (libragena) and ventral plate. Are they the same object? How do they connect with other parts of cephalic shield, e.g. hypostome and fixgena. Critically, homology of cephalic slits (eye slits, eye notch, doral suture, facial suture) not extensivlely discussed either morphologically or functionally. Finally, authors claimed that phylogenic results support two separate origins rather than a deep origin. However, the results in Figure 4 can be explain a deep homology of cephalic suture in molecular level and multiple co-options within the Atiopoda.

      Comments on the revised version:

      I have seen the extensive revision of the manuscript. The main point "Multiple origins of dorsal ecdysial sutures in atiopoans" is now partially supported by results presented by the authors. I am still unsatisfied with descriptions and interpretations of critical features newly revealed by authors. The following points might be useful for the author to make further revisions.

      (1) The antennae were well illustrated in a couple of specimens, while it was described in a short sentence.

      Some more details of the changing article shape and overall length of antennae has been added to the description.

      (2) There are also imprecise descriptions of features.

      Measurements, dimensions and multiple figures are provided for many features in the text and the supplement includes more figures. In total, 11 figures are provided with details (photographs or measurements) of the material.

      (3) Ontogeny of the cephalon was not described.

      A sentence has been added to the description to note the changing width:length of the cephalon during ontogeny, with a reference to Figure 6.

      (3) The critical head element is the so called "ventral plate". How this element connects with the cephalic shield is not adequately revealed. The authors claimed that the suture is along the cephalic margin. However, the lateral margin of cephalon is not rounded but exhibit two notches (e.g. Fig 3C) . This gives an indication that the supposed ventral plates have a dorsal extension to fit the notches. Alternatively, the "ventral plate" can be interpreted as a small free cheek with a large ventral extension, providing evidence for librigenal hypothesis.

      As noted in the diagnosis for the genus, these notches are interpreted to accommodate the eye stalks. The homology of the ventral plates is discussed at length in the manuscript, and is the focus of the three sets of phylogenetic analyses performed.

      Reviewer #3 (Public Review):

      Summary:

      Well-illustrated new material is documented for Acanthomeridion, a formerly incompletely known Cambrian arthropod. The formerly known facial sutures are proposed be associated with ventral plates that the authors homologise with the free cheeks of trilobites (although also testing alternative homologies). An update of a published phylogenetic dataset permits reconsideration of whether dorsal ecdysial sutures have a single or multiple origins in trilobites and their relatives.

      Strengths:

      Documentation of an ontogenetic series makes a sound case that the proposed diagnostic characters of a second species of Acanthomeridion are variation within a single species. New microtomographic data shed light on appendage morphology that was not formerly known. The new data on ventral plates and their association with the ecdysial sutures are valuable in underpinning homologies with trilobites.

      I think the revision does a satisfactory job of reconciling the data and analyses with the conclusions drawn from them. Referee 1's valid concerns about whether a synonymy of Acanthomeridion anacanthus is justified have been addressed by the addition of a length/width scatterplot in Figure 6. Referee 2's doubts about homology between the librigenae of trilobites and ventral plates of Acanthomeridion have been taken on board by re-running the phylogenetic analyses with a coding for possible homology between the ventral plates and the doublure of olenelloid trilobites. The authors sensibly added more trilobite terminals to the matrix (including Olenellus) and did analyses with and without constraints for olenelloids being a grade at the base of Trilobita. My concerns about counting how many times dorsal sutures evolved on a consensus tree have been addressed (the authors now play it safe and say "multiple" rather than attempting to count them on a bushy topology). The treespace visualisation (Figure 9) is a really good addition to the revised paper.

      Weaknesses:

      The question of how many times dorsal ecdysial sutures evolved in Artiopoda was addressed by Hou et al (2017), who first documented the facial sutures of Acanthomeridion and optimised them onto a phylogeny to infer multiple origins, as well as in a paper led by the lead author in Cladistics in 2019. Du et al. (2019) presented a phylogeny based on an earlier version of the current dataset wherein they discussed how many times sutures evolved or were lost based on their presence in Zhiwenia/Protosutura, Acanthomeridion and Trilobita. The answer here is slightly different (because some topologies unite Acanthomeridion and trilobites). This paper is not a game-changer because these questions have been asked several times over the past seven years, but there are solid, worthy advances made here.

      I'd like to see some of the most significant figures from the Supplementary Information included in the main paper so they will be maximally accessed. The "stick-like" exopods are not best illustrated in the main paper; their best imagery is in Figure S1. Why not move that figure (or at least its non-redundant panels) as well as the reconstruction (Figure S7) to the main paper? The latter summarises the authors' interpretation that a large axe-shaped hypostome appears to be contiguous with ventral plates.

      We have moved these figures from the supplementary information to the main text, and renumbered figures accordingly. Fig S1 has now been split – panels a and b are in the main text (new Fig. 4), with the remainder staying as Fig S1. Fig S7 is now Fig. 8 in the main text.

      The specimens depict evidence for three pairs of post-antennal cephalic appendages but it's a bit hard to picture how they functioned if there's no room between the hypostome and ventral plates. Also, a comment is required on the reconstruction involving all cephalic appendages originating against/under the hypostome rather the first pair being paroral near the posterior end of the hypostome and the rest being post-hypostomal as in trilobites.

      A short comment has been added to the caption.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have seen the extensive revision of the manuscript. The main point "Multiple origins of dorsal ecdysial sutures in atiopoans" is now partially supported by results presented by the authors. I am still unsatisfied with descriptions and interpretations of critical features newly revealed by authors. The following points might be useful for the author to make further revisions.

      (1) The antennae were well illustrated in a couple of specimens, while it was described in a short sentence.

      (2) There are also imprecise descriptions of features (see my annotations in submitted ms).

      (3) Ontogeny of the cephalon was not described.

      (3) The critical head element is the so called "vental plate". How this element connects with the cephalic shield is not adequately revealed. The authors claimed that the suture is along the cephalic margin. However, the lateral margin of cephalon is not rounded but exhibit two notches (e.g. Fig 3C) . This gives a indication that the supposed ventral plates have a dorsal extension to fit the notches. Alternatively, the "ventral plate" can be interpreted as a small free cheek with a large ventral extension, providing evidence for librigenal hypothesis.

      Reviewer #3 (Recommendations For The Authors):

      The references swap back and forth between journal titles being abbreviated or written out in full. Please standardise this to journal format rather than alternating between two different styles.

      Line 145: Perez-Peris et al. (2021) should be cited as the source for the Anacheirurus appendages.

      Added, thank you.

      Line 310: The El Albani et al (2024) paper on ellipsocephaloid appendages should be noted in connection with an A+4 (rather than A+3) head in trilobites.

      Added.

      Minor or trivial corrections:

      Line 51: move the three citations to follow "arthropods" rather than following "artiopodans", as none of these papers are specifically about Artiopoda.

      Changed thank you

      Caption to Figure 1 and line 100: Acanthomeridion appears in Figure 1 and in the text with no context. Please weave it into the text appropriately.

      Line 136: The data were...

      Corrected

      Line 164: upper case for Morphobank.

      Corrected

      Line 183: spelling of "Village" (not "Vallige").

      Corrected

      Line 197: I suggest using "articles" rather than "podomeres" for the antenna (as you did in line 232).

      Changed thank you

      Line 269: "gnathobasal spine (rather than "spin").

      Changed thank you

      Line 272: "Exopods" is used here but elsewhere "exopodites" is used.

      Exopodites is now used throughout

      Line 359: "can been seen" is awkward and, as evolutionary patterns are inferred rather than "seen", could be reworded as "... loss of the eye slit has been inferred...".

      Reworded as suggested

      Line 422 and 423: As two referees asked in the first round of review, delete "iconic" and "symbolic".

      Deleted as suggested

      Line 467: "librigena-like".

      Corrected

    1. eLife assessment

      This important computational study provides new insights into how neural dynamics may lead to time-evolving behavioral errors as observed in certain working-memory tasks. By combining ideas from efficient coding and attractor neural networks, the authors construct a two-module network model to capture the sensory-memory interactions and the distributed nature of working memory representations. They provide convincing evidence supporting that their two-module network, although none of the alternative circuit structures they considered can account for error patterns reported in orientation-estimation tasks with delays.

    2. Reviewer #1 (Public Review):

      Summary:

      Working memory is imperfect - memories accrue error over time and are biased towards certain identities. For example, previous work has shown memory for orientation is more accurate near the cardinal directions (i.e., variance in responses is smaller for horizontal and vertical stimuli) while being biased towards diagonal orientations (i.e., there is a repulsive bias away from horizontal and vertical stimuli). The magnitude of errors and biases increase the longer an item is held in working memory and when more items are held in working memory (i.e., working memory load is higher). Previous work has argued that biases and errors could be explained by increased perceptual acuity at cardinal directions. However, these models are constrained to sensory perception and do not explain how biases and errors increase over time in memory. The current manuscript builds on this work to show how a two-layer neural network could integrate errors and biases over a memory delay. In brief, the model includes a 'sensory' layer with heterogenous connections that lead to the repulsive bias and decreased error at the cardinal directions. This layer is then reciprocally connected with a classic ring attractor layer. Through their reciprocal interactions, the biases in the sensory layer are constantly integrated into the representation in memory. In this way, the model captures the distribution of biases and errors for different orientations that has been seen in behavior and their increasing magnitude with time. The authors compare the two-layer network to a simpler one-network model, showing that the one model network is harder to tune and shows an attractive bias for memories that have lower error (which is incompatible with empirical results).

      Strengths:

      The manuscript provides a nice review of the dynamics of items in working memory, showing how errors and biases differ across stimulus space. The two-layer neural network model is able to capture the behavioral effects as well as relate to neurophysiological observations that memory representations are distributed across sensory cortex and prefrontal cortex.

      The authors use multiple approaches to understand how the network produces the observed results. For example, analyzing the dynamics of memories in the low-dimensional representational space of the networks provides the reader with an intuition for the observed effects.

      As a point of comparison with the two-layer network, the authors construct a heterogenous one-layer network (analogous to a single memory network with embedded biases). They argue that such a network is incapable of capturing the observed behavioral effects but could potentially explain biases and noise levels in other sensory domains where attractive biases have lower errors (e.g., color).

      The authors show how changes in the strength of Hebbian learning of excitatory and inhibitory synapses can change network behavior. This argues for relatively stronger learning in inhibitory synapses, an interesting prediction.

      The manuscript is well-written. In particular, the figures are well done and nicely schematize the model and the results.

      Weaknesses:

      Despite its strengths, the manuscript does have some weaknesses. These weaknesses are adequately discussed in the manuscript and motivate future research.

      One weakness is that the model is not directly fit to behavioral data, but rather compared to a schematic of behavioral data. As noted above, the model provides insight into the general phenomenon of biases in working memory. However, because the models are not fit directly to data, they may miss some aspects of the data.

      In addition, directly fitting the models to behavioral data could allow for a broader exploration of parameter space for both the one-layer and two-layer models (and their alternatives). Such an approach would provide stronger support for the papers claims (such as "....these evolving errors...require network interaction between two distinct modules."). That being said, the manuscript does explore several alternative models and also acknowledges the limitation of not directly fitting behavior, due to difficulties in fitting complex neural network models to data.

      One important behavioral observation is that both diffusive noise and biases increase with the number of items in working memory. The current model does not capture these effects and it isn't clear how the model architecture could be extended to capture these effects. That being said, the authors note this limitation in the Discussion and present it as a future direction.

      Overall:

      Overall, the manuscript was successful in building a model that captured the biases and noise observed in working memory. This work complements previous studies that have viewed these effects through the lens of optimal coding, extending these models to explain the effects of time in memory. In addition, the two-layer network architecture extends previous work with similar architectures, adding further support to the distributed nature of working memory representations.

    3. Reviewer #2 (Public Review):

      In this manuscript, Yang et al. present a modeling framework to understand the pattern of response biases and variance observed in delayed-response orientation estimation tasks. They combine a series of modeling approaches to show that coupled sensory-memory networks are in a better position than single-area models to support experimentally observed delay-dependent response bias and variance in cardinal compared to oblique orientations. These errors can emerge from a population-code approach that implements efficient coding and Bayesian inference principles and is coupled to a memory module that introduces random maintenance errors. A biological implementation of such operation is found when coupling two neural network modules, a sensory module with connectivity inhomogeneities that reflect environment priors, and a memory module with strong homogeneous connectivity that sustains continuous ring attractor function. Comparison with single-network solutions that combine both connectivity inhomogeneities and memory attractors shows that two-area models can more easily reproduce the patterns of errors observed experimentally.

      Strengths:

      The model provides an integration of two modeling approaches to the computational bases of behavioral biases: one based on Bayesian and efficient coding principles, and one based on attractor dynamics. These two perspectives are not usually integrated consistently in existing studies, which this manuscript beautifully achieves. This is a conceptual advancement, especially because it brings together the perceptual and memory components of common laboratory tasks.

      The proposed two-area model provides a biologically plausible implementation of efficient coding and Bayesian inference principles, which interact seamlessly with a memory buffer to produce a complex pattern of delay-dependent response errors. No previous model had achieved this.

      Weaknesses:

      The correspondence between the various computational models is not clearly shown. It is not easy to see clearly this correspondence because network function is illustrated with different representations for different models. In particular, the Bayesian model of Figure 2 is illustrated with population responses for different stimuli and delays, while the attractor models of Figure 3 and 4 are illustrated with neuronal tuning curves but not population activity.

      The proposed model has stronger feedback than feedforward connections between the sensory and memory modules (J_f = 0.1 and J_b = 0.25). This is not the common assumption when thinking about hierarchical processing in the brain. The manuscript argues that error patterns remain similar as long as the product of J_f and J_b is constant, so it is unclear why the authors preferred this network example as opposed to one with J_b = 0.1 and J_f = 0.25.

    4. Reviewer #3 (Public Review):

      Summary:

      The present study proposes a neural circuit model consisting of coupled sensory and memory networks to explain the circuit mechanism of the cardinal effect in orientation perception which is characterized by the bias towards the oblique orientation and the largest variance at the oblique orientation.

      Strengths:

      The authors have done numerical simulations and preliminary analysis of the neural circuit model to show the model successfully reproduces the cardinal effect. And the paper is well-written overall. As far as I know, most of the studies on the cardinal effect are at the level of statistical models, and the current study provides one possibility of how neural circuit models reproduce such an effect.

      Weaknesses:

      There are no major weaknesses and flaws in the present study, although I suggest the author conduct further analysis to deepen our understanding of the circuit mechanism of the cardinal effects.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #3:

      I appreciate the revisions made by the author which address all of my concerns.

      Nevertheless, I have some new questions when I read the paper again. These questions are not necessarily criticisms of the paper, which may reflect the gap in my understanding. Meanwhile, it also reflects the writing might be improved further.

      - Fig. 1:

      I understand that a critical assumption for generating the required result is that the oblique orientation has lower "energy" than the cardinal orientation (Fig. 1G). Meanwhile, I always have a concept that typically the energy is defined as the negative of log probability. If we take the log probability plotted in Fig. 1A, that will generate an energy landscape that is upside down compared with current Fig. 1G. How should I understand this discrepancy?

      As the reviewer pointed out, a higher prior distribution near cardinal orientations causes cardinal attraction in typical Bayesian models, which can correspond to lower energy around these orientations. Additionally, in the context of learning natural statistics, Hebbian plasticity in excitatory connections strengthens recurrent connections and drives attraction toward more prevalent stimuli within neural circuits.

      However, as demonstrated by Wei and Stocker (2015), Bayesian inference model can also produce cardinal repulsion when optimizing encoding efficiency. In our network, this efficient encoding is achieved through heterogeneous lateral connections and inhibitory Hebbian plasticity in the sensory module, resulting in lower energy near oblique orientations. Thus, the shape of prior distribution does not have a direct one-to-one correspondence with the bias pattern or the dynamic energy landscape. 

      - Fig. 3 and its corresponding text.

      I understand and agree the Fig. 3B&C that neurons near cardinal orientations are shaper and denser. But why the stimulus representation around cardinal orientations are sparser compared with the oblique orientation? Isn't more neurons around cardinal orientation implying a less sparser representation?

      Indeed, with sharper tuning curves, having more neurons can result in a sparser representation. Consider an extreme case where each orientation, discretized by 1°, is represented by only one active neuron with a tuning width of 1°. While this would require more neurons to represent overall stimuli compared to cases with wider tuning curves, each stimulus would be represented by fewer neurons, aligning with the traditional concept of sparse coding.

      However, in Fig. 3 and corresponding text, we did not measure the sparseness of active neurons for each orientation. Instead, we used the term ‘sparser representation’ to describe the increased distance between representations of different stimuli near the cardinal orientations. Although this increased distance can be consistent with the traditional concept of sparse coding, to avoid any confusion, we have revised the term ‘sparser representation’ to ‘more dispersed representation’ in the 3rd paragraph in pg. 5 and the 3rd paragraph in pg. 6.

    1. eLife assessment

      The study presents a potentially valuable approach by combining two measurements (pHLA binding and pHLA-TCR binding) to improve predictions of which mutations in colorectal cancer are likely to be presented to and recognised by the immune system. While this approach is promising, the evidence supporting the primary claim remains somewhat incomplete. The experimental validation of the computational predictions with actual immune responses is still limited, despite the increase in sample size from 4 to 8 in this revision.

    2. Reviewer #2 (Public Review):

      Summary:

      This paper introduces a novel approach for improving personalized cancer immunotherapy by integrating TCR profiling with traditional pHLA binding predictions, addressing the need for more precise neoantigen CRC patients. By analyzing TCR repertoires from tumor-infiltrating lymphocytes and applying machine learning algorithms, the authors developed a predictive model that outperforms conventional methods in specificity and sensitivity. The validation of the model through ELISpot assays confirmed its potential in identifying more effective neoantigens, highlighting the significance of combining TCR and pHLA data for advancing personalized immunotherapy strategies.

      Strengths:

      (1) Comprehensive Patient Data Collection: The study meticulously collected and analyzed clinical data from 27 CRC patients, ensuring a robust foundation for research findings. The detailed documentation of patient demographics, cancer stages, and pathology information enhances the study's credibility and potential applicability to broader patient populations.<br /> (2) The use of machine learning classifiers (RF, LR, XGB) and the combination of pHLA and pHLA-TCR binding predictions significantly enhance the model's accuracy in identifying immunogenic neoantigens, as evidenced by the high AUC values and improved sensitivity, NPV, and PPV.<br /> (3) The use of experimental validation through ELISpot assays adds a practical dimension to the study, confirming the computational predictions with actual immune responses. The calculation of ranking coverage scores and the comparative analysis between the combined model and the conventional NetMHCpan method demonstrate the superior performance of the combined approach in accurately ranking immunogenic neoantigens.<br /> (4) The use of experimental validation through ELISpot assays adds a practical dimension to the study, confirming the computational predictions with actual immune responses.

      Weakness:

      The authors have made comprehensive revisions to the original version of the article, and this version has now addressed my concerns.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper reports a number of somewhat disparate findings on a set of colorectal tumour and infiltrating T-cells. The main finding is a combined machine-learning tool which combines two previous state-of-the-art tools, MHC prediction, and T-cell binding prediction to predict immunogenicity. This is then applied to a small set of neoantigens and there is a small-scale validation of the prediciton at the end.

      Strengths:

      The prediction of immunogenic neoepitopes is an important and unresolved question.

      Weaknesses:

      The paper contains a lot of extraneous material not relevant to the main claim. Conversely, it lacks important detail on the major claim.

      (1) The analysis of T cell repertoire in Figure 2 seems irrelevant to the rest of the paper. As far as I could ascertain, this data is not used further.

      We appreciate the reviewer for their valuable feedback. We concur with the reviewer's observation that the analysis of the TCR repertoire in Figure 2 should be moved to the supplementary section. We have moved Figures 2B to 2F to Supplementary Figure 2.

      However, the analysis of TCR profiles is still presented in Figure 2, as it plays a pivotal role in the process of neoantigen selection. This is because the TCR profiles of eight (out of 28) patients were used for neoantigen prediction. We have added the following sentences to the results section to explain the importance of TCR profiling: “Furthermore, characterizing T cell receptors (TCRs) can complement efforts to predict immunogenicity.” (Results, Lines 311-312, Page 11)

      (2) The key claim of the paper rests on the performance of the ML algorithm combining NETMHC and pmtNET. In turn, this depends on the selection of peptides for training. I am unclear about how the negative peptides were selected. Are they peptides from the same databases as immunogenic petpides but randomised for MHC? It seems as though there will be a lot of overlap between the peptides used for testing the combined algorithm, and the peptides used for training MHCNet and pmtMHC. If this is so, and depending on the choice of negative peptides, it is surely expected that the tools perform better on immunogenic than on non-immunogenic peptides in Figure 3. I don't fully understand panel G, but there seems very little difference between the TCR ranking and the combined. Why does including the TCR ranking have such a deleterious effect on sensitivity?

      We thank the reviewer for their valuable feedback. We believe the reviewer implies 'MHCNet' as NetMHCpan and 'pmtMHC' as pMTnet tools. First, the negative peptides, which have been excluded from PRIME (1), were not randomized with MHC (HLA-I) but were randomized with TCR only. Secondly, the positive peptides selected for our combined algorithms are chosen from many databases such as 10X Genomics, McPAS, VDJdb, IEDB, and TBAdb, while MHCNet uses peptides from the IEDB database and pMTNet uses a totally different dataset from ours for training. Therefore, there is not much overlap between our training data and the training datasets for MHCNet and pMTNet. Thus, the better performance of our tool is not due to overlapping training datasets with these tools or the selection of negative peptides.

      To enhance the clarity of the dataset construction, we have added Supplementary Figure 1, which demonstrates the workflow of peptide collection and the random splitting of data to generate the discovery and validation datasets. Additionally, we have revised the following sentence: "To objectively train and evaluate the model, we separated the dataset mentioned above into two subsets: a discovery dataset (70%) and a validation dataset (30%). These subsets are mutually exclusive and do not overlap.” (Methods, lines 221-223, page 8).

      Initially, the "combine" label in Figure 3G was confusing and potentially misleading when compared to our subsequent approach using a combined machine learning model. In Figure 3G, the "combine" approach simply aggregates the pHLA and pHLA-TCR criteria, whereas our combined machine learning model employs a more sophisticated algorithm to integrate these criteria effectively. The combined analysis in Figure 3G utilizes a basic "AND" algorithm between pHLA and pHLA-TCR criteria, aiming for high sensitivity in HLA binding and high specificity. However, this approach demonstrated lower efficacy in practice, underscoring the necessity for a more refined integration method through machine learning. This was the key point we intended to convey with Figure 3G. To address this issue, we have revised Figure 3G to replace "combined" with "HLA percentile & TCR ranking" to clarify its purpose and minimize confusion.

      (3) The key validation of the model is Figure 5. In 4 patients, the authors report that 6 out 21 neo-antigen peptides give interferon responses > 2 fold above background. Using NETMHC alone (I presume the tool was used to rank peptides according to binding to the respective HLAs in each individual, but this is not clear), identified 2; using the combined tool identified 4. I don't think this is significant by any measure. I don't understand the score shown in panel E but I don't think it alters the underlying statistic.

      Acknowledging the limitations of our study's sample size, we proceeded to further validate our findings with four additional patients to acquire more data. The final results revealed that our combined model identified seven peptides eliciting interferon responses greater than a two-fold increase, compared to only three peptides identified by NetMHCpan (Figure 5)

      In conclusion, the paper demonstrates that combining MHCNET and pmtMHC results in a modest increase in the ability to discriminate 'immunogenic' from 'non-immunogenic' peptide; however, the strength of this claim is difficult to evaluate without more knowledge about the negative peptides. The experimental validation of this approach in the context of CRC is not convincing.

      Reviewer #2 (Public Review):

      Summary:

      This paper introduces a novel approach for improving personalized cancer immunotherapy by integrating TCR profiling with traditional pHLA binding predictions, addressing the need for more precise neoantigen CRC patients. By analyzing TCR repertoires from tumor-infiltrating lymphocytes and applying machine learning algorithms, the authors developed a predictive model that outperforms conventional methods in specificity and sensitivity. The validation of the model through ELISpot assays confirmed its potential in identifying more effective neoantigens, highlighting the significance of combining TCR and pHLA data for advancing personalized immunotherapy strategies.

      Strengths:

      (1) Comprehensive Patient Data Collection: The study meticulously collected and analyzed clinical data from 27 CRC patients, ensuring a robust foundation for research findings. The detailed documentation of patient demographics, cancer stages, and pathology information enhances the study's credibility and potential applicability to broader patient populations.

      (2) The use of machine learning classifiers (RF, LR, XGB) and the combination of pHLA and pHLA-TCR binding predictions significantly enhance the model's accuracy in identifying immunogenic neoantigens, as evidenced by the high AUC values and improved sensitivity, NPV, and PPV.

      (3) The use of experimental validation through ELISpot assays adds a practical dimension to the study, confirming the computational predictions with actual immune responses. The calculation of ranking coverage scores and the comparative analysis between the combined model and the conventional NetMHCpan method demonstrate the superior performance of the combined approach in accurately ranking immunogenic neoantigens.

      (4) The use of experimental validation through ELISpot assays adds a practical dimension to the study, confirming the computational predictions with actual immune responses.

      Weaknesses:

      (1) While multiple advanced tools and algorithms are used, the study could benefit from a more detailed explanation of the rationale behind algorithm choice and parameter settings, ensuring reproducibility and transparency.

      We thank the reviewer for their comment. We have revised the explanation regarding the rationale behind algorithm choice and parameter settings as follows: “We examined three machine learning algorithms - Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGB) - for each feature type (pHLA binding, pHLA-TCR binding), as well as for combined features. Feature selection was tested using a k-fold cross-validation approach on the discovery dataset with 'k' set to 10-fold. This process splits the discovery dataset into 10 equal-sized folds, iteratively using 9 folds for training and 1 fold for validation. Model performance was evaluated using the ‘roc_auc’ (Receiver Operating Characteristic Area Under the Curve) metric, which measures the model's ability to distinguish between positive and negative peptides. The average of these scores provides a robust estimate of the model's performance and generalizability. The model with the highest ‘roc_auc’ average score, XGB, was chosen for all features.” (Method, lines 225-234, page 8).

      (2) While pHLA-TCR binding displayed higher specificity, its lower sensitivity compared to pHLA binding suggests a trade-off between the two measures. Optimizing the balance between sensitivity and specificity could be crucial for the practical application of these predictions in clinical settings.

      We appreciate the reviewer's suggestion. Due to the limited availability of patient blood samples and time constraints for validation, we have chosen to prioritize high specificity and positive predictive value to enhance the selection of neoantigens.

      (3) The experimental validation was performed on a limited number of patients (four), which might affect the generalizability of the findings. Increasing the number of patients for validation could provide a more comprehensive assessment of the model's performance.

      This has been addressed earlier. Here, we restate it as follows: Acknowledging the limitations of our study's sample size, we proceeded to further validate our findings with four additional patients to acquire more data. The final results revealed that our combined model identified seven peptides eliciting interferon responses greater than a two-fold increase, compared to only three peptides identified by NetMHCpan (Figure 5).

      Reviewer #3 (Public Review):

      Summary:

      This study presents a new approach of combining two measurements (pHLA binding and pHLA-TCR binding) in order to refine predictions of which patient mutations are likely presented to and recognized by the immune system. Improving such predictions would play an important role in making personalized anti-cancer vaccinations more effective.

      Strengths:

      The study combines data from pre-existing tools pVACseq and pMTNet and applies them to a CRC patient population, which the authors show may improve the chance of identifying immunogenic, cancer-derived neoepitopes. Making the datasets collected publicly available would expand beyond the current datasets that typically describe caucasian patients.

      Weaknesses:

      It is unclear whether the pNetMHCpan and pMTNet tools used by the authors are entirely independent, as they appear to have been trained on overlapping datasets, which may explain their similar scores. The pHLA-TCR score seems to be driving the effects, but this not discussed in detail.

      The HLA percentile from NetMHCpan and the TCR ranking from pMTNet are independent. NetMHCpan predicts the interaction between peptides and MHC class I, while pMTNet predicts the TCR binding specificity of class I MHCs and peptides.Additionally, we partitioned the dataset mentioned above into two subsets: a discovery dataset (70%) and a validation dataset (30%), ensuring no overlap between the training and testing datasets.

      To enhance the clarity of the dataset construction, we have added Supplementary Figure 1, which demonstrates the workflow of peptide collection and the random splitting of data to generate the discovery and validation datasets. Additionally, we have revised the following sentence: "To objectively train and evaluate the model, we separated the dataset mentioned above into two subsets: a discovery dataset (70%) and a validation dataset (30%). These subsets are mutually exclusive and do not overlap.” (Methods, lines 221-223, page 8). We also included the dataset construction workflow in Supplementary Figure 1.

      Due to sample constraints, the authors were only able to do a limited amount of experimental validation to support their model; this raises questions as to how generalizable the presented results are. It would be desirable to use statistical thresholds to justify cutoffs in ELISPOT data.

      We chose a cutoff of 2 for ELISPOT, following the recommendation of the study by Moodie et al. (2). The study provides standardized cutoffs for defining positive responses in ELISPOT assays. It presents revised criteria based on a comprehensive analysis of data from multiple studies, aiming to improve the precision and consistency of immune response measurements across various applications.

      Some of the TCR repertoire metrics presented in Figure 2 are incorrectly described as independent variables and do not meaningfully contribute to the paper. The TCR repertoires may have benefitted from deeper sequencing coverage, as many TCRs appear to be supported only by a single read.

      We appreciate the reviewer’s feedback. We have moved Figures 2B through 2F to Supplementary Figure 2. We agree with the reviewer that deeper sequencing coverage could potentially benefit the repertoires. However, based on our current sequencing depth, we have observed that many of our samples (14 out of 28) have reached sufficient saturation, as indicated by Figure 2C. The TCR clones selected in our studies are unique molecular identifier (UMI)-collapsed reads, each representing at least three raw reads sharing the same UMI. This approach ensures that the data is robust despite the variability. It is important to note that Tumor-Infiltrating Lymphocytes (TILs) differ across samples, resulting in non-uniform sequencing coverage among them.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) Please open source the raw and processed data, code, and software output (NetMHCpan, pMTnet), which are important to verify the results.

      NetMHCpan and pMTNet are publicly available software tools (3, 4). In our GitHub repository, we have included links to the GitHub repositories for NetMHCpan and pMTNet (https://github.com/QuynhPham1220/Combined-model).

      (2) Comparison with more state-of-the-art neoantigen prediction models could provide a more comprehensive view of the combined model's performance relative to the current field.

      To further evaluate our model, we gathered additional public data and assessed its effectiveness in comparison to other models. We utilized immunogenic peptides from databases such as NEPdb (5), NeoPeptide (6), dbPepneo (7), Tantigen (8), and TSNAdb (9), ensuring there was no overlap with the datasets used for training and validation. For non-immunogenic peptides, we used data from 10X Genomics Chromium Single Cell Immune Profiling (10-13).The findings indicate that the combined model from pMTNet and NetMHCpan outperforms NetTCR tool (14). To address the reviewer's inquiry, we have incorporated these results in Supplementary Table 6.

      (3) While the combined model shows a positive overall rank coverage score, indicating improved ranking accuracy, the scores are relatively low. Further refinement of the model or the inclusion of additional predictive features might enhance the ranking accuracy.

      We appreciate the reviewer’s suggestion. The RankCoverageScore provides an objective evaluation of the rank results derived from the final peptide list generated by the two tools. The combined model achieved a higher RankCoverageScore than pMTNet, indicating its superior ability to identify immunogenic peptides compared to existing in silico tools. In order to provide a more comprehensive assessment, we included an additional four validated samples to recalculate the rank coverage score. The results demonstrate a notable difference between NetMHCpan and the Combined model (-0.37 and 0.04, respectively). We have incorporated these findings into Supplementary Figure 6 to address the reviewer's question. Additionally, we have modified Figure 5E to present a simplified demonstration of the superior performance of the combined model compared to NetMHCpan.

      (4) Collect more public data and fine-tune the model. Then you will get a SOTA model for neoantigen selection. I strongly recommend you write Python scripts and open source.

      We thank the reviewer for their feedback. We have made the raw and processed data, as well as the model, available on GitHub. Additionally, we have gathered more public data and conducted evaluations to assess its efficiency compared to other methods. You can find the repository here: https://github.com/QuynhPham1220/Combined-model.

      Reviewer #3 (Recommendations For The Authors):

      The Methods section seems good, though HLA calling is more accurate using arcasHLA than OptiType. This would be difficult to correct as OptiType is integrated into pVACtools.

      We chose Optitype for its exceptional accuracy, surpassing 99%, in identifying HLA-I alleles from RNA-Seq data. This decision was informed by a recent extensive benchmarking study that evaluated its performance against "gold-standard" HLA genotyping data, as described in the study by Li et al.(15). Furthermore, we have tested two tools using the same RNA-Seq data from FFPE samples. The allele calling accuracy of Optitype was found to be superior to that of Acras-HLA. To address the reviewer's question, we have included these results in Supplementary Table 2, along with the reference to this decision (Method, line 200, page 07).

      I am not sufficiently expert in machine learning to assess this part of the methods.<br /> TCR beta repertoire analysis of biopsy is highly variable; though my expertise lies largely in sequencing using the 10X genomics platform, typically one sees multiple RNAs per cell. Seeing the majority of TCRs supported by only a single read suggests either problems with RNA capture (particularly in this case where the recovered RNA was split to allow both RNAseq and targeted TCR seq) or that the TCR library was not sequenced deeply enough. I'd like to have seen rarefaction plots of TCR repertoire diversity vs the number of reads to ensure that sufficiently deep sequencing was performed.

      We appreciate the suggestions provided by the reviewer. We agree that deeper sequencing coverage could potentially benefit the repertoires. However, based on our current sequencing depth, we have observed that many of our samples (14 out of 28) have reached sufficient saturation, as indicated by Figure 2C. In addition, the TCR clones selected in our studies are unique molecular identifier (UMI)-collapsed reads, each representing at least three raw reads sharing the same UMI. This approach ensures that the data is robust despite variability. It is important to note that Tumor-Infiltrating Lymphocytes (TILs) differ across samples, resulting in non-uniform sequencing coverage among them. We have already added the rarefaction plots of TCR repertoire diversity versus the number of reads in Figure 2C. These have been added to the main text (lines 329-335).

      In order to support the authors' conclusions that MSI-H tumors have fewer TCR clonotypes than MSS tumors (Figure S2a) I would have liked to see Figure 2a annotated so that it was easy to distinguish which patient was in which group, as well as the rarefaction plots suggested above, to be sure that the difference represented a real difference between samples and not technical variance (which might occur due to only 4 samples being in the MSI-H group).

      We thank the reviewer for their recommendation. Indeed, it's worth noting that the number of MSI-H tumors is fewer than the MSS groups, which is consistent with the distribution observed in colorectal cancer, typically around 15%. This distribution pattern aligns with findings from several previous studies, as highlighted in these studies (16, 17). To provide further clarification on this point, we have included rarefaction plots illustrating TCR repertoire diversity versus the number of reads in Supplementary Figure 3 (line 339). Additionally, MSI-H and MSS samples have been appropriately labeled for clarity.

      The authors write: "in accordance with prior investigations, we identified an inverse relationship between TCR clonality and the Shannon index (Supplementary Figure S1)" >> Shannon index is measure of TCR clonality, not an independent variable. The authors may have meant TCR repertoire richness (the absolute number of TCRs), and the Shannon index (a measure of how many unique TCRs are present in the index).

      We thank the reviewer for their comment regarding the correlation between the number of TCRs and the Shannon index. We have revised the figure to illustrate the relationship between the number of TCRs and the Shannon index, and we have relocated it to Figure 2B.

      The authors continue: "As anticipated, we identified only 58 distinct V (Figure 2C) and 13 distinct J segments (Figure 2D), that collectively generated 184,396 clones across the 27 tumor tissue samples, underscoring the conservation of these segments (Figure 2C & D)" >> it is not clear to me what point the authors are making: it is well known that TCR V and J genes are largely shared between Caucasian populations (https://pubmed.ncbi.nlm.nih.gov/10810226/), and though IMGT lists additional forms of these genes, many are quite rare and are typically not included in the reference sequences used by repertoire analysis software. I would clarify the language in this section to avoid the impression that patient repertoires are only using a restricted set of J genes.

      We thank for the reviewer’s feedback. We have revised the sentence as follows: " As anticipated, we identified 59 distinct V segments (Supplementary Figure 2C) and 13 distinct J segments (Supplementary Figure 2D), collectively sharing 185,627 clones across the 28 tumor tissue samples. This underscores the conservation of these segments (Supplementary Figure 2C & D)” (Result, lines 354-356, page 12)

      As a result I would suggest moving Figure 2 with the exception of 2A into the supplementals - I would have been more interested in a plot showing the distribution of TCRs by frequency, i.e. how what proportion of clones are hyperexpanded, moderately expanded etc. This would be a better measure of the likely immune responses.

      We thank the reviewer for their comment. With the exception of Figure 2A, we have relocated Figures 2B through 2F to Supplementary Figure 2.

      The authors write "To accomplish this, we gathered HLA and TCRβ sequences from established datasets containing immunogenic and non-immunogenic peptides (Supplementary Table 3)" >> The authors mean to refer to Table S4.

      We appreciate the reviewer's feedback. Here's the revised sentence: "To accomplish this, we gathered HLA and TCRβ sequences from established datasets containing immunogenic and non-immunogenic pHLA-TCR complexes (Supplementary Table 5)” (lines 368-370).

      The authors write "As anticipated, our analysis revealed a significantly higher prevalence of peptides with robust HLA binding (percentile rank < 2%) among immunogenic peptides in contrast to their non-immunogenic counterparts (Figure 3A & B, p< 0.00001)" >> this is not surprising, as tools such as NetMHCpan are trained on databases of immunogenic peptides, and thus it is likely that these aren't independent measures (in https://academic.oup.com/nar/article/48/W1/W449/5837056 the authors state that "The training data have been vastly extended by accumulating MHC BA and EL data from the public domain. In particular, EL data were extended to include MA data"). In the pMTNet paper it is stated that pMNet encoded pMHC information using "the exact data that were used to train the netMHCpan model" >> While I am not sufficiently expert to review details on machine learning training models, it would seem that the pHLA scores from NetMHCpan and pMTNet may not be independent, which would explain the concordance in scores that the authors describe in Figures 3B and 3D. I would invite the authors to comment on this.

      The HLA percentiles from NetMHCpan and TCR rankings from pMTNet are independent. NetMHCpan predicts the interaction between peptides and MHC class I, while pMTNet predicts the TCR binding specificity of class I MHCs and peptides. NetMHCpan is trained to predict peptide-MHC class I interactions by integrating binding affinity and MS eluted ligand data, using a second output neuron in the NNAlign approach. This setup produces scores for both binding affinity and ligand elution. In contrast, pMTNet predicts TCR binding specificity of class I pMHCs through three steps:

      (1) Training a numeric embedding of pMHCs (class I only) to numerically represent protein sequences of antigens and MHCs.

      (2) Training an embedding of TCR sequences using stacked auto-encoders to numerically encode TCR sequence text strings.

      (3) Creating a deep neural network combining these two embeddings to integrate knowledge from TCRs, antigenic peptide sequences, and MHC alleles. Fine-tuning is employed to finalize the prediction model for TCR-pMHC pairing.

      Therefore, pHLA scores from NetMHCpan and pMTNet are independent. Furthermore, Figures 3B and 3D do not show concordance in scores, as there was no equivalence in the percentage of immunogenic and non-immunogenic peptides in the two groups (≥2 HLA percentile and ≥2 TCR percentile).

      Many of the authors of this paper were also authors of the epiTCR paper, would this not have been a better choice of tool for assessing pHLA-TCR binding than pMTNet?

      When we started this project, EpiTCR had not been completed. Therefore, we chose pMTNet, which had demonstrated good performance and high accuracy at that time. The validated performance of EpiTCR is an ongoing project that will implement immunogenic assays (ELISpot and single-cell sequencing) to assess the prediction and ranking of neoantigens. This study is also mentioned in the discussion: "Moreover, to improve the accuracy and effectiveness of the machine learning model in predicting and ranking neoantigens, we have developed an in-house tool called EpiTCR. This tool will utilize immunogenic assays, such as ELISpot and single-cell sequencing, for validation." (lines 532-535).

      In Figure 3G it would appear that the pHLA-TCR score is driving the interaction, could the authors comment on this?

      The authors sincerely appreciate the reviewer for their valuable feedback. Initially, the "combine" label in Figure 3G was confusing and potentially misleading when compared to our subsequent approach using a combined machine learning model. In Figure 3G, the "combine" approach simply aggregates the pHLA and pHLA-TCR criteria, whereas our combined machine learning model employs a more sophisticated algorithm to integrate these criteria effectively.

      The combined analysis in Figure 3G utilizes a basic "AND" algorithm between pHLA and pHLA-TCR criteria, aiming for high sensitivity in HLA binding and high specificity. However, this approach demonstrated lower efficacy in practice, underscoring the necessity for a more refined integration method through machine learning. This was the key point we intended to convey with Figure 3G. To address this issue, we have revised Figure 3G to replace "combined" with "HLA percentile & TCR ranking" to clarify its purpose and minimize confusion.

      In Figure 4A I would invite the authors to comment on how they chose the sample sizes they did for the discovery and validation datasets: the numbers seem rather random. I would question whether a training dataset in which 20% of the peptides are immunogenic accurately represents the case in patients, where I believe immunogenic peptides are less frequent (as in Figure 5).

      We aimed to maximize the number of experimentally validated immunogenic peptides, including those from viruses, with only a small percentage from tumors available for training. This limitation is inherent in the field. However, our ultimate objective is to develop a tool capable of accurately predicting peptide immunogenicity irrespective of their source. Therefore, the current percentage of immunogenic peptides may not accurately reflect real-world patient cases, but this is not crucial to our development goals.

      For Figure 5C I would invite the authors to consider adding a statistical test to justify the cutoff at 2fold enrichments.

      Thank you for your feedback. Instead of conducting a statistical test, we have implemented standardized cutoffs as defined in the cited study (2). This research introduces refined criteria for identifying positive responses in ELISPOT assays through a comprehensive analysis of data from multiple studies. These criteria aim to improve the accuracy and consistency of immune response measurements across various applications. The reference to this study has been properly incorporated into the manuscript (Method, line 281, page 10).

      Minor points:

      "paired white blood cells" >> use "paired Peripheral Blood Mononuclear Cells".

      We appreciate the reviewer for the feedback. We agree with the reviewer's observation. The sentence has been revised as follows: "Initially, DNA sequencing of tumor tissues and paired Peripheral Blood Mononuclear Cells identifies cancer-associated genomic mutations. RNA sequencing then determines the patient's HLA-I allele profile and the gene expression levels of mutated genes." (Introduction, lines 55-58, page 2).

      "while RNA sequencing determines the patient's HLA-I allele profile and gene expression levels of mutated genes." >> RNA sequencing covers both the mutant and reference form of the gene, allowing assessment of variant allele frequency.

      "the current approach's impact on patient outcomes remains limited due to the scarcity of effective immunogenic neoantigens identified for each patient" >> Some clearer language here would have been preferred as different tumor types have different mutational loads

      We thank the reviewer for their valuable feedback. We agree with the reviewer's observation. The passage has been revised accordingly: “The current approach's impact on patient outcomes remains limited due to the scarcity of mutations in cancer patients that lead to effective immunogenic neoantigens.” (Introduction, lines 62-64, page 3).

      References

      (1) J. Schmidt et al., Prediction of neo-epitope immunogenicity reveals TCR recognition determinants and provides insight into immunoediting. Cell Rep Med 2, 100194 (2021).

      (2) Z. Moodie et al., Response definition criteria for ELISPOT assays revisited. Cancer Immunol Immunother 59, 1489-1501 (2010).

      (3) V. Jurtz et al., NetMHCpan-4.0: Improved Peptide-MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data. J Immunol 199, 3360-3368 (2017).

      (4) T. Lu et al., Deep learning-based prediction of the T cell receptor-antigen binding specificity. Nat Mach Intell 3, 864-875 (2021).

      (5) J. Xia et al., NEPdb: A Database of T-Cell Experimentally-Validated Neoantigens and Pan-Cancer Predicted Neoepitopes for Cancer Immunotherapy. Front Immunol 12, 644637 (2021).

      (6) W. J. Zhou et al., NeoPeptide: an immunoinformatic database of T-cell-defined neoantigens. Database (Oxford) 2019 (2019).

      (7) X. Tan et al., dbPepNeo: a manually curated database for human tumor neoantigen peptides. Database (Oxford) 2020 (2020).

      (8) G. Zhang, L. Chitkushev, L. R. Olsen, D. B. Keskin, V. Brusic, TANTIGEN 2.0: a knowledge base of tumor T cell antigens and epitopes. BMC Bioinformatics 22, 40 (2021).

      (9) J. Wu et al., TSNAdb: A Database for Tumor-specific Neoantigens from Immunogenomics Data Analysis. Genomics Proteomics Bioinformatics 16, 276-282 (2018).

      (10) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-1-1-standard-3-0-2.

      (11) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-2-1-standard-3-0-2.

      (12) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-3-1-standard-3-0-2.

      (13) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-4-1-standard-3-0-2.

      (14) A. Montemurro et al., NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRalpha and beta sequence data. Commun Biol 4, 1060 (2021).

      (15) G. Li et al., Splicing neoantigen discovery with SNAF reveals shared targets for cancer immunotherapy. Sci Transl Med 16, eade2886 (2024).

      (16) Z. Gatalica, S. Vranic, J. Xiu, J. Swensen, S. Reddy, High microsatellite instability (MSI-H) colorectal carcinoma: a brief review of predictive biomarkers in the era of personalized medicine. Fam Cancer 15, 405-412 (2016).

      (17) N. Mulet-Margalef et al., Challenges and Therapeutic Opportunities in the dMMR/MSI-H Colorectal Cancer Landscape. Cancers (Basel) 15 (2023).

    1. eLife assessment

      This important manuscript demonstrates that UGGT1 is involved in preventing the premature degradation of endoplasmic reticulum (ER) glycoproteins through the re-glucosylation of their N-linked glycans following release from the calnexin/calreticulin lectins. The authors include a wealth of convincing data in support of their findings, although extending these findings to other types of substrates, such as secreted proteins, could further demonstrate the global importance of this mechanism for protein trafficking through the secretory pathway. This will work will be of interest to scientists interested in ER protein quality control, proteostasis, and protein trafficking.

    2. Reviewer #1 (Public review):

      Summary:

      UGGTs are involved in the prevention of premature degradation for misfolded glycoproteins, by utilizing UGGT1-KO cells and a number of different ERAD substrates. They proposed a concept by which the fate of glycoproteins can be determined by a tug-of-war between UGGTs and EDEMs.

      Strengths:

      The authors provided a wealth of data to indicate that UGGT1 competes with EDEMs, which promotes the glycoprotein degradation.

      Weaknesses:

      NA

    3. Reviewer #2 (Public review):

      In this study, Ninagawa et al., sheds light on UGGT's role in ER quality control of glycoproteins. By utilizing UGGT1/UGGT2 DKO , they demonstrate that several model misfolded glycoproteins undergo early degradation. One such substrate is ATF6alpha where its premature degradation hampers the cell's ability to mount an ER stress response.

      This study convincingly demonstrates that many unstable misfolded glycoproteins undergo accelerated degradation without UGGTs. Also, this study provides evidence of a "tug of war" model involving UGGTs (pulling glycoproteins to being refolded) and EDEMs (pulling glycoproteins to ERAD).

      The study explores the physiological role of UGGT, particularly examining the impact of ATF6α in UGGT knockout cells' stress response. The authors further investigate the physiological consequences of accelerated ATF6α degradation, convincingly demonstrating that cells are sensitive to ER stress in the absence of UGGTs and unable to mount an adequate ER stress response.

      These findings offer significant new insights into the ERAD field, highlighting UGGT1 as a crucial component in maintaining ER protein homeostasis. This represents a major advancement in our understanding of the field.

    4. Reviewer #3 (Public review):

      This valuable manuscript demonstrates the long-held prediction that the glycosyltransferase UGGT slows degradation of endoplasmic reticulum (ER)-associated degradation substrates through a mechanism involving re-glucosylation of asparagine-linked glycans following release from the calnexin/calreticulin lectins. The evidence supporting this conclusion is solid using genetically-deficient cell models and well established biochemical methods to monitor the degradation of trafficking-incompetent ER-associated degradation substrates, although this could be improved by better defining of the importance of UGGT in the secretion of trafficking competent substrates. This work will be of specific interest to those interested in mechanistic aspects of ER protein quality control and protein secretion.

      The authors have attempted to address my comments from the previous round of review, although some issues still remain. For example, the authors indicate that it is difficult to assess how UGGT1 influences degradation of secretion competent proteins, but this is not the case. This can be easily followed using metabolic labeling experiments, where you would get both the population of protein secreted and degraded under different conditions. Thus, I still feel that addressing the impact of UGGT1 depletion on the ER quality control for secretion competent protein remains an important point that could be better addressed in this work.

      Further, in the previous submission, the authors showed that UGGT2 depletion demonstrates a similar reduction of ATF6 activation to that observed for UGGT1 depletion, although UGGT2 depletion does not reduce ATF6 protein levels like what is observed upon UGGT1 depletion. In the revised manuscript, they largely remove the UGGT2 data and only highlight the UGGT1 depletion data. While they are somewhat careful in their discussion, the implication is that UGGT1 regulates ATF6 activity by controlling its stability. The fact that UGGT2 has a similar effect on activity, but not stability, indicates that these enzymes may have other roles not directly linked to ATF6 stability. It is important to include the UGGT2 data and explicitly highlight this point in the discussion. Its fine to state that figuring out this other function is outside the scope of this work but removing it does not seem appropriate.

      As I mentioned in my previous review, I think that this work is interesting and addresses an important gap in experimental evidence supporting a previously asserted dogma in the field. I do think that the authors would be better suited for highlighting the limitations of the study, as discussed above. Ultimately, though, this is an important addition to the literature.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      UGGTs are involved in the prevention of premature degradation for misfolded glycoproteins, by utilizing UGGT-KO cells and a number of different ERAD substrates. They proposed a concept by which the fate of glycoproteins can be determined by a tug-of-war between UGGTs and EDEMs.

      Strengths:

      The authors provided a wealth of data to indicate that UGGT1 competes with EDEMs, which promotes glycoprotein degradation.

      Weaknesses:

      Less clear, though, is the involvement of UGGT2 in the process. Also, to this reviewer, some data do not necessarily support the conclusion.

      Major criticisms:

      (1) One of the biggest problems I had on reading through this manuscript is that, while the authors appeared to generate UGGTs-KO cells from HCT116 and HeLa cells, it was not clearly indicated which cell line was used for each experiment. I assume that it was HCT116 cells in most cases, but I did not see that it was clearly mentioned. As the expression level of UGGT2 relative to UGGT1 is quite different between the two cell lines, it would be critical to know which cells were used for each experiment.

      Thank you for this comment. We have clarified this point, especially in the figure legends.

      (2) While most of the authors' conclusion is sound, some claims, to this reviewer, were not fully supported by the data. Especially I cannot help being puzzled by the authors' claim about the involvement of UGGT2 in the ERAD process. In most of the cases, KO of UGGT2 does not seem to affect the stability of ERAD substrates (ex. Fig. 1C, 2A, 3D). When the author suggests that UGGT2 is also involved in the ERAD, it is far from convincing (ex. Fig. 2D/E). Especially because now it has been suggested that the main role of UGGT2 may be distinct from UGGT1, playing a role in lipid quality control (Hung, et al., PNAS 2022), it is imperative to provide convincing evidence if the authors want to claim the involvement of UGGT2 in a protein quality control system. In fact, it was not clear at all whether even UGGT1 is also involved in the process in Fig. 2D/E, as the difference, if any, is so subtle. How the authors can be sure that this is significant enough? While the authors claim that the difference is statistically significant (n=3), this may end up with experimental artifacts. To say the least, I would urge the authors to try rescue experiments with UGGT1 or 2, to clarify that the defect in UGGT-DKO cells can be reversed. It may also be interesting to see that the subtle difference the authors observed is indeed N-glycan-dependent by testing a non-glycosylated version of the protein (just like NHK-QQQ mutants in Fig. 2C).

      We appreciate this comment. According to this comment, we reevaluated the importance of UGGT2 for ER-protein quality control. As this reviewer mentioned, KO of UGGT2 does not affect the stability of ATF6a, NHK, rRI332-Flag or EMC1-△PQQ-Flag (Fig. 1E, 2A, and 3DE). Furthermore, we tested whether overexpression of UGGT2 reverses the phenotype of UGGT-DKO regarding the degradation rate of NHK, and we found that it did not affect the degradation rate of NHK, whereas overexpression of UGGT1 restored the degradation rate to that in WT cells.

      Author response image 1.

      Collectively, these facts suggest that the role of UGGT2 in ER protein quality control is rather limited in HCT116 cells. Therefore, we have decided not to mention UGGT2 in the title, and weakened the overall claim that UGGT2 contributes to ER protein quality control. Tissues with high expression of UGGT2 or cultured cells other than HCT116 would be appropriate for revealing the detailed function of UGGT2.

      To this reviewer, it is still possible that the involvement of UGGT1 (or 2, if any) could be totally substrate-dependent, and the substrates used in Fig 2D or E happen not to be dependent to the action of UGGTs. To the reviewer, without the data of Fig. 2D and E the authors provide enough evidence to demonstrate the involvement of UGGT1 in preventing premature degradation of glycoprotein ERAD substrates. I am just afraid that the authors may have overinterpreted the data, as if the UGGTs are involved in stabilization of all glycoproteins destined for ERAD.

      Based on the point this reviewer mentioned, we decided to delete previous Fig. 2D and 2E. There may be more or less efficacy of UGGT1 for preventing early degradation of substrates.

      (3) I am a bit puzzled by the DNJ treatment experiments. First, I do not see the detailed conditions of the DNJ treatment (concentration? Time?). Then, I was a bit surprised to see that there were so little G3M9 glycans formed, and there was about the same amount of G2M9 also formed (Figure 1 Figure supplement 4B-D), despite the fact that glucose trimming of newly syntheized glycoproteins are expected to be completely impaired (unless the authors used DNJ concentration which does not completely impair the trimming of the first Glc). Even considering the involvement of Golgi endo-alpha-mannosidase, a similar amount of G3M9 and G2M9 may suggest that the experimental conditions used for this experiment (i.e. concentration of DNJ, duration of treatment, etc) is not properly optimized.

      We think that our experimental condition of DNJ treatment is appropriate to evaluate the effect of DNJ. Referring to the other papers (Ali and Field, 2000; Karlsson et al., 1993; Lomako et al., 2010; Pearse et al., 2010; Tannous et al., 2015), 0.5 mM DNJ is appropriate. In our previously reported experiment, 16 h treatment with kifunensine mannosidase inhibitor was sufficient for N-glycan composition analysis prior to cell collection (Ninagawa et al., 2014), and we treated cells for a similar time in Figure 1-Figure Supplement 4 and 5 (and Figure 1-Figure Supplement 6). We could see the clear effect of DNJ to inhibit degradation of ATF6a with 2 hours of pretreatment (Fig. 1G). Furthermore, our results are very reasonable and consistent with previous findings that DNJ increased GM9 the most (Cheatham et al., 2023; Gross et al., 1983; Gross et al., 1986; Romero et al., 1985). In addition to DNJ, we used CST for further experiments in new figures (Fig. 1H and Figure 1-Figure supplement 6). DNJ and CST are inhibitors of glucosidase; DNJ is a stronger inhibitor of glucosidase II, while CST is a stronger inhibitor of glucosidase I (Asano, 2000; Saunier et al., 1982; Szumilo et al., 1987; Zeng et al., 1997). An increase in G3M9 and G2M9 was detected using CST (Figure1-Figure Supplement 6). Like DNJ, CST also inhibited ATF6a degradation in UGGT-DKO cells (Fig. 1H). These findings show that our experimental condition using glucosidase inhibitor is appropriate and strongly support our model (Fig. 5). Differences between the effects of DNJ and CST are now described in our manuscript pages 8 to 10.

      Reviewer #2 (Public Review):

      In this study, Ninagawa et al., shed light on UGGT's role in ER quality control of glycoproteins. By utilizing UGGT1/UGGT2 DKO cells, they demonstrate that several model misfolded glycoproteins undergo early degradation. One such substrate is ATF6alpha where its premature degradation hampers the cell's ability to mount an ER stress response.

      While this study convincingly demonstrates early degradation of misfolded glycoproteins in the absence of UGGTs, my major concern is the need for additional experiments to support the "tug of war" model involving UGGTs and EDEMs in influencing the substrate's fate - whether misfolded glycoproteins are pulled into the folding or degradation route. Specifically, it would be valuable to investigate how overexpression of UGGTs and EDEMs in WT cells affects the choice between folding and degradation for misfolded glycoproteins. Considering previous studies indicating that monoglucosylation influences glycoprotein solubility and stability, an essential question is: what is the nature of glycoproteins in UGGTKO/EDEMKO and potentially UGGT/EDEM overexpression cells? Understanding whether these substrates become more soluble/stable when GM9 versus mannose-only translation modification accumulates would provide valuable insights.

      In the new figure 2DE, we conducted overexpression experiments of structure formation factors UGGT1 and/or CNX, and degradation factors EDEMs. While overexpression of structure formation factors (Fig. 2DE) and KO of degradation factors (Ninagawa et al., 2015; Ninagawa et al., 2014) increased stability of substrates, KO of UGGT1 (Fig. 1E, 2A and 3DF) and overexpression of degradation factors (Fig. 2DE) (Hirao et al., 2006; Hosokawa et al., 2001; Mast et al., 2005; Olivari et al., 2005) accelerated degradation of substrates. A comparison of the properties of N-glycan with the normal type and the type without glucoses was already reported (Tannous et al., 2015). The rate of degradation of substrate was unchanged, but efficiency of secretion of substrates was affected.

      The study delves into the physiological role of UGGT, but is limited in scope, focusing solely on the effect of ATF6alpha in UGGT KO cells' stress response. It is crucial for the authors to investigate the broader impact of UGGT KO, including the assessment of basal ER proteotoxicity levels, examination of the general efflux of glycoproteins from ER, and the exploration of the physiological consequences due to UGGT KO. This broader perspective would be valuable for the wider audience. Additionally, the marked increase in ATF4 activity in UGGTKO requires discussion, which the authors currently omit.

      We evaluated the sensitivity of WT and UGGT1-KO cells to ER stress (Figure 4G). KO of UGGT1 increased the sensitivity to ER stress inducer Tg, indicating the importance of UGGT1 for resisting ER stress.

      We add the following description in the manuscript about ATF4 activity in UGGT1-KO: “In addition to this, UGGT1 is necessary for proper functioning of ER resident proteins such as ATF6a (Fig. 4B-F). It is highly possible that ATF6a undergoes structural maintenance by UGGT1, which could be necessary to avoid degradation and maintain proper function, because ATF6a with more rigid in structure tended to remain in UGGT1-KO cells (Fig. 4C). Responses of ERSE and UPRE to ER stress, which require ATF6a, were decreased in UGGT1-KO cells (Fig. 4DE). In contrast, ATF4 reporter activity was increased in UGGT1-KO cells (Fig. 4F), while the basal level of ATF4 in UGGT1-KO cells was comparable with that in WT (Figure 1-Figure supplement 2B). The ATF4 pathway might partially compensate the function of the ERSE and UPRE pathways in UGGT1-KO cells in acute ER stress. This is now described on Page 17 in our manuscript.

      The discussion section is brief and could benefit from being a separate section. It is advisable for the authors to explore and suggest other model systems or disease contexts to test UGGT's role in the future. This expansion would help the broader scientific community appreciate the potential applications and implications of this work beyond its current scope.

      Thank you for making this point. The DISCUSSION part has now been separated in our manuscript. We added some points in the manuscript about other model organisms and diseases in the DISCUSSION as follows: “ Our work focusing on the function of mammalian UGGT1 greatly advances the understanding how ER homeostasis is maintained in higher animals. Considering that Saccharomyces cerevisiae does not have a functional orthologue of UGGT1 (Ninagawa et al., 2020a) and that KO of UGGT1 causes embryonic lethality in mice (Molinari et al., 2005), it would be interesting to know at what point the function of UGGT1 became evolutionarily necessary for life. Related to its importance in animals, it would also be of interest to know what kind of diseases UGGT1 is associated with. Recently, it has been reported that UGGT1 is involved in ER retention of Trop-2 mutant proteins, which are encoded by a causative gene of gelatinous drop-like corneal dystrophy (Tax et al., 2024). Not only this, but since the ER is known to be involved in over 60 diseases (Guerriero and Brodsky, 2012), we must investigate how UGGT1 and other ER molecules are involved in diseases.”

      Reviewer #3 (Public Review):

      This manuscript focuses on defining the importance of UGGT1/2 in the process of protein degradation within the ER. The authors prepared cells lacking UGGT1, UGGT2, or both UGGT1/UGGT2 (DKO) HCT116 cells and then monitored the degradation of specific ERAD substrates. Initially, they focused on the ER stress sensor ATF6 and showed that loss of UGGT1 increased the degradation of this protein. This degradation was stabilized by deletion of ERAD-specific factors (e.g., SEL1L, EDEM) or treatment with mannose inhibitors such as kifunesine, indicating that this is mediated through a process involving increased mannose trimming of the ATF6 N-glycan. This increased degradation of ATF6 impaired the function of this ER stress sensor, as expected, reducing the activation of downstream reporters of ER stress-induced ATF6 activation. The authors extended this analysis to monitor the degradation of other well-established ERAD substrates including A1AT-NHK and CD3d, demonstrating similar increases in the degradation of destabilized, misfolding protein substrates in cells deficient in UGGT. Importantly, they did experiments to suggest that re-overexpression of wild-type, but not catalytically deficient, UGGT rescues the increased degradation observed in UGGT1 knockout cells. Further, they demonstrated the dependence of this sensitivity to UGGT depletion on N-glycans using ERAD substrates that lack any glycans. Ultimately, these results suggest a model whereby depletion of UGGT (especially UGGT1 which is the most expressed in these cells) increases degradation of ERAD substrates through a mechanism involving impaired re-glucosylation and subsequent re-entry into the calnexin/calreticulin folding pathway.

      I must say that I was under the impression that the main conclusions of this paper (i.e., UGGT1 functions to slow the degradation of ERAD substrates by allowing re-entry into the lectin folding pathway) were well-established in the literature. However, I was not able to find papers explicitly demonstrating this point. Because of this, I do think that this manuscript is valuable, as it supports a previously assumed assertion of the role of UGGT in ER quality control. However, there are a number of issues in the manuscript that should be addressed.

      Notably, the focus on well-established, trafficking-deficient ERAD substrates, while a traditional approach to studying these types of processes, limits our understanding of global ER quality control of proteins that are trafficked to downstream secretory environments where proteins can be degraded through multiple mechanisms. For example, in Figure 1-Figure Supplement 2, UGGT1/2 knockout does not seem to increase the degradation of secretion-competent proteins such as A1AT or EPO, instead appearing to stabilize these proteins against degradation. They do show reductions in secretion, but it isn't clear exactly how UGGT loss is impacting ER Quality Control of these more relevant types of ER-targeted secretory proteins.

      We appreciate your comment. It is certainly difficult to assess in detail how UGGT1 functions against secretion-competent proteins, but we think that the folding state of these proteins is improved, which avoids their degradation and increases their secretion. In Figure 1-Figure supplement 2E, there is a clear decrease in secretion of EPO in UGGT1-KO cells, suggesting that UGGT1 also inhibits degradation of such substrates. Note that, as shown in Fig. 3A-C, once a protein forms a solid structure, it is rarely degraded in the ER.

      Lastly, I don't understand the link between UGGT, ATF6 degradation, and ATF6 activation. I understand that the idea is that increased ATF6 degradation afforded by UGGT depletion will impair activation of this ER stress sensor, but if that is the case, how does UGGT2 depletion, which only minimally impacts ATF6 degradation (Fig. 1), impact activation to levels similar to the UGGT1 knockout (Fig 4)? This suggests UGGT1/2 may serve different functions beyond just regulating the degradation of this ER stress sensor. Also, the authors should quantify the impaired ATF6 processing shown in Fig 4B-D across multiple replicates.

      According to this valuable comment, we reevaluated our manuscript. As this reviewer mentioned, involvement of UGGT2 in the activation of ATF6a cannot be explained only by the folding state of ATF6a. Thus, the part about whether UGGT2 is effective in activating ATF6 is outside the scope of this paper. The main focus of this paper is the contribution of UGGT1 to the ER protein quality control mechanism.

      Ultimately, I do think the data support a role for UGGT (especially UGGT1) in regulating the degradation of ERAD substrates, which provides experimental support for a role long-predicted in the field. However, there are a number of ways this manuscript could be strengthened to further support this role, some of which can be done with data they have in hand (e.g., the stats) or additional new experiments.

      In this revision period, to further elucidate the function of UGGT, we did several additional experiments (new figures Fig. 1H, 2DE, 4G and, Figure 1-Figure Supplement 6). We hope that these will bring our papers up to the level you have requested.

      Reviewer #1 (Recommendations For The Authors):

      Minor points:

      (1) Abbreviations: GlcNAc, N-acetylglucosamines -> why plural?

      Corrected.

      (2) Abstract: to this reviewer, it may not be so common to cite references in the abstract.

      We submit this manuscript to eLife as “Research Advances”. In the instructions of eLife for “Research Advances”, there is the description: “A reference to the original eLife article should be included in the abstract, e.g. in the format “Previously we showed that XXXX (author, year). Here we show that YYYY.” We follow this.

      (3) Introduction: "as the site of biosynthesis of approximately one-third of all proteins." Probably this statement needs a citation?

      We added the reference there. You can also confirm this in “The Human Protein Atlas” website. https://www.proteinatlas.org/humanproteome/tissue/secretome

      (4) Figure 1F - the authors claimed that maturation of HA was delayed also in UGGT2 cells, but it was not at all clear to me. Rescue experiments with UGGT2 would be desired.

      We agree with this reviewer, but there was a statistically significant difference in the 80 min UGGT2-KO strain. Previously, it was reported that HA maturation rate was not affected by UGGT2 (Hung et al., 2022). We think that the difference is not large. A rescue experiment of UGGT2 on the degradation of NHK was conducted, and is shown in this response to referees.

      (5) Figure 4A, here also the authors claim that UGGT2 is "slightly" involved in folding of ATF6alpha(P) but it is far from convincing to this reviewer.

      Now we also think that involvement of UGGT2 in ER protein quality control should be examined in the future.

      (6) Page 11, line 7 from the bottom: "peak of activation was shifted from 1 hour to 4 hours after the treatment of Tg in UGGT-KO cells". I found this statement a bit awkward; how can the authors be sure that "the peak" is 4 hours when the longest timing tested is 4 hours (i.e. peak may be even later)?

      Corrected. We deleted the description.

      (7) Page 11, line 4 "a more rigid structure that averts degradation" Can the authors speculate what this "rigid" structure actually means? The reviewer has to wonder what kind of change can occur to this protein with or without UGGT1. Binding proteins? The difference in susceptibility against trypsin appears very subtle anyway (Figure 4 Figure Supplement 1).

      Let us add our thoughts here: Poorly structured ATF6a is immediately routed for degradation in UGGT1-KO cells. As a result, ATF6a with a stable or rigid structure have remained in the UGGT1-KO strain. ATF6a with a metastable state is tended to be degraded without assistance of UGGT1.

      (8) Figure 1 Figure supplement 2; based on the information provided, I calculate the relative ratio of UGGT2/UGGT1 in HCT116 which is 4.5%, and in HeLa 26%. Am I missing something? Also significant figure, at best, should be 2, not 3 (i.e. 30%, not 29.8%).

      Corrected. Thank you for this comment.

      Reviewer #2 (Recommendations For The Authors):

      (1) The effect in Fig. 2B with UGGT1-D1358A add-back is minimal. Testing the inactive and active add-back on other substrates, such as ATF6alpha, which undergoes a more rapid degradation, would provide a more comprehensive assessment.

      To examine the effect of full length and inactive mutant of UGGT1 in UGGT1-KO and UGGT2-KO on the rate of degradation of endogenous ATF6a, we tried to select more than 300 colonies stably expressing full-length Myc-UGGT1/2, UGGT1/2-Flag, and UGGT1/2 (no tag), and their point mutant of them. However, no cell lines expressing nearly as much or more UGGT1/2 than endogenous ones were obtained. The expression level of UGGT1 seemed to be tightly regulated. A low-expressing stable cell line could not recover the phenotype of ATF6a degradation.

      We also tried to measure the degradation rate of exogenously expressed ATF6a. But overexpressed ATF6a is partially transported to the Golgi and cleaved by proteases, which makes it difficult to evaluate only the effect of degradation.

      (2) In reference to this statement on pg. 11:

      "This can be explained by the rigid structure of ATF6(P) lacking structural flexibility to respond to ER stress because the remaining ATF6(P) in UGGT1-KO cells tends to have a more rigid structure that averts degradation, which is supported by its slightly weaker sensitivity to trypsin (Figure 4-figure supplement 1A). "

      The rationale for testing ATF6(P) rigidity via trypsin digestion needs clarification. The authors should provide more background, especially if it relates to previous studies demonstrating UGGT's influence on substrate solubility. If trypsin digestion is indeed addressing this, it should be applied consistently to all tested misfolded glycoproteins, ensuring a comprehensive approach.

      We now provide more background with three references about trypsin digestion. Trypsin digestion allows us to evaluate the structure of proteins originated from the same gene, but it can sometimes be difficult to comparatively evaluate the structure of proteins originated from different genes. For example, antitrypsin is resistant to trypsin by its nature, which does not necessarily mean that antitrypsin forms a more stable structure than other proteins. NHK, a truncated version of antitrypsin, is still resistant to trypsin compared with other substrates.

      (3) Many of the figures described in the manuscript weren't referred to a specific panel. For example, pg. 12 "Fig. 1E and Fig.5," the exact panel for Fig. 5 wasn't referenced.

      Thank you for this comment. Corrected.

      (4) For experiments measuring the composition of glycoproteins in different KO lines, it is necessary to do the experiment more than once for conducting statistical analysis and comparisons. Moreover, the authors did not include raw composition data for these experiments. Statistical analysis should also be done for Fig. 4E-F.

      Our N-glycan composition data (Figure 1-Figure supplement 5 and 6C) is consistent with previous our papers (George et al., 2021; George et al., 2020; Ninagawa et al., 2015; Ninagawa et al., 2014). We did it twice in the previous study and please refer to it regarding statistical analysis (George et al., 2020). We add the raw composition data of N-glycan (Figure 1-Figure supplement 4 and 6B). In Fig. 4D-F, now statistical analysis is included.

      Ali, B.R., and M.C. Field. 2000. Glycopeptide export from mammalian microsomes is independent of calcium and is distinct from oligosaccharide export. Glycobiology. 10:383-391.

      Asano, N. 2000. Glycosidase-Inhibiting Glycomimetic Alkaloids. Biological Activities and Therapeutic Perspectives. Journal of Synthetic Organic Chemistry, Japan. 58:666-675.

      Cheatham, A.M., N.R. Sharma, and P. Satpute-Krishnan. 2023. Competition for calnexin binding regulates secretion and turnover of misfolded GPI-anchored proteins. J Cell Biol. 222.

      George, G., S. Ninagawa, H. Yagi, J.I. Furukawa, N. Hashii, A. Ishii-Watabe, Y. Deng, K. Matsushita, T. Ishikawa, Y.P. Mamahit, Y. Maki, Y. Kajihara, K. Kato, T. Okada, and K. Mori. 2021. Purified EDEM3 or EDEM1 alone produces determinant oligosaccharide structures from M8B in mammalian glycoprotein ERAD. Elife. 10.

      George, G., S. Ninagawa, H. Yagi, T. Saito, T. Ishikawa, T. Sakuma, T. Yamamoto, K. Imami, Y. Ishihama, K. Kato, T. Okada, and K. Mori. 2020. EDEM2 stably disulfide-bonded to TXNDC11 catalyzes the first mannose trimming step in mammalian glycoprotein ERAD. Elife. 9:e53455.

      Gross, V., T. Andus, T.A. Tran-Thi, R.T. Schwarz, K. Decker, and P.C. Heinrich. 1983. 1-deoxynojirimycin impairs oligosaccharide processing of alpha 1-proteinase inhibitor and inhibits its secretion in primary cultures of rat hepatocytes. Journal of Biological Chemistry. 258:12203-12209.

      Gross, V., T.A. Tran-Thi, R.T. Schwarz, A.D. Elbein, K. Decker, and P.C. Heinrich. 1986. Different effects of the glucosidase inhibitors 1-deoxynojirimycin, N-methyl-1-deoxynojirimycin and castanospermine on the glycosylation of rat alpha 1-proteinase inhibitor and alpha 1-acid glycoprotein. Biochem J. 236:853-860.

      Hirao, K., Y. Natsuka, T. Tamura, I. Wada, D. Morito, S. Natsuka, P. Romero, B. Sleno, L.O. Tremblay, A. Herscovics, K. Nagata, and N. Hosokawa. 2006. EDEM3, a soluble EDEM homolog, enhances glycoprotein endoplasmic reticulum-associated degradation and mannose trimming. J Biol Chem. 281:9650-9658.

      Hosokawa, N., I. Wada, K. Hasegawa, T. Yorihuzi, L.O. Tremblay, A. Herscovics, and K. Nagata. 2001. A novel ER alpha-mannosidase-like protein accelerates ER-associated degradation. EMBO reports. 2:415-422.

      Hung, H.H., Y. Nagatsuka, T. Solda, V.K. Kodali, K. Iwabuchi, H. Kamiguchi, K. Kano, I. Matsuo, K. Ikeda, R.J. Kaufman, M. Molinari, P. Greimel, and Y. Hirabayashi. 2022. Selective involvement of UGGT variant: UGGT2 in protecting mouse embryonic fibroblasts from saturated lipid-induced ER stress. Proc Natl Acad Sci U S A. 119:e2214957119.

      Karlsson, G.B., T.D. Butters, R.A. Dwek, and F.M. Platt. 1993. Effects of the imino sugar N-butyldeoxynojirimycin on the N-glycosylation of recombinant gp120. Journal of Biological Chemistry. 268:570-576.

      Lomako, J., W.M. Lomako, C.A. Carothers Carraway, and K.L. Carraway. 2010. Regulation of the membrane mucin Muc4 in corneal epithelial cells by proteosomal degradation and TGF-beta. Journal of cellular physiology. 223:209-214.

      Mast, S.W., K. Diekman, K. Karaveg, A. Davis, R.N. Sifers, and K.W. Moremen. 2005. Human EDEM2, a novel homolog of family 47 glycosidases, is involved in ER-associated degradation of glycoproteins. Glycobiology. 15:421-436.

      Ninagawa, S., T. Okada, Y. Sumitomo, S. Horimoto, T. Sugimoto, T. Ishikawa, S. Takeda, T. Yamamoto, T. Suzuki, Y. Kamiya, K. Kato, and K. Mori. 2015. Forcible destruction of severely misfolded mammalian glycoproteins by the non-glycoprotein ERAD pathway. J Cell Biol. 211:775-784.

      Ninagawa, S., T. Okada, Y. Sumitomo, Y. Kamiya, K. Kato, S. Horimoto, T. Ishikawa, S. Takeda, T. Sakuma, T. Yamamoto, and K. Mori. 2014. EDEM2 initiates mammalian glycoprotein ERAD by catalyzing the first mannose trimming step. J Cell Biol. 206:347-356.

      Olivari, S., C. Galli, H. Alanen, L. Ruddock, and M. Molinari. 2005. A novel stress-induced EDEM variant regulating endoplasmic reticulum-associated glycoprotein degradation. J Biol Chem. 280:2424-2428.

      Pearse, B.R., T. Tamura, J.C. Sunryd, G.A. Grabowski, R.J. Kaufman, and D.N. Hebert. 2010. The role of UDP-Glc:glycoprotein glucosyltransferase 1 in the maturation of an obligate substrate prosaposin. J Cell Biol. 189:829-841.

      Romero, P.A., B. Saunier, and A. Herscovics. 1985. Comparison between 1-deoxynojirimycin and N-methyl-1-deoxynojirimycin as inhibitors of oligosaccharide processing in intestinal epithelial cells. Biochem J. 226:733-740.

      Saunier, B., R.D. Kilker, J.S. Tkacz, A. Quaroni, and A. Herscovics. 1982. Inhibition of N-linked complex oligosaccharide formation by 1-deoxynojirimycin, an inhibitor of processing glucosidases. Journal of Biological Chemistry. 257:14155-14161.

      Szumilo, T., G.P. Kaushal, and A.D. Elbein. 1987. Purification and properties of the glycoprotein processing N-acetylglucosaminyltransferase II from plants. Biochemistry. 26:5498-5505.

      Tannous, A., N. Patel, T. Tamura, and D.N. Hebert. 2015. Reglucosylation by UDP-glucose:glycoprotein glucosyltransferase 1 delays glycoprotein secretion but not degradation. Molecular biology of the cell. 26:390-405.

      Zeng, Y., Y.T. Pan, N. Asano, R.J. Nash, and A.D. Elbein. 1997. Homonojirimycin and N-methyl-homonojirimycin inhibit N-linked oligosaccharide processing. Glycobiology. 7:297-304.

    1. eLife assessment

      This fundamental study reports the most comprehensive neurotransmitter atlas of any organism to date, using fluorescent knock-in reporter lines. The work is comprehensive, rigorous, and compelling. The tool will be used by broad audience of scientists interested in neuronal cell type differentiation and function, and could be a seminal reference in the field.

    2. Reviewer #1 (Public review):

      Summary:

      Wang and colleagues conducted a study to determine the neurotransmitter identity of all neurons in C. elegans hermaphrodites and males. They used CRISPR technology to introduce fluorescent gene expression reporters into the genomic loci of NT pathway genes. This approach is expected to better reflect in vivo gene expression compared to other methods like promoter- or fosmid-based transgenes, or available scRNA datasets. The study presents several noteworthy findings, including sexual dimorphisms, patterns of NT co-transmission, neuronal classes that likely use NTs without direct synthesis, and potential identification of unconventional NTs (e.g. betaine releasing neurons). The data is well-described and critically discussed, including a comparison with alternative methods. Although many of the observations and proposals have been previously discussed by the Hobert lab, the current study is particularly valuable due to its comprehensiveness. This NT atlas is the most complete and comprehensive of any nervous system that I am aware of, making it an extremely important tool for the community.

      Strengths:

      Very compelling study presenting the most comprehensive neurotransmitter (NT) map of any model so far, using state-of-the art tools and validations. The work is very important not only as a resource but also for our understanding that (NT) function of neurons is best understood taking into consideration the full set of genes implicated in NT metabolism and transport.

      Weaknesses:

      None, all have been addressed.

    3. Reviewer #2 (Public review):

      Summary:

      Together with the known anatomical connectivity, molecular atlasses paves the way toward functional maps of the nervous system of C. elegans. Along with the analysis of previous scRNA sequencing and reporter strains, new expression patterns are generated for hermaphrodite and males based on CRISPR-knocked-in GFP reporter strains and the use of the color-coded Neuropal strain to accurately identify neurons. Beyond a map of the known neurotransmitters (GABA, Acetylcholine, Glutamate, dopamine, serotonin, tyramine, octopamine), the atlas also identifies neurons likely using betaine and suggests sets of neurons employing new unknown monoaminergic transmission, or using exclusively peptidergic neurotransmission.

      Strengths:

      The use of CRISPR reporter alleles and of the Neuropal strain to assign neurotransmitter usage to each neuron is much more rigourous than previous analysis and reveal intriguing differences between scRNA seq, fosmid reporter and CRISPR knock-in approaches. The differences between approaches are discussed.

      Weaknesses:

      All have been addressed.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, Wang et al. provides the most comprehensive description and comparison of the expression of the different genes required to synthesize, transport and recycle the most common neurotransmitters (Glutamate, Acetylcholine, GABA, Serotonin, Dopamine, Octopamine and Tyramine) used by hermaphrodite and male C. elegans. This paper will be a seminal reference in the field. Building and contrasting observations from previous studies using fosmid, multicopy reporters and single cell sequencing, they now describe CRISPR/Cas-9-engineered reporter strains that, in combination with the multicolor pan-neuronal labeling of all C. elegans neurons (NeuroPAL), allows rigorous elucidation of neurotransmitter expression patterns. These novel reporters also illuminate previously unappreciated aspects of neurotransmitter biology in C. elegans, including sexual dimorphism of expression patterns, co-transmission and the elucidation of cell-specific pathways that might represent new forms of neurotransmission.

      Strengths:

      The authors set to establish neurotransmitter identities in C. elegans males and hermaphrodites via varying techniques, including integration of previous studies, examination of expression patterns and generation of endogenous CRISPR-labeled alleles. Their study is comprehensive, detailed and rigorous, and achieve the aims. It is an excellent reference for the field, particularly those interested in biosynthetic pathways of neurotransmission and their distribution in vivo, in neuronal and non-neuronal cells.

      Weaknesses:

      No weaknesses noted. The authors do a great job linking their characterizations with other studies and techniques, leading credence to their findings. As the authors note, there are sexually dimorphic differences across animals, and varying expression patterns of enzymes. While it is unlikely there will be huge differences in the reported patterns across individual animals, it is possible that these expression patterns could vary developmentally, or based on physiological or environmental conditions.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers and editor for their helpful comments and suggestions. In response, we have revised the manuscript in two main ways:

      (1) To address the comments about rearranging figures and tables, we added a new Figure 3 that summarizes neurotransmitter assignments across all neuron classes. Our rationale for this change is detailed below.

      (2) To address the comment on clarifying neurotransmitter synthesis versus uptake, we analyzed two additional reporter alleles that tag the monoamine uptake transporters for 5-HT and potentially tyramine. These results are now presented in a new Figure 8 and corresponding sections in the manuscript. Related tables have been updated to include this expression data. Two more authors have been added due to their contributions to these experiments.

      For more detailed changes, please see our responses to the specific reviewer's comments as well as the revised manuscript.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Wang and colleagues conducted a study to determine the neurotransmitter identity of all neurons in C. elegans hermaphrodites and males. They used CRISPR technology to introduce fluorescent gene expression reporters into the genomic loci of NT pathway genes. This approach is expected to better reflect in vivo gene expression compared to other methods like promoter- or fosmid-based transgenes, or available scRNA datasets. The study presents several noteworthy findings, including sexual dimorphisms, patterns of NT co-transmission, neuronal classes that likely use NTs without direct synthesis, and potential identification of unconventional NTs (e.g. betaine releasing neurons). The data is well-described and critically discussed, including a comparison with alternative methods. Although many of the observations and proposals have been previously discussed by the Hobert lab, the current study is particularly valuable due to its comprehensiveness. This NT atlas is the most complete and comprehensive of any nervous system that I am aware of, making it an extremely useful tool for the community. 

      Reviewer #2 (Public Review):

      Summary: 

      Together with the known anatomical connectivity of C. elegans, a neurotransmitter atlas paves the way toward a functional connectivity map. This study refines the expression patterns of key genes for neurotransmission by analyzing the expression patterns from CRISPR-knocked-in GFP reporter strains using the color-coded Neuropal strain to identify neurons. Along with data from previous scRNA sequencing and other reporter strains, examining these expression patterns enhances our understanding of neurotransmitter identity for each neuron in hermaphrodites and the male nervous system. Beyond the known neurotransmitters (GABA, Acetylcholine, Glutamate, dopamine, serotonin, tyramine, octopamine), the atlas also identifies neurons likely using betaine and suggests sets of neurons employing new unknown monoaminergic transmission, or using exclusively peptidergic transmission. 

      Strengths: 

      The use of CRISPR reporter alleles and of the Neuropal strain to assign neurotransmitter usage to each neuron is much more rigorous than previous analysis and reveals intriguing differences between scRNA seq, fosmid reporter, and CRISPR knock-in approaches. Among other mechanisms, these differences between approaches could be attributed to 3'UTR regulatory mechanisms for scRNA vs. knockin or titration of rate-limited negative regulatory mechanisms for fosmid vs. knockin. It would be interesting to discuss this and highlight the occurrences of these potential phenomena for future studies.  

      We recognize that readers of this study may be interested in understanding the differences between the three approaches. Therefore, in the Introduction, we addressed the potential risk of overexpression artifacts associated with multicopy transgenes, such as fosmid-based reporters, which can affect rate-limiting negative regulatory mechanisms. Additionally, in the Discussion, we included a section titled 'Comparing approaches and caveats of expression pattern analysis' to further explore these comparative methods and their associated nuances.

      Weaknesses: 

      For GABAergic transmission, one shortcoming arises from the lack of improved expression pattern by a knockin reporter strain for the GABA recapture symporter snf-11. In its absence, it is difficult to make a final conclusion on GABA recapture vs GABA clearance for all neurons expressing the vesicular GABA transporter neurons (unc-47+) but not expressing the GAD/UNC-25 gene e.g. SIA or R2A neurons. At minima, a comparison of the scRNA seq predictions versus the snf-11 fosmid reporter strain expression pattern would help to better judge the proposed role of each neuron in GABA clearance or recycling. 

      The snf-11 fosmid-based reporter data shows very good overlap with scRNA seq predictions (now included in Supp. Table S1). 

      But there are two much stronger reasons why we did not seek to further the analysis of expression of the snf-11 GABA uptaker:

      (1) Due to available anti-GABA staining data, we do know which neurons have the potential to take up GABA (via SNF-11).

      (2) Focusing on SNF-11 function rather than expression, we can ask which neurons lose anti-GABA staining in snf-11 mutants.

      Both of these types of analyses have been done in an earlier study from our lab (Gendrel et al., 2016, PMID 27740909), which, among other things, investigated GABA uptake mechanisms via SNF-11. Apart from analyzing the expression of a fosmid-based snf-11 reporter, we immunostained worms for GABA in both snf-11 mutant and wild type backgrounds (results summarized in Tables 1 and 2 of Gendrel et al.). Of the neurons that typically stain for GABA (Table 1, Gendrel et al.), two neuron classes (ALA and AVF) lost the staining in snf-11 mutants, suggesting that these neurons likely uptake GABA via SNF-11. Importantly, one of the neurons the reviewer mentioned, R2A, stains for GABA in both wild type and snf-11 mutants, indicating that it likely does not uptake GABA via SNF-11. The other neuron mentioned, SIA, does not stain for GABA in wild type (Table 2, Gendrel et al.), hence not a GABA uptake neuron. In cases like SIA and other neurons, where a neuron does not express unc-25 but does express unc-47 reporters (either fosmid or CRISPR reporter alleles), we speculate that UNC-47 transport another neurotransmitter.

      Considering the complexities of different tagging approaches, like T2A-GFP and SL2-GFP cassettes, in capturing post-translational and 3'UTR regulation is important. The current formulation is simplistic. e.g. after SL2 trans-splicing the GFP RNA lacks the 5' regulatory elements, T2A-GFP self-cleavage has its own issues, and the his-44-GFP reporter protein does certainly have a different post-translational life than vesicular transporters or cytoplasmic enzymes. 

      Yes, agreed, these points are mentioned in the Introduction and discussed in "Comparing approaches and caveats of expression pattern analysis" in the Discussion.

      Do all splicing variants of neurotransmitter-related genes translate into functional proteins? The possibility that some neurons express a non-functional splice variant, leading to his-74-GFP reporter expression without functional neurotransmitter-related protein production is not addressed. 

      We thank the reviewer for bringing up this really interesting point, which we had not considered. First and foremost, with the exception of unc-25 (discussed in the next point), for all other genes that produce multiple splice forms, we made sure to append our tag (at 5’ or 3’ end) such that the expression of all splice forms is captured. The reviewer raises the interesting point that in an alternative splicing scenario, some of the cells that express the primary transcript may “switch” to an inactive form. While we cannot exclude this possibility, we have confirmed by sequence analysis in WormBase that in five of the six cases where there is alternative splicing, the alternatively spliced exon lies outside the conserved, functionally relevant (enzymatic or structural) domain. In one case, unc-25, a shorter isoform is produced that does cut into the functionally relevant domain; however, since all unc-25 reporter allele expression cells are also staining positive for GABA, this may not be an issue. 

      Also, one tagged splice variant of unc-25 is expected to fail to produce a GFP reporter, can this cause trouble? 

      Yes, there is indeed a third splice variant of unc-25 with an alternative C-terminus. To address potential expression of this isoform, we CRISPR-engineered another reporter, unc-25(ot1536[unc-25b.1::t2a::gfp::h2b]), in which the inserted t2a::gfp::h2b sequences are fused to the C-terminus of the alternative splice form, but we did not observe any expression of this reporter. Now included in the manuscript.

      Reviewer #3 (Public Review): 

      Summary: 

      In this paper, Wang et al. provide the most comprehensive description and comparison of the expression of the different genes required to synthesize, transport, and recycle the most common neurotransmitters (Glutamate, Acetylcholine, GABA, Serotonin, Dopamine, Octopamine, and Tyramine) used by hermaphrodite and male C. elegans. This paper will be a seminal reference in the field. Building and contrasting observations from previous studies using fosmid, multicopy reporters, and single-cell sequencing, they now describe CRISPR/Cas-9-engineered reporter strains that, in combination with the multicolor pan-neuronal labeling of all C. elegans neurons (NeuroPAL), allows rigorous elucidation of neurotransmitter expression patterns. These novel reporters also illuminate previously unappreciated aspects of neurotransmitter biology in C. elegans, including sexual dimorphism of expression patterns, cotransmission, and the elucidation of cell-specific pathways that might represent new forms of neurotransmission. 

      Strengths: 

      The authors set out to establish neurotransmitter identities in C. elegans males and hermaphrodites via varying techniques, including integration of previous studies, examination of expression patterns, and generation of endogenous CRISPR-labeled alleles. Their study is comprehensive, detailed, and rigorous, and achieves the aims. It is an excellent reference for the field, particularly those interested in biosynthetic pathways of neurotransmission and their distribution in vivo, in neuronal and non-neuronal cells. 

      Weaknesses: 

      No weaknesses were noted. The authors do a great job linking their characterizations with other studies and techniques, giving credence to their findings. As the authors note, there are sexually dimorphic differences across animals and varying expression patterns of enzymes. While it is unlikely there will be huge differences in the reported patterns across individual animals, it is possible that these expression patterns could vary developmentally, or based on physiological or environmental conditions. It is unclear from the study how many animals were imaged for each condition, and if the authors noted changes across individuals during development (could be further acknowledged in the discussion?)  

      We have updated the Methods section to specify the number of animals used for imaging. We agree with the reviewer that documenting the developmental dynamics of neurotransmitter expression would be interesting. However, except for one gene (tph-1, Fig. S2), we did not analyze the expression during different developmental stages for most genes in this study. Following the reviewer's suggestion, we have included this as a potential future direction in "Conclusions" at the end of the revised manuscript.

      Recommendations for the authors:

      After the consultation session, a common suggestion from the reviewers is to bring the tables more upfront, perhaps even in the form of legible main Figures and in alphabetical order of neurons; since we believe that the study will be in the long-term often used for these data; while the Figures with fluorescent expression patterns could be moved to the supplemental information. 

      We appreciate the reviewers' and editor's acknowledgment of the tables' possibly frequent usage by the field. We have considered carefully how to order the data presentation. We prefer to keep most of the fluorescent figures in the main text because they convey important subtleties that we want the reader to be aware of.

      To address the suggestions to bring key data more upfront, we have added an entirely new figure (Figure 3) before the ensuing data figures that summarized expression patterns of the fluorescent reporters. This new figure (A) summarizes the neurotransmitter use for all neuron classes and (B) illustrates this information within worm schematics, showing the position of neurons in the whole worm. This figure serves as a good overview of neurotransmitter assignments but also specifically refers to the more extensive data and supplementary tables with detailed notes. We believe this solution effectively balances the need for comprehensive information and ease of reference.

      Reviewer #1 (Recommendations for The Authors):

      Suggestions: 

      (1) The study contains up to 10 Figures with gene expression patterns; however, I believe the community will use this paper mostly in the future for its summarizing tables. I wonder if it would be more useful to edit the tables and move them to the main figures while most fluorescent reporter images could be moved to the supplementary part. 

      Yes, as mentioned above, we made new summary table & schematic upfront. We do prefer to keep primary data in main figure body. Please see above (Public Review & Response).

      (2) In the section titled 'Neurotransmitter Synthesis versus Uptake', the author's wording could be more careful. The data rather suggests functions for individual neuronal classes, such as clearance neurons or signaling neurons. However, these functions remain hypotheses until further detailed studies are conducted to test them. 

      These are fair points. We have made several improvements: 

      (1) In the referenced section, we added a sentence at the end of the paragraph on betaine to suggest the importance of future functional studies.

      (2) We analyzed reporter allele expression for two additional genes: the known uptake transporter for 5-HT (mod-5, reporter allele vlc47) and the predicted uptake transporter for tyramine (oct-1, reporter allele syb8870). The results from these experiments are presented in the new Figure 8 and discussed in Results and Discussion correspondingly. We also collaborated with Curtis Loer, who conducted anti-5-HT staining in wild type and mod-5 mutant animals (results shown in Figure 12). These experiments have enhanced our understanding of 5-HT uptake mechanisms and potential tyramine uptake mechanisms.

      (3) At the end of the Conclusions, we emphasized the need for future detailed studies to test the functions of neurotransmitter synthesis and uptake.

      (3) Page 21; add to the discussion: neurons could use mainly electrical synapses for communication. Especially for RMG neurons, this might be the case (in addition to neuropeptide communication). 

      “Main usage” is a difficult term to use. If there were neurons that are clearly devoid of any form of synaptic vesicle (small or DCV; note that RMG has plenty of DCVs), but show robust and reproducible electrical synapses, we would agree that such neurons could primarily be a “coupling” neuron. But this call is very hard to make for any C. elegans neuron (RMG included) and hence we prefer to not add further to an already quite long Discussion section.

      (4) Page 23: I believe that multi-copy promoter-based transgenes (despite array suppression mechanisms) could be potentially more sensitive than single-copy insertion of fluorescent reporters. In our lab, we observed this a couple of times. This could be discussed. 

      We discuss this in "Comparing approaches and caveats of expression pattern analysis" in the Discussion.

      We have also added a third possibility (i.e. technical issues related to neuron-ID) in the revised manuscript.   

      Reviewer #2 (Recommendations For The Authors): 

      Comment during consultation session: As for my feedback on the lack of an SNF-11 reporter strain, exercising more caution in their conclusions would suffice for me. Other comments are simple edits/discussion.  

      Please see above.  

      Several neurotransmitter symporters exist in the C. elegans genome, does any express specifically in the "orphan" UNC-47+ neurons? 

      Yes, good point, we considered this possibility, but of the >10 SLC6-family of neurotransmitter reporters, only the classic, de-orphanized ones that we discuss here in the paper show robust scRNA signals (as discussed in the paper) and none of those give clues about the orphan unc-47(+) neurons.

      Based on UNC-47+ expression the article suggests a "Novel inhibitory neurotransmitter". Why would any new neurotransmitter using UNC-47 be necessarily inhibitory? The presence of one potential glycine-gated anion channel and one GPCR in C. elegans genome sounds poor evidence to suggest a sign of glycine or b-alanine transmission. 

      Yes, agreed, it does not need to be inhibitory. Fixed in Results and Discussion. 

      To help readers the expression of the knocked in GFP in neurons should not be reported as binary in table S1 which leads to a feeling of strong discrepancy between scRNA seq and CRISPR GFP, which is not the case.  

      There might be some misunderstanding regarding the coloring in this table. To clarify, the green-filled Excel cells denote the expression of reporters utilized in prior studies, rather than the CRISPR reporter alleles. Expression of the CRISPR alleles is instead indicated on the left side of the neuron names, marked as "CRISPR+" in green font. For signifying absence of expression, we used "no CRISPR" in red font in the first submission. We have now changed it into "CRISPR-" for greater clarity.

      The variable expression of reporter GFP between individuals for the same neuron is intriguing. It is unclear if this is observed only for dim neurons or can be more of an ON/OFF expression. 

      Variability only occurs for dim expression. We have now clarified this point in Discussion, "Comparing approaches and caveats of expression pattern analysis".

      The multiple occurrences of co-transmission, especially in male neurons, are interesting. It will be interesting in the future to establish whether the neurotransmitters are synaptically segregated or coreleased. As the section on sexual dimorphism of neurotransmitter usage does not discuss novel information coming from this study, it is not very necessary. 

      Agreed. We added this perspective to the Discussion, "Co-transmission of multiple neurotransmitters".  

      In the abstract, dopamine is missing in the main known transmitter.  

      Fixed. Thanks for spotting this.

      Reviewer #3 (Recommendations For The Authors): 

      Great article. Minor suggestions to strengthen presentation: 

      Figure 1B is hard to interpret. There could be more intuitive ways of representing the data and the methodologies that support a given expression pattern. Neurons should also be reordered by alphabetical order rather than expression levels to facilitate finding them.  

      We considered alternative ways of presenting this data, but, regrettably, did not come up with a better approach. To clarify, the primary focus of Fig. 1B is to compare expression of previously reported reporters and scRNA data, which was quite literally the initial impetus for our analysis, i.e. we noted strong scRNA signals that had not previously been supported by transgenic reporter data. For a comprehensive version of the table that includes more details on the expression of CRISPR reporter alleles, please refer to Table S1, which we referenced in the figure legend.   

      GFP-only channel images in Figures 3, 4, 5, and 9 sometimes show dim signals that the authors are highlighting as new findings. We recommend using the inverted grayscale version of that channel since the contrast of dim signals is more noticeable to the human eye rather than when the image is colorized. 

      Good point, we implemented these suggestions in the figures the reviewer mentioned, now re-numbered Figures 4, 5, 6, and 12. For Figure 6 (tph-1, bas-1, and cat-1 expression in hermaphrodites), we used a new cat-1 head image to reflect the newly identified ASI and AVL expression that wasn’t readily visible in the original projection used in the earlier version of this manuscript. We also added grayscale images in Figure 13 to reflect dim tbh-1 expression in IL2 neurons more clearly.

      A plan to integrate this new information into WormAtlas. The C. elegans community is characterized by the open sharing of information on platforms that are user-friendly and accessible. Ideally, the new information would not just 'erase' what was observed before but will describe the new observations and will let the community reach their own conclusions since there is no perfect method and even these CRISPR/Cas9 reporter strains are only proxy for gene expression that subject to post-transcriptional regulation since they depend on T2A and SL2 sequences. 

      We completely agree with the reviewer’s suggestion. We will coordinate with WormAtlas on integrating this new information. 

      In the case of neurons that were removed from using a specific neurotransmitter, like PVQ. What do the authors conclude overall, if it does not use glutamate, are there any new hypotheses to what it could be using?

      Since all neurons express multiple neuropeptides, we hypothesize neurons such as PVQ may be primarily peptidergic. This is included in Discussion, "Neurons devoid of canonical neurotransmitter pathway genes may define neuropeptide-only neurons".  

      In Table S5, the I4 neuron is listed as a variable for eat-4 expression but in Table S1 it says that there was no CRISPR expression detected. Which one is correct? 

      Thanks for spotting this. Table S5 is correct, we saw very dim and variable expression of the eat-4 reporter allele in I4. Table S1 is fixed now.

      Additional discussion points that might be important for the community: 

      CRIPSR strains used here should be deposited in the CGC. 

      Yes, all strains generated in this study have already been deposited to CGC. 

      It would be great to have an additional discussion point on how the neural clusters in CenGEN were defined based on the fosmid reporter expression, so in a way using the defining factor as one that was already defined by it might make results confusing. 

      Neural cluster definition in CeNGEN did not rely on isolated data points but on the combination of many expression reagents, each with its own shortcomings, but in combination providing reliable identification. Since one feedback we have gotten from many readers of our manuscript is that it is already very long as is, we prefer not to dilute the discussion further.

      It would be important to discuss the rate of neurotransmitter genes that have variable expression patterns. Are any of those genes used in NeuroPAL to define specific neuronal classes? This is important to describe as NeuroPAL labeling is being used to define neuronal identity. 

      All the reporters used in NeuroPAL are promoter-based, very robust and do not include the full loci of genes, so they are not directly comparable with the CRISPR reporter alleles in this study. However, we recognize that some expression pattern variability could be confusing. We have discussed this more in the section "Comparing approaches and caveats of expression pattern analysis" in the Discussion.

    1. eLife assessment

      The study presents compelling evidence that the melanocortin system originating in the arcuate nucleus of the hypothalamus plays a crucial role in puberty onset, representing a significant advance in our understanding of reproductive biology. The work, which represents a fundamental advance, employs innovative approaches and benefits from the combined expertise of two respected laboratories, enhancing the robustness of the findings. Given the potential impact on human health and the strength of the evidence presented, this work will likely influence the field substantially and may inform future clinical applications.