10,000 Matching Annotations
  1. Last 7 days
    1. eLife Assessment

      This work, combining behavioural genetics and calcium imaging, provides evidence for a form of learning in Drosophila that derives solely from direct or (optogenetically induced) phantom experience of punishment or reward. Flies that experience foot-shock alone show a subsequent decrease in avoidance to all odorants, together with increased odor-evoked activation of reward-encoding dopaminergic neurons that innervate the mushroom body. Phantom reward, delivered via optogenetic activation of reward-encoding dopaminergic neurons, increases subsequent odour-avoidance. While the findings are valuable to the field, there are aspects of the work that are incomplete, and some of the conclusions and terminology are also not completely justified; three major issues include : (a) the use of the term "priming" to describe this form of learning seems inappropriate and inconsistent with the accepted definition of this term; (b) a key 1998 publication with an initial description of this behavioural phenomenon needs to be cited and presented as context; and (c) the work on reward induced increase in odor-aversion seems relatively preliminary.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present an investigation of associative learning in Drosophila in which a previous exposure to an aversive stimulus leads to an increase in approach behaviors to a novel odor relative to a previously paired odor or no odor (air). Moreover, this relative increase is larger compared to that of a control group - i.e., presented with a (different) odor only. Evidence for the opposite effect with an appetitive stimulus, delivered indirectly by optogenetically activating sugar sensory neurons, which leads to a reduction in approach behavior to a novel odor, was also presented. The olfactory memory circuits underpinning these responses, which the authors refer to as 'priming', are revealed and include a feedback loop mediated by dopaminergic neurons to the mushroom body.

      Strengths:

      (1) The study includes a solid demonstration of the effect of the valence of a previous stimulus on sensory preferences, with an increase or decrease in preference to novel over no odor following an aversive or appetitive stimulus, respectively.

      (2) The demonstration of bidirectional effects on odor preferences following aversive or rewarding stimuli is compelling.

      (3) The evidence for distinct neural circuits underpinning the odor preferences in each context appears to be robust.

      Weaknesses:

      (1) The conclusions regarding the links between neural and behavioral mechanisms are mostly well supported by the data. However, what is less convincing is the authors' argument that their study offers evidence of 'priming'. An important hallmark of priming, at least as is commonly understood by cognitive scientists, is that it is stimulus specific: i.e., a repeated stimulus facilitates response times (repetition priming), or a repeated but previously ignored stimulus increases response times (negative priming). That is, it is an effect on a subsequent repeated stimulus, not ANY subsequent stimulus. Because (prime or target) stimuli are not repeated in the current experiments, the conditions necessary for demonstrating priming effects are not present. Instead, a different phenomenon seems to be demonstrated here, and one that might be more akin to approach/avoidance behavior to a novel or salient stimulus following an appetitive/aversive stimulus, respectively.

      (2) On a similar note, the authors' claim that 'priming' per se has not been well studied in non-human animals is not quite correct and would need to be revised. Priming effects have been demonstrated in several animal types, although perhaps not always described as such. For example, the neural underpinnings of priming effects on behavior have been very well characterized in human and non-human primates, in studies more commonly described as investigations of 'response suppression'.

      (3) The outcome measure - i.e., difference scores between the two odors or odor and non-odor (i.e., the number of flies choosing to approach the novel odor versus the number approaching the non-odor (air)) - appears to be reasonable to account for a natural preference for odors in the mock-trained group. However, it does not provide sufficient clarification of the results. The findings would be more convincing if these relative scores were unpacked - that is, instead of analyzing difference scores, the results of the interaction between group and odor preference (e.g., novel or air) (or even within the pre- and post-training conditions with the same animals) would provide greater clarity. This more detailed account may also better support the argument that the results are not due to conditioning of the US with pure air.

    3. Reviewer #2 (Public review):

      The manuscript by Yang et al. investigates how a prior experience (notably by the activation of sensory/reinforcing dopaminergic neurons) alters olfactory response and memory expression in Drosophila. They refer to a priming effect with the definition: "Priming is a process by which exposure to a stimulus affects the response to a subsequent stimulus in Humans". The authors observed that exposing flies to a series of shocks (or the optogenetic activation of aversively reinforcing dopaminergic neurons) decreases ensuing odour avoidance. Conversely, optogenetic activation of sweet-sensing neurons increases following odour avoidance. They proposed that the reduced odour avoidance was due to the involvement of reward dopaminergic neurons involved during shock (or the optogenetic activation of aversively reinforcing dopaminergic neurons). They indeed show the involvement of reward dopaminergic neurons innervating the mushroom body (the fly learning and memory centre) during shock preexposure. Recording (calcium activity) from reward dopaminergic neurons before and after shock preexposure shows that only a small subset of dopaminergic neurons innervating the mushroom body γ4 compartment increases their response to odour after shock. They then showed the requirement of the γ4 reward dopaminergic neurons during shock preexposure on ensuing odour avoidance. They also tested the role of the dopamine receptor in the mushroom body. They finally recorded from different mushroom body output neurons, including the one (MBON-γ4γ5) likely affected by the increased activity of the corresponding γ4 reward dopaminergic neurons after shock preexposure. They recorded odour-evoked responses from these neurons before and after shock preexposure, but did not find any plasticity, while they found a logical effect during spaced cycles of aversive training.

      Overall, the study is very interesting with a substantial amount of behavioural analysis and in vivo 2-photon calcium imaging data, but some major (and some minor) issues have to be resolved to strengthen their conclusions.

      (1) According to neuropsychological work (Henson, Encyclopedia of Neuroscience (2009), vol. 7, pp. 1055-1063), « Priming refers to a change in behavioral response to a stimulus, following prior exposure to the same, or a related, stimulus. Examples include faster reaction times to make a decision about the stimulus, a bias to produce that stimulus when generating responses, or the more accurate identification of a degraded version of the stimulus". Or "Repetition priming refers to a change in behavioural response to a stimulus following re-exposure" (PMID: 18328508). I therefore do not think that the effects observed by the authors are really the investigation of the neural mechanisms of priming. To me, the effect they observed seems more related to sensitisation, especially for the activation of sweet-sensing neurons. For the shock effect, it could be a safety phenomenon, as in Jacob and Waddell, 2020, involving (as for sugar reward) different subsets for short-term and long-term safety.

      (2) The author missed the paper from Thomas Preat, The Journal of Neuroscience, October 15, 1998, 18(20):8534-8538 (Decreased Odor Avoidance after Electric Shock in Drosophila Mutants Biases Learning and Memory Tests). In this paper, one of the effects observed by the authors has already been described, and the molecular requirement of memory-related genes is investigated. This paper should be mentioned and discussed.

      (3) Overall, the bidirectional effect they observed is interesting; however, their results are not always clear, and the use of a delta PI is sometimes misleading. The authors have mentioned that shocks induced attraction to the novel odour, while they should stick to the increase or decrease in preference/avoidance. As not all experiments are done in parallel logic, it is not always easy to understand which protocol the authors are using. For example, only optogenetics is used in the appetitive preexposure. Does exposing flies to sugar or activating reward dopaminergic neurons also increase odour avoidance? The observed increased odour avoidance after optogenetic activation of sweet-sensing neurons involve reward (e.g., decreased response) and/or punishment (e.g., increased response) to increase odour avoidance? The author should always statistically test the fly behavioural performances against 0 to have an idea of random choice or a clear preference toward an odour. On the appetitive side, the internal hunger state would play an important role. The author should test it or at least discuss it.

      (4) The authors found a discrepancy between genetic backgrounds; sometimes the same odour can be attractive or aversive. Different effects between the T-maze and the olfactory arena are found. The authors proposed that: "Punishment priming effect was still not detected, probably due to the insensitivity of the optogenetic arena". This is unclear to me, considering all prior work using this arena. The author should discuss it more clearly. They mentioned that flies could not be conditioned with air and electric shock. However, flies could be conditioned with the context + shock, which is changing in the T-maze and not in the optogenetic area.

    4. Author response:

      We thank both reviewers for their valuable comments. We have prepared a point-by-point response below.

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The conclusions regarding the links between neural and behavioral mechanisms are mostly well supported by the data. However, what is less convincing is the authors' argument that their study offers evidence of 'priming'. An important hallmark of priming, at least as is commonly understood by cognitive scientists, is that it is stimulus specific: i.e., a repeated stimulus facilitates response times (repetition priming), or a repeated but previously ignored stimulus increases response times (negative priming). That is, it is an effect on a subsequent repeated stimulus, not ANY subsequent stimulus. Because (prime or target) stimuli are not repeated in the current experiments, the conditions necessary for demonstrating priming effects are not present. Instead, a different phenomenon seems to be demonstrated here, and one that might be more akin to approach/avoidance behavior to a novel or salient stimulus following an appetitive/aversive stimulus, respectively.

      (2) On a similar note, the authors' claim that 'priming' per se has not been well studied in non-human animals is not quite correct and would need to be revised. Priming effects have been demonstrated in several animal types, although perhaps not always described as such. For example, the neural underpinnings of priming effects on behavior have been very well characterized in human and non-human primates, in studies more commonly described as investigations of 'response suppression'.

      We thank the reviewer for these critical comments. After careful consideration of both reviews, we agree that “priming” may not be the most accurate term to describe the behavioral phenomenon. We plan to revise our terminology throughout the manuscript accordingly to better capture the generalized nature of the effect we observe.

      (3) The outcome measure - i.e., difference scores between the two odors or odor and non-odor (i.e., the number of flies choosing to approach the novel odor versus the number approaching the non-odor (air)) - appears to be reasonable to account for a natural preference for odors in the mock-trained group. However, it does not provide sufficient clarification of the results. The findings would be more convincing if these relative scores were unpacked - that is, instead of analyzing difference scores, the results of the interaction between group and odor preference (e.g., novel or air) (or even within the pre- and post-training conditions with the same animals) would provide greater clarity. This more detailed account may also better support the argument that the results are not due to conditioning of the US with pure air.

      We use the PI score as a standard metric to quantify all the odor preference in behavioral assays because it allows for robust comparison across different genetic or treatment groups under the same experimental setting. In T-maze, real time tracking of fly trajectories is technically difficult. With olfactory arenas, we showed some examples of fly distribution in quadrants over the entire odor choice test period (Figure 2—figure supplement 2) for both pre-trained and post-trained groups and discussed the trajectories in Discussion. We will ensure this point is clarified in the revised text.                       

      Reviewer #2 (Public review):

      […] They finally recorded from different mushroom body output neurons, including the one (MBON-γ4γ5) likely affected by the increased activity of the corresponding γ4 reward dopaminergic neurons after shock preexposure. They recorded odour-evoked responses from these neurons before and after shock preexposure, but did not find any plasticity, while they found a logical effect during spaced cycles of aversive training.

      We thank the reviewer for the summary. We would like to clarify that we did, in fact, observe plasticity in MBON-γ4γ5 following shock exposure, as shown in Figure 4B.

      Overall, the study is very interesting with a substantial amount of behavioural analysis and in vivo 2-photon calcium imaging data, but some major (and some minor) issues have to be resolved to strengthen their conclusions.

      (1) According to neuropsychological work (Henson, Encyclopedia of Neuroscience (2009), vol. 7, pp. 1055-1063), « Priming refers to a change in behavioral response to a stimulus, following prior exposure to the same, or a related, stimulus. Examples include faster reaction times to make a decision about the stimulus, a bias to produce that stimulus when generating responses, or the more accurate identification of a degraded version of the stimulus". Or "Repetition priming refers to a change in behavioural response to a stimulus following re-exposure" (PMID: 18328508). I therefore do not think that the effects observed by the authors are really the investigation of the neural mechanisms of priming. To me, the effect they observed seems more related to sensitisation, especially for the activation of sweet-sensing neurons. For the shock effect, it could be a safety phenomenon, as in Jacob and Waddell, 2020, involving (as for sugar reward) different subsets for short-term and long-term safety.

      As noted in our response to Reviewer #1, we plan to revise our use of the term “priming” in the manuscript to more accurately interpret the behavioral phenomenon.

      (2) The author missed the paper from Thomas Preat, The Journal of Neuroscience, October 15, 1998, 18(20):8534-8538 (Decreased Odor Avoidance after Electric Shock in Drosophila Mutants Biases Learning and Memory Tests). In this paper, one of the effects observed by the authors has already been described, and the molecular requirement of memory-related genes is investigated. This paper should be mentioned and discussed.

      We thank the reviewer for bringing this important reference to our attention. We will cite the Preat (1998) paper and discuss its relevant findings in relation to our own in the revised manuscript.

      (3) Overall, the bidirectional effect they observed is interesting; however, their results are not always clear, and the use of a delta PI is sometimes misleading. The authors have mentioned that shocks induced attraction to the novel odour, while they should stick to the increase or decrease in preference/avoidance.

      The ΔPI is calculated either as (trained PI – mock PI) for different animals or as (post PI – pre PI) for the same animals, with the specific calculation clarified in each figure legend. A positive ΔPI signifies an increase in preference for the odor, which is equivalent to a relative attraction or a decrease in avoidance.

      As not all experiments are done in parallel logic, it is not always easy to understand which protocol the authors are using. For example, only optogenetics is used in the appetitive preexposure. Does exposing flies to sugar or activating reward dopaminergic neurons also increase odour avoidance? The observed increased odour avoidance after optogenetic activation of sweet-sensing neurons involve reward (e.g., decreased response) and/or punishment (e.g., increased response) to increase odour avoidance?  

      We used different behavioral assays (T-maze or arena), stimuli (real shock or optogenetics), and protocols (different or same animal groups) to robustly demonstrate the phenomenon across platforms. We explained each protocol in the figures or texts, and we’ll make them clearer to follow in the revised version. We focused on activating a clean set of sugar sensing neurons because this optogenetic stimulus is an effective and efficient substitute to real sugar. We agree that testing reward dopaminergic neuron activation is a logical extension and will consider adding these experiments in the revised work.

      The author should always statistically test the fly behavioural performances against 0 to have an idea of random choice or a clear preference toward an odour.

      Our primary focus is on the change in preference induced by training, rather than the innate odor preference itself, which can be highly variable due to physiological and environmental factors. Statistical testing against 0 for innate preference scores is not standard practice in this specific paradigm, as the critical question is whether a treatment alters behavior relative to a control.

      On the appetitive side, the internal hunger state would play an important role. The author should test it or at least discuss it.

      For appetitive experiments, we always starve the flies on 1% agar for two days prior to behavioral tests to standardize their hunger state. We will consider adding fed flies as control groups in the revised work.

      (4) The authors found a discrepancy between genetic backgrounds; sometimes the same odour can be attractive or aversive.

      We observed minor discrepancies in innate odor preferences across genetic backgrounds, which is a known and common occurrence. Different genotypes and temperatures can result in different baseline PI scores. However, the key finding is that the relative change in odor preference following an aversive stimulus is consistent: it increases the relative preference for an odor compared to air. This sometimes reverses valence (aversion to attraction) and other times simply reduces aversion. Our analysis focuses on this consistent, relative change.

      Different effects between the T-maze and the olfactory arena are found. The authors proposed that: "Punishment priming effect was still not detected, probably due to the insensitivity of the optogenetic arena". This is unclear to me, considering all prior work using this arena. The author should discuss it more clearly.

      The punishment effect with CS+ present was reliably detected in the T-maze (Figure 1A) but was not significant in the olfactory arena (Figure 2—figure supplement 1B-C). We hypothesize that the olfactory arena assay is less sensitive than the T-maze for detecting such subtle behavioral changes. This is evidenced by the fact that even classical odor-shock conditioning yields lower PI in the arena (typically ~0.4) than in the T-maze (~0.8), likely due to the greater distance flies must explore and travel. The higher variance in the arena may therefore mask more modest effects. Here the effect under investigation was induced by optogenetically activating only a small subset of aversive dopaminergic neurons, a stimulus that is likely weaker than full electric shock. This reduced stimulus strength may have contributed to the challenge of detecting a significant effect in the less sensitive arena paradigm.

      They mentioned that flies could not be conditioned with air and electric shock. However, flies could be conditioned with the context + shock, which is changing in the T-maze and not in the optogenetic area.

      While flies can be conditioned to context, during the optogenetic stimulation period in the arena, the light is delivered uniformly across all four quadrants. Therefore, any potential context conditioning would be equivalent across the entire chamber and should not bias the final distribution of flies between the odor and air quadrants during the test, nor affect the calculated PI score.

    1. eLife Assessment

      Liang et al. have conducted a small pilot study investigating the feasibility and tolerability of a regimen of neoadjuvant chemo-immunotherapy for non-small cell lung cancer, with lower cumulative dose of chemotherapy and with the immunotherapy delivered on D8 of each cycle. The clinical data are interesting and novel, and overall the findings of the study are valuable. However, the translational data and analyses are incomplete and do not support key claims in the title.

    2. Reviewer #1 (Public review):

      Liang et al. have conducted a small-scale pilot study focusing on the feasibility and tolerability of Low-dose chemotherapy combined with delayed immunotherapy in the neoadjuvant treatment of non-small cell lung cancer. The design of delayed immunotherapy after chemotherapy is relatively novel, while the reduced chemotherapy, although somewhat lacking in innovation, still serves as an early clue for exploring future feasible strategies. Also, the dynamic ctDNA and TCR profiles could give some important hints of intrinsic tumor reaction.

      However, as the author mentioned in the limitation part, due to the small sample size and lack of a control group, we cannot fully understand the advantages and disadvantages of this approach compared to standard treatment. Compared to standard immunotherapy, the treatment group in this study has three differences: (1) reduced chemotherapy, (2) the use of cisplatin instead of the commonly used carboplatin in neoadjuvant therapy trials, and (3) delayed immunotherapy. Generally, in the exploration of updated treatment strategies, the design should follow the principle of "controlling variables." If there are too many differences at once, it becomes difficult to determine which variable is responsible for the effects, leading to confusion in the interpretation of the results. Moreover, the therapeutic strategy may lack practical clinical operability due to the long treatment duration.

      Furthermore, in the exploration of biomarkers, the authors emphasized the procedure of whole RNA sequencing in tumor tissues in the method section, and this was also noted in the flowchart in Figure 1. However, I didn't find any mention of RNA-related analyses in the Results section, which raises some concerns about the quality of this paper for me. If the authors have inadvertently omitted some results, they should supplement the RNA-related analyses so that I can re-evaluate the paper.

      To sum up, this article exhibited a certain degree of innovation to some extent, However, due to its intrinsic design defects and data omissions, the quality of the research warranted further improvement.

    3. Reviewer #2 (Public review):

      Summary:

      In this single center, single arm, open label non-randomised study the authors tested the use of paclitaxel at 180-220 mg/m2 and cisplatin at 60mg/m2 in patients with squamous NSCLC and pemetrexed at 500mg/m2 and cisplatin at 60mg/m2 in adenocarcinoma of lung origin in the neoadjuvant setting. The chemotherapy appears to have been given at a relatively standard dose; though the platin dose at 60mg/m2 is somewhat lower than has been used in the checkmate 816 trial (75mg/m2/dose), this is a well-established dose for NSCLC.

      Key differences to currently approved neoadjuvant chemo-ICI treatment is that anti-PD1 antibody sintilimab (at 200mg/dose) was given on day 5 and that only 2 cycles of chemotherapy were given pre surgery, but then repeated on two occasions post surgery. Between May/2020 and Nov/2023 50 patients were screened, 38 went on to have this schedule of tx, 31 (~82%) went on to have surgery and 27 had the adjuvant treatment. The rate of surgery is entirely consistent with the checkmate 816 data.

      Question to the authors:

      It would be very helpful to understand why 7 (~18% of the population) patients did not make it to surgery and whether this is related to disease progression, toxicity or other reasons for withdrawal.

      The key clinical endpoints were pCR and mPR rates. 2/38 patients are reported to have achieved a radiological pCR but only 31 patients underwent surgery with histological verification. Supp table2 suggests that 10/31 patients achieved a pCR, 6/31 additional patients achieved a major pathological response and that 13/31 did not achieve a major pathological response

      It would be really helpful for understanding the clinical outcome to present the histopathological findings in the text in a bit more detail and to refer the outcome to the radiological findings. I note that the reference for pathological responses incorrectly is 38 patients as only 31 patients underwent surgery and were evaluated histologically.

      The treatment was very well tolerated with only 1 grade 3 AE reported. The longer term outcome will need to be assessed over time as the cohort is very 'young'. It is not clear what the adjuvant chemo-ICI treatment would add and how this extra treatment would be evaluated for benefit - if all the benefit is in the neoadjuvant treatment then the extra post-operative tx would only add toxicity

      Please consider what the two post-operative chemo-ICI cycles might add to the outcome and how the value of these cycles would be assessed. Would there be a case for a randomised assessment in the patients who have NOT achieved a mPR histologically?

      While the clinical dataset identifies that the proposed reduced chemo-ICI therapy has clinical merit and should be assessed in a randomized study, the translational work is less informative.

      The authors suggest that the treatment has a positive impact on T lymphocytes. Blood sampling was done at day 0 and day 5 of each of the four cycle of chemotherapy with an additional sample post cycle 4. The authors state that data were analysed at each stage.

      The data in Figure 3B are reported for three sets of pairs: baseline to pre day 5 in cycle 1, day 5 to day 21 in cycle 1, baseline of cycle to to day 5. It remains unclear whether the datasets contain the same top 20 clones and it would be very helpful to show kinetic change for the individual 'top 20 clones' throughout the events in individual patients; as it stands the 'top20 clones' may vary widely from timepoint to timepoint. Of note, the figures do not demonstrate that the top 20 TCR clones were 'continuously increased'.

      Instead, the data suggest that there are fluctuations in the relative distributions over time but that may simply be a reflection of shifts in T cell populations following chemotherapy rather than of immunological effects in the cancer tissue.<br /> Consistent with this the authors conclude (line 304/5): "No significant difference was observed in the diversity, evenness, and clonality of TCR clones across the whole treatment procedure" and this seems to be a more persuasive conclusion than the statement 'that a positive effect on T lymphocytes was observed' - where it is also not clear what 'positive' means.

      The text needs a more balanced representation of the data: only a small subset of four patients appear to have been evaluated to generate the data for figure 3B and only three patients (P5, P6, P7) can have contributed to figure 3C if the sample collection is represented accurately in Figure 3A.

      The text refers to flow cytometric results in SF3. However, no information is given on the flow cytometry in M&M, markers or gating strategy.

      Please consider changing the terminology of the 'phases' into something that is easier to understand. One option would be to use a reference to a more standard unit (cycle 1-4 of chemotherapy and then d0/d5/d21).

      Please make it explicit in the text that molecular analyses were undertaken for some patients only, and how many patients contribute to the data in figures 3B-F. Figure 3A suggests paired mRNA data were obtained in 2 patients (P2 and P5) but I cannot find the results on these analyses; four individual blood samples to assess TCR changes int PH1/PH2/PH3and PH4 were only available in four patients (P4,P5,P7,P9). Only three patients seem to have the right samples collected to allow the analysis for 'C3' in figure 3C.

      Please display for each of the 'top 20 clones' at any one timepoint how these clones evolve throughout the study; I expect that a clone that is 'top 20' at a given timepoint may not be among the 'top twenty' at all timepoints.

      Please also assess if the expanded clonotypes are present (and expanded) in the cancer tissue at resection, to link the effect in blood to the tumour. Given that tissue was collected for 31 patients, mRNA sequencing to generate TCR data should be possible to add to the blood analyses in the 12 patients in Figure 3A. Without this data no clear link can be made to events in the cancer.

      Please provide in M&M the missing information on the flow cytometry methodology (instrument, antibody clones, gating strategy) and what markers were used to define T cell subsets (naïve, memory, central memory, effector memory).

      The authors also describe that ctDNA reduces after chemo-ICI treatment. This is well documented in their data but ultimately irrelevant: if the cancer volume is reduced to the degree of a radiological or pathological response /complete response then the quantity of circulating DNA from the cancer cells must reduce. More interesting would be the question whether early changes predict clinical outcome and whether recurrent ct DNA elevations herald recurrence.

      Please probe whether the molecular data identify good radiological or pathological outcomes before cycle 2 is started and whether the ctDNA levels identify patients who will have a poor response and/or who relapse early.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Liang et al. have conducted a small-scale pilot study focusing on the feasibility and tolerability of Low-dose chemotherapy combined with delayed immunotherapy in the neoadjuvant treatment of non-small cell lung cancer. The design of delayed immunotherapy after chemotherapy is relatively novel, while the reduced chemotherapy, although somewhat lacking in innovation, still serves as an early clue for exploring future feasible strategies. Also, the dynamic ctDNA and TCR profiles could give some important hints of intrinsic tumor reaction.

      However, as the author mentioned in the limitation part, due to the small sample size and lack of a control group, we cannot fully understand the advantages and disadvantages of this approach compared to standard treatment. Compared to standard immunotherapy, the treatment group in this study has three differences: (1) reduced chemotherapy, (2) the use of cisplatin instead of the commonly used carboplatin in neoadjuvant therapy trials, and (3) delayed immunotherapy. Generally, in the exploration of updated treatment strategies, the design should follow the principle of "controlling variables." If there are too many differences at once, it becomes difficult to determine which variable is responsible for the effects, leading to confusion in the interpretation of the results. Moreover, the therapeutic strategy may lack practical clinical operability due to the long treatment duration.

      Thank you for your advice. As you pointed out, incorporating too many variables can obscure research findings. Our study focuses on two primary objectives: (1) to demonstrate that our approach is less toxic than the standard regimen; and (2) to fully activate the immune system in order to achieve better therapeutic outcomes. Based on these two objectives, we reduced chemotherapy dosage to alleviate toxicity, and perform delayed immunotherapy administration to alleviate the killing of activated immune cells by chemotherapy so as to maximize the immune response. Therefore, the two variables of reduced chemotherapy and delayed immunotherapy are unified in this study. The reduction of cisplatin to 60mg/m2 is supported by data for Chinese people; A retrospective study conducted by our center found that delayed immunotherapy also has great therapeutic effects. Considering the previous blood toxicity of carboplatin and albumin paclitaxel, we replaced carboplatin with cisplatin to alleviate bone marrow suppression. Usually, our patients are hospitalized for 4-7 days to receive treatment, observe and manage potential side effects, including nausea, vomiting, diarrhea, bone marrow suppression and so on. Therefore, it is convenient and feasible for immunotherapy administration on the 5th day.

      Furthermore, in the exploration of biomarkers, the authors emphasized the procedure of whole RNA sequencing in tumor tissues in the method section, and this was also noted in the flowchart in Figure 1. However, I didn't find any mention of RNA-related analyses in the Results section, which raises some concerns about the quality of this paper for me. If the authors have inadvertently omitted some results, they should supplement the RNA-related analyses so that I can re-evaluate the paper.

      Thanks for your comment. In this study, we employed a multi-omics approach involving whole transcriptome, ctDNA, and TCR sequencing to investigate the effects of a neoadjuvant treatment on NSCLC. The sequencing details are described in the Materials and Methods section. RNA-related analyses are presented in Figure S3. Given that our primary focus is on the impact of this modified treatment on immune cells, we estimate immune cell compositions by using the xCell and immunCellAI algorithms based on the RNA sequencing results. The estimated immune cell profiles have been added to Supplementary Tables 5 and 6.

      To sum up, this article exhibited a certain degree of innovation to some extent, However, due to its intrinsic design defects and data omissions, the quality of the research warranted further improvement.

      Thanks for your comment. We have provided a more detailed explanation of the administration for all patients. Additionally, we have clarified and supplemented the sequencing results to enhance the clarity and overall quality of the article.

      Reviewer #2 (Public review):

      Summary:

      In this single center, single arm, open label non-randomised study the authors tested the use of paclitaxel at 180-220 mg/m2 and cisplatin at 60mg/m2 in patients with squamous NSCLC and pemetrexed at 500mg/m2 and cisplatin at 60mg/m2 in adenocarcinoma of lung origin in the neoadjuvant setting. The chemotherapy appears to have been given at a relatively standard dose; though the platin dose at 60mg/m2 is somewhat lower than has been used in the checkmate 816 trial (75mg/m2/dose), this is a well-established dose for NSCLC.

      Key differences to currently approved neoadjuvant chemo-ICI treatment is that anti-PD1 antibody sintilimab (at 200mg/dose) was given on day 5 and that only 2 cycles of chemotherapy were given pre surgery, but then repeated on two occasions post surgery. Between May/2020 and Nov/2023 50 patients were screened, 38 went on to have this schedule of tx, 31 (~82%) went on to have surgery and 27 had the adjuvant treatment. The rate of surgery is entirely consistent with the checkmate 816 data.

      Question to the authors:

      It would be very helpful to understand why 7 (~18% of the population) patients did not make it to surgery and whether this is related to disease progression, toxicity or other reasons for withdrawal.

      Thank you for your comment. No patients were denied surgery due to disease progression or side effects. 7 patients did not undergo surgery: three declined to undergo total pneumonectomy, 2 were unable to come to our hospital for treatment because of the COVID-19 pandemic, and 2 were ineligible for radical surgery due to tumor invasion of the arteries.

      The key clinical endpoints were pCR and mPR rates. 2/38 patients are reported to have achieved a radiological pCR but only 31 patients underwent surgery with histological verification. Supp table2 suggests that 10/31 patients achieved a pCR, 6/31 additional patients achieved a major pathological response and that 13/31 did not achieve a major pathological response.

      It would be really helpful for understanding the clinical outcome to present the histopathological findings in the text in a bit more detail and to refer the outcome to the radiological findings. I note that the reference for pathological responses incorrectly is 38 patients as only 31 patients underwent surgery and were evaluated histologically.

      Thanks for your comment. The ITT population consisted of 38 individuals, of whom 31 underwent surgery. After surgery, 18 patients achieved MPR, including 12 achieved pCR and 13 achieved non-MPR. So for ITT population, the rate of pCR and MPR is 12/38 (31.6%) and 18/38 (47.4%) respectively; for patients who have completed surgery, both pCR and MPR have improved, accounting for 12/31 (38.7%) and 18/31 (58.1%) respectively (Results, line 268 to 269).

      Author response image 1.

      The treatment was very well tolerated with only 1 grade 3 AE reported. The longer term outcome will need to be assessed over time as the cohort is very 'young'. It is not clear what the adjuvant chemo-ICI treatment would add and how this extra treatment would be evaluated for benefit - if all the benefit is in the neoadjuvant treatment then the extra post-operative tx would only add toxicity.

      Please consider what the two post-operative chemo-ICI cycles might add to the outcome and how the value of these cycles would be assessed. Would there be a case for a randomised assessment in the patients who have NOT achieved a mPR histologically?

      Thanks for your comment. The purpose of postoperative adjuvant therapy is to prevent recurrence and metastasis.  Both clinical trial Keynote091 and Impower010 have achieved positive test results. The clinical trial design of Checkmate-77T is neoadjuvant therapy followed by surgery and adjuvant therapy. Checkmate-77T resulted in significantly longer event-free survival than chemotherapy in patients with resectable NSCLC. So we designed this perioperative treatment method, which is currently a common approach, hoping to reduce tumor burden and improve surgical remission rate through neoadjuvant therapy; and to kill residual tumor cells and prolong the DFS through adjuvant therapy. As for DFS, follow-up shows that there are currently 3 cases of recurrence, but the overall data is not yet mature (updated in Table S1). The side effect includes all patients who received neoadjuvant therapy and adjuvant therapy, and the addition of immunotherapy shows no new safety signals.

      While the clinical dataset identifies that the proposed reduced chemo-ICI therapy has clinical merit and should be assessed in a randomized study, the translational work is less informative.

      Thanks for your comment. As mentioned in the shortcomings of the article, our research is preliminary and exploratory, and more large-scale randomized studies are needed to be invested in the future.

      The authors suggest that the treatment has a positive impact on T lymphocytes. Blood sampling was done at day 0 and day 5 of each of the four cycle of chemotherapy with an additional sample post cycle 4. The authors state that data were analysed at each stage.

      The data in Figure 3B are reported for three sets of pairs: baseline to pre day 5 in cycle 1, day 5 to day 21 in cycle 1, baseline of cycle to to day 5. It remains unclear whether the datasets contain the same top 20 clones and it would be very helpful to show kinetic change for the individual 'top 20 clones' throughout the events in individual patients; as it stands the 'top20 clones' may vary widely from timepoint to timepoint. Of note, the figures do not demonstrate that the top 20 TCR clones were 'continuously increased'.

      Thanks for your comment. The data in Fig. 3B do not represent the overlapping top 20 clones across all samples but rather illustrate the changes in the individual top 20 clones for each patient. The changes in the top 20 TCR clones during neoadjuvant treatment for specific samples are shown in Fig. S1. Due to tumor heterogeneity, both within and between samples, the top 20 clones for each patient at the same time point may differ. Additionally, since the top 20 TCR clones can vary between stages as a result of antigen exposure over time, the top 20 clones for the same patient may also differ across different time points. Indeed, when analyzing the data, we measured the dynamic changes of the top 20 TCR clones across three stages in cycle 1, and describing these changes as "continuously increased" may not be entirely accurate. Therefore, we believe it is more accurate to correct it to a phased increase. (Results line 293).

      Instead, the data suggest that there are fluctuations in the relative distributions over time but that may simply be a reflection of shifts in T cell populations following chemotherapy rather than of immunological effects in the cancer tissue.<br /> Consistent with this the authors conclude (line 304/5): "No significant difference was observed in the diversity, evenness, and clonality of TCR clones across the whole treatment procedure" and this seems to be a more persuasive conclusion than the statement 'that a positive effect on T lymphocytes was observed' - where it is also not clear what 'positive' means.

      Thanks for your comment. The scores for diversity, evenness, and clonality assess changes in the overall TCR repertoire. In our cohort, we did not observe significant changes in these three metrics throughout the treatment process, indicating the overall stability of the TCR repertoire. Despite this overall stability, we observed a significant increase in the top 20 and large clones—representative of major TCR clone dynamics—during the treatment period. Additionally, integrating RNA results (Table S5-S6 and Fig. S3) from baseline and surgical samples, we found an increasing trend in the proportion of T cells following neoadjuvant therapy. Therefore, we suggested that the treatment has a positive effect on T lymphocytes.

      The text needs a more balanced representation of the data: only a small subset of four patients appear to have been evaluated to generate the data for figure 3B and only three patients (P5, P6, P7) can have contributed to figure 3C if the sample collection is represented accurately in Figure 3A.

      Thanks for your comment. In Fig. 3B, we utilized TCR data from six patients (P1, P2, P3, P10, P11, P12) for the period from day 1 to day 5 of cycle 1. For the period from day 5 of cycle 1 to day 1 of cycle 2, we used data from six patients (P1, P2, P5, P10, P11, P12). For the period from day 1 of cycle 2 to day 5 of cycle 2, we included data from five patients (P2, P4, P10, P11, P12). In Fig. 3C, we used TCR data from eight patients (P1, P2, P4, P6, P7, P10, P11, P12) to generate the images for cycle 1, and data from two patients (P6, P7) to create the images for cycle 3. Therefore, the sampling illustration in Fig. 3A is accurate.

      The text refers to flow cytometric results in SF3. However, no information is given on the flow cytometry in M&M, markers or gating strategy.

      Thanks for your comment. In this study, we performed tissue sampling and whole transcriptome sequencing at both the baseline and surgical stages. Based on the sequencing results, we evaluated T cell populations using two algorithms, xCell and immunoCellAI, and detailed the analysis procedures in the Methods and Materials section. Additionally, we have included the assessment results from both algorithms in Supplementary Tables 5 and 6.

      Please consider changing the terminology of the 'phases' into something that is easier to understand. One option would be to use a reference to a more standard unit (cycle 1-4 of chemotherapy and then d0/d5/d21).

      Thanks for your advice. Since each treatment cycle consists of both chemotherapy and immunotherapy, with chemotherapy administered on day 1 and immunotherapy on day 5 of each cycle, blood samples are collected at these two time points. Following your suggestion, we will use the notation d0/d5 within each treatment cycle to better clarify this process for the readers.

      Please make it explicit in the text that molecular analyses were undertaken for some patients only, and how many patients contribute to the data in figures 3B-F. Figure 3A suggests paired mRNA data were obtained in 2 patients (P2 and P5) but I cannot find the results on these analyses; four individual blood samples to assess TCR changes int PH1/PH2/PH3and PH4 were only available in four patients (P4,P5,P7,P9). Only three patients seem to have the right samples collected to allow the analysis for 'C3' in figure 3C.

      Thanks for your comment. In Fig. 3B and 3D, we used TCR data from six patients (P1, P2, P3, P10, P11, P12) for the period from day 0 to day 5 of cycle 1. For the period from day 5 of cycle 1 to day 0 of cycle 2, data from six patients (P1, P2, P5, P10, P11, P12) were used. For the period from day 0 of cycle 2 to day 5 of cycle 2, we included data from five patients (P2, P4, P10, P11, P12). In Fig. 3C and 3E, TCR data from eight patients (P1, P2, P4, P6, P7, P10, P11, P12) were used to generate the images for cycle 1, while data from two patients (P6, P7) were used to create the images for cycle 3. In Fig. 3F, all patients who underwent sequencing are included in the analysis, with each patient's data represented by dots of different colors.

      For the mRNA data, we sampled and sequenced five patients (P1, P2, P4, P5, P7) before treatment. During the surgical phase, we sampled and sequenced three patients (P2, P5, P6). The T cell assessments and comparisons based on the mRNA sequencing results are presented in Fig. S3 and Tables S5-S6.

      Please display for each of the 'top 20 clones' at any one timepoint how these clones evolve throughout the study; I expect that a clone that is 'top 20' at a given timepoint may not be among the 'top twenty' at all timepoints.

      Thanks for your comment. Yes, due to the heterogeneity of tumors, a variety of different antigens are exposed during the course of cancer treatment. As a result, the formation of TCR dominant clones is a dynamic process, with new dominant clones emerging at each stage. Therefore, the top 20 clones at each time point do not necessarily represent the overall top 20 clones across all time points. However, there is still some overlap in the dominant TCR clones. We have chosen to present the data from P2, which provides the most complete results throughout the entire treatment process.

      Author response image 2.

      Please also assess if the expanded clonotypes are present (and expanded) in the cancer tissue at resection, to link the effect in blood to the tumour. Given that tissue was collected for 31 patients, mRNA sequencing to generate TCR data should be possible to add to the blood analyses in the 12 patients in Figure 3A. Without this data no clear link can be made to events in the cancer.

      Thanks for your comment. Due to limitations in sampling conditions, we were unable to collect samples from all patients at every time point. As shown in Fig. 3A, we performed tissue sampling and RNA sequencing on five patients (P1, P2, P4, P5, P7) before treatment. During the surgical phase, we sampled and conducted RNA sequencing on three patients (P2, P5, P6). This study primarily focuses on TCR analysis in peripheral blood. The relationship between peripheral blood TCR and tissue TCR clones will be addressed in future research.

      Please provide in M&M the missing information on the flow cytometry methodology (instrument, antibody clones, gating strategy) and what markers were used to define T cell subsets (naïve, memory, central memory, effector memory).

      Thanks for your comment. In this study, we evaluated immune cells based on RNA sequencing results rather than using flow cytometry. Subsequently, we compared T cell subsets between the baseline and post-neoadjuvant treatment stages. The steps for RNA sequencing and the evaluation of immune cells using the xCell and ImmunoCellAI algorithms are detailed in the Methods and Materials section. The comparison of T cell subsets is presented in Fig. S3. The estimated immune cell data have been added to Tables S5 and S6.

      The authors also describe that ctDNA reduces after chemo-ICI treatment. This is well documented in their data but ultimately irrelevant: if the cancer volume is reduced to the degree of a radiological or pathological response /complete response then the quantity of circulating DNA from the cancer cells must reduce. More interesting would be the question whether early changes predict clinical outcome and whether recurrent ct DNA elevations herald recurrence.

      Thanks for your comment. If the tumor responds to treatment, its volume will decrease. Over the long term, ctDNA levels in the blood are expected to decline. However, in the short term, as tumor cells are killed, there may be a surge of ctDNA released into the patient's bloodstream, potentially causing a rise in the maxVAF. Based on the current follow-up data, the ctDNA maxVAF for patient P8 has increased compared to baseline levels. However, given the relatively short follow-up period, no recurrence has been observed yet.

      Please probe whether the molecular data identify good radiological or pathological outcomes before cycle 2 is started and whether the ctDNA levels identify patients who will have a poor response and/or who relapse early.

      Thanks for your comment. Before initiating Cycle 2 of treatment, we observed all patients whom performed ctDNA sequencing. Among them, Patients P1 to P4 were classified as MPR, whereas Patients P5 to P9 were categorized as non-MPR. It was noted that Patients P7 and P8 showed a trend of increasing maximum variant allele frequency (maxVAF) in their ctDNA. Thus, 50% (2 out of 4) of the MPR patients could be identified as having potential issues through molecular testing before Cycle 2. Additionally, only P3 experienced a recurrence, which was predicted by molecular testing prior to starting cycle 2.

      Author response image 3.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I have some detailed comments for the authors:

      (1) Please explain the reason for putting forward the opinion that "cytotoxic drugs with standard doses and anti-PD1 antibody were administrated on the same day (9), which may result in unsatisfactory eradication rates and relatively high incidence of severe treatment-related adverse events (TRAEs)" (Page 3 Line 76), especially "unsatisfactory eradication rates". Is this based on actual evidence, or is it purely theoretical speculation?

      Thanks for your comment. Our team have done relative research to explore impact of the combined timing of PD-1/PD-L1 inhibitors and chemotherapy on the outcomes in patients with refractory lung cancer. Our findings suggest that administering PD-1/PD-L1 inhibitors 1-10 days (especially 3-5 days) after chemotherapy is superior to administering PD-1/PD-L1 inhibitors before or concurrent with chemotherapy in patients with refractory lung cancer, but this result needs to be further explored by prospective studies. So we infer that cytotoxic drugs with standard doses and anti-PD1 antibody were administrated on the same day may lead to unsatisfactory eradication rates and more side-effects.

      Yao W, Zhao X, Gong Y, Zhang M, Zhang L, Wu Q, et al. Impact of the combined timing of PD-1/PD-L1 inhibitors and chemotherapy on the outcomes in patients with refractory lung cancer. ESMO Open. 2021;6(2):100094.

      (2) Due to the lack of a control group, we cannot assess the advantages and disadvantages of this treatment strategy compared to standardized neoadjuvant immuno-chemotherapy. We can refer to historical data. In the current clinical trials on neoadjuvant chemotherapy combined with immunotherapy (CheckMate-816, etc), what is the proportion of patients who had their chemotherapy reduced due to adverse reactions? Is there a difference in their efficacy? This could serve as a good historical reference.

      Thanks for your comment. In checkmate816, the rate of off neoadjuvant treatment in treatment group and control treatment group is 5.7% and 6.8% respectively. No patients have reduced their chemotherapy dosage due to intolerable side effects. However, it’s a excellent suggestion to find a historical refence, so we will check details in other clinical trials.

      (3) Among the 38 patients, there are 21 cases of SCC and 17 cases of LUAD. From the protocol, it can be seen that SCC patients had both albumin-bound paclitaxel and cisplatin reduced, whereas LUAD patients did not have a reduction in pemetrexed, only in cisplatin. Considering the different pathological subtypes and treatment strategies, I suggest the author to present the efficacy data for SCC and LUAD separately rather than combining them together.

      Thanks for your comment. In this cohort of 31 patients who underwent pathological evaluation, the ratio of squamous cell carcinoma (SCC) to lung adenocarcinoma (LUAD) was 16 vs 15. Upon comparing the groups, no statistically significant difference was observed in the treatment efficacy between SCC and LUAD patients.

      Author response table 1.

      (4) In the discussion, the authors mention that during the adjuvant treatment phase, "no significant change was observed in evenness or clonality of TCR" (Page 13, Line 364). However, in Figure 3E, it can be seen that the evenness and clonality of TCR during the adjuvant treatment phase (i.e., C3) are significantly increased (P < 0.05).

      Thanks for your comment. For the TCR repertoire evenness and clonality, we present these metrics in Fig. S2 B-C. Throughout the treatment process of all patients, there were no significant changes in the Pielou index (representing evenness) or clonality. In Fig. 3E, we defined TCR clones with a frequency greater than 0.001 as "large clones" and examined their changes during cycle 1 and cycle 3. Therefore, although there was a significant increase in large clones during cycle 3, the overall TCR evenness and clonality did not show notable changes.

      (5) The authors indicated that low-dose chemotherapy does not inhibit TCR expansion; however, due to the lack of a control group, we cannot conclude that "standard doses would affect TCR expansion." To better explore this possibility, please analyze the differences in TCR expansion between patients with bone marrow suppression and those without.

      We analyzed the incidence of bone marrow suppression in patients who underwent blood TCR testing. The statistical results are shown in the figure below. Patients were grouped based on the presence or absence of bone marrow suppression to compare differences in TCR clonal dynamics between the two groups during neoadjuvant therapy. As shown in the figure below, patients in the non-bone marrow suppression group exhibited higher TCR diversity (SW score) during treatment compared to those in the bone marrow suppression group. During neoadjuvant therapy, the dominant clones in both groups significantly increased from c2d0 to c2d5. However, from c1d0 to c2d0, there was no significant change observed in the non-bone marrow suppression group, possibly due to the limited sample size. Additionally, Patient P11 in the non-bone marrow suppression group showed a downward trend in dominant clones from c1d5 to c2d0, which may have influenced the overall results for this group during this phase.

      Author response table 2.

      Author response image 4.

      (6) In the analysis of ctDNA maxVAF, I noticed that one patient showed a significant drop at T1 (after C1 chemotherapy), followed by a notable rebound at T2 (after C1 delayed immunotherapy), and then a decline again at T3 (after C2 chemotherapy). Theoretically, maxVAF can reflect tumor burden and should change in accordance with treatment response. Could this indicate that the patient has a poor response to the delayed immunotherapy without concurrent chemotherapy? Additionally, please examine this patient's efficacy separately. What is the status of dynamic TCR? Does it show a trend opposite to that of maxVAF?

      Thanks for your comment. For Patient P7, the radiological evaluation reached PR, while the pathological assessment was non-MPR. The naming of time points has been revised according to the requirements: T0, T1, T2, and T3 were changed to c1d0, c1d5, c2d0, and c2d5, respectively. Combining both radiological and pathological evaluations, the patient experienced a certain degree of tumor shrinkage during neoadjuvant therapy but still retained some residual tumor cells. Theoretically, maxVAF can reflect the tumor burden in the bloodstream as a real-time indicator of treatment response. For patients with long-term benefits, maxVAF is expected to decrease as tumors are eliminated. However, in the short term, the release of large amounts of clonal ctDNA from destroyed tumor cells may lead to a temporary increase in maxVAF. Therefore, it is not possible to conclude that this patient had an adverse response to delayed immunotherapy based on individual cases. The increase in maxVAF from c1d5 to c2d0 might result from the extensive release of newly exposed antigens. During this period, the top 20 and large clone TCRs did not show significant changes, suggesting that the patient's immune response was insufficient, leading to suboptimal neoadjuvant treatment efficacy and failure to achieve MPR. Additionally, there were no noticeable changes in maxVAF or TCR metrics from c1d0 to c2d0 for this patient, indicating that there is no evidence to suggest an inverse trend between TCR and maxVAF.

      Author response image 5.

      (7) In line with the previous question, another patient's maxVAF shows a significant rebound at T3. Please examine this patient's efficacy as well as the status of dynamic TCR.

      Thanks for your comment. For Patient P4, the radiographic assessment showed SD, while the pathological assessment indicated a MPR. Although the reduction rate of the tumor volume in this patient was low, the tumor cell content within the lesion was less than 10%, which suggests that this patient had a good response to neoadjuvant therapy. From c1d0 to c2d0, the maxVAF of this patient showed a downward trend, while there was no significant change in the dominant clone indices of the TCR. From c2d0 to c2d5, both the maxVAF and the TCR dominant clone indices increased significantly. This implies that this patient had a stronger immune response level compared to Patient P7.

      Author response image 6.

      Minor Comments:

      (1) Figure 2E shows only OS, but the corresponding description in the text mentions that OS and DFS are not reached.

      Thanks for your comment. Both OS and disease-free survival (DFS) records are available in Table S1. By January 31, 2025, the follow-up data were updated for 31 patients in Supplementary Table1. Among them, three patients experienced tumor recurrence, one of whom passed away. Additionally, seven patients were lost to follow-up. As a result, neither the overall survival (OS) nor the progression-free survival (PFS) reached the median number of events required for analysis. Since neither OS nor DFS have reached their median values, we opted to display only the OS in Fig. 2E.

      (2) In the Discussion section, it is mentioned that there is controversy regarding chemotherapy combined with immunotherapy. I disagree with this statement. I believe that chemotherapy combined with immunotherapy is a consensus. The wording should be revised accordingly.

      Thanks for your comment. Yes, as you said, the combination of chemotherapy and immunotherapy has become a consensus. What we want to express is that how to optimize the administration time and dosage is worth further exploration. We will make a revise accordingly (Discussion line 328-331).

      (3) The authors mentioned that the study involves multi-omics, but only ctDNA and TCR levels are included, with no RNA-related content observed. Perhaps a different term could be used.

      Thanks for your comment. In this study, we employed a multi-omics approach involving whole transcriptome, ctDNA, and TCR sequencing to investigation. RNA-related analyses are presented in Figure S3. Given that our primary focus is on the impact of this modified treatment on immune cells, we utilized RNA sequencing results to estimate immune cell compositions using the xCell and immunCellAI algorithms. The estimated immune cell profiles have been added to Supplementary Tables 5 and 6.

      Reviewer #2 (Recommendations for the authors):

      Additional comment to the authors:

      The methods section refers to mRNA sequencing of the tumour tissue to define immune cell populations. Figure 3A also identifies that up to two timepoints were to be sequenced for individual patients. I could not find the results in the document.

      Please review the methods section and remove experimental methods where no data are presented.

      Thanks for your comment. As shown in Fig. 3A, for the mRNA data, we sampled and sequenced five patients (P1, P2, P4, P5, P7) before treatment. During the surgical phase, we sampled and sequenced three patients (P2, P5, P6). Then we utilized RNA sequencing results to estimate immune cell compositions using the xCell and immunCellAI algorithms. The estimated immune cell data have been added to Supplementary Tables 5 and 6. The T cells proportion comparisons were shown in fig. S3. The description of Whole transcriptome sequencing and immune cell abundance estimation were detailed in methods section.

    1. eLife Assessment

      Glioblastoma is among the most aggressive cancers without a cure, and its cells are characterized by high mitochondrial membrane potential. This manuscript provides solid evidence that glioblastoma tumorigenesis is closely linked to mitochondrial stress. The study makes a valuable contribution to the field by advancing our understanding of the metabolic mechanisms driving glioblastoma and highlighting potential therapeutic targets.

    2. Reviewer #1 (Public review):

      Summary:

      Cai et al have investigated the role of msiCAT-tailed mitochondrial proteins that frequently exist in glioblastoma stem cells. Overexpression of msiCAT-tailed mitochondrial ATP synthase F1 subunit alpha (ATP5) protein increases the mitochondrial membrane potential and blocks mitochondrial permeability transition pore formation/opening. These changes in mitochondrial properties provide resistance to staurosporine (STS)-induced apoptosis in GBM cells. Therefore, msiCAT-tailing can promote cell survival and migration, while genetic and pharmacological inhibition of msiCAT-tailing can prevent the overgrowth of GBM cells.

      Strengths:

      The CATailing concept has not been explored in cancer settings. Therefore, the present provides new insights for widening the therapeutic avenue.

      Weaknesses:

      Although the paper does have strengths in principle, the weaknesses of the paper are that these strengths are not directly demonstrated.

      The conclusions of this paper are mostly well supported by data, but some aspects of image acquisition and data analysis need to be clarified and extended.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Cai et al have investigated the role of msiCAT-tailed mitochondrial proteins that frequently exist in glioblastoma stem cells. Overexpression of msiCAT-tailed mitochondrial ATP synthase F1 subunit alpha (ATP5) protein increases the mitochondrial membrane potential and blocks mitochondrial permeability transition pore formation/opening. These changes in mitochondrial properties provide resistance to staurosporine (STS)-induced apoptosis in GBM cells. Therefore, msiCAT-tailing can promote cell survival and migration, while genetic and pharmacological inhibition of msiCAT-tailing can prevent the overgrowth of GBM cells.

      Strengths:

      The CAT-tailing concept has not been explored in cancer settings. Therefore, the present provides new insights for widening the therapeutic avenue. 

      Your acknowledgment of our study's pioneering elements is greatly appreciated.

      Weaknesses:

      Although the paper does have strengths in principle, the weaknesses of the paper are that these strengths are not directly demonstrated. The conclusions of this paper are mostly well-supported by data, but some aspects of image acquisition and data analysis need to be clarified and extended.

      We are grateful for your acknowledgment of our study’s innovative approach and its possible influence on cancer therapy. We sincerely appreciate your valuable feedback. In response, this updated manuscript presents substantial new findings that reinforce our central argument. Moreover, we have broadened our data analysis and interpretation, as well as refined our methodological descriptions.

      Reviewer #2 (Public Review):

      This work explores the connection between glioblastoma, mito-RQC, and msiCAT-tailing. They build upon previous work concluding that ATP5alpha is CAT-tailed and explore how CAT-tailing may affect cell physiology and sensitivity to chemotherapy. The authors conclude that when ATP5alpha is CAT-tailed, it either incorporates into the proton pump or aggregates and that these events dysregulate MPTP opening and mitochondrial membrane potential and that this regulates drug sensitivity. This work includes several intriguing and novel observations connecting cell physiology, RQC, and drug sensitivity. This is also the first time this reviewer has seen an investigation of how a CAT tail may specifically affect the function of a protein. However, some of the conclusions in this work are not well supported. This significantly weakens the work but can be addressed through further experiments or by weakening the text.

      We appreciate the recognition of our study's novelty. To address your concerns about our conclusions, we have revised the manuscript. This revision includes new data and corrections of identified issues. Our detailed responses to your specific points are outlined below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Figure 1B, please replace the high-exposure blots of ATP5 and COX with representative results. The current results are difficult to interpret clearly. Additionally, it would be helpful if the author could explain the nature of the two different bands in NEMF and ANKZF1. Did the authors also examine other RQC factors and mitochondrial ETC proteins? I'm also curious to understand why CAT-tailing is specific to C-I30, ATP5, and COX-V, and why the authors did not show the significance of COX-V.

      We appreciate your inquiry regarding the data.  Additional attempts were made using new patient-derived samples; however, these results did not improve upon the existing ATP5⍺, (NDUS3)C-I30, and COX4 signals presented in the figure.  This is possibly due to the fact that CAT-tail modified mitochondrial proteins represent only a small fraction of the total proteins in these cells.  It is acknowledged that the small tails visible above the prominent main bands are not particularly distinct. To address this, the revised version includes updated images to better illustrate the differences. We believe the assertion that GBM/GSCs possess CAT-tailed proteins is substantiated by a combination of subsequent experimental findings. The figure (refer to new Fig. 1B) serves primarily as an introduction. It is important to note that the CAT-tailed ATP5⍺ plays a vital role in modulating mitochondrial potential and glioma phenotypes, a function which has been demonstrated through subsequent experiments.

      It is acknowledged that the CAT-tail modification is not exclusive to the ATP5⍺protein.  ATP5⍺ was selected as the primary focus of this study due to its prevalence in mitochondria and its specific involvement in cancer development, as noted by Chang YW et al.  Future research will explore the possibility of CAT tails on other mitochondrial ETC proteins. Currently, NDUS3 (C-I30), ATP5⍺, and COX4 serve as examples confirming the existence of these modifications. It remains challenging to detect endogenous CAT-tailing, and bulk proteomics is not yet feasible for this purpose. COX4 is considered significant.  We hypothesize that CAT-tailed COX4 may function similarly to the previously studied C-I30 (Wu Z, et al), potentially causing substantial mitochondrial proteostasis stress.  

      Concerning RQC proteins, our blotting analysis of GBM cell lines now includes additional RQC-related factors. The primary, more prominent bands (indicated by arrowheads) are, in our assessment, the intended bands for NEMF and ANKZF1.  Subsequent blotting analyses showed only single bands for both ANKZF1 and NEMF, respectively. The additional, larger molecular weight band of NEMF, which was initially considered for property analysis (phosphorylation, ubiquitination, etc.), was not examined further as it did not appear in subsequent experiments (refer to new Fig. S1C).

      References:

      Chang YW, et al. Spatial and temporal dynamics of ATP synthase from mitochondria toward the cell surface. Communications biology. 2023;6(1).

      Wu Z, et al. MISTERMINATE Mechanistically Links Mitochondrial Dysfunction With Proteostasis Failure. Molecular cell. 2019;75(4).

      (2) In addition to Figure 1B, it would be interesting to explore CAT-tailed mETC proteins in cancer tissue samples.

      This is an excellent point, and we appreciate the question. We conducted staining for ATP5⍺ and key RQC proteins in both tumor and normal mouse tissues. Notably, ATP5⍺ in GBM exhibited a greater tendency to form clustered punctate patterns compared to normal brain tissue, and not all of it co-localized with the mitochondrial marker TOM20 (refer to new Fig. S3C-E). Crucially, we observed a significant increase in NEMF expression within mouse xenograft tumor tissues, alongside a decrease in ANKZF1 expression (refer to new Fig. S1A, B). These findings align with our observations in human samples.

      (3) Please knock down ATP5 in the patient's cells and check whether both the upper band and lower band of ATP5 have disappeared or not.

      This control was essential and has been executed now. To validate the antibody's specificity, siRNA knockdown was performed. The simultaneous elimination of both upper and lower bands upon siRNA treatment (refer to new Fig. S2A) confirms they represent genuine signals recognized by the antibody.

      (4) In Figure 1C and ID, add long exposure to spot aggregation and oligomer. Figure 1D, please add the blots where control and ATP5 are also shown in NHA and SF (similar to SVG and GSC827).

      New data are included in the revised manuscript to address the queries. Specifically, the new Fig 1D now displays the full queue as requested, featuring blots for Control, ATP5α, AT3, and AT20. Our analysis reveals that AT20 aggregates exhibit higher expression and accumulation rates in GSC and SF cells.

      Fig. 1C has been updated to include experimental groups treated with cycloheximide and sgNEMF. Our results show that sgNEMF effectively inhibits CAT-tailing in GBM cell lines, whereas cycloheximide has no impact. After consulting with the Reporter's original creator and optimizing expression conditions, we observed no significant aggregates with β-globin-non-stop protein, potentially due to the length of endogenous CAT-tail formation (as noted by Inada, 2020, in Cell Reports). Our analysis focused on the ratio of CAT-tailed (red box blots) and non-CAT-tailed proteins (green box blots). Comparing these ratios revealed that both anisomycin treatment and sgNEMF effectively hinder the CAT-tailing process, while cycloheximide has no effect.

      (5) In Figure 1E, please double-check the results with the figure legend. ATP5A aggregated should be shown endogenously. The number of aggregates shown in the bar graph is not represented in micrographs. Please replace the images. For Figure 1E, to confirm the ATP5-specific aggregates, it would be better if the authors would show endogenous immunostaining of C-130 and Cox-IV.

      Labels in Fig. 1E were corrected to reflect that the bar graph in Fig. 1F indicates the number of cells with aggregates, not the quantity of aggregates per cell. The presence of endogenous ATP5⍺ is accurately shown. To address the specificity of ATP5⍺, immunostaining for endogenous NUDS3 was conducted. This revealed NUDS3 aggregation in GBM cells (SF and GSC) lacking TOM20, as demonstrated in the new Fig. S3A, B. These findings suggest NUDS3 also undergoes CAT-tailing modification, similar to ATP5⍺.

      (6) Figure 3A. Please add representative images in the anisomycin sections. It is difficult to address the difference.

      We appreciate your feedback. Upon re-examining the Calcein fluorescence intensity data in Fig. 3A, we believe the images accurately represent the statistical variations presented in Fig. 3B. To address your concerns more effectively, please specify which signals in Fig. 3A you find potentially misleading. We are prepared to revise or substitute those images accordingly.

      (7) Figure 3D. If NEMF is overexpressed, is the CAT-tailing of ATP 5 reversed?

      Thank you. Your prediction aligns with our findings. We've added data to the revised Fig. S6A, B, which demonstrates that both NEMF overexpression and ANKZF1 knockdown lead to elevated levels of CRC. This increase, however, was not statistically significant in GSC cells. A plausible explanation for this discrepancy is that the MPTP of GSC cells is already closed, thus any additional increase in CAT-tailing activity does not result in further amplification.

      (8) Figure 3G. Why on the BN page are AT20 aggregates not the same as shown in Figure 2E?

      We appreciate your inquiry regarding the ATP5⍺ blots, specifically those in the original Fig. 3G (left) and 2E (right). Careful observation of the ATP5⍺ band placement in these figures reveals a high degree of similarity. Notably, there are aggregates present at the top, and the diffuse signals extend downwards. Given that this is a gradient polyacrylamide native PAGE, the concentration diminishes towards the top. Consequently, the non-rigid nature of the Blue Native PAGE gel may lead to slight variations in the aggregate signals; however, the overall patterns are very much alike. To mitigate potential misinterpretations, we have rearranged the blot order in the new Fig. 3M.

      (9) Figure 4D. The amount of aggregation mediated by AT20 is more compared to AT3. Why are there no such drastic effects observed between AT3 and AT20 in the Tunnel assay?

      The previous Figure 4D presents the quantification of cell migration from the experiment depicted in Figure 4C. But this is a good point. TUNEL staining results are directly influenced by mitochondrial membrane potential and the state of mitochondrial permeability transition pores (MPTP), not by the degree of protein aggregation. Our previous experiments showed comparable effects of AT3 and AT20 on mitochondria (Fig. 2E, 3K), which aligns with the expected similar outcomes on TUNEL staining. As for its biological nature, this could be very complicated. We hope to explore it in future studies.

      (10) Figure 5C: The role of NEMF and ANKZF1 can be further clarified by conducting Annexin-PI assays using FACS. The inclusion of these additional data points will provide more robust evidence for CAT-tailing's role in cancer cells.

      In response to your suggestion, we have incorporated additional data into the revised version.

      Using the Annexin-PI kit, we labeled apoptotic cells and detected them using flow cytometry (FACS). Our findings indicate that anisomycin pretreatment, NEMF knockdown (sgNEMF), and ANZKF1 upregulation (oeANKZF1) significantly increase the rate of STS-induced apoptosis compared to the control group (refer to new Fig. S9D-G).

      (11) Figure 5F: STS is a known apoptosis inhibitor. Why it is not showing PARP cleavage?

      Also, cell death analysis would be more pronounced, if it could be shown at a later time point. What is the STS and Anisomycin at 24h or 48h time-point? Since PARP is cleaved, it would also be better if the authors could include caspase blots.

      I guess what you meant to say here is "Staurosporine is a protein kinase inhibitor that can induce apoptosis in multiple mammalian cell lines." Our study observed PARP cleavage even in GSCs, which are typically more resistant to staurosporine-induced apoptosis (C-PARP in Fig. S9B). The ratio of C-PARP to total PARP increased. We selected a 180-minute treatment duration because longer treatments with STS + anisomycin led to a late stage of apoptosis and non-specific protein degradation (e.g., at 24 or 48 hours), making PARP comparisons less meaningful. Following your suggestion, we also examined caspase 3/7 activity in GSC cells treated with DMSO, CHX, and anisomycin. We found that anisomycin treatment also activated caspases (Fig. S9A).

      (12) In Figure 5, the addition of an explanation, how CAT-tailing can induce cell death, would add more information such as BAX-BCL2 ratio, and cytochrome-c release from the mitochondria.

      Thank you for your suggestion. In this study, we state that specific CAT-tails inhibit GSC cell death/apoptosis rather than inducing it. Therefore, we do not expect that examining BAX-BCL2 and mitochondrial cytochrome c release would offer additional insights.

      (13) To confirm the STS resistance, it would be better if the author could do the experiments in the STS-resistant cell line and then perform the Anisomycin experiments.

      Thank you. We should emphasize that our data primarily originates from GSC cells. These cells already exhibit STS-resistance when compared to the control cells (Fig. S8A-C).

      (14) It would be more advantageous if the author could show ATP5 CATailed status under standard chemotherapy conditions in either cell lines or in vivo conditions.

      This is an interesting question. It's worth exploring this question; however, GSC cells exhibit strong resistance to standard chemotherapy treatments like temozolomide (TMZ).

      Additionally, we couldn't detect changes in CAT-tailed ATP5⍺ and thus did not include that data.

      (15) In vivo (cancer mouse model or cancer fly model) data will add more weight to the story.

      We appreciate your intriguing question. An effective approach would be to test the RQC pathway's function using the Drosophila Notch overexpression-induced brain tumor model. However, Khaket et al. have conducted similar studies, stating, "The RNAi of Clbn, VCP, and Listerin (Ltn), homologs of key components of the yeast RQC machinery, all attenuated NSC over-proliferation induced by Notch OE (Figs. 5A and S5A–D, G)." This data supports our theory, and we have incorporated it into the Discussion. While the mouse model more closely resembles the clinical setting, it is not covered by our current IACUC proposal. We intend to verify this hypothesis in a future study.

      Reference:

      Khaket TP, Rimal S, Wang X, Bhurtel S, Wu YC, Lu B. Ribosome stalling during c-myc translation presents actionable cancer cell vulnerability. PNAS Nexus. 2024 Aug 13;3(8):pgae321.

      Reviewer #2 (Recommendations For The Authors):

      Figure 1B, C: To demonstrate that Globin, ATP5alpha, and C-130 are CAT-tailed, it is necessary to show that the high mobility band disappears after NEMF deletion or mutagenesis of the NFACT domain of NEMF. This can be done in a cell line. The anisomycin experiment is not convincing because the intensity of the bands drops and because no control is done to show that the effects are not due to translation inhibition (e.g. cycloheximide, which inhibits translation but not CAT tailing). Establishing ATP5alpha as a bonafide RQC substrate and CAT-tailed protein is critical to the relevance of the rest of the paper.

      Thank you for suggesting this crucial control experiment.

      To confirm the observed signal is indeed a bona fide CAT-tail, it's essential to demonstrate that NEMF is necessary for the CAT-tailing process. We have incorporated data from NEMF knockdown (sgNEMF) and cycloheximide treatment into the revised manuscript. Our findings show that both sgNEMF and anisomycin treatment effectively inhibit the formation of CAT-tailing signals on the reporter protein (Fig. 1C). Similarly, NEMF knockdown in a GSC cell line also effectively eliminated CAT-tails on overexpressed ATP5⍺ (Fig. S2B).

      In general, the text should be weakened to reflect that conclusions were largely gleaned from artificial CAT tails made of AT repeats rather than endogenously CAT-tailed ATP5alpha. CAT tails could have other sequences or be made of pure alanine, as has been suggested by some studies.

      Thank you for your reminder. We have reviewed the recent studies by Khan et al. and Chang et al., and we found their analysis of CAT tail components to be highly insightful. We concur with your suggestion regarding the design of the CAT tail sequence. We aimed to design a tail that maintained stability and resisted rapid degradation, regardless of its length. In the revised version, we clarify that our conclusions are based on artificial CAT tails, specifically those composed of AT repeat sequences (p. 9). We acknowledge that the presence of other sequence components may lead to different outcomes (p. 19).

      Reference:

      Khan D, Vinayak AA, Sitron CS, Brandman O. Mechanochemical forces regulate the composition and fate of stalled nascent chains. bioRxiv [Preprint]. 2024 Oct 14:2024.08.02.606406. Chang WD, Yoon MJ, Yeo KH, Choe YJ. Threonine-rich carboxyl-terminal extension drives aggregation of stalled polypeptides. Mol Cell. 2024 Nov 21;84(22):4334-4349.e7. 

      Throughout the work (e.g. 3B, C), anisomycin effects should be compared to those with cycloheximide to observe if the effects are specific to a CAT tail inhibitor rather than a translation inhibitor.

      We agree that including cycloheximide control experiments is crucial. The revised version now incorporates new data, as depicted in Fig. S5A, B, illustrating alterations in the on/off state of MPTP following cycloheximide treatment. Furthermore, Fig. S6A, B present changes in Calcium Retention Capacity (CRC) under cycloheximide treatment. The consistency of results across these experiments, despite cycloheximide treatment, suggests that anisomycin's role is specifically as a CAT tail inhibitor, rather than a translation inhibitor.

      Line 110, it is unclear what "short-tailed ATP5" is. Do you mean ATP5alpha-AT3? If so this needs to be introduced properly. Line 132: should say "may indicate accumulation of CAT-tailed protein" rather than "imply".

      We acknowledge your points. We have clarified that the "short-tailed ATP5α" refers to ATP5α-AT3 and incorporated the requested changes into the revised manuscript.

      Figure 1C: how big are those potential CAT-tails (need to be verified as mentioned earlier)?

      They look gigantic. Include a ladder.

      In the revised Fig. 1D, molecular weight markers have been included to denote signal sizes. The aggregates in the previous Fig. 1C, also present in the control plasmid, are likely a result of signal overexposure. The CAT-tailed protein is observed just above the intended band in these blots. These aggregates have been re-presented in the updated figures, and their signal intensities quantified.

      Line 170: "indicating that GBM cells have more capability to deal with protein aggregation".

      This logic is unclear. Please explain.

      We appreciate your question and have thoroughly re-evaluated our conclusion. We offer several potential explanations for the data presented in Fig. 1D: (1) ATP5α-AT20 may demonstrate superior stability. (2) GSC (GBM) cells might lack adequate mechanisms to monitor protein accumulation. (3) GSC (GBM) cells could possess an increased adaptive capacity to the toxicity arising from protein accumulation. This discussion has been incorporated into the revised manuscript (lines 166-169).

      Line 177: how do you know the endogenous ATP5alpha forms aggregates due to CAT-tailing? Need to measure in a NEMF hypomorph.

      We understand your concern and have addressed it. Revised Fig. 3G, H demonstrates that a reduction in NEMF levels, achieved through sgNEMF in GSC cells, significantly diminishes ATP5α aggregation. This, in conjunction with the Anisomycin treatment data presented in revised Fig. 3E, F, confirms the substantial impact of the CAT-tailing process on this aggregation.

      Line 218: really need a cycloheximide or NEMF hypomorph control to show this specific to CAT-tailing.

      We have revised the manuscript to include data from sgNEMF and cycloheximide treatments, specifically Fig. 3G, H, and Fig. S5C, D, as detailed in our response above.

      Lines 249,266, Figure 5A: The mentioned experiments would benefit from controls including an extension of ATP5alpha that was not alanine and threonine, perhaps a gly-ser linker, as well as an NEMF hypomorph.

      We sincerely appreciate your insightful comments. In response, the revised manuscript now incorporates control data for ATP5α featuring a poly-glycine-serine (GS) tail. This data is specifically presented in Figs. S2E-G, S4E, S7A, D, E, and S8F, G. Our experimental findings consistently demonstrate that the overexpression of ATP5α, when modified with GS tails, had no discernible impact on protein aggregation, mitochondrial membrane potential, GSC cell mobility, or any other indicators assessed in our study.

      Figure S5A should be part of the main figures and not in the supplement.

      This has been moved to the main figure (Fig. 5C).

    1. Reviewer #2 (Public review):

      This study provides some interesting observations on how different flavour e-cigarettes can affect lung immunology; however, there are numerous flaws, including a low replicate number and a lack of effective validation methods, meaning findings may not be repeated. This is a revised article but several weaknesses remain related to the analysis and interpretation of the data.

      Strengths:

      The strength of the study is the successful scRNA-seq experiment which gives some preliminary data that can be used to create new hypotheses in this area.

      Weaknesses:

      Although some text weaknesses have been addressed since resubmission, other specific weaknesses remain: The major weakness is the n-number and analysis methods. Two biological n per group is not acceptable to base any solid conclusions. Any validatory data was too little (only cell % data) and not always supporting the findings (e.g. figure 3D does not match 3B/4A). Other examples include:

      (1) There aren't enough cells to justify analysis - only 300-1500 myeloid cells per group with not many of these being neutrophils or the apparent 'Ly6G- neutrophils'

      (2) The dynamic range of RNA measurement using scRNAseq is known to be limited - how do we know whether genes are not expressed or just didn't hit detection? This links into the Ly6G negative neutrophil comments, but in general the lack of gene expression in this kind of data should be viewed with caution, especially with a low n number and few cells. The data in the entire paper is not strong enough to base any solid conclusion - it is not just the RNA-sequencing data.

      (3) There is no data supporting the presence of Ly6G negative neutrophils. In the flow cytometry only Ly6G+ cells are shown with no evidence of Ly6G negative neutrophils (assuming equal CD11b expression). There is no new data to support this claim since resubmission and the New figures 4C and D actually show there are no Ly6G negative cells - the cells that the authors deem Ly6G negative are actually positive - but the red overlay of S100A8 is so strong it blocks out the green signal - looking to the Ly6G single stains (green only) you can see that the reported S100A8+Ly6G- cells all have Ly6G (with different staining intensities).

      (4) Eosinophils are heavily involved in lung macrophage biology, but are missing from the analysis - it is highly likely the RNA-sequence picked out eosinophils as Ly6G- neutrophils rather than 'digestion issues' the authors claim

      (5) After author comments, it appears the schematic in Figure 1A is misleading and there are not n=2/group/sex but actually only n=1/group/sex (as shown in Figure 6A). Meaning the n number is even lower than the previous assumption.

    2. Reviewer #3 (Public review):

      This work aims to establish cell-type specific changes in gene expression upon exposure to different flavors of commercial e-cigarette aerosols compared to control or vehicle. Kaur et al. conclude that immune cells are most affected, with the greatest dysregulation found in myeloid cells exposed to tobacco-flavored e-cigs and lymphoid cells exposed to fruit-flavored e-cigs. The up- and down-regulated genes are heavily associated with innate immune response. The authors suggest that a Ly6G-deficient subset of neutrophils is found to be increased in abundance for the treatment groups, while gene expression remains consistent, which could indicate impaired function. Increased expression of CD4+ and CD8+ T cells along with their associated markers for proliferation and cytotoxicity is thought to be a result of activation following this decline in neutrophil-mediated immune response.

      Strengths:

      - Single cell sequencing data can be very valuable in identifying potential health risks and clinical pathologies of lung conditions associated with e-cigarettes considering they are still relatively new.

      - Not many studies have been performed on cell-type specific differential gene expression following exposure to e-cig aerosols.

      - The assays performed address several factors of e-cig exposure such as metal concentration in the liquid and condensate, coil composition, cotinine/nicotine levels in serum and the product itself, cell types affected, which genes are up- or down-regulated and what pathways they control.

      - Considerations were made to ensure clinical relevance such as selecting mice whose ages corresponded with human adolescents so that data collected was relevant.

      Weaknesses:

      - The exposure period of 1 hour a day for 5 days is not representative of chronic use and this time point may be too short to see a full response in all cell types. The experimental design is not well-supported based on the literature available for similar mouse models. Clinical relevance of this short exposure remains unclear.

      - Several claims lack supporting evidence or use data that is not statistically significant. In particular, there were no statistical analyses to compare results across sex, so conclusions stating there is a sex bias for things like Ly6G+ neutrophil percentage by condition are observational.

      - Overall, the paper and its discussion are relatively surface-level and do not delve into the significance of the findings or how they fit into the bigger picture of the field. It is not clear whether this paper is intended to be used as a resource for other researchers or as an original research article.

      - The manuscript has some validation of findings but not very comprehensive.

      This paper provides a strong foundation for follow-up experiments that take a closer look at the effects of e-cig exposure on innate immunity. There is still room to elaborate on the differential gene expression within and between various cell types.

      Comments on revisions:

      The reviewers have addressed major concerns with better validation of data and improved organization of the paper. However, we still have some concerns and suggestions pertaining to the statistical analyses and justifications for experimental design.

      - We appreciate the nuance of this experimental design, and the reviewers have adequately commented on why they chose nose-only exposure over whole body exposure. However, the justification for the duration of the exposure, and the clinical relevance of a short exposure, have not been addressed in the revised manuscript.

      - The presentation of cell counts should be represented by a percentage/proportion rather than a raw number of cells. Without normalization to the total number of cells, comparisons cannot be made across groups/conditions. This comment applies to several figures.

      - We appreciate that the authors have taken the reviewers' advice to validate their findings. However, we have concerns regarding the immunofluorescent staining shown in Figure 4. If the red channel is showing a pan-neutrophil marker (S100A8) and the green channel is showing only a subset of neutrophils (LY6G+), then the green channel should have far less signal than the red channel. This expected pattern is not what is shown in the figure, with the Ly6G marker apparently showing more expression than S100A8. Additionally, the FACS data states that only 4-5% of cells are neutrophils, but the red channel co-localizes with far more than 4-5% of the DAPI stain, meaning this population is overrepresented, potentially due to background fluorescence (noise). In addition, some of the shapes in the staining pattern do not look like true neutrophils, although it is difficult to tell because there remains a lot of background staining. The authors need to verify that their S100A8 and Ly6G antibodies work and are specific to the populations they intend to target. It is possible that only the brightest spots are truly S100A8+ or Ly6G+.

      - Paraffin sections do not always yield the best immunostaining results and the images themselves are low magnification and low resolution.

      - Please change the scale bars to white so they are more visible in each channel.

      - We appreciate that this is a preliminary test used as a resource for the community, but there is interesting biology regarding immune cells that warrants DEG analysis by the authors. This computational analysis can be easily added with no additional experiments required.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors tackled the public concern about E-cigarettes among young adults by examining the lung immune environment in mice using single-cell RNA sequencing, discovering a subset of Ly6G- neutrophils with reduced IL-1 activity and increased CD8 T cells following exposure to tobaccoflavored e-cigarettes. Preliminary serum cotinine (nicotine metabolite) measurements validated the effective exposure to fruit, menthol, and tobacco-flavored e-cigarettes with air and PG:VG serving as control groups. They also highlighted the significance of metal leaching, which fluctuated over different exposure durations to flavored e-cigarettes, underscoring the inherent risks posed by these products. The scRNAseq analysis of e-cig exposure to flavors and tobacco demonstrated the most notable differences in the myeloid and lymphoid immune cell populations. Differentially expressed genes (DEGs) were identified for each group and compared against the air control. Further subclustering revealed a flavor-specific rise in Ly6G- neutrophils and heightened activation of cytotoxic T cells in response to tobacco-flavored e-cigarettes. These effects varied by sex, indicating that immune changes linked to e-cig use are dependent on gender. By analyzing the expression of various genes and employing gene ontology and gene enrichment analysis, they identified key pathways involved in this immune dysregulation resulting from flavor exposure. Overall, this study affirmed that e-cigarette exposure can suppress the neutrophil-mediated immune response, subsequently enhancing T cell toxicity in the lung tissue of mice.

      Strengths:

      This study used single-cell RNA sequencing to comprehensively analyze the impact of e-cigarettes on the lung. The study pinpointed alterations in immune cell populations and identified differentially expressed genes and pathways that are disrupted following e-cigarette exposure. The manuscript is well written, the hypothesis is clear, the experiments are logically designed with proper control groups, and the data is thoroughly analyzed and presented in an easily interpretable manner. Overall, this study suggested novel mechanisms by which e-cigs impact lung immunity and created a dataset that could benefit the lung immunity field.

      Weaknesses:

      The authors included a valuable control group - the PG:VG group, since PG:VG is the foundation of the e-liquid formulation. However, most of the comparative analyses use the air group as the control. Further analysis comparing the air group to the PG:VG group, and the PG:VG group to the individual flavored e-cig groups will provide more clear insights into the true source of irritation. This is done for a few analyses but not consistently throughout the paper. Flavor-specific effects should be discussed in greater detail. For example, Figure 1E shows that the Fruit flavor group exhibits more severe histological pathology, but similar effects were not corroborated by the singlecell data.

      We thank the reviewer for this query. We agree that PG:VG group is the foundation of the e-liquid formulation and hence comparisons with this group are of significance to understand the effect of individual flavors on the cell population. Though we compared the flavored e-cig groups with PG:VG group, we did not discuss it in detail within the manuscript to avoid confusions in interpretation for this study. However, we have now included the comparisons with the PG:VG group as a Supplement File S13-S18 in our revised manuscript to facilitate proper interpretation of our omics data to interested readers.

      While we agree that flavor-specific effects might be of interest, we did not delve into exploring them in detail as the fruit flavor e-liquids have now been regulated/banned from sale in the US. Thus, from regulatory point of view, the effects of tobacco-flavored e-liquids hold most interest. Since at the time of conducting this study, fruit flavors were in the market, we have still included the data. However, studying it further was not the focus of this work.

      The characterization of Ly6g+ vs Ly6g- neutrophils is interesting and potentially very impactful. Key results like this from scRNAseq analyses should be validated by qPCR and flow cytometry.

      Also, a recent study by Ruscitti et al reported Ly6g+ macrophages in the lung which can potentially confound the cell type analysis. A more detailed marker gene and sub-population analysis of the myeloid clusters could rule out this potential confounding factor.

      We agree with the reviewer that the loss of Ly6G on neutrophils is a very interesting finding and we have designed a neutrophil specific experiment to study the impact of e-cig exposure on neutrophil maturation and function which will be discussed in subsequent work by our group. To address the concerns raised by the reviewer, we stained the lung tissue samples from air-and tobacco flavored e-cig aerosol exposed mouse lungs with Ly6G and S100A8 (universal marker for neutrophil) to see the infiltration of Ly6G+ vs Ly6G- neutrophils within the lungs of exposed and unexposed mice. Results from this study showed that exposure to tobacco-flavored e-cig aerosol affects the neutrophil population within the mouse lungs. In fact, the changes were more pronounced for female mice. The data have now been shown in Figure 4.

      Reviewer #2 (Public review):

      This study provides some interesting observations on how different flavors of e-cigarettes can affect lung immunology, however there are numerous flaws including a low number of replicates and a lack of effective validation methods which reduces the robustness and rigor of the findings.

      Strengths:

      The strength of the study is the successful scRNA-seq experiment which gives good preliminary data that can be used to create new hypotheses in this area.

      Weaknesses:

      The major weakness is the low number of replicates and the limited analysis methods. Two biological n per group is not acceptable to base any solid conclusions. Any validatory data was too little (only cell % data) and did not always support the findings (e.g. Figure 4D does not match 4C). Often n seems to be combined and only one data point is shown, it is not at all clear how the groups were analyzed and how many cells in each group were compared.

      We thank the reviewer for recognizing the strengths of this manuscript while pointing out the errors to allow us to improve our analyses. We understand that the low number of replicates in this work makes the analyses difficult to draw solid conclusions, but this was a pilot study to identify the changes in the mouse lung upon acute exposures to flavored e-cig aerosols at a single cell level. So far, the e-cig field has been primarily focused on conducting toxicological studies to help regulatory bodies to set standards and enforce laws to better regulate the manufacture, sale and distribution of e-cig products. However, adolescents and young adults are still getting access to these products, and there is little to no understanding of how this may affect the lung health upon acute and chronic exposures. Single cell technology is a powerful tool to analyze the gene expression changes within cell populations to study cell heterogeneity and function. Yet, it is a costly tool owing to which conducting such analyses on large sample sizes is not ideal. This pilot study was designed to get some initial leads for our future studies involving larger sample sizes and chronic exposures. However, due to the vast information that is provided by a single cell RNA sequencing experiment, we intend to share it with a larger audience to support research and further study in this area. We understand that the validations are limited in our current work and so we have now conducted coimmunostaining to validate the Ly6G+ and Ly6G- neutrophil population. We have now included single cell findings with the validating experiments using classical methods of experimentation including ELISA, immunostaining or flow cytometry and revamped the whole manuscript. However, it is important to mention that such validations are sometimes challenging as many of these techniques still investigate the tissue while the changes shown in single cell analyses are mainly pertaining to a single cell type. This could be well-understood by looking at the flow cytometry results for neutrophils where we use Ly6G as a marker to stain for neutrophils which is only found in mature neutrophil population.

      Only 71,725 cells mean only 7,172 per group, which is 3,586 per animal - how many of these were neutrophils, T-cells, and macrophages? This was not shown and could be too low.

      We do agree that the number of cells could be too low. To avoid this, we did not study gene expression variations at the finest level of cell identity. We classified the cell clusters into general annotations -myeloid, lymphoid, endothelial, stromal and epithelial- and identified the changes in the gene expressions. Of these, only two clusters (myeloid and lymphoid) with more than ~1000 cells per cell type per group were studied in detail. We have included the cell count information to allow better interpretation of our results in the revised manuscript. For a single cell point of view, a cell count of ~3500 each with over 20000 features (genes) has good statistical strength and merit in our opinion.

      The dynamic range of RNA measurement using scRNA seq is known to be limited - how do we know whether genes are not expressed or just didn't hit detection? This links into the Ly6G negative neutrophil comment, but in general, the lack of gene expression in this kind of data should be viewed with caution, especially with a low n number and few cells.

      This is a well-taken point, and we thank the reviewer for this comment. We agree that the dynamic range RNA measurement is limited low cell numbers that could lead to bias. However, none of the clusters with counts lower than 150 were included for differential gene analyses. To avoid confusion, we now show immunofluorescence results to validate the findings. We are certain that with the inclusion of these validation experiments, will convince the reviewer about the loss of Ly6G marker from neutrophils and lack of proper neutrophilic response in exposed mouse lungs as compared to the controls.

      There is no rigorous quantification of Ly6G+ and Ly6G- cells int he flow cytometry data.

      We understand that flow-based quantification of our scRNA seq findings would be interesting. However, flow cytometry and single cell suspension to perform sequencing were performed parallelly for this study. We used a basic flow panel using single markers to identify individual immune cell type. We did identify changes in the Ly6G population in our treated and control samples using scRNA seq and intend to exclude it as a marker for our future studies using flow cytometry. Unfortunately, the same analyses could not be performed for the current batch of samples. We have now included results from IHC staining to identify the Ly6G+ and Ly6G- population in the lung tissues from control and treated mice in revised manuscript to address some of the concerns raised here. 

      Eosinophils are heavily involved in lung biology but are missing from the analysis.

      We use RBC lysis buffer to remove the excess RBCs during lung digestion for preparation of single cell suspension for scRNA seq in this study. Reports suggest that RBC lysis could adversely affect the eosinophil number and function. We did not identify any cell cluster, representing markers for eosinophils through our scRNA seq data and we believe that our lung digestion protocol could be the reason for it. We have studied the eosinophil changes through flow cytometry in these samples and have found significant changes as well. However, due to our inability to find cell clusters for eosinophil through scRNA seq data, we did not include these results in the final manuscript previously. To avoid confusion and maintain transparency, we have now included the changes in eosinophils through flow cytometry in revised manuscript (Figure S4).

      The figures had no titles so were difficult to navigate.

      We have now revamped the figures to make it easier for the readers to navigate.

      PGVG is not defined and not introduced early enough.

      We have made the necessary changes in the revised manuscript.

      Neutrophils are not well known to proliferate, so any claims about proliferation need to be accompanied by validation such as BrdU or other proliferation assays.

      We have now removed the cell cycle scoring information from the revised manuscript. Performing BrDU assay was not possible for these tissues due to limited samples and resources. However, we may consider performing it in our future studies.

      It was not clear how statistics were chosen and why Table S2 had a good comparison (two-way ANOVA with gender as a variable) but this was not used for other data particularly when looking at more functional RNA markers (Table S2 also lacks the interaction statistic which is most useful here).

      We have now included the two-way ANOVA statistics (Supplementary File S3) for other data included in the revised manuscript. It is important to note that since we did not identify any significant changes upon two-way ANOVA, the interaction statistics were not available for the abovementioned statistical test. We have included the interaction information wherever available.

      Many statistics are only vs air control, but it would be more useful as a flavor comparison to see these vs PGVG. In some cases, the carrier PGVG looks worse than some of the flavors (which have nicotine).

      While we agree with this comment of the reviewer, comparisons with PG:VG were not included due to the low cell numbers for PG:VG samples obtained following quality control and filtering of scRNA seq analyses.  However, considering the reviewer’s question we still include the details of comparisons with PG:VG included as supplementary files S13-S18 in the revised manuscript.

      The n number is a large issue, but in Figures such as 4, 6, and 7 it could be a bigger factor. The number of significant genes identified has been determined by chance rather than any real difference, e.g. Is Il1b not identified in Fruit flavor vs air because there wasn't enough n, while in Air vs Tobacco, it randomly hit the significance mark. This is but an example of the problems with the analysis and conclusions.

      While we agree in part with the concern raised here. In our opinion, an omics study is not necessarily aimed at finding the changes at transcript level with absolute certainty, but rather to identify probable cell and gene targets to validate with subsequent work. We did not claim that our findings are absolute outcomes but rather add the limitation of sample number and need for further research at every step. The strength of this work is to be the first study of its kind looking at changes in the lung cell population at single cell level upon e-cig aerosol exposure. This study has provided us with interesting gene and cell targets that we are now validating with future work. We still strongly believe that a dataset like this is a useful resource for a wider audience.  

      The data in Figure 7A is confusing, if this is a comparison to air, then why does air vs air not equal 1? Even if this was the comparison to the average of air between males and females, then this doesn't explain why CCL12 is >1 in both. Is this z-score instead? Regardless the data is difficult to interpret in this format.

      We have now changed the format of data representation in the figure.

      Individual n was not shown for almost all experiments - e.g. Figure 1D - what is this representative of? Figure 2D - is this bulk-grouped data for all cells and all mice? The heatmaps are also pooled from 2n and don't show the variability.

      Wherever needed, the n number has been included in the Figure legend. Additionally, the n number is shown in Figure 1A. However, with respect to the second comment we would like to differ from the reviewer’s opinion. Each scRNA seq data had 2 samples – one for male and another for female which has been clearly shown in the current figures. The pooling of cells as mentioned in the comment happened at the stage of preparation of cell suspension from each sex/group at the start of the sequencing. We show the results of the pooled sample showing the variability amongst pooled samples, which we acknowledge is a shortcoming of our work. In terms of representation of the heat maps and data analyses we have included all the needed information to uphold transparency of our study design and data visualization for each figure and would like to stick to the current representations. However, validation cohort does not involve any pooling of sample and still agrees with most of the deductions made from this study. So we are confident that no over statements have been made in this work and we still provide a useful dataset to inform future research in this area.

      Reviewer #3 (Public review):

      This work aims to establish cell-type specific changes in gene expression upon exposure to different flavors of commercial e-cigarette aerosols compared to control or vehicle. Kaur et al. conclude that immune cells are most affected, with the greatest dysregulation found in myeloid cells exposed to tobacco-flavored e-cigs and lymphoid cells exposed to fruit-flavored e-cigs. The up-and-downregulated genes are heavily associated with innate immune response. The authors suggest that a Ly6G-deficient subset of neutrophils is found to be increased in abundance for the treatment groups, while gene expression remains consistent, which could indicate impaired function. Increased expression of CD4+ and CD8+ T cells along with their associated markers for proliferation and cytotoxicity is thought to be a result of activation following this decline in neutrophil-mediated immune response.

      Strengths:

      (1) Single-cell sequencing data can be very valuable in identifying potential health risks and clinical pathologies of lung conditions associated with e-cigarettes considering they are still relatively new.

      (2) Not many studies have been performed on cell-type specific differential gene expression following exposure to e-cig aerosols.

      (3) The assays performed address several factors of e-cig exposure such as metal concentration in the liquid and condensate, coil composition, cotinine/nicotine levels in serum and the product itself, cell types affected, which genes are up- or down-regulated and what pathways they control.

      (4)Considerations were made to ensure clinical relevance such as selecting mice whose ages corresponded with human adolescents so that the data collected was relevant.

      Weaknesses:

      The exposure period of 1 hour a day for 5 days is not representative of chronic use and this time point may be too short to see a full response in all cell types. The experimental design is not well-supported based on the literature available for similar mouse models.

      This study was not designed to study the effects of chronic exposures on lung tissues. We were interested in delineating the effect of acute exposures for which the proposed study design was chosen. Previous work by our group has performed similar exposures and has been well received by the community. We understand that chronic exposures will be interesting to look at, but that was beyond the scope of this pilot study. Longer / chronic exposures will be conducted considering disease modifying effects of e-cigarettes.

      Several claims lack supporting evidence or use data that is not statistically significant. In particular, there were no statistical analyses to compare results across sex, so conclusions stating there is a sex bias for things like Ly6G+ neutrophil percentage by condition are observational.

      We thank the reviewer for this observation, and we have now included the necessary validations and details of the sex-based statistical analyses in the revised version of this manuscript. 

      Statistical analyses lack rigor and are not always displayed with the most appropriate graphical representation.

      We thank the reviewer and have included all the necessary statistical details with more details in the revised manuscript.

      Overall, the paper and its discussion are relatively limited and do not delve into the significance of the findings or how they fit into the bigger picture of the field.

      As pointed out by the reviewers themselves the strength of this work is in the first ever scRNA seq analyses of mice exposed to differently flavored e-cig aerosols in vivo. We also show cellspecific differential gene expressions and address some of the major queries made around e-cig research including release of metals on a day-to-day basis from the same coil. The limited sample number makes it difficult to draw solid conclusions from this work, which has been discussed as a shortcoming. Nevertheless, the major strength of this work is not in identifying specific trends, but rather to determine the possible cell and gene targets to expand the study for longer (chronic) exposures with a larger sample group. We have mentioned the significance of the study with respect to vaping effects on cellular heterogeneity leading to deleterious effects.

      The manuscript lacks validation of findings in tissue by other methods such as staining.

      We have now included some validation experiments and revamped the revised manuscript to support scRNA seq findings.

      This paper provides a foundation for follow-up experiments that take a closer look at the effects of e-cig exposure on innate immunity. There is still room to elaborate on the differential gene expression within and between various cell types.

      We thank the reviewer for this observation. The cell numbers for some cell clusters (especially epithelial cells) were too low. So, though we have performed the differential gene expression analyses on all the cell clusters, we refrained from discussing it in the manuscript to avoid over interpretation of our results. Only clusters with high enough (> 150) cells per sex per group were used to plot the heatmaps. We have now included the cell numbers for each cell type in the revisions to allow better interpretation of our data. Furthermore, the raw data from this study will be freely available to the public upon publication of this manuscript. This would enable the interested readers to access the raw data and study the cell types of interest in detail based on their study requirements. This data will be a useful resource for all in this community to inform and design future studies. 

      Recommendation For The Author:

      Major comments

      Mouse experiments are extremely variable and an n of 2 is not enough. Because of the complexity of separating male and female mice, the analyses are not adequately powered to support conclusions. The two-way ANOVA style approach to consider sex as a separate variable was a great idea in Table S2 - but this was not used elsewhere, and there is a need to show the interaction statistic (which would say if there is a flavor effect dependent on sex).

      We thank the reviewers for this recommendation. We agree that the experiments are highly variable. However, it is not merely an outcome of a small sample size (which we address as one of the limitations). What is important to mention here is the fact that validating results from single cell technologies using regular molecular biology techniques is challenging and may not completely align. It is because we are comparing single cell population in the former and a heterogeneous cell population in latter. However, considering this comment, we have now toned down our conclusions and performed some extra experiments to validate single cell findings. We also provide the results from two-way ANOVA statistics for all the figures/experiments performed in this work. 

      More validatory data with PCR, immunostaining, and flow cytometry would be very helpful. This includes validating the neutrophil functional and phenotype data and the T-cell data by flow cytometry.

      To validate the presence of Ly6G+ and Ly6G- neutrophil population, we performed coimmunostaining experiments and proved that exposure to tobacco-flavored e-cig aerosols results in increase in cell percentages of two neutrophil population in female mice. We also re-analyzed our Flow cytometry data to align with scRNA seq results. Multiplex protein assay was another technique used to show altered innate/adaptive immune responses upon exposure to differently flavored e-cig aerosol. Of note, considering the short duration of exposure we did not identify significant changes in cell numbers or inflammatory responses. But we have now validated our scRNA seq results using various techniques to draw meaningful conclusions.

      The in vivo experimental design seems to model very short-term exposure. In the literature, including the papers cited in the references, much longer time points are used, extending from several weeks to months of exposure. There seem to be few examples of papers using 5-day exposure and those that do are inspired by traditional cigarette smoke rather than e-cig aerosols or model acute exposure by making the daily duration longer. It is important to consider the possibility that the greatest number of up- or down-regulated genes are found in immune cell populations solely because they are the first to be affected by e-cig exposure and the other cell types just do not have time to become dysregulated in 5 days.

      We thank the reviewers for this comment. We do not refute the fact that our observations of major changes in the immune cell population are due to the short duration of exposure. This was one of the first studies using single cell technologies to look at cell specific changes in the mouse lungs exposed to e-cig aerosols. However, the future experiments being conducted in our lab are using more controlled approach to mimic chronic exposures to e-cig aerosols to identify changes in other cell types and long-term effects of e-cig exposures in vivo. However, since this was not the focus of this work, we have not discussed it in detail.

      The validity of the claims pertaining to septal thickening and mean linear intercept (MLI) are questionable due to the poor lung inflation of the treatment group, which the authors acknowledge. Thus, MLI cannot be accurately used. It is contradictory to state that the fruit-flavored treatment group presented challenges with inflation but then concluded that there is a phenotype. In addition, inflation with low-melting agarose is not an ideal method because it does not use a liquid column to maintain constant pressure. For these metrics to be used and evaluated, it is imperative that all lobes are properly inflated. Therefore, these data should either be repeated or removed.

      We agree with this critique and have removed the MLI quantification from the revised manuscripts, we also do not make claims regarding much histological changes upon exposure. We suggest further work in future to get better understanding of the effect of differently flavored e-cig aerosol exposure on mouse lungs.

      What is the purpose of analyzing cell cycle scores? Why is it relevant that neutrophils are in G2M-phase? Figure 3B shows that neutrophils are clearly in both G1- and G2M-phase and this cluster includes both Ly6G+ and Ly6G- subsets, so it does not seem accurate to claim that they are in the G2M-phase of the cell cycle, nor does it reveal anything novel about Ly6G- neutrophils. Is it possible that the cell cycle score is noting a point in differentiation when neutrophils acquire/begin expressing Ly6G? Ly6G expression in neutrophils has been found to be associated with differentiation and maturation. To rule out the possibility that this is a cell state being identified, differential gene expression between the 2 neutrophil subsets should be shown in a volcano plot. It would also be useful to stain for Ly6G+/- neutrophils using either IF or RNAscope to prove they are present. If the claim is that Ly6G- neutrophils are a "unique" population, it must be established to what extent they are unique. Immune cells cluster together on UMAPs, so what if these are a different cell type entirely, like another immature myeloid lineage, and this is an artifact of clustering? This could be clarified with a trajectory analysis and further subsetting of the immune population.

      We thank the reviewers for this comment. We now realize that analyzing the cell cycle scores was not serving the intended purpose in this work. Moreover, due to the use of pooled samples for scRNA seq analyses, it may not be best to perform such downstream analyses in our datasets. We have thus removed these graphs from the revised version and have tried to simplify the conclusions of our study to the readers. 

      Our main take home from this study is the increase in number of mature (Ly6G+) and immature (Ly6G-) neutrophils in tobacco-flavored e-cig aerosol exposed mouse lungs as compared to air control. This result was validated using co-immunofluorescence in the revised manuscript (Figure 4).

      In vivo validation of findings should be included, especially for the claimed changes. As of now, this paper serves more as a dataset that could be further explored by other groups, which in itself is valuable, but it is just one single cell sequencing experiment without validation.

      We thank the reviewers for this comment. We have used multiple techniques (flow cytometry, multiplex protein assay, co-immunofluorescence) in the revised manuscript to validate the scRNA seq findings. However, this was a preliminary study which was designed to generate a small dataset for future experiments, and we do not have resources to add more validatory experiments for this study. We are currently designing chronic e-cig exposure studies to elaborate upon certain hypothesis generated through this study in future.

      Minor Comments

      There are several examples of typos or small errors in the text that would benefit from proofreading. Examples: line 51 "in the many countries including (the) United States (US), (the) United Kingdom..."; on line 54, the reference cited states that 9.4% of middle schoolers are daily users, not 9.2%; on line 55 the reference cited states that these are the most commonly used flavors, not the most preferred, which explains why the percentages do not add up to 100; line 120 "the lungs were in a collapsed state than the other groups"; line 127 "to confirm out speculations"; line 136 "PGVG" instead of the previously used "PG:VG"; line 140 "(single cell capture))"; line 999 "result in" rather than "results in" for Figure 4 title, etc.

      We thank the reviewer for this comment. The manuscript has been thoroughly proofread and edited to avoid typos and grammatical errors.

      If this is a "pilot study" (as it is stated in the introduction) it is meant to assess the validity of experimental design on a small scale to later test a hypothesis. The authors should change the phrasing.

      We have now changed the phrasing as suggested.

      The introduction lacked the necessary context and background. Some information described in the results section could be addressed in the intro. For example: What is the significance of neutrophils having a Ly6G deficiency? Why was the exposure duration of 1 hour a day for 5 days chosen? Why use nose-only exposure when many models use whole-body exposure? Why look at cell-type-specific changes?

      We have made the necessary amendments in the introduction.

      Some figure titles only address certain panels rather than summarizing the figure as a whole. For example, the title of Figure 1 only refers to panel D and is unrelated to serum cotinine levels, septa thickening, or mean linear intercept. The text discussed conclusions about septa thickening and Lm values for the fruit-flavored treatment group, so they are equally relevant to the figure compared to the metal levels.

      We have now changed the Figures and Figure legends to summarize the figure.

      significance level is not defined in Figure 1 legend although it is used in Figure 1C.

      The Figure legend has now been updated.

      Figure 1E does not include a scale bar.

      We have now included the scale bar in updated figures.

      The multiplex ELISA shown in the experimental design schematic is not further discussed in the paper. Flow cytometry plots should be displayed in addition to the data they generated.

      The flow cytometry plots have now been included (Figures 3&5) and the results for Multiplex ELISA are shown as Figure S3D and lines 327-342 of the revised manuscript.

      In Figure 1F, a multivariate ANOVA should be used so that multiple groups can be compared across sex, rather than plotting in a sex-specific manner and claiming there exists a sex bias. The small sample size also introduces an issue because a p-value cannot be generated with so few samples.

      Per the suggestions made previously, figure 1F has now been removed from the revised manuscript.

      The protocol for achieving a single-cell suspension should be detailed in the methods section. As is, it only describes the sample collection and preparation. This could help elucidate to the reader why the UMAP shows such a large abundance of immune cells.

      We have now included the protocol in the revised manuscript.

      Clarify whether PG:VG was used as a control in the scRNA sequencing in addition to air to generate the UMAP in Figure 2A.

      Yes, PG:VG was used as one of the controls which has now been illustrated as groupwise comparison in Figure 2D. We have also included the comparisons to identify DEGs in myeloid and lymphoid clusters upon comparison of various treatment groups versus PGVG (Supplementary Files S13-S18)

      A UMAP should be shown for each treatment group/flavor. The overall UMAP in Figure 1A is good, but there could be another panel with separate projections for each condition.

      A groupwise UMAP has now been included in Figure 2D.

      In Figure 2C, relative cell percentage is not a reliable method to quantify cell type and the histogram is not a great way to visualize the data or its statistical significance. These claims should also be validated in tissue.

      We thank the reviewers for this comment and have tried to validate the findings using Flow cytometry. However, we may want to add that the changes observed in single cell technologies cannot be validated using simple molecular biology techniques as the markers used to specify cell clusters in scRNA seq is too specific which was not the case for the design of flow panel in this work. Our major purpose of using cell percentages was to show the flavor-specific changes in generalized cell populations in mouse lungs. So, we have still included these graphs in the revised manuscript.

      Figure 2D could be better illustrated with a volcano plot to show which genes are being dysregulated rather than just how many. Knowing which genes are affected is more valuable than knowing just the number of genes.

      Figure 2D is no longer a part of the revised manuscript. For the other comparisons we have still used heatmaps as they also depict sex-specific changes in gene expressions, which would have been difficult to elucidate using volcano plots.

      Assuming Figure 3C is representative of all conditions, then Figures 3C and D demonstrate that Ly6G- neutrophils are present in all conditions including controls. To see whether they are truly present in different abundances between treatment and control groups, separate UMAPs of the neutrophil subsets should be made per condition or use a dot plot for Figure 3A. This also applies to Figure 3B.

      We thank the reviewers for pointing this out. We have now revamped the whole manuscript and used additional validation experiments to show the presence of Ly6G- and Ly6G+ neutrophil population upon exposure to tobacco-flavored e-cig aerosols. 

      Figure 3E shows that there is no statistically significant change in % of Ly6G+ neutrophils across treatment groups, but the text claims that there is "an increase in the levels of Ly6G+ neutrophils in lung digests from mouse lungs exposed to tobacco-flavored e-cig aerosols" (lines 207-209). The text also claims that "The observed increase was more pronounced in males as compared to females" (lines 209-210), but there was no statistical analysis across sexes to support this statement. It is clear that the change in % of Ly6G+ neutrophils is more pronounced in males than females, but it is still not statistically significant. This figure should also be repeated for analysis of Ly6G- neutrophils. Lines 272-274 mention that the % increase is higher for Ly6G- neutrophils than for Ly6G+ neutrophils, but there is not an analogous histogram to demonstrate this. The claims made in lines 275-280 are not clearly shown in any figure.

      We thank the reviewer for this query. This was an error on our part. We have now added sex-specific changes using scRNA seq, flow cytometry and co-immunofluorescence-based experiments to prove that more pronounces changes in the Ly6G+ and Ly6G- neutrophil population occurs in female mice and not males.

      Figures 4 and 6 have an overwhelming amount of heatmaps. Volcano plots with downstream analyses could be used to make some of this data more legible. The main findings should be validated in vivo/in tissue.

      We have now revamped the figures and data distribution to make the data legible and remove overwhelming amount of data from the slides.

      For Figure 5, show cell type by condition and do differential gene expression analysis displayed in a volcano plot. Then, stain tissue to validate the findings. Compare across sex during statistical analysis.

      The necessary changes have been made.

      Figure 6 error: panels E and F should be labeled as "tobacco" rather than "fruit".

      Error has now been fixed.

      Figure 7C can be placed in the supplemental materials.

      It has now been included in supplemental materials.

      The Figure 6E title should have been tobacco instead of fruit.

      This error has now been fixed.

      Line 381 mentioned the wrong subfigure. (Figure 7B instead of 7E).

      We have now made the necessary edits.

    4. eLife Assessment

      This manuscript by Kaur et al. identifies differential gene expression observed in distinct cell populations, namely myeloid and lymphoid cells, upon short-term exposure to e-cig aerosols with various flavors. Their findings are useful because they provide a single cell sequencing data resource for assessing which genes and cellular pathways are most affected by e-cig aerosols and their components. However, the evidence is incomplete due to limited analyses and replicates per condition, as well as the lack of in vivo validation.

    5. Reviewer #1 (Public review):

      Summary:

      The authors assess the impact of E-cigarette smoke exposure on mouse lungs using single cell RNA sequencing. Air was used as control and several flavors (fruit, menthol, tobacco) were tested. Differentially expressed genes (DEGs) were identified for each group and compared against the air control. Changes in gene expression in either myeloid or lymphoid cells were identified for each flavor and the results varied by sex. The scRNAseq dataset will be of interest to the lung immunity and e-cig research communities and some of the observed effects could be important. Unfortunately, the revision did not address the reviewers' main concerns about low replicate numbers and lack of validations. The study remains preliminary and no solid conclusions could be drawn about the effects of E-cig exposure as a whole or any flavor-specific phenotypes.

      Strengths:

      The study is the first to use scRNAseq to systematically analyze the impact of e-cigarettes on the lung. The dataset will be of broad interest.

      Weaknesses:

      scRNAseq studies may have low replicate numbers due to the high cost of studies but at least 2 or 3 biological replicates for each experimental group is required to ensure rigor of the interpretation. This study had only N=1 per sex per group and some sex-dependent effects were observed. This could have been remedied by validating key observations from the study using traditional methods such as flow cytometry and qPCR, but the limited number of validation experiments did not support the conclusions of the scRNAseq analysis. An important control group (PG:VG) had extremely low cell numbers and was basically not useful. Statistical analysis is lacking in almost all figures. Overall, this is a preliminary study with some potentially interesting observations but no solid conclusions can be made from the data presented.

      (1) The only new validation experiment is the immunofluorescent staining of neutrophils in Figure 4. The images are very low resolution and low quality and it is not clear which cells are neutrophils. S100A8 (calprotectin) is highly abundant in neutrophils but not strictly neutrophil-specific. It's hard to distinguish positive cells from autofluorescence in both Ly6g and S100a8 channels. No statistical analysis in the quantification.

      (2) It is unclear what the meaning of Fig. 3A and B is, since these numbers only reflect the number of cells captured in the scRNAseq experiment and are not biologically meaningful. Flow cytometry quantification is presented as cell counts, but the percentage of cells from the CD45+ gate should be shown. No statistical analysis is shown, and flow cytometry results do not support the conclusions of scRNAseq data.

    1. eLife Assessment

      This fundamental study uncovers the unique molecular features of Arabidopsis phloem companion cells that highly express FLOWERING LOCUS T (FT). These FT-expressing cells constitute a distinct subpopulation marked by elevated ATP biosynthesis and co-expression of small mobile proteins such as FLP1 and BFT, highlighting a fine balance between florigen and anti-florigen signals. Motif analyses and transgenic studies further identify NIGT1 transcription factors as direct, nitrogen-inducible repressors of FT, providing a mechanism for delayed flowering under nitrogen-rich conditions. Together, the compelling findings show that florigen-producing companion cells integrate energy metabolism, systemic protein signals, and nutrient-responsive repression to fine-tune the seasonal and nutritional regulation of flowering.

    2. Reviewer #1 (Public review):

      Summary:

      The authors revealed the cellular heterogeneity of companion cells (CCs) and demonstrated that the florigen gene FT is highly expressed in a specific subpopulation of these CCs in Arabidopsis. Through a thorough characterization of this subpopulation, they further identified NITRATE-INDUCIBLE GARP-TYPE TRANSCRIPTIONAL REPRESSOR 1 (NIGT1)-like transcription factors as potential new regulators of FT. Overall, these findings are intriguing and valuable, contributing significantly to our understanding of florigen and the photoperiodic flowering pathway. However, there is still room for improvement in the quality of the data and the depth of the analysis. I have several comments that may be beneficial for the authors.

      Strengths:

      The usage of snRNA-seq to characterize the FT-expressing companion cells (CCs) is very interesting and important. Two findings are novel: 1) Expression of FT in CCs is not uniform. Only a subcluster of CCs exhibits high expression level of FT. 2) Based on consensus binding motifs enriched in this subcluster, they further identify NITRATE-INDUCIBLE GARP-TYPE TRANSCRIPTIONAL REPRESSOR 1 (NIGT1)-like transcription factors as potential new regulators of FT.

      Weaknesses:

      (1) Title: "A florigen-expressing subpopulation of companion cells". It is a bit misleading. The conclusion here is that only a subset of companion cells exhibit high expression of FT, but this does not imply that other companion cells do not express it at all.

      (2) Data quality: Authors opted for fluorescence-activated nuclei sorting (FANS) instead of traditional cell sorting method. What is the rationale behind this decision? Readers may wonder, especially given that RNA abundance in single nuclei is generally lower than that in single cells. This concern also applies to snRNA-seq data. Specifically, the number of genes captured was quite low, with a median of only 149 genes per nucleus. Additionally, the total number of nuclei analyzed was limited (1,173 for the pFT:NTF and 3,650 for the pSUC2:NTF). These factors suggest that the quality of the snRNA-seq data presented in this study is quite low. In this context, it becomes challenging for the reviewer to accurately assess whether this will impact the subsequent conclusions of the paper. Would it be possible to repeat this experiment and get more nuclei?

      (3) Another disappointment is that the authors did not utilize reporter genes to identify the specific locations of the FT-high expressing cells (cluster 7 cells) within the CC population in vivo. Are there any discernible patterns that can be observed?

      (4) The final disappointment is that the authors only compared FT expression between the nigtQ mutants and the wild type. Does this imply that the mutant does not have a flowering time defect particularly under high nitrogen conditions?

      Comments on revisions:

      I think the authors took my comments seriously and addressed most of my concerns. Overall, I find this to be a very interesting paper.

    3. Reviewer #2 (Public review):

      This manuscript submitted by Takagi et al. details the molecular characterization of the FT-expressing cell at a single-cell level. The authors examined what genes are expressed specifically in FT-expressing cells and other phloem companion cells by exploiting bulk nuclei and single-nuclei RNA-seq and transgenic analysis. The authors found the unique expression profile of FT-expressing cells at a single-cell level and identified new transcriptional repressors of FT such as NIGT1.2 and NIGT1.4.

      Although previous researchers have known that FT is expressed in phloem companion cells, they have tended to neglect the molecular characterization of the FT-expressing phloem companion cells. To understand how FT, which is expressed in tiny amounts in phloem companion cells that make up a very small portion of the leaf, can be a key molecule in the regulation of the critical developmental step of floral transition, it is important to understand the molecular features of FT-expressing cells in detail. In this regard, this manuscript provides insight into the understanding of detailed molecular characteristics of the FT-expressing cell. This endeavor will contribute to the research field of flowering time.

      During the initial review process, I proposed the following two points for improving this manuscript:

      (1) The most noble finding of this manuscript is the identification of NTGI1.2 as the upstream regulator of FT-expressing cluster 7 gene expression. The flowering phenotypes of the nigtQ mutant and the transgenic plants in which NIGT1.2 was expressed under the SUC2 gene promoter support that NIGT1.2 functions as a floral repressor upstream of the FT gene. Nevertheless, the expression patterns of NIGT1.2 genes do not appear to have much overlap with those of NIGT1.2-downstream genes in the cluster 7 (Figs S14 and F3). An explanation for this should be provided in the discussion section.

      (2) To investigate gene expression in the nuclei of specific cell populations, the authors generated transgenic plants expressing a fusion gene encoding a Nuclear Targeting Fusion protein (NTF) under the control of various cell type-specific promoters. Since the public audience would not know about NTF without reading reference 16, some explanation of NTF is necessary in the manuscript. Please provide a schematic of the constructs the authors used to make the transformants.

      The revised manuscript has addressed my comments well. I am deeply grateful for the authors' efforts to address concerns raised by me and other reviewers.<br /> I have no doubt that the manuscript in its current form is worthy of publication in this journal and will provide valuable insights into flowering time for many readers.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The authors revealed the cellular heterogeneity of companion cells (CCs) and demonstrated that the florigen gene FT is highly expressed in a specific subpopulation of these CCs in Arabidopsis. Through a thorough characterization of this subpopulation, they further identified NITRATE-INDUCIBLE GARP-TYPE TRANSCRIPTIONAL REPRESSOR 1 (NIGT1)-like transcription factors as potential new regulators of FT. Overall, these findings are intriguing and valuable, contributing significantly to our understanding of florigen and the photoperiodic flowering pathway. However, there is still room for improvement in the quality of the data and the depth of the analysis. I have several comments that may be beneficial for the authors. 

      Strengths: 

      The usage of snRNA-seq to characterize the FT-expressing companion cells (CCs) is very interesting and important. Two findings are novel: 1) Expression of FT in CCs is not uniform. Only a subcluster of CCs exhibits high expression level of FT. 2) Based on consensus binding motifs enriched in this subcluster, they further identify NITRATE-INDUCIBLE GARP-TYPE TRANSCRIPTIONAL REPRESSOR 1 (NIGT1)-like transcription factors as potential new regulators of FT. 

      We are pleased to hear that reviewer 1 noted the novelty and importance of our work. As reviewer 1 mentioned, we are also excited about the identification of a subcluster of companion cells with very high FT expression. We believe that this work is an initial step to describe the molecular characteristics of these FT-expressing cells. We are also excited to share our new findings on NIGT1s as potential FT regulators. We believe this finding will attract a broader audience, as the molecular factor coordinating plant nutrition status with flowering time remains largely unknown despite its well-known phenomenon.

      Weaknesses: 

      (1) Title: "A florigen-expressing subpopulation of companion cells". It is a bit misleading. The conclusion here is that only a subset of companion cells exhibit high expression of FT, but this does not imply that other companion cells do not express it at all. 

      We agree with this comment, as it was not our intention to sound like that FT is not produced in other companion cells than the subpopulation we identified. We revised the title to more accurately reflect the point. The new title is “Companion cells with high florigen production express other small proteins and reveal a nitrogen-sensitive FT repressor.”

      (2) Data quality: Authors opted for fluorescence-activated nuclei sorting (FANS) instead of traditional cell sorting method. What is the rationale behind this decision? Readers may wonder, especially given that RNA abundance in single nuclei is generally lower than that in single cells. This concern also applies to snRNA-seq data. Specifically, the number of genes captured was quite low, with a median of only 149 genes per nucleus. Additionally, the total number of nuclei analyzed was limited (1,173 for the pFT:NTF and 3,650 for the pSUC2:NTF). These factors suggest that the quality of the snRNA-seq data presented in this study is quite low. In this context, it becomes challenging for the reviewer to accurately assess whether this will impact the subsequent conclusions of the paper. Would it be possible to repeat this experiment and get more nuclei?

      We appreciate this comment; we noticed that we did not clearly explain the rationale for using single-nucleus RNA sequencing (snRNA-seq) instead of single-cell RNA-seq (scRNA-seq). As reviewer 1 mentioned, RNA abundance in scRNA-seq is higher than in snRNA-seq. To conduct scRNA-seq using plant cells, protoplasting is the necessary step. However, in our study, protoplasting has many drawbacks in isolating our target cells from the phloem. First, it is technically challenging to efficiently isolate protoplasts from highly embedded phloem companion cells from plant tissues. Typically, at least several hours of enzymatic incubation are required to obtain protoplasts from companion cells (often using semi-isolated vasculatures), and the efficiency of protoplasting vasculature cells remains low. Secondly, for our analysis, restoring the time information within a day is also crucial. Therefore, we employed a more rapid isolation method. In the revision, we will explain our rationale for choosing snRNA-seq due to the technical limitations. In the revised manuscripts, we added four new sentences in the Introduction section to clearly explain these points.

      Reviewer 1 also raised a concern about the quality of our snRNA-seq data, referring to the relatively low readcounts per nucleus. Although we believe that shallow reads do not necessarily indicate low quality and are confident in the accuracy of our snRNA-seq data, as supported by the detailed follow-up experiments (e.g., imaging analysis in Fig. 4B), we agree that it is important to address this point in the revision and alleviate readers’ concerns regarding the data quality. 

      We believe the primary reason for the low readcounts per cell is the small amount of RNA present in each Arabidopsis vascular cell nucleus that we isolated. For bulk nuclei RNAseq, we collected 15,000 nuclei. However, the total RNA amount was approximately 3 ng. It indicates that each nucleus isolated contains a very limited amount of RNA (by the simple calculation, 3,000 pg / 15,000 nuclei = 0.2 pg/nucleus). It appears that the size of cells and nuclei was still small in 2-week-old seedlings; thus, each nucleus may contain lower levels of RNA. During the optimization process, we also tried to fix the tissues that we hoped to restore nuclear retained RNA, but unfortunately, in our hands, we encountered the technical issue of nuclei aggregation that hindered the sorting process, which is not suitable for single-nucleus RNA-seq.

      Reviewer 1 suggested that we repeat the same snRNA-seq experiment. We agree that having more cells increases the reliability of data. However, to our knowledge, higher cell numbers enhance the confidence of clustering, but not readcounts per cell. In our snRNAseq data, our target, FT-expressing cells, were observed in cluster 7, which projected at an obvious distance from other cell clusters. Therefore, we think that having more nuclei does not significantly help in separating high FT-expressing cluster 7 cells and different types of cells, although we may obtain more DEGs from the cluster 7 cells. Considering the costs and time required for additional snRNA-seq experiments, we think that adding more followup molecular biology experiment data would be more practical. We clearly stated the limitations of our approach in the Discussion section. “A drawback of our snRNA-seq analysis was shallow reads per nucleus. It appears mainly due to the low abundance of mRNA in nuclei from 2-week-old leaves. Based on our calculation, the average mRNA level per nucleus is approximately 0.2 pg (3,000 pg mRNA from 15,000 sorted nuclei). Future technological advance is needed to improve the data quality“

      In this revised version of the manuscript, we silenced FT gene expression using an amiRNA against FT driven by tissue-specific promoters [pROXY10, cluster 7; pSUC2, companion cells; pPIP2.6, cluster 4 (for the spatial expression pattern of PIP2.6, please see the new data shown in Fig. S8F); pGC1, guard cells]. Given that both FT and ROXY10 were highly expressed in cluster 7 of our snRNA-seq dataset, we anticipated the late flowering phenotype of pROXY10:amiRNA-ft. As we expected, pROXY10:amiR-ft but not pPIP2.6:amiR-ft lines showed delayed flowering phenotypes (Fig. S14A), supporting the validity of our snRNA-seq approach. We are also now more confident in the resolution of our snRNA-seq analysis, since cluster 4-specific PIP2.6 did not cause late flowering despite its higher basal expression than ROXY10 (Fig. S14B).

      (3) Another disappointment is that the authors did not utilize reporter genes to identify the specific locations of the FT-high expressing cells (cluster 7 cells) within the CC population in vivo. Are there any discernible patterns that can be observed? 

      In the original manuscript, as we showed only limited spatial images of overlap between FT and other cluster 7 genes in Fig. 4B, this comment is totally understandable. To respond to it, we added whole leaf images showing the spatial expression of FT and other cluster 7 genes (Fig. S12). These data indicate that cluster 7 genes including FT are expressed highly in minor veins in the distal part of the leaf but weakly in the main vein. We also added enlarged images of spatial expression of FT and cluster 7 genes (FLP1 and ROXY10) to note that those genes do not overlap completely (Fig. S13).

      In contrast to cluster 7 genes, genes highly expressed in cluster 4, such as LTP1 and MLP28, are reportedly highly expressed in the main leaf vein. To further confirm it, we established a transgenic line that expresses a GFP-fusion protein controlled by the promoter of a cluster 4-specific gene PIP2.6 (Fig. S8F). It also showed strong GFP signals in the main vein, consistent with previous observations of LTP1 and MLP28.   In summary, FT-expressing cells (cluster 7 cells) are enriched in companion cells in the minor vein, and their expression patterns show a clear distinction from genes expressed in the main vein (e.g., cluster 4-specific genes). 

      (4) The final disappointment is that the authors only compared FT expression between the nigtQ mutants and the wild type. Does this imply that the mutant does not have a flowering time defect particularly under high nitrogen conditions? 

      We agree with reviewer 1 that more experiments are required to conclude the role of NIGT1 on FT regulation, in addition to our Y1H data, flowering time data of NIGT1 overexpressors, and FT expression in NIGT1 overexpressors and nigtQ mutant.

      First, to test the direct regulation of NIGT1s on FT transcription, we conducted a transient luciferase (LUC) assay in tobacco leaves using effectors (p35S:NIGT1.2, p35S:NIGT1.4, and p35S:GFP) and reporters [pFT:LUC (FT promoter fused with LUC) and pFTm:LUC (the same FT promoter with mutations in NIGT1-binding sites fused with LUC)]. Our result showed that NIGT1.2 and NIGT1.4, but not GFP, decreased the activity of pFT:LUC but not pFTm:LUC (Fig. 5C). This indicates that NIGT1s directly repress the FT gene.

      Second, to address reviewer 1’s suggestion about the effect of of nigtQ mutation on flowering time, we have grown WT and nigtQ plants on 20 mM and 2 mM NH<sub>4</sub>NO<sub>3</sub>. Under 20 mM NH<sub>4</sub>NO<sub>3</sub>, the nigtQ line bolted at earlier days than WT; under 2 mM NH<sub>4</sub>NO<sub>3</sub>, nigtQ and WT bolted at almost same timing (Fig. S17D and E). This result suggests that the nigtQ mutation affects flowering timing depending on nitrogen nutrient status. However, leaf numbers of bolted plants were not different between WT and nigtQ lines (Fig. S17E). Therefore, it appears that nigtQ mutation also accelerated overall growth of plants rather than flowering promotion. We also have measured flowering time by counting leaf numbers of the nigtQ and WT plants at bolting on nitrogen-rich soil. The mutant generated slightly more leaves than WT when they flowered (Fig. S17G). These results suggest that the NIGT-derived fine-tuning of FT regulation is conditional on higher nitrogen conditions. 

      Minor: 

      (1) Abstract: "Our bulk nuclei RNA-seq demonstrated that FT-expressing cells in cotyledons and in true leaves differed transcriptionally.". This sentence is not informative. What exactly is the difference in FT-expressing cells between cotyledons and true leaves? 

      We modified the sentence to clarify the differences between cotyledons and true leaves. “Our bulk nuclei RNA-seq demonstrated that FT-expressing cells in cotyledons and true leaves showed differences especially in FT repressor genes.”

      (2) As a standard practice, to support the direct regulation of FT by NIGT1, the authors should provide EMSA and ChIP-seq data. Ideally, they should also generate promoter constructs with deletions or mutations in the NIGT1 binding sites. 

      To test direct interaction of NIGT1 to the FT promoter sequences, we performed the transient reporter assay using FT promoter driven luciferase reporter (Fig. 5C). NIGT1.2 and NIGT1.4 repressed the FT promoter activity; however, with NIGT1 binding site mutations, this repression was not observed, indicating that NIGT1 binds to the ciselements in the FT promoter to repress its transcription.

      (3) Sorting: Did the authors fix the samples before preparing the nuclei suspension? If not, could this be the reason the authors observed the JA-responsive clusters (Fig. 2J)? Please provide more details related to nuclei sorting in the Methods section. 

      We added a new subsection in the Materials and Methods section to explain a detail of the nuclei sorting procedure. We did not include a sample fixation step. We have tried formaldehyde fixation; however, it clumped nuclei, which was not suitable for snRNA-seq. Moreover, fixation steps generally reduce readcounts of single-cell RNA-seq according to the 10X Genomics’ guideline.

      We agree that JA responses were triggered during the FANS nuclei isolation. Therefore, we added the following sentence. “Since our FANS protocol did not include a sample fixation step to avoid clumping, these cells likely triggered wounding responses during the chopping and sorting process (Fig. S1B).  

      Reviewer #2 (Public review): 

      This manuscript submitted by Takagi et al. details the molecular characterization of the FTexpressing cell at a single-cell level. The authors examined what genes are expressed specifically in FT-expressing cells and other phloem companion cells by exploiting bulk nuclei and single-nuclei RNA-seq and transgenic analysis. The authors found the unique expression profile of FT-expressing cells at a single-cell level and identified new transcriptional repressors of FT such as NIGT1.2 and NIGT1.4. 

      Although previous researchers have known that FT is expressed in phloem companion cells, they have tended to neglect the molecular characterization of the FT-expressing phloem companion cells. To understand how FT, which is expressed in tiny amounts in phloem companion cells that make up a very small portion of the leaf, can be a key molecule in the regulation of the critical developmental step of floral transition, it is important to understand the molecular features of FT-expressing cells in detail. In this regard, this manuscript provides insight into the understanding of detailed molecular characteristics of the FT-expressing cell. This endeavor will contribute to the research field of flowering time. 

      We are grateful that reviewer 2 recognizes the importance of transcriptome profiling of FTexpressing cells at the single-cell level.

      Here are my comments on how to improve this manuscript. 

      (1) The most noble finding of this manuscript is the identification of NTGI1.2 as the upstream regulator of FT-expressing cluster 7 gene expression. The flowering phenotypes of the nigtQ mutant and the transgenic plants in which NIGT1.2 was expressed under the SUC2 gene promoter support that NIGT1.2 functions as a floral repressor upstream of the FT gene. Nevertheless, the expression patterns of NIGT1.2 genes do not appear to have much overlap with those of NIGT1.2-downstream genes in the cluster 7 (Figs S14 and F3). An explanation for this should be provided in the discussion section. 

      We agree with reviewer 2 that the spatial expression patterns of NIGT1.2 and cluster 7 genes do not overlap much, and some discussion should be provided in the manuscript. Although we do not have a concrete answer for this phenomenon, we obtained the new data showing that NIGT1.2 and NIGT1.4 directly repress the FT gene in planta (Fig. 5C).  As NIGT1.2/1.4 are negative regulators of FT, it is plausible that NIGT1.2/1.4 may suppress FT gene expression in non-cluster 7 cells to prevent the misexpression of FT. We added this point in the Results section.

      (2) To investigate gene expression in the nuclei of specific cell populations, the authors generated transgenic plants expressing a fusion gene encoding a Nuclear Targeting Fusion protein (NTF) under the control of various cell type-specific promoters. Since the public audience would not know about NTF without reading reference 16, some explanation of NTF is necessary in the manuscript. Please provide a schematic of constructs the authors used to make the transformants.

      As reviewer 2 pointed out, we lacked a clear explanation of why we used NTF in this study. NTF is the fusion protein that consists of a nuclear envelope targeting WPP domain, GFP, and a biotin acceptor peptide. It was initially designed for the INTACT (isolation of nuclei tagged in specific cell types) method, which enables us to isolate bulk nuclei from specific tissues. Although our original intention was to profile the bulk transcriptome of mRNAs that exist in nuclei of the FT-expressing cells using INTACT, we utilized our NTF transgenic lines for snRNA-seq analysis. To explain what NTF is to readers, we included a schematic diagram of NTF (Fig. S1A) and more explanation about NTF in the Results section.

      Again, we appreciate all reviewers’ careful and constructive comments. With these changes, we hope our revised manuscript is now satisfactory.

    1. eLife Assessment

      This manuscript presents an important finding that D1- and D2-striatal neurons receive distinct cortical inputs, offering key insights into corticostriatal function. For instance, in the context of striatal-dependent learning, this distinction is highly informative for interpreting synaptic physiology data, particularly when inputs to one neuron subtype may change independently of the other. The strength of the evidence is solid, with anatomical and electrophysiological findings aligning well with results from optogenetic and behavioral studies. The study would be of interest to neuroscientists studying basal ganglia circuits in health and disease.

    2. Joint Public Review:

      Summary:

      Klug et al. use monosynaptic rabies tracing of inputs to D1- vs D2-SPNs in the striatum to study how separate populations of cortical neurons project to D1- and D2-SPNs. They use rabies to express ChR2, then patch D1-or D2-SPNs to measure synaptic input. They report that cortical neurons labeled as D1-SPN-projecting preferentially project to D1-SPNs over D2-SPNs. In contrast, cortical neurons labeled as D2-SPN-projecting project equally to D1- and D2-SPNs. They go on to conduct pathway-specific behavioral stimulation experiments. They compare direct optogenetic stimulation of D1- or D2-SPNs to stimulation of MCC inputs to DMS and M1 inputs to DLS. In three different behavioral assays (open field, intra-cranial self-stimulation, and a fixed ratio 8 task), they show that stimulating MCC or M1 cortical inputs to D1-SPNs is similar to D1-SPN stimulation, but that stimulating MCC or M1 cortical inputs to D2-SPNs does not recapitulate the effects of D2-SPN stimulation (presumably because both D1- and D2-SPNs are being activated by these cortical inputs).

      Strengths:

      Showing these same effects in three distinct behaviors is strong. Overall, the functional verification of the consequences of the anatomy is very nice to see. It is a good choice to patch only from mCherry-negative non-starter cells in the striatum. This study adds to our understanding of the logic of corticostriatal connections, suggesting a previously unappreciated structure.

      Editors' note:

      The concerns raised by Reviewers #1, and #2, have been addressed during the first round of revision. The specific concern raised by Reviewer #3 is about the Rabis virus-based circuit tracing itself. This version of the work has been assessed by the editors without going back to the reviewers.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      Summary: 

      The study by Klug et al. investigated the pathway specificity of corticostriatal projections, focusing on two cortical regions. Using a G-deleted rabies system in D1-Cre and A2a-Cre mice to retrogradely deliver channelrhodopsin to cortical inputs, the authors found that M1 and MCC inputs to direct and indirect pathway spiny projection neurons (SPNs) are both partially segregated and asymmetrically overlapping. In general, corticostriatal inputs that target indirect pathway SPNs are likely to also target direct pathway SPNs, while inputs targeting direct pathway SPNs are less likely to also target indirect pathway SPNs. Such asymmetric overlap of corticostriatal inputs has important implications for how the cortex itself may determine striatal output. Indeed, the authors provide behavioral evidence that optogenetic activation of M1 or MCC cortical neurons that send axons to either direct or indirect pathway SPNs can have opposite effects on locomotion and different effects on action sequence execution. The conclusions of this study add to our understanding of how cortical activity may influence striatal output and offer important new clues about basal ganglia function. 

      The conceptual conclusions of the manuscript are supported by the data, but the details of the magnitude of afferent overlap and causal role of asymmetric corticostriatal inputs on some behavioral outcomes may be a bit overstated given technical limitations of the experiments. 

      For example, after virally labeling either direct pathway (D1) or indirect pathway (D2) SPNs to optogenetically tag pathway-specific cortical inputs, the authors report that a much larger number of "non-starter" D2-SPNs from D2-SPN labeled mice responded to optogenetic stimulation in slices than "non-starter" D1 SPNs from D1-SPN labeled mice did. Without knowing the relative number of D1 or D2 SPN starters used to label cortical inputs, it is difficult to interpret the exact meaning of the lower number of responsive D2-SPNs in D1 labeled mice (where only ~63% of D1-SPNs themselves respond) compared to the relatively higher number of responsive D1-SPNs (and D2-SPNs) in D2 labeled mice. While relative differences in connectivity certainly suggest that some amount of asymmetric overlap of inputs exists, differences in infection efficiency and ensuing differences in detection sensitivity in slice experiments make determining the degree of asymmetry problematic. 

      It is also unclear if retrograde labeling of D1-SPN- vs D2-SPN- targeting afferents labels the same densities of cortical neurons. This gets to the point of specificity in some of the behavioral experiments. If the target-based labeling strategies used to introduce channelrhodopsin into specific SPN afferents label significantly different numbers of cortical neurons, might the difference in the relative numbers of optogenetically activated cortical neurons itself lead to behavioral differences? 

      We thank the reviewer for the comments and for raising additional interpretations of our results. We agree that determining the relative number of D1- versus D2-SPN starter cells would allow a more accurate estimate of connectivity. However, due to current technical limitations, achieving this level of precision remains challenging. As the reviewer also noted, differences in the number of cortical neurons targeting D1- versus D2-SPNs could introduce additional complexity to the functional effects observed in the behavioral experiments. Moreover, functional heterogeneity is likely to exist not only among cortical neurons projecting to striatal D1- or D2-SPNs, but also within the striatal D1- and D2-SPN populations themselves. Addressing these questions at the single-neuron level will require more refined viral tools in combination with improved recording and manipulation techniques. Despite these limitations, our results suggest that a subpopulation of cortical neurons selectively targets striatal D1-SPNs, supporting a functional dichotomy of pathway-specific corticostriatal subcircuits in the control of behavior.   

      Reviewer #2 (Public review): 

      Summary: 

      Klug et al. use monosynaptic rabies tracing of inputs to D1- vs D2-SPNs in the striatum to study how separate populations of cortical neurons project to D1- and D2-SPNs. They use rabies to express ChR2, then patch D1-or D2-SPNs to measure synaptic input. They report that cortical neurons labeled as D1-SPN-projecting preferentially project to D1-SPNs over D2-SPNs. In contrast, cortical neurons labeled as D2-SPN-projecting project equally to D1- and D2-SPNs. They go on to conduct pathway-specific behavioral stimulation experiments. They compare direct optogenetic stimulation of D1- or D2-SPNs to stimulation of MCC inputs to DMS and M1 inputs to DLS. In three different behavioral assays (open field, intra-cranial self-stimulation, and a fixed ratio 8 task), they show that stimulating MCC or M1 cortical inputs to D1-SPNs is similar to D1-SPN stimulation, but that stimulating MCC or M1 cortical inputs to D2-SPNs does not recapitulate the effects of D2-SPN stimulation (presumably because both D1- and D2-SPNs are being activated by these cortical inputs). 

      Strengths: 

      Showing these same effects in three distinct behaviors is strong. Overall, the functional verification of the consequences of the anatomy is very nice to see. It is a good choice to patch only from mCherry-negative non-starter cells in the striatum. This study adds to our understanding of the logic of corticostriatal connections, suggesting a previously unappreciated structure. 

      Weaknesses: 

      One limitation is that all inputs to SPNs are expressing ChR2, so they cannot distinguish between different cortical subregions during patching experiments. Their results could arise because the same innervation patterns are repeated in many cortical subregions or because some subregions have preferential D1-SPN input while others do not. 

      Thank you for raising this thoughtful concern. It is indeed not feasible to restrict ChR2 expression to a specific cortical region using the first-generation rabies-ChR2 system alone. A more refined approach would involve injecting Cre-dependent TVA and RG into the striatum of D1- or A2A-Cre mice, followed by rabies-Flp infection. Subsequently, a Flp-dependent ChR2 virus could be injected into the MCC or M1 to selectively label D1- or D2-projecting cortical neurons. This strategy would allow for more precise targeting and address many of the current limitations.

      However, a significant challenge lies in the cytotoxicity associated with rabies virus infection. Neuronal health begins to deteriorate substantially around 10 days post-infection, which provides an insufficient window for robust Flp-dependent ChR2 expression. We have tested several new rabies virus variants with extended survival times (Chatterjee et al., 2018; Jin et al., 2024), but unfortunately, they did not perform effectively or suitably in the corticostriatal systems we examined.

      In our experimental design, the aim is to delineate the connectivity probabilities to D1 or D2-SPNs from cortical neurons. Our hypothesis considered includes the possibility that similar innervation patterns could occur across multiple cortical subregions, or that some subregions might show preferential input to D1-SPNs while others do not, or a combination of both scenarios. This leads us to perform a series behavior test that using optogenetic activation of the D1- or D2-projecting cortical populations to see which could be the case.

      In the cortical areas we examined, MCC and M1, during behavioral testing, there is consistency with our electrophysiological results. Specifically, when we stimulated the D1-projecting cortical neurons either in MCC or in M1, mice exhibited facilitated local motion in open field test, which is the same to the activation of D1 SPNs in the striatum along (MCC: Fig 3C & D vs. I; M1: Fig 3F & G vs. L). Conversely, stimulation of D2-projecting MCC or M1 cortical neurons resulted in behavioral effects that appeared to combine characteristics of both D1- and D2-SPNs activation in the striatum (MCC: Fig 3C & D vs. J; M1: Fig 3F & G vs. M). The similar results were observed in the ICSS test. Our interpretation of these results is that the activation of D1-projecting neurons in the cortex induces behavior changes akin to D1 neuron activation, while activation of D2-projecting neurons in the cortex leads to a combined effect of both D1 and D2 neuron activation. This suggests that at least some cortical regions, the ones we tested, follow the hypothesis we proposed.

      There are also some caveats with respect to the efficacy of rabies tracing. Although they only patch non-starter cells in the striatum, only 63% of D1-SPNs receive input from D1-SPN-projecting cortical neurons. It's hard to say whether this is "high" or "low," but one question is how far from the starter cell region they are patching. Without this spatial indication of where the cells that are being patched are relative to the starter population, it is difficult to interpret if the cells being patched are receiving cortical inputs from the same neurons that are projecting to the starter population. The authors indicate they are patching from mCherry-negative neurons within the region of the mCherry-positive neurons, but since the mCherry population will include both true starter cells and monosynaptically connected cells, this is not perfectly precise. Convergence of cortical inputs onto SPNs may vary with distance from the starter cell region quite dramatically, as other mapping studies of corticostriatal inputs have shown specialized local input regions can be defined based on cortical input patterns (Hintiryan et al., Nat Neurosci, 2016, Hunnicutt et al., eLife 2016, Peters et al., Nature, 2021). 

      This is a valid concern regarding anatomical studies. Investigating cortico-striatal connectivity at the single-cell level remains technically challenging due to current methodological limitations. At present, we rely on rabies virus-mediated trans-synaptic retrograde tracing to identify D1- or D2-projecting cortical populations. This anatomical approach is coupled with ex vivo slice electrophysiology to assess the functional connectivity between these projection-defined cortical neurons and striatal SPNs. This enables us to quantify connection ratios, for example, the proportion of D1-projecting cortical neurons that functionally synapse onto non-starter D1-SPNs.

      To ensure the robustness of our conclusions, it is essential that both the starter cells and the recorded non-starter SPNs receive comparable topographical input from the cortex and other brain regions. Therefore, we carefully designed our experiments so that all recorded cells were located within the injection site, were mCherry-negative (i.e., non-starter cells), and were surrounded by ChR2-mCherry-positive neurons. This configuration ensured that the distance between recorded and starter cells did not exceed 100 µm, maintaining close anatomical proximity and thereby preserving the likelihood of shared cortical innervation within the examined circuitry.

      These methodological details are also described in the section on ex vivo brain slice electrophysiology, specifically in the Methods section, lines 453–459:

      “D1-SPNs (eGFP-positive in D1-eGFP mice, or eGFP-negative in D2-eGFP mice) or D2-SPNs (eGFP-positive in D2-eGFP mice, or eGFP-negative in D1-eGFP mice) that were ChR2-mCherry-negative, but in the injection site and surrounded by cells expressing ChR2-mCherry were targeted for recording. This configuration ensured that the distance between recorded and starter cells did not exceed 100 µm, maintaining close anatomical proximity and thereby preserving the likelihood of shared cortical innervation within the examined circuitry.”

      This experimental strategy was implemented to control for potential spatial biases and to enhance the interpretability of our connectivity measurements.

      A caveat for the optogenetic behavioral experiments is that these optogenetic experiments did not include fluorophore-only controls, although a different control (with light delivered in M1) is provided in Supplementary Figure 3. Another point of confusion is that other studies (Cui et al, J Neurosci, 2021) have reported that stimulation of D1-SPNs in DLS inhibits rather than promotes movement. This study may have given different results due to subtly different experimental parameters, including fiber optic placement and NA.

      We appreciate the reviewer’s thoughtful evaluation and comments. We have added a short discussion of Cui et al.’s study on optogenetic stimulation of D1-SPNs in the DLS (lines 341-343), which reports findings that contrast with ours and those of other studies.

      Reviewer #3 (Public review): 

      Review of resubmission: The authors provided a response to the reviews from myself and other reviewers. While some points were made satisfactorily, particularly in clarification of the innervation of cortex to striatum and the effects of input stimulation, many of my points remain unaddressed. In several cases, the authors chose to explain their rationale rather than address the issues at hand. A number of these issues (in fact, the majority) could be addressed simply by toning done the confidence in conclusions, so it was disappointing to see that the authors by and large did not do this. I repeat my concerns below and note whether I find them to have been satisfactorily addressed or not. 

      In the manuscript by Klug and colleagues, the investigators use a rabies virus-based methodology to explore potential differences in connectivity from cortical inputs to the dorsal striatum. They report that the connectivity from cortical inputs onto D1 and D2 MSNs differs in terms of their projections onto the opposing cell type, and use these data to infer that there are differences in cross-talk between cortical cells that project to D1 vs. D2 MSNs. Overall, this manuscript adds to the overall body of work indicating that there are differential functions of different striatal pathways which likely arise at least in part by differences in connectivity that have been difficult to resolve due to difficulty in isolating pathways within striatal connectivity, and several interesting and provocative observations were reported. Several different methodologies are used, with partially convergent results, to support their main points. 

      However, I have significant technical concerns about the manuscript as presented that make it difficult for me to interpret the results of the experiments. My comments are below. 

      Major: 

      There is generally a large caveat to the rabies studies performed here, which is that both TVA and the ChR2-expressing rabies virus have the same fluorophore. It is thus essentially impossible to determine how many starter cells there are, what the efficiency of tracing is, and which part of the striatum is being sampled in any given experiment. This is a major caveat given the spatial topography of the cortico-striatal projections. Furthermore, the authors make a point in the introduction about previous studies not having explored absolute numbers of inputs, yet this is not at all controlled in this study. It could be that their rabies virus simply replicates better in D1-MSNs than D2-MSNs. No quantifications are done, and these possibilities do not appear to have been considered. Without a greater standardization of the rabies experiments across conditions, it is difficult to interpret the results. 

      This is still an issue. The authors point out why they chose various vectors. I can understand why the authors chose the fluorophores etc. that they did, yet the issues I raised previously are still valid. The discussion should mention that this is a potential issue. It does not necessarily invalidate results, but it is an issue. Furthermore, it is possible (in all systems) that rabies replicates better/more efficiently in some cells than others. This is one possible interpretation that has not really been explored in any study. I don't suggest the authors attempt to do that, but it should be raised as a potential interpretation. If the rabies results could mean several different things, the authors owe it to the readership to state all possible interpretations of data.

      We thank the reviewer for the comments and suggestions. Because the same fluorophore (mCherry) was used in both TVA- and ChR2-expressing viruses, it was not possible to distinguish true starter SPNs from TVA-only SPNs or monosynaptically labeled SPNs. This limitation makes it difficult to precisely assess the efficiency of rabies labeling and retrograde tracing in our experimental setup. Moreover, differences in rabies replication efficiency between D1- and D2-SPNs could potentially lead to an apparent lower connection probability from D1-projecting cortical neurons to D2-SPNs than from D2-projecting cortical neurons to D1-SPNs. We have added this clarification to the Discussion (lines 280-297).

      The authors claim using a few current clamp optical stimulation experiments that the cortical cells are healthy, but this result was far from comprehensive. For example, membrane resistance, capacitance, general excitability curves, etc are not reported. In Figure S2, some of the conditions look quite different (e.g., S2B, input D2-record D2, the method used yields quite different results that the authors write off as not different). Furthermore, these experiments do not consider the likely sickness and death that occurs in starter cells, as has been reported elsewhere. Health of cells in the circuit is overall a substantial concern that alone could invalidate a large portion, if not all, of the behavioral results. This is a major confound given those neurons are thought to play critical roles in the behaviors being studied. This is a major reason why first-generation rabies viruses have not been used in combination with behavior, but this significant caveat does not appear to have been considered, and controls e.g., uninfected animals, infected with AAV helpers, etc, were not included. 

      This issue remains unaddressed. I did not request clarity about experimental design, but rather, raised issues about the potential effects of toxicity. I believe this to be a valid concern that needs to be discussed in the manuscript, especially given what look visually like potential differences in S2. 

      We understand and appreciate the reviewer’s concern regarding the potential cytotoxicity of rabies virus infection. Although we performed the in vivo optogenetic behavioral experiments during a period when rabies-infected cells are generally considered relatively healthy, some deficits in starter cells may still occur and could contribute to the observed effects of optogenetic cortical stimulation. We have added this clarification to the Discussion (lines 298-306).

      The overall purity (e.g., EnvA pseudotyping efficiency) of the RABV prep is not shown. If there was a virus that was not well EnvA-pseudotyped and thus could directly infect cortical (or other) inputs, it would degrade specificity. This issue has not been addressed. Viral strain is irrelevant. The quality of the specific preparations used is what matters.

      While most of the study focuses on the cortical inputs, in slice recordings, inputs from the thalamus are not considered, yet likely contribute to the observed results. Related to this, in in vivo optogenetic experiments, technically, if the thalamic or other inputs to the dorsal striatum project to the cortex, their method will not only target cortical neurons but also terminals of other excitatory inputs. If this cannot be ruled it, stating that the authors are able to selectively activate the cortical inputs to one or the other population should be toned down. 

      The authors added text to the discussion to address this point. While it largely does what is intended, based on the one study cited, I disagree with the authors' conclusions that it is "clear" that potential contamination from other sites does not play a role. The simplest interpretation is the one the authors state, and there is some supporting evidence to back up that assertion, but to me that falls short of making the point "clear" that there are no other interpretations. 

      The statements about specificity of connectivity are not well founded. It may be that in the specific case where they are assessing outside of the area of injections, their conclusions may hold (e.g., excitatory inputs onto D2s have more inputs onto D1s than vice versa). However, how this relates to the actual site of injection is not clear. At face value, if such a connectivity exists, it would suggest that D1-MSNs receive substantially more overall excitatory inputs than D2s. It is thus possible that this observation would not hold over other spatial intervals. This was not explored and thus the conclusions are over-generalized. e.g., the distance from the area of red cells in the striatum to recordings was not quantified, what constituted a high level of cortical labeling was not quantified, etc. Without more rigorous quantification of what was being done, it is difficult to interpret the results. 

      Again, the goal here would be to make a statement about this in the discussion to clarify limitations of the study. I don't expect the authors to re-do all of these experiments, but since they are discussing the corticostriatal circuits, which have multiple subdomains, this remains a relevant point. It has not been addressed. 

      The results in Figure 3 are not well controlled. The authors show contrasting effects of optogenetic stimulation of D1-MSNs and D2-MSNs in the DMS and DLS, results which are largely consistent with the canon of basal ganglia function. However, when stimulating cortical inputs, stimulating the inputs from D1-MSNs gives the expected results (increased locomotion) while stimulating putative inputs to D2-MSNs had no effect. This is not the same as showing a decrease in locomotion - showing no effect here is not possible to interpret. 

      I think that the caveat of showing no clear effects of inputs to D2 stimulation should be pointed out. Yes, I understand that the viruses appeared to express etc., but again it remains possible that the results are driven by a lack of e.g., sufficient ChR2 expression. Aside from a full quantification of the number of cells expressing ChR2, overlap in fiber placement and ChR2 expression (which I don't suggest), this remains a possibility and should be pointed out, as it remains a possibility. 

      In the light of their circuit model, the result showing that inputs to D2-MSNs drive ICSS is confusing. How can the authors account for the fact that these cells are not locomotor-activating, stimulation of their putative downstream cells (D2-MSNs) does not drive ICSS, yet the cortical inputs drive ICSS? Is the idea that these inputs somehow also drive D1s? If this is the case, how do D2s get activated, if all of the cortical inputs tested net activate D1s and not D2s? Same with the results in Figure 4 - the inputs and putative downstream cells do not have the same effects. Given potential caveats of differences in viral efficiency, spatial location of injections, and cellular toxicity, I cannot interpret these experiments. 

      The explanation the authors provide in their rebuttal makes sense, however this should be included in the discussion of the manuscript, as it is interesting and relevant. 

      We thank the reviewer for the valuable comments and suggestions. In line with the reviewer’s recommendation, we have incorporated these explanations into the Discussion (lines 242–279) to help interpret the complex behavioral outcomes of optogenetic stimulation of cortical neurons projecting to D1- or D2-SPNs.

      Reviewer #2 (Recommendations for the authors): 

      I appreciate the authors' responses, which helped clarify some experimental choices. I appreciate that the experiment in Fig S3 serves as a reasonable light control for optogenetics experiments. The careful comparison with methods in Cui et al (2021) is useful, although not added to the main manuscript. Some of the other citations here don't really address the controversy, e.g. Kravitz at al is in DMS, but perhaps fully addressing this issue is outside the scope of the current manuscript and awaits further experiments. I also appreciate the clarification for recording locations that "This configuration ensured that the distance between recorded and starter cells did not exceed 100 µm, maintaining close anatomical proximity and thereby preserving the likelihood of shared cortical innervation within the examined circuitry." However, the statement in the reviewer response does not seem to be added to the manuscript's methods, which I think would be helpful. The criteria for choosing recorded cells are still a bit fuzzy without a map of recording locations and histology. There is also a problem that mCherry-positive cells could be starter cells or could be monosynaptically traced cells, so it is hard to know the area of the starter cell population in these experiments for sure. My evaluation of the manuscript remains largely the same as the original. However, I have adjusted my public review a bit to incorporate the authors' responses. I still think this paper has valuable information, suggesting an interesting and previously unappreciated structure of corticostriatal inputs that I hope this group and others will continue to investigate and incorporate into models of basal ganglia function.

      We thank the reviewer for the valuable suggestions. We have now included a comparison with Cui et al. in the Discussion. In addition, we have added the criteria for selecting recorded cells to the Methods section: ‘This configuration ensured that the distance between recorded and starter cells did not exceed 100 µm, maintaining close anatomical proximity and thereby preserving the likelihood of shared cortical innervation within the examined circuitry.’

    1. eLife Assessment

      This work introduces a new Python package, Avian Vocalization Analysis (AVN) that provides several key analysis pipelines for birdsong research. This tool is likely to prove useful to researchers in neuroscience and beyond, as demonstrated by convincing experiments using a wide range of publicly available birdsong data.

    2. Reviewer #2 (Public review):

      Summary:

      In this work, the authors present a new Python software package, Avian Vocalization Network (AVN) aimed at facilitating the analysis of birdsong, especially the song of the zebra finch, the most common songbird model in neuroscience. The package handles some of the most common (and some more advanced) song analyses, including segmentation, syllable classification, featurization of song, calculation of tutor-pupil similarity, and age prediction, with a view toward making the entire process friendlier to experimentalists with limited coding experience working in the field.

      For many years, Sound Analysis Pro has served as a standard in the songbird field, the first package to extensively automate songbird analysis and facilitate the computation of acoustic features that have helped define the field. More recently, the increasing popularity of Python as a language, along with the emergence of new machine learning methods, has resulted in a number of new software tools, including the vocalpy ecosystem for audio processing, TweetyNet (for segmentation), t-SNE and UMAP (for visualization), and autoencoder-based approaches for embedding.

      As with any software package, this one necessarily makes a number of design choices, which may or may not fit the needs of all users. Those who prefer a more automated pipeline with fewer knobs to turn may appreciate AVN in cases where the existing recipes fit their needs, while those who require more customization and flexibility may require a more bespoke (and thus code-intensive) approach.

      Strengths:

      The AVN package overlaps several of these earlier efforts, albeit with a focus on more traditional featurization that many experimentalists may find more interpretable than deep learning-based approaches. Among the strengths of the paper are its clarity in explaining the several analyses it facilitates, along with high-quality experiments across multiple public datasets collected from different research groups. As a software package, it is open source, installable via the pip Python package manager, and features high-quality documentation, as well as tutorials. For experimentalists who wish to replicate any of the analyses from the paper, the package is likely to be a useful time saver.

      Weaknesses:

      I think the potential limitations of the work are predominantly on the software end, with one or two quibbles about the methods.

      First, the software: It's important to note that the package is trying to do many things, of which it is likely to do several well and a few comprehensively. Rather than a package that presents a number of new analyses or a new analysis framework, it is more a codification of recipes, some of which are reimplementations of existing work (SAP features), some of which are essentially wrappers around other work (interfacing with WhisperSeg segmentations), and some of which are new (similarity scoring). All of this has value, but in my estimation, it has less value as part of a standalone package and potentially much more as part of an ecosystem like vocalpy that is undergoing continuous development and has long-term support. While the code is well-documented, including web-based documentation for both the core package and the GUI, the latter is available only on Windows, which might limit the scope of adoption.

      That is to say, whether AVN is adopted by the field in the medium term will have much more to do with the quality of its maintenance and responsiveness to users than any particular feature, but I believe that many of the analysis recipes that the authors have carefully worked out may find their way into other code and workflows.

      In the revised version of the paper, the authors have expanded their case for the design choices made in AVN and remain committed to maintaining the tool. Given the low cost for users in trying new methods and the work the authors have put into further reducing this overhead via documentation, those curious about the package are likely best served by simply downloading it and giving it a try on their own data.

      Second, two notes about new analysis approaches:

      (1) The authors propose a new means of measuring tutor-pupil similarity based on first learning a latent space of syllables via a self-supervised learning (SSL) scheme and then using the earth mover's distance (EMD) to calculate transport costs between the distributions of tutors' and pupils' syllables. While, to my knowledge, this exact method has not previously been proposed in birdsong, I suspect it is unlikely to differ substantially from the approach of autoencoding followed by MMD used in the Goffinet et al. paper. That is, SSL, like the autoencoder, is a latent space learning approach, and EMD, like MMD, is an integral probability metric that measures discrepancies between two distributions. (Indeed, the two are very closely related: https://stats.stackexchange.com/questions/400180/earth-movers-distance-and-maximum-mean-discrepency.) Without further experiments, it is hard to tell whether these two approaches differ meaningfully. Likewise, while the authors have trained on a large corpus of syllables to define their latent space in a way that generalizes to new birds, it is unclear why such an approach would not work with other latent space learning methods.

      Update: The authors now provide an extensive comparison with the Goffinet et al. paper and also consider differences between MMD and EMD. This comparison both adds value to the original paper and provides useful benchmarking for others looking to develop latent space comparison methods.

      (2) The authors propose a new method for maturity scoring by training a model (a generalized additive model) to predict the age of the bird based on a selected subset of acoustic features. This is distinct from the "predicted age" approach of Brudner, Pearson, and Mooney, which predicts based on a latent representation rather than specific features, and the GAM nicely segregates the contribution of each. As such, this approach may be preferred by many users who appreciate its interpretability.

      In summary, my view is that this is a nice paper detailing a well-executed piece of software whose future impact will be determined by the degree of support and maintenance it receives from others over the near and medium term.

    3. Reviewer #3 (Public review):

      This paper introduces the Avian Vocalization Network (AVN), a novel birdsong analysis pipeline using deep learning. By automating vocal annotation tasks, the AVN generates interpretable song features and song similarity scores on novel datasets without retraining. The performance of the network is solid and is comparable to that of human annotators.

      The authors have improved the manuscript in several aspects, such as the comparison with the Goffinet work. Overall, the AVN feature set could become a useful tool for evaluating birdsongs. But the authors also chose not to address a certain number of criticisms, and some issues remain poorly addressed, and the work is not reproducible at this stage. With a little effort, these issues could get resolved in my view. I will just pick on four issues that I think can be easily addressed:

      (1) Limitation of feature set: They claim that AVN satisfies the criteria (line 60) of "creating a common feature space for the comparison of behavioural phenotypes ..."(line 51), but then on LDA analysis, explained on line 910 they say "excluding amplitude and amplitude modulation features as they were found to vary". Since their feature set is not stable and not truly 'common' to all tasks, this limitation needs addressing in the discussion (that some features seem to vary undesirably, and they need exclusion based on some criteria to be defined).

      (2) Missing information on classification training loss: The Authors insist that their triplet loss is not related to classification, and they brush off my request for more information. In their rebuttal, they write: 'The loss function is related to the relative distance between embeddings of syllables with the same or different labels, not the classification of syllables as same or different.' Perplexingly, however, in the revised paper, authors speak themselves of 'classes', in Line 1004: this allows the model to begin learning an easier task, of separating syllables of different classes by a smaller margin.' So it seems the authors actually agree with me that there is an underlying classification task. I am therefore going to make it a bit more explicit here what I'm asking for, hoping this will better resonate with them.

      In line 984 they define their loss function and in lines 994-996 they define 'hard' and 'semi-hard' triplets. Authors then train a system to minimize the loss with a ratio of 75 percent semi-hard triplets and 25 percent hard triplets and a final weighing parameter value alpha=0.7. What I'm asking for is this 'classification' loss their trained model achieves, or in other words, the fraction of triplets that end up producing a loss, either of the 'hard' or 'semi-hard' type. For example, if their model manages to separate all 'possible triplets' by a margin of at least alpha, then the loss would be zero. If the model achieves to separate all triplets except one, then the loss would correspond to the amount by which the separation differences between the anchor and the positive vs negative samples exceeds alpha. So, an important number to provide in the paper is the fraction of triplets that incur a nonzero loss, i.e., the fraction of semi-hard triplets. And another important quantity is the fraction of hard triplets, i.e. the fraction of triplets that would incur a loss if alpha were set to zero, or, in other words, the triplets for which the negative sample is closer to the anchor than the positive sample. By the way, I assume this latter fraction of hard cases will be zero - that their model does not confuse any positive and negative training samples...<br /> Note: the quantification chosen by the authors termed 'contrast index' is interesting, but it is a derived quantity, it is not the quantity authors chose to optimize during training. If authors were to report both the training loss achieved and the 'contrast index', follow-up work could be benchmarked against both these quantities. If for example, a follow-up model achieves smaller loss but worse contrast, then the loss is not a good placeholder measure for optimizing contrast. Alternatively, follow-up work could focus on the contrast index as training objective, obliterating the need for the triplet loss as an intermediate step (I don't buy the authors' argument that such an optimization would be infeasible).

      (3) Reproducibility: they explain the way they train the CNN with triplet loss to produce the embeddings, but we're missing both actual scripts on GitHub to train and inference from scratch, and model weights, or even hyper parameters they used. Authors only provide the architecture, and I don't think that's enough to be considered replicable in today's standards. I would suggest they release complete model checkpoint weights for the result they report, the exact data splits, the hyper parameters they used and training and testing code, so that one can very easily verify their claims and apply their methods to other datasets. Note: for example, the code to extract the embeddings is incomplete (the function definition of single_bird_extract_embeddings cannot be found on GitHub) and the model weights they used are missing.

      (4) With regards to the age prediction model, the authors should specify that this model is mainly useful for comparisons across studies but less so for precise evaluation of the effects of a treatment within a study. Namely, the effect on song of a treatment is best assessed by comparison to within-subject past song, and by comparison to age-matched control birds (ideally siblings) raised in identical conditions, rather than to invoke a generic model trained on other birds and from different colonies and breeding conditions as authors propose to do. In other words, to introduce a generic model for evaluation of song maturity introduces measurement noise in terms of the additional birds and their variable conditions, which can hinder precise assessment of treatment effects. Note that to state that in past work such maturity models were used is not a good justification, scientifically speaking.

      Finally, the authors write that methods for syllable segmentation have not been systematically compared but the whisperseg work they use did such a comparison. So the authors should revise their novelty claim of being the first to compare syllable segmentation methods.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      This paper applies methods for segmentation, annotation, and visualization of acoustic analysis to zebra finch song. The paper shows that these methods can be used to predict the stage of song development and to quantify acoustic similarity. The methods are solid and are likely to provide a useful tool for scientists aiming to label large datasets of zebra finch vocalizations. The paper has two main parts: 1) establishing a pipeline/ package for analyzing zebra finch birdsong and 2) a method for measuring song imitation. 

      Strengths: 

      It is useful to see existing methods for syllable segmentation compared to new datasets.

      It is useful, but not surprising, that these methods can be used to predict developmental stage, which is strongly associated with syllable temporal structure.

      It is useful to confirm that these methods can identify abnormalities in deafened and isolated songs. 

      Weaknesses: 

      For the first part, the implementation seems to be a wrapper on existing techniques. For instance, the first section talks about syllable segmentation; they made a comparison between whisperseg (Gu et al, 2024), tweetynet (Cohen et al, 2022), and amplitude thresholding. They found that whisperseg performed the best, and they included it in the pipeline. They then used whisperseg to analyze syllable duration distributions and rhythm of birds of different ages and confirmed past findings on this developmental process (e.g. Aronov et al, 2011). Next, based on the segmentation, they assign labels by performing UMAP and HDBScan on the spectrogram (nothing new; that's what people have been doing). Then, based on the labels, they claimed they developed a 'new' visualization - syntax raster ( line 180 ). That was done by Sainburg et. al. 2020 in Figure 12E and also in Cohen et al, 2020 - so the claim to have developed 'a new song syntax visualization' is confusing. The rest of the paper is about analyzing the finch data based on AVN features (which are essentially acoustic features already in the classic literature). 

      First, we would like to thank this reviewer for their kind comments and feedback on this manuscript. It is true that many of the components of this song analysis pipeline are not entirely novel in isolation. Our real contribution here is bringing them together in a way that allows other researchers to seamlessly apply automated syllable segmentation, clustering, and downstream analyses to their data. That said, our approach to training TweetyNet for syllable segmentation is novel. We trained TweetyNet to recognize vocalizations vs. silence across multiple birds, such that it can generalize to new individual birds, whereas Tweetynet had only ever been used to annotate song syllables from birds included in its training set previously. Our validation of TweetyNet and WhisperSeg in combination with UMAP and HDBSCAN clustering is also novel, providing valuable information about how these systems interact, and how reliable the completely automatically generated labels are for downstream analysis. We have added a couple sentences to the introduction to emphasize the novelty of this approach and validation.

      Our syntax raster visualization does resemble Figure 12E in Sainburg et al. 2020, however it differs in a few important ways, which we believe warrant its consideration as a novel visualization method. First, Sainburg et al. represent the labels across bouts in real time; their position along the x axis reflects the time at which each syllable is produced relative to the start of the bout. By contrast, our visualization considers only the index of syllables within a bout (ie. First syllable vs. second syllable etc) without consideration of the true durations of each syllable or the silent gaps between them. This makes it much easier to detect syntax patterns across bouts, as the added variability of syllable timing is removed. Considering only the sequence of syllables rather than their timing also allows us to more easily align bouts according to the first syllable of a motif, further emphasizing the presence or absence of repeating syllable sequences without interference from the more variable introductory notes at the start of a motif. Finally, instead of plotting all bouts in the order in which they were produced, our visualization orders bouts such that bouts with the same sequence of syllables will be plotted together, which again serves to emphasize the most common syllable sequences that the bird produces. These additional processing steps mean that our syntax raster plot has much starker contrast between birds with stereotyped syntax and birds with more variable syntax, as compared to the more minimally processed visualization in Sainburg et al. 2020. There doesn’t appear to be any similar visualizations in Cohen et al. 2020. 

      The second part may be something new, but there are opportunities to improve the benchmarking. It is about the pupil-tutor imitation analysis. They introduce a convolutional neural network that takes triplets as an input (each tripled is essentially 3 images stacked together such that you have (anchor, positive, negative), Anchor is a reference spectrogram from, say finch A; positive means a different spectrogram with the same label as anchor from finch A, and negative means a spectrogram not related to A or different syllable label from A. The network is then trained to produce a low-dimensional embedding by ensuring the embedding distance between anchor and positive is less than anchor and negative by a certain margin. Based on the embedding, they then made use of earth mover distance to quantify the similarity in the syllable distribution among finches. They then compared their approach performance with that of sound analysis pro (SAP) and a variant of SAP. A more natural comparison, which they didn't include, is with the VAE approach by Goffinet et al. In this paper (https://doi.org/10.7554/eLife.67855, Fig 7), they also attempted to perform an analysis on the tutor pupil song.  

      We thank the reviewer for this suggestion. We have included a comparison of our triplet loss embedding model to the VAE model proposed in Goffinet et al. 2021. We also included comparisons of similarity scoring using each of these embedding models combined with either earth mover’s distance (EMD) or maximum mean discrepancy (MMD) to calculate the similarity of the embeddings, as was done in Goffinet et al. 2021. As discussed in the updated results section of the paper and shown in the new Figure 6–figure supplement 1, the Triplet loss model with MMD performs best for evaluating song learning on new birds, not included in model training. We’ve updated the main text of the paper to reflect this switch from EMD to MMD for the primary similarity scoring approach.

      Reviewer #2 (Public Review):

      Summary: 

      In this work, the authors present a new Python software package, Avian Vocalization Network (AVN) aimed at facilitating the analysis of birdsong, especially the song of the zebra finch, the most common songbird model in neuroscience. The package handles some of the most common (and some more advanced) song analyses, including segmentation, syllable classification, featurization of song, calculation of tutor-pupil similarity, and age prediction, with a view toward making the entire process friendlier to experimentalists working in the field.

      For many years, Sound Analysis Pro has served as a standard in the songbird field, the first package to extensively automate songbird analysis and facilitate the computation of acoustic features that have helped define the field. More recently, the increasing popularity of Python as a language, along with the emergence of new machine learning methods, has resulted in a number of new software tools, including the vocalpy ecosystem for audio processing, TweetyNet (for segmentation), t-SNE and UMAP (for visualization), and autoencoder-based approaches for embedding.

      Strengths: 

      The AVN package overlaps several of these earlier efforts, albeit with a focus on more traditional featurization that many experimentalists may find more interpretable than deep learning-based approaches. Among the strengths of the paper are its clarity in explaining the several analyses it facilitates, along with high-quality experiments across multiple public datasets collected from different research groups. As a software package, it is open source, installable via the pip Python package manager, and features high-quality documentation, as well as tutorials. For experimentalists who wish to replicate any of the analyses from the paper, the package is likely to be a useful time saver.

      Weaknesses: 

      I think the potential limitations of the work are predominantly on the software end, with one or two quibbles about the methods.

      First, the software: it's important to note that the package is trying to do many things, of which it is likely to do several well and few comprehensively. Rather than a package that presents a number of new analyses or a new analysis framework, it is more a codification of recipes, some of which are reimplementations of existing work (SAP features), some of which are essentially wrappers around other work (interfacing with WhisperSeg segmentations), and some of which are new (similarity scoring). All of this has value, but in my estimation, it has less value as part of a standalone package and potentially much more as part of an ecosystem like vocalpy that is undergoing continuous development and has long-term support. 

      We appreciate this reviewer’s comments and concerns about the structure of the AVN package and its long-term maintenance. We have considered incorporating AVN into the VocalPy ecosystem but have chosen not to for a few key reasons. (1) AVN was designed with ease of use for experimenters with limited coding experience top of mind. VocalPy provides excellent resources for researchers with some familiarity with object-oriented programming to manage and analyze their datasets; however, we believe it may be challenging for users without such experience to adopt VocalPy quickly. AVN’s ‘recipe’ approach, as you put it, is very easily accessible to new users, and allows users with intermediate coding experience to easily navigate the source code to gain a deeper understanding of the methodology. AVN also consistently outputs processed data in familiar formats (tables in .csv files which can be opened in excel), in an effort to make it more accessible to new users, something which would be challenging to reconcile with VocalPy’s emphasis on their `dataset`classes. (2) AVN and VocalPy differ in their underlying goals and philosophies when it comes to flexibility vs. standardization of analysis pipelines. VocalPy is designed to facilitate mixing-and-matching of different spectrogram generation, segmentation, annotation etc. approaches, so that researchers can design and implement their own custom analysis pipelines. This flexibility is useful in many cases. For instance, it could allow researchers who have very different noise filtering and annotation needs, like those working with field recordings versus acoustic chamber recordings, to analyze their data using this platform. However, when it comes to comparisons across zebra finch research labs, this flexibility comes at the expense of direct comparison and integration of song features across research groups. This is the context in which AVN is most useful. It presents a single approach to song segmentation, labeling, and featurization that has been shown to generalize well across research groups, and which allows direct comparisons of the resulting features. AVN’s single, extensively validated, standard pipeline approach is fundamentally incompatible with VocalPy’s emphasis on flexibility. We are excited to see how VocalPy continues to evolve in the future, and recognize the value that both AVN and VocalPy bring to the songbird research community, each with their own distinct strengths, weaknesses, and ideal use cases. 

      While the code is well-documented, including web-based documentation for both the core package and the GUI, the latter is available only on Windows, which might limit the scope of adoption. 

      We thank the reviewer for their kind words about AVN’s documentation. We recognize that the GUI’s exclusive availability on Windows is a limitation, and we would be happy to collaborate with other researchers and developers in the future to build a Mac compatible version, should the demand present itself. That said, the python package works on all operating systems, so non-Windows users still have the ability to use AVN that way.

      That is to say, whether AVN is adopted by the field in the medium term will have much more to do with the quality of its maintenance and responsiveness to users than any particular feature, but I believe that many of the analysis recipes that the authors have carefully worked out may find their way into other code and workflows. 

      Second, two notes about new analysis approaches:

      (1) The authors propose a new means of measuring tutor-pupil similarity based on first learning a latent space of syllables via a self-supervised learning (SSL) scheme and then using the earth mover's distance (EMD) to calculate transport costs between the distributions of tutors' and pupils' syllables. While to my knowledge this exact method has not previously been proposed in birdsong, I suspect it is unlikely to differ substantially from the approach of autoencoding followed by MMD used in the Goffinet et al. paper. That is, SSL, like the autoencoder, is a latent space learning approach, and EMD, like MMD, is an integral probability metric that measures discrepancies between two distributions. (Indeed, the two are very closely related: https://stats.stackexchange.com/questions/400180/earth-movers-distance-andmaximum-mean-discrepency.) Without further experiments, it is hard to tell whether these two approaches differ meaningfully. Likewise, while the authors have trained on a large corpus of syllables to define their latent space in a way that generalizes to new birds, it is unclear why such an approach would not work with other latent space learning methods.  

      We recognize the similarities between these approaches and have included comparisons of the VAE and MMD as in the Goffinet paper to our triplet loss model and EMD.  As discussed in the updated results section of the paper and shown in the new Figure 6–figure supplement 1, the Triplet loss model with MMD performs best for evaluating song learning on new birds, not included in model training. We’ve updated the main text of the paper to reflect this switch from EMD to MMD for the primary similarity scoring approach. 

      (2) The authors propose a new method for maturity scoring by training a model (a generalized additive model) to predict the age of the bird based on a selected subset of acoustic features. This is distinct from the "predicted age" approach of Brudner, Pearson, and Mooney, which predicts based on a latent representation rather than specific features, and the GAM nicely segregates the contribution of each. As such, this approach may be preferred by many users who appreciate its interpretability.  

      In summary, my view is that this is a nice paper detailing a well-executed piece of software whose future impact will be determined by the degree of support and maintenance it receives from others over the near and medium term.

      Reviewer #3 (Public Review):

      Summary: 

      The authors invent song and syllable discrimination tasks they use to train deep networks. These networks they then use as a basis for routine song analysis and song evaluation tasks. For the analysis, they consider both data from their own colony and from another colony the network has not seen during training. They validate the analysis scores of the network against expert human annotators, achieving a correlation of 80-90%. 

      Strengths: 

      (1) Robust Validation and Generalizability: The authors demonstrate a good performance of the AVN across various datasets, including individuals exhibiting deviant behavior. This extensive validation underscores the system's usefulness and broad applicability to zebra finch song analysis, establishing it as a potentially valuable tool for researchers in the field.

      (2) Comprehensive and Standardized Feature Analysis: AVN integrates a comprehensive set of interpretable features commonly used in the study of bird songs. By standardizing the feature extraction method, the AVN facilitates comparative research, allowing for consistent interpretation and comparison of vocal behavior across studies.

      (3) Automation and Ease of Use. By being fully automated, the method is straightforward to apply and should introduce barely an adoption threshold to other labs.

      (4) Human experts were recruited to perform extensive annotations (of vocal segments and of song similarity scores). These annotations released as public datasets are potentially very valuable. 

      Weaknesses: 

      (1) Poorly motivated tasks. The approach is poorly motivated and many assumptions come across as arbitrary. For example, the authors implicitly assume that the task of birdsong comparison is best achieved by a system that optimally discriminates between typical, deaf, and isolated songs. Similarly, the authors assume that song development is best tracked using a system that optimally estimates the age of a bird given its song. My issue is that these are fake tasks since clearly, researchers will know whether a bird is an isolated or a deaf bird, and they will also know the age of a bird, so no machine learning is needed to solve these tasks. Yet, the authors imagine that solving these placeholder tasks will somehow help with measuring important aspects of vocal behavior.  

      We appreciate this reviewer’s concerns and apologize for not providing sufficiently clear rationale for the inclusion of our phenotype classifier and age regression models in the original manuscript. These tasks are not intended to be taken as a final, ultimate culmination of the AVN pipeline. Rather, we consider the carefully engineered 55-interpretable feature set to be AVN’s final output, and these analyses serve merely as examples of how that feature set can be applied. That said, each of these models do have valid experimental use cases that we believe are important and would like to bring to the attention of the reviewer.

      For one, we showed how the LDA model that can discriminate between typical, deaf, and isolate birds’ songs not only allows us to evaluate which features are most important for discriminating between these groups, but also allows comparison of the FoxP1 knock-down (FP1 KD) birds to each of these phenotypes. Based on previous work (Garcia-Oscos et al. 2021), we hypothesized that FP1 KD in these birds specifically impaired tutor song memory formation while sparing a bird’s ability to refine their own vocalizations through auditory feedback. Thus, we would expect their songs to resemble those of isolate birds, who lack a tutor song memory, but not to resemble deaf birds who lack a tutor song memory and auditory feedback of their own vocalizations to guide learning. The LDA model allowed us to make this comparison quantitatively for the first time and confirm our hypothesis that FP1 KD birds’ songs are indeed most like isolates’. In the future, as more research groups publish their birds’ AVN feature sets, we hope to be able to make even more fine-grained comparisons between different groups of birds, either using LDA or other similar interpretable classifiers. 

      The age prediction model also has valid real-world use cases. For instance, one might imagine an experimental manipulation that is hypothesized to accelerate or slow song maturation in juvenile birds. This age prediction model could be applied to the AVN feature sets of birds having undergone such a manipulation to determine whether their predicted ages systematically lead or lag their true biological ages, and which song features are most responsible for this difference. We didn’t have access to data for any such birds for inclusion in this paper, but we hope that others in the future will be able to take inspiration from our methodology and use this or a similar age regression model with AVN features in their research. We have added a couple lines to the ‘Comparing Song Disruptions with AVN Features’ and ‘Tracking Song Development with AVN Features’ sections of the results to make this more clear. 

      Along similar lines, authors assume that a good measure of similarity is one that optimally performs repeated syllable detection (i.e. to discriminate same syllable pairs from different pairs). The authors need to explain why they think these placeholder tasks are good and why no better task can be defined that more closely captures what researchers want to measure. Note: the standard tasks for self-supervised learning are next word or masked word prediction, why are these not used here? 

      This reviewer appears to have misunderstood our similarity scoring embedding model and our rationale for using it. We will explain it in more depth here and have added a paragraph to the ‘Measuring Song Imitation’ section of the results explaining this rationale more briefly.

      First, nowhere are we training a model to discriminate between same and different syllable pairs. The triplet loss network is trained to embed syllables in an 8-dimensional space such that syllables with the same label are closer together than syllables with different labels. The loss function is related to the relative distance between embeddings of syllables with the same or different labels, not the classification of syllables as same or different. This approach was chosen because it has repeatedly been shown to be a useful data compression step (Schorff et al. 2015, Thakur et al. 2019) before further downstream tasks are applied on its output, particularly in contexts where there is little data per class (syllable label). For example, Schorff et al. 2015 trained a deep convolutional neural network with triplet loss to embed images of human faces from the same individual closer together than images of different individuals in a 128dimensional space. They then used this model to compute 128-dimensional representations of additional face images, not included in training, which were used for individual facial recognition (this is a same vs. different category classifier), and facial clustering, achieving better performance than the previous state of the art. The triplet loss function results in a model that can generate useful embeddings of previously unseen categories, like new individuals’ faces, or new zebra finches’ syllables, which can then be used in downstream analyses. This meaningful, lower dimensional space allows comparisons of distributions of syllables across birds, as in Brainard and Mets 2008, and Goffinet et al. 2021. 

      Next word and masked word prediction are indeed common self-supervised learning tasks for models working with text data, or other data with meaningful sequential organization. That is not the case for our zebra finch syllables, where every bird’s syllable sequence depends only on its tutor’s sequence, and there is no evidence for strong universal syllable sequencing rules (James et al. 2020). Rather, our embedding model is an example of a computer vision task, as it deals with sets of two-dimensional images (spectrograms), not sequences of categorical variables (like text). It is also not, strictly speaking, a selfsupervised learning task, as it does require syllable labels to generate the triplets. A common selfsupervised approach for dimensionality reduction in a computer vision task such as this one would be to train an autoencoder to compress images to a lower dimensional space, then faithfully reconstruct them from the compressed representation.  This has been done using a variational autoencoder trained on zebra finch syllables in Goffinet et al. 2021. In keeping with the suggestions from reviewers #1 and #2, we have included a comparison of our triplet loss model with the Goffinet et al. VAE approach in the revised manuscript. 

      (2) The machine learning methodology lacks rigor. The aims of the machine learning pipeline are extremely vague and keep changing like a moving target. Mainly, the deep networks are trained on some tasks but then authors evaluate their performance on different, disconnected tasks. For example, they train both the birdsong comparison method (L263+) and the song similarity method (L318+) on classification tasks. However, they evaluate the former method (LDA) on classification accuracy, but the latter (8-dim embeddings) using a contrast index. In machine learning, usually, a useful task is first defined, then the system is trained on it and then tested on a held-out dataset. If the sensitivity index is important, why does it not serve as a cost function for training?

      Again, this reviewer seems not to understand our similarity scoring methodology. Our similarity scoring model is not trained on a classification task, but rather on an embedding task. It learns to embed spectrograms of syllables in an 8-dimensional space such that syllables with the same label are closer together than syllables with different labels. We could report the loss values for this embedding task on our training and validation datasets, but these wouldn’t have any clear relevance to the downstream task of syllable distribution comparison where we are using the model’s embeddings. We report the contrast index as this has direct relevance to the actual application of the model and allows comparisons to other similarity scoring methods, something that the triplet loss values wouldn’t allow. 

      The triplet loss method was chosen because it has been shown to yield useful low-dimensional representations of data, even in cases where there is limited labeled training data (Thakur et al. 2019). While we have one of the largest manually annotated datasets of zebra finch songs, it is still quite small by industry deep learning standards, which is why we chose a method that would perform well given the size of our dataset. Training a model on a contrast index directly would be extremely computationally intensive and require many more pairs of birds with known relationships than we currently have access to. It could be an interesting approach to take in the future, but one that would be unlikely to perform well with a dataset size typical to songbird research. 

      Also, usually, in solid machine learning work, diverse methods are compared against each other to identify their relative strengths. The paper contains almost none of this, e.g. authors examined only one clustering method (HDBSCAN).  

      We did compare multiple methods for syllable segmentation (WhisperSeg, TweetyNet, and Amplitude thresholding) as this hadn’t been done previously. We chose not to perform extensive comparison of different clustering methods as Sainburg et al. 2020 already did so and we felt no need to reduplicate this effort. We encourage this reviewer to refer to Sainburg et al.’s excellent work for comparisons of multiple clustering methods applied to zebra finch song syllables.

      (3) Performance issues. The authors want to 'simplify large-scale behavioral analysis' but it seems they want to do that at a high cost. (Gu et al 2023) achieved syllable scores above 0.99 for adults, which is much larger than the average score of 0.88 achieved here (L121). Similarly, the syllable scores in (Cohen et al 2022) are above 94% (their error rates are below 6%, albeit in Bengalese finches, not zebra finches), which is also better than here. Why is the performance of AVN so low? The low scores of AVN argue in favor of some human labeling and training on each bird.  

      Firstly, the syllable error rate scores reported in Cohen et al. 2022 are calculated very differently than the F1 scores we report here and are based on a model trained with data from the same bird as was used in testing, unlike our more general segmentation approach where the model was tested on different birds than were used in training. Thus, the scores reported in Cohen et al. and the F1 scores that we report cannot be compared. 

      The discrepancy between the F1<sub>seg</sub> scores reported in Gu et al. 2023 and the segmentation F1 scores that we report are likely due to differences in the underlying datasets. Our UTSW recordings tend to have higher levels of both stationary and non-stationary background noise, which make segmentation more challenging. The recordings from Rockefeller were less contaminated by background noise, and they resulted in slightly higher F1 scores. That said, we believe that the primary factor accounting for this difference in scores with Gu et al. 2023 is the granularity of our ‘ground truth’ syllable segments. In our case, if there was never any ambiguity as to whether vocal elements should be segmented into two short syllables with a very short gap between them or merged into a single longer syllable, we chose to split them. WhisperSeg had a strong tendency to merge the vocal elements in ambiguous cases such as these. This results in a higher rate of false negative syllable onset detections, reflected in the low recall scores achieved by WhisperSeg (see Figure 2–figure supplement 1b), but still very high precision scores (Figure 2–figure supplement 1a). While WhisperSeg did frequently merge these syllables in a way that differed from our ground truth segmentation, it did so consistently, meaning it had little impact on downstream measures of syntax entropy (Figure 3c) or syllable duration entropy (Figure 3–figure supplement 2a). It is for that reason that, despite a lower F1 score, we still consider AVN’s automatically generated annotations to be sufficiently accurate for downstream analyses. 

      Should researchers require a higher degree of accuracy and precision with their annotations (for example, to detect very subtle changes in song before and after an acute manipulation) we suggest they turn toward one of the existing tools for supervised song annotation, such as TweetyNet.

      (4) Texas bias. It is true that comparability across datasets is enhanced when everyone uses the same code. However, the authors' proposal essentially is to replace the bias between labs with a bias towards birds in Texas. The comparison with Rockefeller birds is nice, but it amounts to merely N=1. If birds in Japanese or European labs have evolved different song repertoires, the AVN might not capture the associated song features in these labs well.  

      We appreciate the author’s concern about a bias toward birds from the UTSW colony. However, this paper shows that despite training (for the similarity scoring) and hyperparameter fitting (for the HDBSCAN clustering) on the UTSW birds, AVN performs as well if not better on birds from Rockefeller than from UTSW. To our knowledge, there are no publicly available datasets of annotated zebra finch songs from labs in Europe or in Asia but we would be happy to validate AVN on such datasets, should they become available. Furthermore, there is no evidence to suggest that there is dramatic drift in zebra finch vocal repertoire between continents which would necessitate such additional validation. While we didn’t have manual annotations for this dataset (which would allow validation of our segmentation and labeling methods), we did apply AVN to recordings shared with us by the Wada lab in Japan, where visual inspection of the resulting annotations suggested comparable accuracy to the UTSW and Rockefeller datasets. 

      (5) The paper lacks an analysis of the balance between labor requirement, generalizability, and optimal performance. For tasks such as segmentation and labeling, fine-tuning for each new dataset could potentially enhance the model's accuracy and performance without compromising comparability. E.g. How many hours does it take to annotate hundred song motifs? How much would the performance of AVN increase if the network were to be retrained on these? The paper should be written in more neutral terms, letting researchers reach their own conclusions about how much manual labor they want to put into their data.  

      With standardization and ease of use in mind, we designed AVN specifically to perform fully automated syllable annotation and downstream feature calculations. We believe that we have demonstrated in this manuscript that our fully automated approach is sufficiently reliable for downstream analyses across multiple zebra finch colonies. That said, if researchers require an even higher degree of annotation precision and accuracy, they can turn toward one of the existing methods for supervised song annotation, such as TweetyNet. Incorporating human annotations for each bird processed by AVN is likely to improve its performance, but this would require significant changes to AVN’s methodology, and is outside the scope of our current efforts.

      (6) Full automation may not be everyone's wish. For example, given the highly stereotyped zebra finch songs, it is conceivable that some syllables are consistently mis-segmented or misclassified. Researchers may want to be able to correct such errors, which essentially amounts to fine-tuning AVN. Conceivably, researchers may want to retrain a network like the AVN on their own birds, to obtain a more fine-grained discriminative method.  

      Other methods exist for supervised or human-in-the-loop annotation of zebra finch songs, such as TweetyNet and DAN (Alam et al. 2023). We invite researchers who require a higher degree of accuracy than AVN can provide to explore these alternative approaches for song annotation. Incorporating human feedback into AVN was never the goal of our pipeline, would require significant changes to AVN’s design and is outside the scope of this manuscript.

      (7) The analysis is restricted to song syllables and fails to include calls. No rationale is given for the omission of calls. Also, it is not clear how the analysis deals with repeated syllables in a motif, whether they are treated as two-syllable types or one.  

      It is true that we don’t currently have any dedicated features to describe calls. This could be a useful addition to AVN in the future. 

      What a human expert inspecting a spectrogram would typically call ‘repeated syllables’ in a bout are almost always assigned the same syllable label by the UMAP+HDBSCAN clustering. The syntax analysis module includes features examining the rate of syllable repetitions across syllable types, as mentioned in lines 222-226 of the revised manuscript. See https://avn.readthedocs.io/en/latest/syntax_analysis_demo.html#Syllable-Repetitions for further details.

      (8) It seems not all human annotations have been released and the instruction sets given to experts (how to segment syllables and score songs) are not disclosed. It may well be that the differences in performance between (Gu et al 2023) and (Cohen et al 2022) are due to differences in segmentation tasks, which is why these tasks given to experts need to be clearly spelled out. Also, the downloadable files contain merely labels but no identifier of the expert. The data should be released in such a way that lets other labs adopt their labeling method and cross-check their own labeling accuracy.  

      All human annotations used in this manuscript have indeed been released as part of the accompanying dataset. Syllable annotations are not provided for all pupils and tutors used to validate the similarity scoring, as annotations are not necessary for similarity comparisons. We have expanded our description of our annotation guidelines in the methods section of the revised manuscript. All the annotations were generated by one of two annotators. The second annotator always consulted with the first annotator in cases of ambiguous syllable segmentation or labeling, to ensure that they had consistent annotation styles. Unfortunately, we haven’t retained records about which birds were annotated by which of the two annotators, so we cannot share this information along with the dataset. The data is currently available in a format that should allow other research groups to use our annotations either to train their own annotation systems or check the performance of their existing systems on our annotations.  

      (9) The failure modes are not described. What segmentation errors did they encounter, and what syllable classification errors? It is important to describe the errors to be expected when using the method. 

      As we discussed in our response to this reviewer’s point (3), WhisperSeg has a tendency to merge syllables when the gap between them is very short, which explains its lower recall score compared to its precision on our dataset (Figure 2–figure supplement 1). In rare cases, WhisperSeg also fails to recognize syllables entirely, again impacting its precision score. TweetyNet hardly ever completely ignores syllables, but it does tend to occasionally merge syllables together or over-segment them. Whereas WhisperSeg does this very consistently for the same syllable types within the same bird, TweetyNet merges or splits syllables more inconsistently. This inconsistent merging and splitting has a larger effect on syllable labeling, as manifested in the lower clustering v-measure scores we obtain with TweetyNet compared to WhisperSeg segmentations. TweetyNet also has much lower precision than WhisperSeg, largely because TweetyNet often recognizes background noises (like wing flaps or hopping) as syllables whereas WhisperSeg hardly ever segments non-vocal sounds. 

      Many errors in syllable labeling stem from differences in syllable segmentation. For example, if two syllables with labels ‘a’ and ‘b’ in the manual annotation are sometimes segmented as two syllables, but sometimes merged into a single syllable, the clustering is likely to find 3 different syllable types; one corresponding to ‘a’, one corresponding to ‘b’ and one corresponding to ‘ab’ merged. Because of how we align syllables across segmentation schemes for the v-measure calculation, this will look like syllable ‘b’ always has a consistent cluster label (or is missing a label entirely), but syllable ‘a’ can carry two different cluster labels, depending on the segmentation. In certain cases, even in the absence of segmentation errors, a group of syllables bearing the same manual annotation label may be split into 2 or 3 clusters (it is extremely rare for a single manual annotation group to be split into more than 3 clusters). In these cases, it is difficult to conclusively say whether the clustering represents an error, or if it actually captured some meaningful systematic difference between syllables that was missed by the annotator. Finally, sometimes rare syllable types with their own distinct labels in the manual annotation are merged into a single cluster. Most labeling errors can be explained by this kind of merging or splitting of groups relative to the manual annotation, not to occasional mis-classifications of one manual label type as another.

      For examples of these types of errors, we encourage this reviewer and readers to refer to the example confusion matrices in figure 2f and Figure 2–figure supplement 3b&e. We also added two paragraphs to the end of the ‘Accurate, fully unsupervised syllable labeling’ section of the Results in the revised manuscript. 

      (10) Usage of Different Dimensionality Reduction Methods: The pipeline uses two different dimensionality reduction techniques for labeling and similarity comparison - both based on the understanding of the distribution of data in lower-dimensional spaces. However, the reasons for choosing different methods for different tasks are not articulated, nor is there a comparison of their efficacy.  

      We apologize for not making this distinction sufficiently clear in the manuscript and have added a paragraph to the ‘Measuring Song Imitation’ section of the Results explaining the rational for using an embedding model for similarity scoring. 

      We chose to use UMAP for syllable labeling because it is a common embedding methodology to precede hierarchical clustering and has been shown to result in reliable syllable labels for birdsong in the past (Sainburg et al. 2020). However, it is not appropriate for similarity scoring, because comparing EMD or MMD scores between birds requires that all the birds’ syllable distributions exist within the same shared embedding space. This can be achieved by using the same triplet loss-trained neural network model to embed syllables from all birds. This cannot be achieved with UMAP because all birds whose scores are being compared would need to be embedded in the same UMAP space, as distances between points cannot be compared across UMAPs. In practice, this would mean that every time a new tutor-pupil pair needs to be scored, their syllables would need to be added to a matrix with all previously compared birds’ syllables, a new UMAP would need to be computed, and new EMD or MMD scores between all bird pairs would need to be calculated using their new UMAP embeddings. This is very computationally expensive and quickly becomes unfeasible without dedicated high power computing infrastructure. It also means that similarity scores couldn’t be compared across papers without recomputing everything each time, whereas EMD and MMD scores obtained with triplet loss embeddings can be compared, provided they use the same trained model (which we provide as part of AVN) to embed their syllables in a common latent space. 

      (11) Reproducibility: are the measurements reproducible? Systems like UMAP always find a new embedding given some fixed input, so the output tends to fluctuate.

      There is indeed a stochastic element to UMAP embeddings which will result in different embeddings and therefore different syllable labels across repeated runs with the same input. We observed that v-measures scores were quite consistent within birds across repeated runs of the UMAP, and have added an additional supplementary figure to the revised manuscript showing this (Figure 2–figure supplement 4).

      Reviewer #1 (Recommendations For The Authors):

      (1) Benchmark their similarity score to the method used by Goffinet et al, 2021 from the Pearson group. Such a comparison would be really interesting and useful.  

      This has been added to the paper. 

      (2) Please clarify exactly what is new and what is applied from existing methods to help the reader see the novelty of the paper.  

      We have added more emphasis on the novel aspects of our pipeline to the paper’s introduction. 

      Minor:

      It's unclear if AVN is appropriate as the paper deals only with zebra finch song - the scope is more limited than advertised.

      We assume this is in reference to ‘Birdsong’ in the paper’s title and ‘Avian’ in Avian Vocalization Network. There is a brief discussion of how these methods are likely to perform on other commonly studied songbird species at the end of the discussion section.

      Reviewer #2 (Recommendations For The Authors):

      A few points for the authors to consider that might strengthen or inform the paper:

      (1) In the public review, I detailed some ways in which the SSL+EMD approach is unlikely to be appreciably distinct from the VAE+MMD approach -- in fact, one could mix and match here. It would strengthen the authors' claim if they showed via experiments that their method outperforms VAE+MMD, but in the absence of that, a discussion of the relation between the two is probably warranted.  

      This comparison has been added to the paper.

      (2) ll. 305-310: This loss of accuracy near the edge is expected on general Bayesian grounds. Any regression approach should learn to estimate the conditional mean of the age distribution given the data, so ages estimated from data will be pulled inward toward the location of most training data. This bias is somewhat mitigated in the Brudner paper by a more flexible model, but it's a general (and expected) feature of the approach.

      (3) While the online AVA documentation looks good, it might benefit from a page on design philosophy that lays out how the various modules fit together - something between the tutorials and the nitty-gritty API. That way, users would be able to get a sense of where they should look if they want to harness pieces of functionality beyond the tutorials.

      Thank you for this suggestion. We will add a page on AVN’s design philosophy to the online documentation. 

      (4) While the manuscript does compare AVN to packages like TweetyNet and AVA that share some functionality, it doesn't really mention what's been going on with the vocalpy ecosystem, where the maintainers have been doing a lot to standardize data processing, integrate tools, etc. I would suggest a few words about how AVN might integrate with these efforts.

      We thank the reviewer for this suggestion.

      (5) ll. 333-336: It would be helpful to provide a citation to some of the self-supervised learning literature this procedure is based on. Some citations are provided in methods, but the general approach is worth citing, in my opinion. 

      We have added a paragraph to the results section with more background on self-supervised learning for dimensionality reduction, particularly in the context of similarity scoring.

      (6) One software concern for medium-term maintenance: AVN docs say to use Python 3.8, and GitHub says the package is 3.9 compatible. I also saw in the toml file that 3.10 and above are not supported. It's worth noting that Python 3.9 reaches its end of life in October 2025, so some dependencies may have to be altered or changed for the package to be viable going forward.  

      Thank you for this comment. We will continue to maintain AVN and update its dependencies as needed.

      Minor points:

      (1) It might be good to note that WhisperSeg is a different install from AVN. May be hard for novice users, though there's a web interface that's available. 

      We’ve added a line to the methods section making this clear. 

      (2) Figure 6b: Some text in the y-axis labels is overlapping here. 

      This has been fixed. Thank you for bringing it to our attention. 

      (3) The name of the Python language is always capitalized.  

      We’ve fixed this capitalization error throughout the manuscript. Thank you.

      Reviewer #3 (Recommendations For The Authors):

      (1) I recommend that the authors improve the motivation of the chosen tasks and data or choose new tasks that more clearly speak to the optimizations they want to perform. 

      We have included more details about the motivation for our LDA classification analysis, age prediction model and embedding model for similarity scoring in the results of the revised manuscript, as discussed in more detail in the above responses to this reviewer. Thank you for these suggestions. 

      (2) They need to rigorously report the (classification) scores on the test datasets: these are the scores associated with the cost function used during training.  

      Based on this reviewer’s ‘Weaknesses: 3’ comment in the public reviews, we believe that they are referring to a classification score for the triplet loss model. As we explained in response to that comment, this is not a classification task, therefor there is no classification score to report. The loss function used to train the model was a triplet loss function. While we could report these values, they are not informative for how well this approach would perform in a similarity scoring context, as explained above. As such, we prefer to include contrast index and tutor contrast index scores to compare the models’ performance for similarity score, as these are directly relevant to the task and are established in the field for said task.

      (3) They need to explain the reasons for the poor performance (or report on the inconsistencies with previous work) and why they prefer a fully automated system rather than one that needs some fine-tuning on bird-specific data.

      We’ve addressed this comment in the public response to this reviewer’s weakness points 3, 5, and 6. 

      (4) They should consider applying their method to data from Japanese and European labs.  

      We’ve addressed this comment in the public response to this reviewer’s weakness point 4.

      (5) The need to document the failure modes and report all details about the human annotations.  

      We’ve added additional description of the failure modes for our segmentation and labeling approaches in the results section of the revised manuscript.

      Details: 

      The introduction is very vague, it fails to make a clear case of what the problem is and what the approach is. It reads a bit like an advertisement for machine learning: we are given a hammer and are looking for a nail.  

      We thank the reviewer for this viewpoint; however, we disagree and have decided to keep our Introduction largely unchanged. 

      L46 That interpretability is needed to maximize the benefits of machine learning is wrong, see self-driving cars and chat GPT.  

      This line states that ‘To truly maximize the benefits of machine learning and deep learning methods for behavior analysis, their power must be balanced with interpretability and generalizability’. We firmly believe that interpretability is critically important when using machine learning tools to gain a deeper scientific understanding of data, including animal behavior data in a neuroscience context. We believe that the introduction and discussion of this paper already provide strong evidence for this claim. 

      L64 What about zebra finches that repeat a syllable in the motif, how are repetitions dealt with by AVN?  

      This is already described in the results section in lines 222-226, and in the methods in the ‘Syntax Features: Repetition Bouts’ section.

      L107 Say a bit more here, what exactly has been annotated?  

      We’ve added a sentence in the introduction to clarify this. Line 113-115. 

      L112 Define spectrogram frames. Do these always fully or sometimes partially contain a vocalization? 

      Spectrogram frames are individual time bins used to compute the spectrogram using a short-term Fourier transform. As described in the ‘Methods; Labeling : UMAP Dimensionality Reduction” section, our spectrograms are computed using ‘The short term Fourier transform of the normalized audio for each syllable […] with a window length of 512 samples and a hop length of 128 samples’. Given that the song files have a standard sampling rate of 44.1kHz, this means each time bin represents 11.6ms of song data, with successive frames advancing in time by 2.9ms. These contain only a small fraction of a vocalization. 

      L122 The reported TweetyNet score of 0.824 is lower than the one reported in Figure 2a.  

      The center line in the box plot in Figure 2a represents the median of the distribution of TweetyNet vmeasure scores. Given that there are a couple outlying birds with very low scores, the mean (0.824 as reported in the text of the results section) is lower than the median. This is not an error.

      L155 Some of the differences in performance are very small, reporting of the P value might be necessary. 

      These methods are unlikely to statistically significantly differ in their validation scores. This doesn’t mean that we cannot use the mean/median values reported to justify favoring one method over another. This is why we’ve chosen not to report p-values here.

      L161 The authors have not really tested more than a single clustering method, failing to show a serious attempt to achieve good performance.  

      We’ve addressed this comment in the public response to this reviewer’s weakness point 2.

      L186 Did isolate birds produce stereotyped syllables that can be clustered? 

      Yes, they did. The validation for clustering of isolate bird songs can be found in Figure 2–figure supplement 4. 

      Fig. 3e: How were the multiple bouts aligned?

      This is described in lines 857-876 in the ‘Methods: Song Timing Features: Rhythm Spectrograms” section of the paper.

      L199 There is a space missing in front of (n=8).  

      Thank you for bringing this to our attention. It’s been corrected in the updated manuscript. 

      L268 Define classification accuracy.  

      We’ve added a sentence in lines 953-954 of the methods section defining classification accuracy. 

      L325 How many motifs need to be identified, why does this need to be done manually? There are semiautomated methods that can allow scaling, these should be  cited here. Also, the mention of bias here should be removed in favor of a more extensive discussion on the experimenter bias (traditionally vs Texas bias (in this paper).  

      All of the methods cited in this line have graphical user interfaces that require users to select a file containing song and manually highlight the start and end each motif to be compared. The exact number of motifs required varies depending on the specific context (e.g. more examples are needed to detect more subtle differences or changes in song similarity) but it is fairly standard for reviewers to score 30 – 100 pairs of motifs. 

      We’ve discussed the tradeoffs between full automation and supervised or human-in-the loop methods in response to this reviewer’s public comment ‘weakness #5 and 6’. Briefly, AVN’s aim is to standardize song analysis, to allow direct comparisons between song features and similarity scores across research groups. We believe, as explained in the paper, that this can be best achieve by having different research groups use the same deep learning models, which perform consistently well across those groups. Introducing semi-automated methods would defeat this benefit of AVN. 

      We’ve also addressed the question of ‘Texas bias’ in response to their reviewer’s public comment ‘Weakness #4’. 

      L340 How is EMD applied? Syllables are points in 8-dim space, but now suddenly authors talk about distributions without explaining how they got from points to distributions. Same in L925.  

      We apologize for the confusion here. The syllable points in the 8-d space are collectively an empirical distribution, not a probability distribution. We referred to them simply as ‘distributions’ to limit technical jargon in the results of the paper, but have changed this to more precise language in the revised manuscript.

      L351 Why do authors now use 'contrast index' to measure performance and no longer 'classification accuracy'?  

      We’ve addressed this comment in the public response to this reviewer’s weakness points 1 and 2.

      Figure 6 What is the confusion matrix, i.e. how well can the model identify pupil-pupil pairings from pupiltutor and from pupil-unrelated pairings? I guess that would amount to something like classification accuracy.  

      There is no model classifying comparisons as pupil-pupil vs. pupil-tutor etc. These comparisons exist only to show the behavior of the similarity scoring approach, which consists of a dissimilarity measure (MMD or EMD) applied to low dimensional representations of syllable generated by the triplet loss model or VAE. This was clarified further in our public response to this reviewer’s weakness points 1 and 2. 

      L487 What are 'song files', and what do they contain?   

      ‘Song files’ are .wav files containing recordings of zebra finch song. They typically contain a single song bout, but they can include multiple song bouts if they are produced close together, or incomplete song bouts if the introductory notes were very soft or the bouts were very long (>30s from the start of the file). Details of these recordings are provided in the ‘Methods: Data Acquisition: UTSW Dataset’ section of the manuscript.

      L497 Calls were only labelled for tweetynet but not for other tasks.  

      That is correct. The rationale for this is provided in the ‘Methods: Manual Song Annotation’ section of the manuscript. 

      L637 There is a contradiction (can something be assigned to the 'own manual annotation category' when the same sentence states that this is done 'without manual annotation'?) 

      We believe there is confusion here between automated annotation and validation. Any bird can be automatically annotated without the need for any existing manual annotations for that individual bird. However, manual labels are required to compare automatically generated annotations against for validation of the method.

      L970 Spectograms of what? (what is the beginning of a song bout, L972). 

      The beginning of a song bout is the first introductory note produced by a bird after a period without vocalizations. This is standard.

    1. eLife Assessment

      This valuable study tests whether prediction error or prediction uncertainty controls how the brain segments continuous experience into events. The paper uses validated models that predict human behavior to analyze multivariate neural pattern changes during naturalistic movie watching. The authors provide solid evidence that there are overlapping but partially distinct brain dynamics for each signal.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates the control signals that drive event model updating during continuous experience. The authors apply predictions from previously published computational models to fMRI data acquired while participants watched naturalistic video stimuli. They first examine the time course of BOLD pattern changes around human-annotated event boundaries, revealing pattern changes preceding the boundary in anterior temporal and then parietal regions, followed by pattern stabilization across many regions. The authors then analyze time courses around boundaries generated by a model that updates event models based on prediction error and another that uses prediction uncertainty. These analyses reveal overlapping but partially distinct dynamics for each boundary type, suggesting that both signals may contribute to event segmentation processes in the brain.

      Strengths:

      (1) The question addressed by this paper is of high interest to researchers working on event cognition, perception, and memory. There has been considerable debate about what kinds of signals drive event boundaries, and this paper directly engages with that debate by comparing prediction error and prediction uncertainty as candidate control signals.

      (2) The authors use computational models that explain significant variance in human boundary judgments, and they report the variance explained clearly in the paper.

      (3) The authors' method of using computational models to generate predictions about when event model updating should occur is a valuable mechanistic alternative to methods like HMM or GSBS, which are data-driven.

      (4) The paper utilizes an analysis framework that characterizes how multivariate BOLD pattern dissimilarity evolves before and after boundaries. This approach offers an advance over previous work focused on just the boundary or post-boundary points.

      Weaknesses:

      (1) While the paper raises the possibility that both prediction error and uncertainty could serve as control signals, it does not offer a strong theoretical rationale for why the brain would benefit from multiple (empirically correlated) signals. What distinct advantages do these signals provide? This may be discussed in the authors' prior modeling work, but is left too implicit in this paper.

      (2) Boundaries derived from prediction error and uncertainty are correlated for the naturalistic stimuli. This raises some concerns about how well their distinct contributions to brain activity can be separated. The authors should consider whether they can leverage timepoints where the models make different predictions to make a stronger case for brain regions that are responsive to one vs the other.

      (3) The authors refer to a baseline measure of pattern dissimilarity, which their dissimilarity measure of interest is relative to, but it's not clear how this baseline is computed. Since the interpretation of increases or decreases in dissimilarity depends on this reference point, more clarity is needed.

      (4) The authors report an average event length of ~20 seconds, and they also look at +20 and -20 seconds around each event boundary. Thus, it's unclear how often pre- and post-boundary timepoints are part of adjacent events. This complicates the interpretations of the reported time courses.

      (5) The authors describe a sequence of neural pattern shifts during each type of boundary, but offer little setup of what pattern shifts we might expect or why. They also offer little discussion of what cognitive processes these shifts might reflect. The paper would benefit from a more thorough setup for the neural results and a discussion that comments on how the results inform our understanding of what these brain regions contribute to event models.

    3. Reviewer #2 (Public review):

      Summary:

      Tan et al. examined how multivoxel patterns shift in time windows surrounding event boundaries caused by both prediction errors and prediction uncertainty. They observed that some regions of the brain show earlier pattern shifts than others, followed by periods of increased stability. The authors combine their recent computational model to estimate event boundaries that are based on prediction error vs. uncertainty and use this to examine the moment-to-moment dynamics of pattern changes. I believe this is a meaningful contribution that will be of interest to memory, attention, and complex cognition research.

      Strengths:

      The authors have shown exceptional transparency in terms of sharing their data, code, and stimuli, which is beneficial to the field for future examinations and to the reproduction of findings. The manuscript is well written with clear figures. The study starts from a strong theoretical background to understand how the brain represents events and has used a well-curated set of stimuli. Overall, the authors extend the event segmentation theory beyond prediction error to include prediction uncertainty, which is an important theoretical shift that has implications in episodic memory encoding, the use of semantic and schematic knowledge, and attentional processing.

      Weaknesses:

      The data presented is limited to the cortex, and subcortical contributions would be interesting to explore. Further, the temporal window around event boundaries of 20 seconds is approximately the length of the average event (21.4 seconds), and many of the observed pattern effects occur relatively distal from event boundaries themselves, which makes the link to the theoretical background challenging. Finally, while multivariate pattern shifts were examined at event boundaries related to either prediction error or prediction uncertainty, there was no exploration of univariate activity differences between these two different types of boundaries, which would be valuable.

    4. Reviewer #3 (Public review):

      Summary:

      The aim of this study was to investigate the temporal progression of the neural response to event boundaries in relation to uncertainty and error. Specifically, the authors asked (1) how neural activity changes before and after event boundaries, (2) if uncertainty and error both contribute to explaining the occurrence of event boundaries, and (3) if uncertainty and error have unique contributions to explaining the temporal progression of neural activity.

      Strengths:

      One strength of this paper is that it builds on an already validated computational model. It relies on straightforward and interpretable analysis techniques to answer the main question, with a smart combination of pattern similarity metrics and FIR. This combination of methods may also be an inspiration to other researchers in the field working on similar questions. The paper is well written and easy to follow. The paper convincingly shows that (1) there is a temporal progression of neural activity change before and after an event boundary, and (2) event boundaries are predicted best by the combination of uncertainty and error signals.

      Weaknesses:

      Regarding question 3, I am less convinced by the results. They show that overlapping but somewhat distinct sets of brain regions relate to uncertainty and error boundaries over time. And that some regions show distinct patterns of temporal progressions in pattern change with both types of boundaries. However, most of the effects they observe in this analysis may still be driven by shared variance, as suggested by the results in Figure 6 and the high correlation between the two boundary time series. More specific comments are provided below.

      Impact:

      If these comments can be addressed sufficiently, I expect that this work will impact the field in its thinking on what drives event boundaries and spur interest in understanding the mechanisms behind the temporal progression of neural activity around these boundaries.

      Comments

      (1) The current analysis of the neural data does not convincingly show that uncertainty and prediction error both contribute to the neural responses. As both terms are modelled in separate FIR models, it may be that the responses we see for both are mostly driven by shared variance. Given that the correlation between the two is very high (r=0.49), this seems likely. The strong overlap in the neural responses elicited by both, as shown in Figure 6, also suggests that what we see may mainly be shared variance. To improve the interpretability of these effects, I think it is essential to know whether uncertainty and error explain similar or unique parts of the variance. The observation that they have distinct temporal profiles is suggestive of some dissociation, but not as convincing as adding them both to a single model.

      (2) The results for uncertainty and error show that uncertainty has strong effects before or at boundary onset, while error is related to more stabilization after boundary onset. This makes me wonder about the temporal contribution of each of these. Could it be the case that increases in uncertainty are early indicators of a boundary, and errors tend to occur later?

      (3) Given that there is a 24-second period during which the neural responses are shaped by event boundaries, it would be important to know more about the average distance between boundaries and the variability of this distance. This will help establish whether the FIR model can properly capture a return to baseline.

      (4) Given that there is an early onset and long-lasting response of the brain to these event boundaries, I wonder what causes this. Is it the case that uncertainty or errors already increase at 12 seconds before the boundaries occur? Or if there are other makers in the movie that the brain can use to foreshadow an event boundary? And if uncertainty or errors do increase already 12 seconds before an event boundary, do you see a similar neural response at moments with similar levels of error or uncertainty, which are not followed by a boundary? This would reveal whether the neural activity patterns are specific to event boundaries or whether these are general markers of error and uncertainty.

      (5) It is known that different brain regions have different delays of their BOLD response. Could these delays contribute to the propagation of the neural activity across different brain areas in this study?

      (6) In the FIR plots, timepoints -12, 0, and 12 are shown. These long intervals preclude an understanding of the full temporal progression of these effects.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      This paper investigates the control signals that drive event model updating during continuous experience. The authors apply predictions from previously published computational models to fMRI data acquired while participants watched naturalistic video stimuli. They first examine the time course of BOLD pattern changes around human-annotated event boundaries, revealing pattern changes preceding the boundary in anterior temporal and then parietal regions, followed by pattern stabilization across many regions. The authors then analyze time courses around boundaries generated by a model that updates event models based on prediction error and another that uses prediction uncertainty. These analyses reveal overlapping but partially distinct dynamics for each boundary type, suggesting that both signals may contribute to event segmentation processes in the brain.

      Strengths:

      (1) The question addressed by this paper is of high interest to researchers working on event cognition, perception, and memory. There has been considerable debate about what kinds of signals drive event boundaries, and this paper directly engages with that debate by comparing prediction error and prediction uncertainty as candidate control signals.

      (2) The authors use computational models that explain significant variance in human boundary judgments, and they report the variance explained clearly in the paper.

      (3) The authors' method of using computational models to generate predictions about when event model updating should occur is a valuable mechanistic alternative to methods like HMM or GSBS, which are data-driven.

      (4) The paper utilizes an analysis framework that characterizes how multivariate BOLD pattern dissimilarity evolves before and after boundaries. This approach offers an advance over previous work focused on just the boundary or post-boundary points.

      We appreciate this reviewer’s recognition of the significance of this research problem, and of the value of the approach taken by this paper.

      Weaknesses:

      (1) While the paper raises the possibility that both prediction error and uncertainty could serve as control signals, it does not offer a strong theoretical rationale for why the brain would benefit from multiple (empirically correlated) signals. What distinct advantages do these signals provide? This may be discussed in the authors' prior modeling work, but is left too implicit in this paper.

      We added a brief discussion in the introduction highlighting the complementary advantages of prediction error and prediction uncertainty, and cited prior theoretical work that elaborates on this point. Specifically, we now note that prediction error can act as a reactive trigger, signaling when the current event model is no longer sufficient (Zacks et al., 2007). In contrast, prediction uncertainty is framed as proactive, allowing the system to prepare for upcoming changes even before they occur (Baldwin & Kosie, 2021; Kuperberg, 2021). Together, this makes clearer why these two signals could each provide complementary benefits for effective event model updating.

      "One potential signal to control event model updating is prediction error—the difference between the system’s prediction and what actually occurs. A transient increase in prediction error is a valid indicator that the current model no longer adequately captures the current activity. Event Segmentation Theory (EST; Zacks et al., 2007) proposes that event models are updated when prediction error increases beyond a threshold, indicating that the current model no longer adequately captures ongoing activity. A related but computationally distinct proposal is that prediction uncertainty (also termed "unpredictability"), in addition to error, serves as the control signal (Baldwin & Kosie, 2021). The advantage of relying on prediction uncertainty to detect event boundaries is that it is inherently proactive: the cognitive system can start looking for cues about what might come next before the next event starts (Baldwin & Kosie, 2021; Kuperberg, 2021)."

      (2) Boundaries derived from prediction error and uncertainty are correlated for the naturalistic stimuli. This raises some concerns about how well their distinct contributions to brain activity can be separated. The authors should consider whether they can leverage timepoints where the models make different predictions to make a stronger case for brain regions that are responsive to one vs the other.

      We addressed this concern by adding an analysis that explicitly tests the unique contributions of prediction error– and prediction uncertainty–driven boundaries to neural pattern shifts. In the revised manuscript, we describe how we fit a combined FIR model that included both boundary types as predictors and then compared this model against versions with only one predictor. This allowed us to identify the variance explained by each boundary type over and above the other. The results revealed two partially dissociable sets of brain regions sensitive to error- versus uncertainty-driven boundaries (see Figure S1), strengthening our argument that these signals make distinct contributions.

      "To account for the correlation between uncertainty-driven boundaries and error-driven boundaries, we also fitted a FIR model that predicts pattern dissimilarity from both types of boundaries (combined FIR) for each parcel. Then, we performed two likelihood ratio tests: combined FIR to error FIR, which measures the unique contribution of uncertainty boundaries to pattern dissimilarity, and combined FIR to uncertainty FIR, which measures the unique contribution of error boundaries to pattern dissimilarity. The analysis also revealed two dissociable sets of brain regions associated with each boundary type (see Figure S1)."

      (3) The authors refer to a baseline measure of pattern dissimilarity, which their dissimilarity measure of interest is relative to, but it's not clear how this baseline is computed. Since the interpretation of increases or decreases in dissimilarity depends on this reference point, more clarity is needed.

      We clarified how the FIR baseline is estimated in the methods section. Specifically, we now explain that the FIR coefficients should be interpreted relative to a reference level, which reflects the expected dissimilarity when timepoints are far from an event boundary. This makes it clear what serves as the comparison point for observed increases or decreases in dissimilarity.

      "The coefficients from the FIR model indicates changes relative to baseline, which can be conceptualized as the expected value when far from the boundary."

      (4) The authors report an average event length of ~20 seconds, and they also look at +20 and -20 seconds around each event boundary. Thus, it's unclear how often pre- and post-boundary timepoints are part of adjacent events. This complicates the interpretations of the reported time courses.

      This is related to reviewer's 2 comment, and it will be addressed below.

      (5) The authors describe a sequence of neural pattern shifts during each type of boundary, but offer little setup of what pattern shifts we might expect or why. They also offer little discussion of what cognitive processes these shifts might reflect. The paper would benefit from a more thorough setup for the neural results and a discussion that comments on how the results inform our understanding of what these brain regions contribute to event models.

      We thank the reviewer for this advice on how better to set the context for the different potential outcomes of the study. We expanded both the introduction and discussion to better set up expectations for neural pattern shifts and to interpret what these shifts may reflect. In the introduction, we now describe prior findings showing that sensory regions tend to update more quickly than higher-order multimodal regions (Baldassano et al., 2017; Geerligs et al., 2021, 2022), and we highlight that it remains unclear whether higher-order updates precede or follow those in lower-order regions. We also note that our analytic approach is well-suited to address this open question. In the discussion, we then interpret our results in light of this framework. Specifically, we describe how we observed early shifts in higher-order areas such as anterior temporal and prefrontal cortex, followed by shifts in parietal and dorsal attention regions closer to event boundaries. This pattern runs counter to the traditional bottom-up temporal hierarchy view and instead supports a model of top-down updating, where high-level representations are updated first and subsequently influence lower-level processing (Friston, 2005; Kuperberg, 2021). To make this interpretation concrete, we added an example: in a narrative where a goal is reached midway—for instance, a mystery solved before the story formally ends—higher-order regions may update the event representation at that point, and this updated model then cascades down to shape processing in lower-level regions. Finally, we note that the widespread stabilization of neural patterns after boundaries may signal the establishment of a new event model.

      Excerpt from Introduction:

      “More recently, multivariate approaches have provided insights into neural representations during event segmentation. One prominent approach uses hidden Markov models (HMMs) to detect moments when the brain switches from one stable activity pattern to another (Baldassano et al., 2017) during movie viewing; these periods of relative stability were referred to as "neural states" to distinguish them from subjectively perceived events. Sensory regions like visual and auditory cortex showed faster transitions between neural states. Multi-modal regions like the posterior medial cortex, angular gyrus, and intraparietal sulcus showed slower neural state shifts, and these shifts aligned with subjectively reported event boundaries. Geerligs et al. (2021, 2022) employed a different analytical approach called Greedy State Boundary Search (GSBS) to identify neural state boundaries. Their findings echoed the HMM results: short-lived neural states were observed in early sensory areas (visual, auditory, and somatosensory cortex), while longer-lasting states appeared in multi-modal regions, including the angular gyrus, posterior middle/inferior temporal cortex, precuneus, anterior temporal pole, and anterior insula. Particularly prolonged states were found in higher-order regions such as lateral and medial prefrontal cortex...

      The previous evidence about evoked responses at event boundaries indicates that these are dynamic phenomena evolving over many seconds, with different brain areas showing different dynamics (Ben-Yakov & Henson, 2018; Burunat et al., 2024; Kurby & Zacks, 2018; Speer et al., 2007; Zacks, 2010). Less is known about the dynamics of pattern shifts at event boundaries, because the HMM and GSBS analysis methods do not directly provide moment-by-moment measures of pattern shifts. For example, one question is whether shifts in higher-order regions precedes or follow shifts in lower-level regions. Both the spatial and temporal aspects of evoked responses and pattern shifts at event boundaries have the potential to provide evidence about potential control processes for event model updating.”

      Excerpt from Discussion:

      “We first characterized the neural signatures of human event segmentation by examining both univariate activity changes and multivariate pattern changes around subjectively identified event boundaries. Using multivariate pattern dissimilarity, we observed a structured progression of neural reconfiguration surrounding human-identified event boundaries. The largest pattern shifts were observed near event boundaries (~4.5s before) in dorsal attention and parietal regions; these correspond with regions identified by Geerligs et al. as shifting their patterns on an intermediate timescale (2022). We also observed smaller pattern shifts roughly 12 seconds prior to event boundaries in higher-order regions within anterior temporal cortex and prefrontal cortex, and these are slow-changing regions identified by Geerligs et al. (2022). This is puzzling. One prevalent proposal, based on the idea of a cortical hierarchy of increasing temporal receptive windows (TRWs), suggests that higher-order regions should update representations after lower-order regions do (Chang et al., 2021). In this view, areas with shorter TRWs (e.g., word-level processors) pass information upward, where it is integrated into progressively larger narrative units (phrases, sentences, events). This proposal predicts neural shifts in higher-order regions to follow those in lower-order regions. By contrast, our findings indicate the opposite sequence. Our findings suggest that the brain might engage in top-down event representation updating, with changes in coarser-grain representations propagating downward to influence finer-grain representations. (Friston, 2005; Kuperberg, 2021). For example, in a narrative where the main goal is achieved midway—such as a detective solving a mystery before the story formally ends—higher-order regions might update the overarching event representation at that point, and this updated model could then cascade down to reconfigure how lower-level regions process the remaining sensory and contextual details. In the period after a boundary (around +12 seconds), we found widespread stabilization of neural patterns across the brain, suggesting the establishment of a new event model. Future work could focus on understanding the mechanisms behind the temporal progression of neural pattern changes around event boundaries.”

      Reviewer #2 (Public review):

      Summary:

      Tan et al. examined how multivoxel patterns shift in time windows surrounding event boundaries caused by both prediction errors and prediction uncertainty. They observed that some regions of the brain show earlier pattern shifts than others, followed by periods of increased stability. The authors combine their recent computational model to estimate event boundaries that are based on prediction error vs. uncertainty and use this to examine the moment-to-moment dynamics of pattern changes. I believe this is a meaningful contribution that will be of interest to memory, attention, and complex cognition research.

      Strengths:

      The authors have shown exceptional transparency in terms of sharing their data, code, and stimuli, which is beneficial to the field for future examinations and to the reproduction of findings. The manuscript is well written with clear figures. The study starts from a strong theoretical background to understand how the brain represents events and has used a well-curated set of stimuli. Overall, the authors extend the event segmentation theory beyond prediction error to include prediction uncertainty, which is an important theoretical shift that has implications in episodic memory encoding, the use of semantic and schematic knowledge, and attentional processing.

      We thank the reader for their support for our use of open science practices, and for their appreciation of the importance of incorporating prediction uncertainty into models of event comprehension.

      Weaknesses:

      The data presented is limited to the cortex, and subcortical contributions would be interesting to explore. Further, the temporal window around event boundaries of 20 seconds is approximately the length of the average event (21.4 seconds), and many of the observed pattern effects occur relatively distal from event boundaries themselves, which makes the link to the theoretical background challenging. Finally, while multivariate pattern shifts were examined at event boundaries related to either prediction error or prediction uncertainty, there was no exploration of univariate activity differences between these two different types of boundaries, which would be valuable.

      The fact that we observed neural pattern shifts well before boundaries was indeed unexpected, and we now offer a more extensive interpretation in the discussion section. Specifically, we added text noting that shifts emerged in higher-order anterior temporal and prefrontal regions roughly 12 seconds before boundaries, whereas shifts occurred in lower-level dorsal attention and parietal regions closer to boundaries. This sequence contrasts with the traditional bottom-up temporal hierarchy view and instead suggests a possible top-down updating mechanism, in which higher-order representations reorganize first and propagate changes to lower-level areas (Friston, 2005; Kuperberg, 2021). (See excerpt for Reviewer 1’s comment #5.)

      With respect to univariate activity, we did not find strong differences between error-driven and uncertainty-driven boundaries. This makes the multivariate analyses particularly informative for detecting differences in neural pattern dynamics. To support further exploration, we have also shared the temporal progression of univariate BOLD responses on OpenNeuro for interested researchers.

      Reviewer #3 (Public review):

      Summary:

      The aim of this study was to investigate the temporal progression of the neural response to event boundaries in relation to uncertainty and error. Specifically, the authors asked (1) how neural activity changes before and after event boundaries, (2) if uncertainty and error both contribute to explaining the occurrence of event boundaries, and (3) if uncertainty and error have unique contributions to explaining the temporal progression of neural activity.

      Strengths:

      One strength of this paper is that it builds on an already validated computational model. It relies on straightforward and interpretable analysis techniques to answer the main question, with a smart combination of pattern similarity metrics and FIR. This combination of methods may also be an inspiration to other researchers in the field working on similar questions. The paper is well written and easy to follow. The paper convincingly shows that (1) there is a temporal progression of neural activity change before and after an event boundary, and (2) event boundaries are predicted best by the combination of uncertainty and error signals.

      We thank the reviewer for their thoughtful and supportive comments, particularly regarding the use of the computational model and the analysis approaches.

      Weaknesses:

      (1) The current analysis of the neural data does not convincingly show that uncertainty and prediction error both contribute to the neural responses. As both terms are modelled in separate FIR models, it may be that the responses we see for both are mostly driven by shared variance. Given that the correlation between the two is very high (r=0.49), this seems likely. The strong overlap in the neural responses elicited by both, as shown in Figure 6, also suggests that what we see may mainly be shared variance. To improve the interpretability of these effects, I think it is essential to know whether uncertainty and error explain similar or unique parts of the variance. The observation that they have distinct temporal profiles is suggestive of some dissociation, but not as convincing as adding them both to a single model.

      We appreciate this point. It is closely related to Reviewer 1's comment 2; please refer to our response above.

      (2) The results for uncertainty and error show that uncertainty has strong effects before or at boundary onset, while error is related to more stabilization after boundary onset. This makes me wonder about the temporal contribution of each of these. Could it be the case that increases in uncertainty are early indicators of a boundary, and errors tend to occur later?

      We also share the intuition that increases in uncertainty are early indicators of a boundary, and errors tend to occur later. If that is the case, we would expect some lags between prediction uncertainty and prediction error. We examined lagged correlation between prediction uncertainty and prediction error, and the optimal lag is 0 for both uncertainty-driven and error-driven models. This indicates that when prediction uncertainty rises, prediction error also simultaneously rises.

      Author response image 1.

      (3) Given that there is a 24-second period during which the neural responses are shaped by event boundaries, it would be important to know more about the average distance between boundaries and the variability of this distance. This will help establish whether the FIR model can properly capture a return to baseline.

      We have added details about the distribution of event lengths. Specifically, we now report that the mean length of subjectively identified events was 21.4 seconds (median 22.2 s, SD 16.1 s). For model-derived boundaries, the average event lengths were 28.96 seconds for the uncertainty-driven model and 24.7 seconds for the error-driven model.

      "For each activity, a separate group of 30 participants had previously segmented each movie to identify fine-grained event boundaries (Bezdek et al., 2022). The mean event length was 21.4 s (median 22.2 s, SD 16.1 s). Mean event lengths for uncertainty-driven model and error-driven model were 28.96s, and 24.7s, respectively."

      (4) Given that there is an early onset and long-lasting response of the brain to these event boundaries, I wonder what causes this. Is it the case that uncertainty or errors already increase at 12 seconds before the boundaries occur? Or if there are other makers in the movie that the brain can use to foreshadow an event boundary? And if uncertainty or errors do increase already 12 seconds before an event boundary, do you see a similar neural response at moments with similar levels of error or uncertainty, which are not followed by a boundary? This would reveal whether the neural activity patterns are specific to event boundaries or whether these are general markers of error and uncertainty.

      We appreciate this point; it is similar to reviewer 2’s comment 2. Please see our response to that comment above.

      (5) It is known that different brain regions have different delays of their BOLD response. Could these delays contribute to the propagation of the neural activity across different brain areas in this study?

      Our analyses use ±20 s FIR windows, and the key effects we report include shifts ~12s before boundaries in higher-order cortex and ~4.5s pre-boundary in dorsal attention/parietal areas. Given the literature above, region-dependent BOLD delays are much smaller (~1–2s) than the temporal structure we observe (Taylor et al., 2018), making it unlikely that HRF lag alone explains our multi-second, region-specific progression.

      (6) In the FIR plots, timepoints -12, 0, and 12 are shown. These long intervals preclude an understanding of the full temporal progression of these effects.

      For page length purposes, we did not include all timepoints. We uploaded an animation of all timepoints in Openneuro for interested researchers.

      References

      Taylor, A. J., Kim, J. H., & Ress, D. (2018). Characterization of the hemodynamic response function across the majority of human cerebral cortex. NeuroImage, 173, 322–331. https://doi.org/10.1016/j.neuroimage.2018.02.061

    1. eLife Assessment

      This manuscript advances the prior finding that antigen recognition in the skill helps establish skin resident memory in CD8 T cells by elucidating the role of TGFBR3 in regulating CD8+ TRM skin persistence upon topical antigen exposure. Key novelty of the your work lies in generation and use of the CD8+ T cell-specific TGFBR3 knockout model, which allows them to demonstrate the role of TGFBR3 in fine tuning the degree of CD8+ T cell skin persistence and that TGFBR3 expression is promoted by CD8+ TRM encountering their cognate antigen upon initial skin entry. This is an important finding and is supported by convincing evidence. There are concerns about the use of FTY720 and the need to establish active TGFbeta limiting conditions to further test this working model.

    2. Reviewer #1 (Public review):

      Summary:

      Weiss et. al. seek to delineate the mechanisms by which antigen-specific CD8+ T cells outcompete bystanders in the epidermis when active TGF-b is limiting, resulting in selective retention of these cells and more complete differentiation into the TRM phenotype.

      Strengths:

      They begin by demonstrating that at tissue sites where cognate antigen was expressed, CD8+ T cells adopt a more mature TRM transcriptome than cells at tissue sites where cognate antigen was never expressed. By integrating their scRNA-Seq data on TRM with the much more comprehensive ImmGenT atlas, the authors provide a very useful resource for future studies in the field. Furthermore, they conclusively show that these "local antigen-experienced" TRM have increased proliferative capacity and that TCR avidity during TRM formation positively correlates with their future fitness. Finally, using an elegant experimental strategy, they establish that TCR signaling in CD8+ T cells in the epidermis induces TGFBRIII expression, which likely contributes to endowing them with a competitive advantage over antigen-inexperienced TRM.

      Weaknesses:

      The main weakness in this paper lies in the authors' reliance on a single experimental model to derive conclusions on the role of local-antigen during the acute phase of the response by comparing T cells in model antigen-vaccinia virus (VV-OVA) exposed skin to T cells in contralateral skin exposed to DNFB 5 days after the VV-OVA exposure. In this setting, antigen-independent factors may contribute to the difference in CD8+ T cell number and phenotype at the two sites. For example, it was recently shown that very early memory precursors (formed 2 days after exposure) are more efficient at seeding the epithelial TRM compartment than those recruited to skin at later times (Silva et al, Sci Immunol, 2023). DNFB-treated skin may therefore recruit precursors with reduced TRM potential. In addition, TRM-skewed circulating memory precursors have been identified (Kok et al, JEM, 2020), and perhaps VV-OVA exposed skin more readily recruits this subset compared to DNFB-exposed skin. Therefore, when the DNFB challenge is performed 5 days after vaccinia virus, the DNFB site may already be at a disadvantage in the recruitment of CD8+ T cells that can efficiently form TRM. In addition, CD8+ T cell-extrinsic mechanisms may be at play, such as differences in myeloid cell recruitment and differentiation or local cytokine and chemokine levels in VV-infected and DNFB-treated skin that could account for differences seen in TRM phenotype and function between these two sites. Although the authors do show that providing exogenous peptide antigen at the DNFB-site rescues their phenotype in relation to the VV-OVA site, the potential antigen-independent factors distinguishing these two sites remain unaddressed. In addition, there is a possibility that peptide treatment of DNFB-treated skin initiates a second phase of priming of new circulatory effectors in the local-draining lymph nodes that are then recruited to form TRM at the DFNB-site, and that the effect does not solely rely on TRM precursors at the DNFB-treated skin site at the time of peptide treatment. These concerns are somewhat alleviated by the fact that in a prior publication (PMID: 33212014), the group has already established a role for local antigen encounter in skin in a setting where they compared contralateral ears infected with VV-OVA and VV expressing an irrelevant antigen.

      Secondly, although the authors conclusively demonstrate that TGFBRIII is induced by TCR signals and required for conferring increased fitness to local-antigen experienced CD8+ TRM compared to local antigen-inexperienced cells, this is done in only one experiment, albeit repeated 3 times. The data suggest that antigen encounter during TRM formation induces sustained TGFBRIII expression that persists during the antigen-independent memory phase. It remains however, unclear why only antigen encounter in skin, but not already in the draining lymph nodes, induces sustained TGFBRIII expression. Further characterizing the dynamics of TGFBRIII expression on CD8+ T cells during priming in draining lymph nodes and over the course of TRM formation and persistence may shed more light on this question. Probing the role of this mechanism at other sites of TRM formation would also further strengthen their conclusions and enhance the significance of this finding.

      A minor caveat of the study pertains to the use of FTY720 to block T cell egress from lymphoid tissues and thereby prevent a contribution of circulating memory OT-I T cells to the local recall response in skin. Since the half-life of FTY720 is less than a day in mice, its effects wear off rapidly. In their experiments, the authors discontinued treatment at the time of re-challenge, which may have allowed circulating T cells to contribute to the local recall response in skin, limiting the interpretability of the results somewhat. This concern is alleviated by the use of a second method (anti-Thy1.1-depleting antibodies) to eliminate circulating memory cells. For the benefit of readers intending to use this experimental strategy, it should however, be noted that FTY720 needs to be dosed continually (e.g. 3x/week at an appropriate dose) in order to sustain its effect.

    3. Reviewer #2 (Public review):

      Summary:

      The authors set out to dissect the mechanistic basis of their previously published finding that encountering cutaneous antigen augments the persistence of CD8+ memory T cells that enter skin (TRM) (Hirai et al., 2021, Immunity). Here they use the same murine model to study the fate of CD8+ T cells after antigen-priming in the lymph nodes, (1) those that re-encounter antigen in the skin via vaccinia virus (VV) versus (2) those that do not encounter antigen in skin but rather are recruited via topical dinitrofluorobenzene (DNFB) (so-called "bystander TRM"). The authors' previous publication establishes that this first group of CD8+ TRM has a persistence advantage over bystander TRM under TGFb-limiting conditions. The current paper advances this finding by elucidating the role of TGFBR3 in regulating CD8+ TRM skin persistence upon topical antigen exposure. Key novelty of the work lies in generation and use of the CD8+ T cell-specific TGFBR3 knockout model, which allows them to demonstrate the role of TGFBR3 in fine tuning the degree of CD8+ T cell skin persistence and that TGFBR3 expression is promoted by CD8+ TRM encountering their cognate antigen upon initial skin entry. Future work directly measuring active TGFb in the skin under different conditions would help identify physiologic scenarios which yield active TGFb-limiting conditions, thus establishing physiologic relevance.

      Strengths:

      Technical strengths of the paper include (1) complementary imaging and flow cytometry analyses, (2) integration of their scRNA-seq data with the existing CD8+ TRM literature via pathway analysis, and (3) use of orthogonal models where possible. Using a vaccina virus (VV) model, with and without ovalbumin (OVA), the authors investigate how topical antigen exposure and TCR strength regulate CD8+ TRM skin recruitment and retention. The authors use both FTY720 and a Thy1.1 depleting antibody to demonstrate that skin CD8+ TRM expand locally following both a primary and secondary recall response to topical OVA application.

      A conceptual strength of the paper is the authors' observation that TCR signal strength upon initial TRM tissue entry helps regulate the extent of their local re-expansion on subsequent antigen re-exposure. They achieved this by applying peptides of varying affinity for the OT-I TCR on the DNFB-exposed flank in tandem with initial VV-OVA + DNFB treatment. They then measured TRM expansion after OVA peptide rechallenge, revealing that encountering a higher affinity peptide upon skin entry leads to greater subsequent re-expansion. Additionally, by generating an OT-I Thy1.1+ E8i-creERT2 huNGFR Tgfbr3fl/fl (Tgfbr3∆CD8) mouse, the authors were able to elucidate a unique role for TGFBR3 in CD8+TRM persistence when active TGFb in skin is limited.

      Weaknesses:

      Overall, the authors' conclusions are well supported although there are some instances where additional controls, experiments, or clarifications would add rigor. The conclusions regarding skin localized TCR signaling leading to increased skin CD8+ TRM proliferation in-situ and increased TGFBR3 expression would be strengthened by assessing skin CD8+ TRM proliferation and TGFBR3 expression in models of high versus low avidity topical OVA-peptide exposure. The authors could further increase the impact of the paper by fully exploring whether TGFBR3 is regulated at the RNA or protein level; analysis of scRNAseq data included in the rebuttal did show an increase in Tgfbr3 RNA transcript levels in VV-treated compared to DNFB-treated back skin.

      Quantification of the skin TRM population relies primarily on imaging analysis, which the authors indicate is more sensitive and consistent for quantifying this population. While flow cytometry is used to perform some phenotyping of TRMs, there remain some missed opportunities for more extensive analysis of markers expressed by this population. Finally, quantifying right and left skin draining lymph node CD8+ T cell numbers would clarify the skin specificity and cell trafficking dynamics of the authors' model.

      This work heavily utilizes models developed and defined in previously published work (Hirai, T., et al., Competition for Active TGFβ Cytokine Allows for Selective Retention of Antigen-Specific Tissue- Resident Memory T Cells in the Epidermal Niche. Immunity, 2021. 54(1): p. 84-98.e5). Rather than repeating control experiments for this manuscript, the authors reference data included in this prior work. Thus, readers interested in a more in-depth understanding of these tools and concepts would be encouraged to read both papers.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Weaknesses: 

      The main weakness in this paper lies in the authors' reliance on a single model to derive conclusions on the role of local antigen during the acute phase of the response by comparing T cells in model antigen-vaccinia virus (VV-OVA) exposed skin to T cells in contralateral skin exposed to DNFB 5 days after the VV-OVA exposure. In this setting, antigen-independent factors may contribute to the difference in CD8+ T cell number and phenotype at the two sites. For example, it was recently shown that very early memory precursors (formed 2 days after exposure) are more efficient at seeding the epithelial TRM compartment than those recruited to skin at later times (Silva et al, Sci Immunol, 2023). DNFB-treated skin may therefore recruit precursors with reduced TRM potential. In addition, TRM-skewed circulating memory precursors have been identified (Kok et al, JEM, 2020), and perhaps VV-OVA exposed skin more readily recruits this subset compared to DNFB-exposed skin. Therefore, when the DNFB challenge is performed 5 days after vaccinia virus, the DNFB site may already be at a disadvantage in the recruitment of CD8+ T cells that can efficiently form TRM. In addition, CD8+ T cell-extrinsic mechanisms may be at play, such as differences in myeloid cell recruitment and differentiation or local cytokine and chemokine levels in VV-infected and DNFB-treated skin that could account for differences seen in TRM phenotype and function between these two sites. Although the authors do show that providing exogenous peptide antigen at the DNFB-site rescues their phenotype in relation to the VV-OVA site, the potential antigen-independent factors distinguishing these two sites remain unaddressed. In addition, there is a possibility that peptide treatment of DNFB-treated initiates a second phase of priming of new circulatory effectors in the local-draining lymph nodes that are then recruited to form TRM at the DFNB-site, and that the effect does not solely rely on TRM precursors at the DNFB-treated skin site at the time of peptide treatment. 

      Thank you for pointing out these potential caveats to our work.  We have considered the possibility that late application of peptide or cell-extrinsic difference could affect the interpretation of our results.  We would like to highlight that in our prior publication on this topic [1], we found that OT-1 responses in mice infected with VV-OVA and VV-N (irrelevant antigen) yielded the same responses as in our VV-OVA/DNFB models.  In addition, in both our prior publication and our current manuscript, application of peptide to DNFB painted sites results in T<sub>RM</sub> with a similar phenotype to those in the VV-OVA site.  Thus, we are confident that it is the presence of cognate antigen in the skin that drives the augmented T<sub>RM</sub> fitness that we observe.

      Secondly, although the authors conclusively demonstrate that TGFBRIII is induced by TCR signals and required for conferring increased fitness to local-antigen-experienced CD8+ TRM compared to local antigen-inexperienced cells, this is done in only one experiment, albeit repeated 3 times. The data suggest that antigen encounter during TRM formation induces sustained TGFBRIII expression that persists during the antigen-independent memory phase. It remains unclear why only the antigen encounter in skin, but not already in the draining lymph nodes, induces sustained TGFBRIII expression. Further characterizing the dynamics of TGFBRIII expression on CD8+ T cells during priming in draining lymph nodes and over the course of TRM formation and persistence may shed more light on this question. Probing the role of this mechanism at other sites of TRM formation would also further strengthen their conclusions and enhance the significance of this finding. 

      This is an intriguing point.  We do not understand why expression of TGFbR3 in T<sub>RM</sub> required antigen encounter in the skin if T<sub>RM</sub> at all sites clearly have encountered antigen during priming in the LN.  We speculate that durable TGFbR3 expression may require antigen encounter in the context of additional cues present in the periphery or only once cells have committed to the T<sub>RM</sub> lineage.  A more detailed characterization of the dynamics of TGFbR3 expression in multiple tissues would be informative and represents a promising future direction for this project.  We note that to robustly perform these experiments a reporter mouse would likely be a requirement.

      Reviewer #2 (Public review): 

      Weaknesses: 

      Overall, the authors' conclusions are well supported, although there are some instances where additional controls, experiments, or clarifications would add rigor. The conclusions regarding skin-localized TCR signaling leading to increased skin CD8+ TRM proliferation in-situ and increased TGFBR3 expression would be strengthened by assessing skin CD8+ TRM proliferation and TGFBR3 expression in models of high versus low avidity topical OVA-peptide exposure.

      Thank you for these helpful suggestions.  We did not attempt these experiment as we were concerned that given the relatively modest expansion differences observed with the APL that resolving differences in TGFbR3 and BrdU would prove unreliable. However, this is something that we could attempt as we continue working on this project.

      The authors could further increase the novelty of the paper by exploring whether TGFBR3 is regulated at the RNA or protein level. To this end, they could perform analysis of their single-cell RNA sequencing data (Figure 1), comparing Tgfbr3 mRNA in DNFB versus VV-treated skin. 

      As discussed above, a more detailed analysis of TGFbR3 regulation is of great interest.  These experiments would likely require the creation of additional tools (e.g. a reporter mouse) to provide robust data.  However, as suggested, we have re-analyzed our scRNAseq looking for expression of Tgfbr3. Pseudobulk analysis of cells isolated from VV or DNFB sites suggests that Tgfbr3 appears to be elevated in antigen-experienced TRM at steady-state (Author response image 1).

      Author response image 1.

      Pseudobulk analysis by average gene expression of Tgfbr3 in cells isolated from either VV or DNFB treated flanks, divided by the average gene expression of Tgfbr3 in naïve CD8 T cells from the same dataset.

      For clarity, when discussing antigen exposure throughout the paper, it would be helpful for the authors to be more precise that they are referring to the antigen in the skin rather than in the draining lymph node. A more explicit summary of some of the lab's previous work focused on CD8+ TRM and the role of TGFb would also help readers better contextualize this work within the existing literature on which it builds. 

      We appreciate this feedback, and we have clarified this in the text.

      For rigor, it would be helpful where possible to pair flow cytometry quantification with the existing imaging data.

      Thank you for these suggestions.  In terms of quantification of number of T<sub>RM</sub>by flow cytometry, we have previously demonstrated as much as a 36-fold decrease in cell count when compared to numbers directly visualized by immunofluorescence [1].  Thus, for enumeration of T<sub>RM</sub> we rely primarily on direct IF visualization and use flow cytometry primarily for phenotyping.

      Additional controls, namely enumerating TRM in the opposite, untreated flank skin of VV-only-treated mice and the treated flank skin of DNFB-only treated mice, would help contextualize the results seen in dually-treated mice in Figure 2.

      Without a source of inflammation (e.g. VV infection of DNFB) we see very few T<sub>RM</sub>in untreated skin.  A representative image is provided (Author response image 2).  A single DNFB stimulation does not recruit any CD8+ T cells to the skin without a prior sensitization [2].

      Author response image 2.

      Representative images of epidermal whole mounts of VV treated flank skin, and an untreated site from the same mouse isolated on day 50 post infection and stained for CD8a.

      In figure legends, we suggest clearly reporting unpaired T tests comparing relevant metrics within VV or DNFB-treated groups (for example, VV-OVA PBS vs VV-OVA FTY720 in Figure 3F).

      Thank you for this suggestion.  The figure legends have been amended.

      Finally, quantifying right and left skin draining lymph node CD8+ T cell numbers would clarify the skin specificity and cell trafficking dynamics of the authors' model. 

      We quantified the numbers of CD8 T cells in left and right skin draining lymph nodes by flow cytometry in mice at day 50 post VV infection DNFB-pull.  We observe similar numbers of cells at both sites (Author response Image 3).

      Author response Image 3.

      Quantification of total number of CD8+ T cells in left and right inguinal lymph nodes. Each symbol represents paired data from the same individual animal, and this is representative of 3 separate experiments.

      Reviewer #1 (Recommendations for the authors): 

      (1) Figures 1D and S1C demonstrate that 80-90 % of TRM at both VV and DNFB sites express CD103+. In contrast, the sequencing data suggests the TRM at the VV site has much higher Itgae expression. Also, clusters 3 and 4, which express significantly more Itgae than all other clusters, together comprise only ~30% of CD8+ T cells at the VV-infected skin site. How can these discrepancies between transcript and protein expression be explained? 

      Thank you for these excellent comments. T<sub>RM</sub> at both VV and DNFB sites appear to express similarly high levels of CD103 protein in both the OT-I system as we previously published [1] and in a polyclonal system using tetramers.  The lower penetrance of Itgae expression in the scRNAseq data we attribute to a lack of sensitivity which is common with this modality.  However, the relative increased expression of Itgae in clusters 3 and 4 is interesting and may suggest increased Itgae production/stability.  However, in the absence of any effect on protein expression, we chose not to focus on these mRNA differences.

      (2) For the experiments in Figure 3D, in order to exclude a contribution from circulating memory cells, FTY720 should have been administered during the duration of, not prior to, the initiation of the recall response. The effect of FTY720 wears off quickly, so the current experimental setting likely allows for circulating cells to enter the skin. This concern is mitigated by the results of anti-Thy1.1 mAb treatment, but documenting the experiment as in Figure D will likely be confusing to readers. 

      Thank you for this comment.  We relied on the literature indicating that the half-life of FTY720 in blood is longer than 6 days [3-5].  However, on reviewing this again, there are other reports suggesting a lower halflife.  Thank you for pointing out this potential caveat.  As mentioned above, we do not think this affects the interpretation of our data as similar results were obtained with anti-Thy1.1

      (3) Similar to what is described in the weaknesses section, the data on TGFBRIII expression is lacking. When is TGFBRIII induced? In the LN during primary activation and it is then sustained by a secondary antigen exposure at the peripheral target tissue site? Or is it only induced in the peripheral tissue, and there is interesting biology to uncover in regard to how it is induced by the TCR only after secondary exposure, etc.? 

      Thank you for these comments. As discussed above, a more detailed analysis of TGFbR3 regulation is of great interest.  These experiments would likely require the creation of additional tools (e.g. a reporter mouse) to provide robust data and are part of our future directions.

      (4) As described in the weakness section, there could be TCR-independent differences between the VV-OVA and DNFB sites that lead to phenotypic changes in the TRMs that are formed there, both CD8+ T cell-intrinsic (kinetics; with regard to time after initial priming) and extrinsic (microenvironmental differences due to the nature of the challenge, recruited cell types, cytokines, chemokines, etc.). Since the authors report the use of both VV and VV-ova, we recommend an experimental strategy that controls for this by challenging one site with VV and another with VV-OVA concomitantly, followed by repeating the key experiments reported in this manuscript. 

      As discussed above, we have previously published a very similar experiment using VV-OVA and VV-N infection on opposite flanks [1].

      (5) In Figure 6J please indicate means and provide more of the statistics comparing the groups (such as comparing VV-WT vehicle to VV-KO vehicle etc.), and potentially display on a linear scale as with all of the other figures looking at cells/mm2 to help convince the reader of the conclusions and support the secondary findings mentioned in the text such as "Notably, numbers of Tgfbr3ΔCD8 TRM in cohorts treated with vehicle remained at normal levels indicating that loss of TGFβRIII does not affect TRM epidermal residence in the steady state" despite it looking like there is a decrease when looking at the graph. 

      We appreciate the feedback on the readability of this figure, and so have updated figure 6J to be on a linear scale and added additional helpful statistics to the figure legend. The difference between Tgfbr3<sup>WT</sup> and Tgfbr3<sup>∆CD8</sup> at steady state is excellent point, and we agree that there could to be a trend towards reduction in the huNGFR+ T<sub>RM</sub> across both groups, even without CWHM12 administration. However, we did not see statistically significant reductions in steady-state Tgfbr3<sup>∆CD8</sup> T<sub>RM</sub>, but the slight reduction in both VV-OVA and DNFB treated flanks suggests that TGFßRIII may play a role in steady-state maintenance of all T<sub>RM</sub>. Perhaps with more sensitive tools to better visualize TGFßRIII expression, we could identify stepwise upregulation of TGFßRIII depending on TCR signal strength, possibly starting in the lymph node. We have also amended our description of this figure in the text, to allow for the possibility that a low, but under the level of detection amount of TGFßRIII could play a role in steady-state maintenance of both local antigen-experienced and bystander T<sub>RM</sub>.

      Minor points: 

      (1) In describing Figure 4B, the term "doublets" for pairs of connected dividing cells is confusing. 

      Thank you for this comment, the term has been revised to “dividing cells” in the text and figure.

      (2) Figure legend 4F: BrdU is not "expressed" . 

      Very true, it has been changed to “incorporation”.

      (3) Do CreERT2 and/or huNGFR expressed by transferred OT-I cells act as foreign antigens in C57BL/6 mice, potentially causing elimination of circulating memory cells? If that were the case, this would not necessarily confound the read-out of TRM persistence studied here, since skin TRM are likely protected from at least antibody-mediated deletion and their numbers are not maintained by recruitment of circulating cells at stead-state. However, it would be useful to be aware of this potential limitation of this and similar models. 

      Thank you for raising the important technical concern.  In our prior work [1] and this work, we monitor the levels of transferred OT-I cells in the blood over time.  We have not observed rejection of huNGFR+ cells.  We also note that others using the same system have also not observed rejection [6].

      (4) In Figure 6J, means or medians should be indicated 

      This has been updated in Figure 6J.

      (5) Using the term "antigen-experienced" to specifically refer to TRM at the VV site could be confusing, since those at the DNFB site are also Ag-experienced (in the LN draining the VV skin site). 

      We agree that it is a challenging term, as all T<sub>RM</sub> are memory cells. That is why in the text we refer to T<sub>RM</sub> isolated from the VV site as “local antigen experienced T<sub>RM</sub>.”, to try to distinguish them from bystanders that did not experience local antigen.

      (6) The Title essentially restates what was already reported in the authors' prior study. If the data supporting the TGFBRIII-mediated mechanism is studied in more depth, maybe adding this aspect to the title may be useful? 

      Thank you for this suggestion.  I think the current title is probably most suitable for the current manuscript but we are willing to change it should the editors support an alternative title.

      Reviewer #2 (Recommendations for the authors): 

      (1) Definition of bystander CD8+ TRM: The first paragraph of the introduction defines CD8+ TRM. To improve the clarity of this definition, we suggest being explicit that bystander TRM experience cognate antigen in the SDLNs but, in contrast to other TRM, do not experience cognate antigen in the skin. 

      Thank you, we have clarified this is in the text.

      (2) Consider softening the language when comparing the efficiency of CD8+ recruitment of the skin between DNFB and VV-treated flanks. For example, substitute "equal efficiency" with "comparable efficiency" since it is difficult to directly compare the extent of inflammation between viral and hapten-based treatments. 

      We have adjusted this terminology throughout the paper.

      (3) Throughout figure legends, we appreciate the indication of the number of experimental repeats performed. We suggest, either through statistics or supplemental figures, demonstrating the degree of variability between experiments to aid readers in understanding the reproducibility of results. 

      Thank you for this suggestion.  In key figures we show data from individual mice across multiple experiments. Thus, inter-experiment variability is captured in our figures.  

      (4) Figure 1: 

      a) Add control mice treated with either vaccinia virus or DNFB and harvest back skin at day 52 to demonstrate baseline levels of polyclonal and B8R tetramer-positive CD8s in the epidermis. These controls would clarify the background CD8+ expansion that might occur in DNFB-treated mice in the absence of vaccinia virus. 

      This point was addressed above.

      b) Figure 1: It would be helpful to see the %Tet+ population specifically in the CD103+ population, recognizing that the majority of the CD8+ from the skin are CD103+. 

      We did look only at CD103+ CD8 T cells from the skin for our tetramer analysis, so this has been clarified in the figure legend.

      c) Provide a UMAP, very similar to 1H, where CD8+ T cells, vaccinia virus, and DNFB-treated flanks are overlaid.

      Thank you for this suggestion.  A UMAP combining aspects of 1G (cell types from the whole ImmgenT dataset) with 1H (our data) results in a figure that is very difficult to interpret.  Thus, we have separated cell types across the entire ImmgenT data set (e.g. CD8+ T cells) and our data into 2 separate panels.

      d) 1D: left flow plot has numbered axis while the right flow plot does not. 

      Thank you, this has been fixed.

      (5) Figure 2: 

      a) In the figure legend, define what is meant by the grey line present in Figures 2C and 2D. 

      This has been updated in the figure legend.

      b) Edit the Y axis of 2C and 2D to specify the TRM signature score. 

      This has been updated in the figure.

      c) Include panel 1D from 1S into Figure 2 to help clarify for the reader what genes are expressed in the 0 - 5 clusters.

      We appreciate the feedback, but we found the heatmap made the figure look too busy, so we feel comfortable keeping it available within supplemental figure 1.

      d) In body of text explicitly discuss that the TRM module used to calculate a signature score was created using virus infection modules (HSV, LCMV and influenza) and thus some of the transcriptional similarity between the authors vaccinia virus treated CD8+ TRM and the TRM module might be due to viral infection rather than TRM status.

      Thank you for this comment.  We have now emphasized this point in the text.

      (6) Figure 3: 

      a) If there are leftover tissue sections, it would be optimal to show specific staining for CD103. We recognize that this data has been previously published by the lab, but it would be ideal to show it once in this paper. 

      Unfortunately, we do not have leftover tissue sections, so we are unable to measure CD103 by I.F. in these experiments.

      b) If you did collect skin draining lymph nodes in the Thy1.1 depletion model, it would be nice to see flow data showing the depletion effects in the skin draining lymph nodes in addition to the blood. 

      Unfortunately, we did not collect the skin draining lymph nodes, and do not have that data for the relevant experiments.

      c) Figure 3 F & G: Perform a T-test comparing vaccinia virus PBS to FTY720 and isotype to anti-Thy1.1 within the same treatment group. Showing no significance with these two comparisons would strengthen the authors' claims. Statistics can be described in legend. 

      We have included this analysis in the figure legend.

      (7) Figure 4: 

      a) It would be helpful to have the CD69+/CD103+ population in this model discussed/defined more. The CD69 expression seen in 4E is lower than the reviewers would've predicted, and it would be interesting to see CD103 expression as well.

      We have found that generally CD103 is a stronger marker for in the skin by flow, as CD69 staining is somewhat less robust in the colors we have chosen.  By way of example, we present gating we did upstream in that experiment, gated previously on liveCD45+CD3+CD8+ events (Author response image 4).

      Author response image 4.

      Representative flow cytometric plots showing CD69 and CD103 expression in gated live CD45+CD8+CD90.1+ cells isolates from VV-OVA or DNFB treated flanks.

      (8) Figure 5: 

      a) Define APL and its purpose in both the body of text and the figure legend. 

      We have clarified this in the text and the figure legend.

      b) Using in-vivo BrdU, compare proliferation between high avidity N4 and low avidity Y3 OVA-peptide at the primary recall timepoint. 

      We considered this, but due to the lack of sensitivity of the BrdU incorporation and the relatively subtle phenotype of the Y3, we did not think the assay would be sensitive enough to identify differences.

      (9) Figure 6: 

      a) Compare TGFBR3 expression in CD8+ T cells from mice receiving high avidity N4 versus low avidity Y3 OVA-peptide at the primary recall timepoint. 

      This point was discussed above.

      b) Either 1) examine TGFBR3 mRNA expression in VV vs DNFB skin from scRNA-seq dataset or 2) perform a qPCR on epidermal CD8+ T cells from mice receiving high avidity N4 versus low avidity Y3 at the primary recall timepoint. This would help distinguish whether TGFBR3 regulation occurs at the mRNA versus protein level. 

      This point has been discussed above.

      c) Figure 6A: Not required, but it seems like the TGFBR3 gate could be shifted to the right a bit. 

      The gates were set using FMO.

      d) Figure 6C: What comparison is the asterisk indicating significance referring to?

      It is the Dunnett’s test comparing VV-OVA to DNFB and untreated skin, the figure has been amended to clarify this point.

      e) Figure 6: To increase the rigor of the claim that CWHM12 is creating a TGFb limiting condition, the authors could either 1) perform an ELISA or cell-based assay measuring active TGFb, 2) recapitulate results of 6J using monoclonal antibody against avb6 as done in Hirai et al., 2021, Immunity., or 3) examine Tgfbr3 mRNA expression in your single cell RNAseq data, comparing cluster 0 and cluster 3.

      We are pleased to have the opportunity to show Tgfbr3 mRNA, which is above in figure R1.

      (10) Material and methods: 

      Specify how the localization of the back skin used for imaging was made consistent between the right and left flanks. 

      We have updated this methodology in the text.

      Literature Cited

      (1) Hirai, T., et al., Competition for Active TGFβ Cytokine Allows for Selective Retention of Antigen-Specific Tissue- Resident Memory T Cells in the Epidermal Niche. Immunity, 2021. 54(1): p. 84-98.e5.

      (2) Manresa, M.C., Animal Models of Contact Dermatitis: 2,4-Dinitrofluorobenzene-Induced Contact Hypersensitivity, in Animal Models of Allergic Disease: Methods and Protocols, K. Nagamoto-Combs, Editor. 2021, Springer US: New York, NY. p. 87-100.

      (3) Müller, H.C., et al., The Sphingosine-1 Phosphate receptor agonist FTY720 dose dependently affected endothelial integrity in vitro and aggravated ventilator-induced lung injury in mice. Pulmonary Pharmacology & Therapeutics, 2011. 24(4): p. 377-385.

      (4) Nofer, J.-R., et al., FTY720, a Synthetic Sphingosine 1 Phosphate Analogue, Inhibits Development of Atherosclerosis in Low-Density Lipoprotein Receptor–Deficient Mice. Circulation, 2007. 115(4): p. 501-508.

      (5) Brinkmann, V., et al., Fingolimod (FTY720): discovery and development of an oral drug to treat multiple sclerosis. Nat Rev Drug Discov, 2010. 9(11): p. 883-97.

      (6) Andrews, L.P., et al., A Cre-driven allele-conditioning line to interrogate CD4<sup>+</sup> conventional T cells. Immunity, 2021. 54(10): p. 2209-2217.e6.

    1. eLife Assessment

      This study addresses an important question and shows how social navigation in homing pigeons can be explained by simple averaging, without requiring any complex cognitive abilities. The evidence, based on a rigorous and systematic comparison of seven models and data on how social routes can be generated from solitary routes, is compelling. The authors should be commended for their willingness to critically re-examine established interpretations.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates how collective navigation improvements arise in homing pigeons. Building on the Sasaki & Biro (2017) experiment on homing pigeons, the authors use simulations to test seven candidate social learning strategies of varying cognitive complexity, ranging from simple route averaging to potentially cognitively demanding selective propagation of superior routes. They show that only the simplest strategy-equal route averaging-quantitatively matches the experimental data in both route efficiency and social weighting. More complex strategies, while potentially more effective, fail to align with the observed data. The authors also introduce the concept of "effective group size," showing that the chaining design leads to a strong dilution of earlier individuals' contributions. Overall, they conclude that cognitive simplicity rather than cumulative cultural evolution explains collective route improvements in pigeons.

      Strengths:

      The manuscript addresses an important question and provides a compelling argument that a simpler hypothesis is necessary and sufficient to explain findings of a recent influential study on pigeon route improvements, via a rigorous systematic comparison of seven alternative hypotheses. The authors should be commended for their willingness to critically re-examine established interpretations. The introduction and discussion are broad and link pigeon navigation to general debates on social learning, wisdom of crowds, and CCE.

      Weaknesses:

      The lack of availability of codes and data for this manuscript, especially given that it critically examines and proposes alternative hypotheses for an important published work.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript investigates which social navigation mechanisms, with different cognitive demands, can explain experimental data collected from homing pigeons. Interestingly, the results indicate that the simplest strategy - route averaging - aligns best with the experimental data, while the most demanding strategy - selectively propagating the best route - offers no advantage. Further, the results suggest that a mixed strategy of weighted averaging may provide significant improvements.

      The manuscript addresses the important problem of identifying possible mechanisms that could explain observed animal behavior by systematically comparing different candidate models. A core aspect of the study is the calculation of collective routes from individual bird routes using different models that were hypothesized to be employed by the animals, but which differ in their cognitive demands.

      The manuscript is well-written, with high-quality figures supporting both the description of the approach taken and the presentation of results. The results should be of interest to a broad community of researchers investigating (collective) animal behavior, ranging from experiment to theory. The general approach and mathematical methods appear reasonable and show no obvious flaws. The statistical methods also appear.

      Strengths:

      The main strength of the manuscript is the systematic comparison of different meta-mechanisms for social navigation by modeling social trajectories from solitary trajectories and directly comparing them with experimental results on social navigation. The results show that the experimentally observed behavior could, in principle, arise from simple route averaging without the need to identify "knowledgeable" individuals. Another strength of the work is the establishment of a connection between social navigation behavior and the broader literature on the wisdom of crowds through the concept of effective group size.

      Weaknesses:

      However, there are two main weaknesses that should be addressed:

      (1) The first concerns the definition of "mechanism" as used by the authors, for example, when writing "navigation mechanism." Intuitively, one might assume that what is meant is a behavioral mechanism in the sense of how behavior is generated as a dynamic process. However, here it is used at a more abstract (meta) level, referring to high-level categories such as "averaging" versus "leader-follower" dynamics. It is not used in the sense of how an individual makes decisions while moving, where the actual route followed in a social context emerges from individuals navigating while simultaneously interacting with conspecifics in space and time. In the presented work, the approach is to directly combine (global) route data of solitary birds according to the considered "meta-mechanisms" to generate social trajectories. Of course, this is not how pigeon social navigation actually works-they do not sit together before the flight and say, "This is my route, this is your route, let's combine them in this way." A mechanistic modeling approach would instead be some form of agent-based model that describes how agents move and interact in space and time. Such a "bottom-up" approach, however, has its drawbacks, including many unknown parameters and often strongly simplifying (implicit) assumptions. I do not expect the authors to conduct agent-based modeling, but at the very least, they should clearly discuss what they mean by "mechanism" and clarify that while their approach has advantages-such as naturally accounting for the statistical features of solitary routes and allowing a direct comparison of different meta-mechanisms is also limited, as it does not address how behavior is actually generated. For example, the approach lacks any explicit modeling of errors, uncertainty, or stochasticity more broadly (e.g., due to environmental influences). Thus, while the presented study yields some interesting results, it can only be considered an intermediate step toward understanding actual behavioral mechanisms.

      (2) While the presented study raises important questions about the applicability and viability of cumulative cultural evolution (CCE) in explaining certain animal behaviors such as social navigation, I find that it falls short in discussing them. What are the implications regarding the applicability of CCE to animal data and to previously claimed experimental evidence for CCE? Should these experiments be re-analyzed or critically reassessed? If not, why? What are good examples from animal behavior where CCE should not be doubted? Furthermore, what about the cited definitions and criteria of CCE? Are they potentially too restrictive? Should they be revised-and if so, how? Conversely, if the definitions become too general, is CCE still a useful concept for studying certain classes of animal behavior? I think these are some of the very important questions that could be addressed or at least raised in the discussion to initiate a broader debate within the community.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      This study investigates how collective navigation improvements arise in homing pigeons. Building on the Sasaki & Biro (2017) experiment on homing pigeons, the authors use simulations to test seven candidate social learning strategies of varying cognitive complexity, ranging from simple route averaging to potentially cognitively demanding selective propagation of superior routes. They show that only the simplest strategy-equal route averaging-quantitatively matches the experimental data in both route efficiency and social weighting. More complex strategies, while potentially more effective, fail to align with the observed data. The authors also introduce the concept of "effective group size," showing that the chaining design leads to a strong dilution of earlier individuals' contributions. Overall, they conclude that cognitive simplicity rather than cumulative cultural evolution explains collective route improvements in pigeons.

      Strengths:

      The manuscript addresses an important question and provides a compelling argument that a simpler hypothesis is necessary and sufficient to explain findings of a recent influential study on pigeon route improvements, via a rigorous systematic comparison of seven alternative hypotheses. The authors should be commended for their willingness to critically re-examine established interpretations. The introduction and discussion are broad and link pigeon navigation to general debates on social learning, wisdom of crowds, and CCE.

      We thank the reviewer for their positive comments.

      Weaknesses:

      The lack of availability of codes and data for this manuscript, especially given that it critically examines and proposes alternative hypotheses for an important published work.

      We thank the reviewer for their comment. The code and data for our manuscript are an important aspect of the study, and we had intended to make them publicly available upon publication. The link to our code and data on figshare can be found here: (https://doi.org/10.6084/m9.figshare.28950032.v1). We will further add this link to the Data Availability Statement of our revised version.  

      Reviewer #2 (Public review):

      Summary:

      The manuscript investigates which social navigation mechanisms, with different cognitive demands, can explain experimental data collected from homing pigeons. Interestingly, the results indicate that the simplest strategy - route averaging - aligns best with the experimental data, while the most demanding strategy - selectively propagating the best route - offers no advantage. Further, the results suggest that a mixed strategy of weighted averaging may provide significant improvements.

      The manuscript addresses the important problem of identifying possible mechanisms that could explain observed animal behavior by systematically comparing different candidate models. A core aspect of the study is the calculation of collective routes from individual bird routes using different models that were hypothesized to be employed by the animals, but which differ in their cognitive demands.

      The manuscript is well-written, with high-quality figures supporting both the description of the approach taken and the presentation of results. The results should be of interest to a broad community of researchers investigating (collective) animal behavior, ranging from experiment to theory. The general approach and mathematical methods appear reasonable and show no obvious flaws. The statistical methods also appear.

      Strengths:

      The main strength of the manuscript is the systematic comparison of different meta-mechanisms for social navigation by modeling social trajectories from solitary trajectories and directly comparing them with experimental results on social navigation. The results show that the experimentally observed behavior could, in principle, arise from simple route averaging without the need to identify "knowledgeable" individuals. Another strength of the work is the establishment of a connection between social navigation behavior and the broader literature on the wisdom of crowds through the concept of effective group size.

      We thank the reviewer for their positive comments.

      Weaknesses:

      However, there are two main weaknesses that should be addressed:

      (1) The first concerns the definition of "mechanism" as used by the authors, for example, when writing "navigation mechanism." Intuitively, one might assume that what is meant is a behavioral mechanism in the sense of how behavior is generated as a dynamic process. However, here it is used at a more abstract (meta) level, referring to high-level categories such as "averaging" versus "leader-follower" dynamics. It is not used in the sense of how an individual makes decisions while moving, where the actual route followed in a social context emerges from individuals navigating while simultaneously interacting with conspecifics in space and time. In the presented work, the approach is to directly combine (global) route data of solitary birds according to the considered "meta-mechanisms" to generate social trajectories. Of course, this is not how pigeon social navigation actually works-they do not sit together before the flight and say, "This is my route, this is your route, let's combine them in this way." A mechanistic modeling approach would instead be some form of agent-based model that describes how agents move and interact in space and time. Such a "bottom-up" approach, however, has its drawbacks, including many unknown parameters and often strongly simplifying (implicit) assumptions. I do not expect the authors to conduct agent-based modeling, but at the very least, they should clearly discuss what they mean by "mechanism" and clarify that while their approach has advantages-such as naturally accounting for the statistical features of solitary routes and allowing a direct comparison of different meta-mechanisms is also limited, as it does not address how behavior is actually generated. For example, the approach lacks any explicit modeling of errors, uncertainty, or stochasticity more broadly (e.g., due to environmental influences). Thus, while the presented study yields some interesting results, it can only be considered an intermediate step toward understanding actual behavioral mechanisms.

      We thank the reviewer for their comment and thoughtful suggestions. We agree that the inherent behavioral mechanisms and the biological basis of these mechanisms cannot be determined just through the navigational data alone. For instance, it remains unexplored if pigeons are adapting their behavior based only on social cues from their partners or using other navigational features such as landmarks or roads, location of the sun, geomagnetic cues or prior learnt routes. However, we do agree (as also pointed by the reviewer) that these behavioral rules generate an emergent ‘meta-mechanism’ where the bird pairs are behaving as if their preferred routes are averaged during a flight. It will be important in future work to explore the biological basis of these mechanisms, but our current approach allows us to only describe the mechanisms in a meta sense with any confidence. Considering this, we believe that our analysis is a more top-down approach towards describing the outcomes of these underlying mechanisms in an abstract sense. We would also like to point the reviewer to Dalmaijer, 2024 [1] who used a bottom up approach, using naive agents and showed that cumulative route improvements emerged in the absence of any sophisticated communication in the same dataset, in agreement with our approach. Considering these points, we will make changes in our revised version to clearly elaborate on what the definition of ‘mechanism’ should include in line with the reviewer’s feedback.

      (2) While the presented study raises important questions about the applicability and viability of cumulative cultural evolution (CCE) in explaining certain animal behaviors such as social navigation, I find that it falls short in discussing them. What are the implications regarding the applicability of CCE to animal data and to previously claimed experimental evidence for CCE? Should these experiments be re-analyzed or critically reassessed? If not, why? What are good examples from animal behavior where CCE should not be doubted? Furthermore, what about the cited definitions and criteria of CCE? Are they potentially too restrictive? Should they be revised-and if so, how? Conversely, if the definitions become too general, is CCE still a useful concept for studying certain classes of animal behavior? I think these are some of the very important questions that could be addressed or at least raised in the discussion to initiate a broader debate within the community.

      We thank the reviewer for their comments and interesting questions regarding our study. We agree with the reviewer that our study opens up new avenues for critically analysing the criteria previous studies have used for providing evidence of CCE in non-human animals. According to our literature review, we found that the field has been usually motivated in thinking about CCE in a ‘process’ focused manner (Reindl et al. [2]) in regards to individuals being able to compare strategies and selecting ones resulting in higher individual fitness. This preferential selection of strategies – termed innovations — allows for the stereotypical ratcheting effect seen in CCE. In our study, we propose that in the case of homing pigeons, the ratcheting effect is more of a statistical outcome rather than deliberate individual judgement. We believe that this strategy is also amenable to certain task types (which in our study was homing route choice) and may change for others (for example solving a puzzle box) and the task also needs to be sufficiently complex for animals to benefit from the use of social information (Caldwell et al. 2008 [3]). Thus, we recommend future work to address what classes of problems would fit well within the definition of “emergent” CCE and which ones don’t. Keeping this framework in mind, studies should clearly state what definition of CCE they are using and should be critically evaluated for their underlying task type and cognitive mechanisms to deem them as CCE. Considering these points we will expand our discussion to highlight these key questions that could be critical to think upon for future research.

      References:

      (1) Dalmaijer ES (2024) Cumulative route improvements spontaneously emerge in artificial navigators even in the absence of sophisticated communication or thought. PLoS Biol. 22:e3002644.

      (2) Reindl, E., Gwilliams, A.L., Dean, L.G. et al. (2020) Skills and motivations underlying children’s cumulative cultural learning: case not closed. Palgrave Commun 6, 106.

      (3) Caldwell CA, Millen AE (2008) Studying cumulative cultural evolution in the laboratory. Phil. Trans. R. Soc. B 363:3529-3539.

    1. eLife Assessment

      This valuable study examines the cleavage of motor neuron nucleoporins by proteases 2A and 3C of enterovirus D68, a pathogen associated with acute flaccid myelitis. The evidence supporting the effects of EV-D68 proteases on nuclear import and export is solid and confirms previous results on the specific targeting of nucleoporins by proteases from other enteroviruses. However, the claim that cleavage of nucleoporins by EV-D68 2A is neurotoxic, though intriguing, is incomplete, as the evidence is largely indirect.

    2. Reviewer #1 (Public review):

      Summary:

      Zinn and colleagues investigated the role of proteases 2A and 3C of enterovirus D68 (EVD68), an emerging pathogen associated with outbreaks of acute flaccid myelitis (AFM), a polio-like disease, on the nucleocytoplasmic trafficking in different systems, including human neurons derived from pluripotent cells. They found that 2A specifically cleaved Nup98 and POM121. Using reporter proteins and RNA synthesis and trafficking assays in cells expressing viral proteases, they showed that 2A induces broad loss of the nuclear pore barrier function, but, surprisingly, the RNA export appears to be minimally affected. Since nucleocytoplasmic trafficking defects are known to be associated with neuropatologies, they propose a hypothesis that 2A-dependent cleavage of nucleoporins in motoneurons underlies the development of EVD68-induced AFM. They further show that a 2A-specific inhibitor increases the survival of human neurons differentiated from stem cells upon EVD68 infection.

      Strengths:

      Use of multiple methods to investigate the effect of 2A and 3C expression on nucleoporin cleavage and nucleocytoplasmic trafficking.

      Weaknesses:

      Overall, the paper follows multiple others that extensively investigated the cleavage of nucleoporins by enterovirus 2As, so the results are of limited novelty. The hypothesis that infection of motoneurons is the cause of EVD68-induced neurological complications so far is supported by only one autopsy report. Other data suggest that infection of other cell types, such as astrocytes, and/or inflammatory cell infiltration in the CNS, are likely to be responsible for the symptoms. In any case, the claim that EVD68 is specifically neurotoxic because of the 2A-dependent cleavage of nucleoporins in neurons is unfounded, as the virus will be just as "toxic" for other infected cell types.

      The paper also requires a more convincing presentation of the data.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the role of EV-D68 proteases 2A and 3C in nuclear pore complex (NPC) dysfunction and their contribution to motor neuron toxicity. The authors demonstrate that both proteases cleave only a limited number of nucleoporins, with 2A^pro showing the strongest impact by inhibiting nuclear import and export of proteins and disrupting NPC permeability without affecting RNA export. Importantly, treatment with the 2A^pro inhibitor telaprevir reduced neuronal cell death in a dose-dependent manner, achieving neuroprotection at concentrations below those required to inhibit viral replication. The study addresses a relevant mechanism underlying EV-D68-induced neuropathology and explores a potential therapeutic intervention.

      Strengths:

      (1) Provides significant mechanistic insight into how EV-D68 proteases alter NPC function and contribute to neuronal toxicity.

      (2) The use of recombinant 2A and 3C proteins allows clear dissection of the specific contribution of each protease.

      (3) Demonstrates a therapeutic effect of telaprevir, with neuroprotection independent of viral replication inhibition, adding translational value to the findings.

      (4) The topic is highly relevant given the association of EV-D68 with acute flaccid myelitis.

      Weaknesses:

      (1) Most experiments were performed with recombinant proteases, lacking validation in the context of viral infection, where both proteases act simultaneously.

      (2) The conclusion that RNA export is unaffected requires confirmation during actual infection.

      (3) The reduction of neurotoxicity by telaprevir does not fully demonstrate that the protective effect is solely mediated through NPC preservation; additional analyses of eIF4G cleavage, nucleoporin integrity, and stress granules are needed.

      (4) The study would be strengthened by including another 2A inhibitor (e.g., boceprevir) to confirm the specificity of telaprevir's protective effects.

    4. Reviewer #3 (Public review):

      Summary:

      The author showed expression of the viral proteases 2Apro and 3Cpro of EV-D68, which cleaved specific components of the nuclear pore complex (Nup98 and POM121 by 2Apro), and 2A but not 3C expression altered nuclear import and export. Similar nucleocytoplasmic transport deficits are observed in EV-D68-infected RD cells and iPSC-derived motor neurons (diMNs). 2A inhibitor telaprevir partially rescued the nucleocytoplasmic transport deficits and suppressed neuronal cell death after infection. While it's clear that 2A can cleave NPC proteins and affect nuclear transport, the link to neurotoxicity after EV-D68 infection is less convincing.

      This study opens up a very intriguing hypothesis: that EV-D68 2Apro could be directly responsible for motor neuron cell death, mediated by POM121 and possibly Nup98 cleavage, that ultimately results in paralysis known as acute flaccid myelitis. This hypothesis notably does run counter to other published data showing that human neuronal organoids derived from iPSCs can support productive EV-D68 infection for weeks without cell death and that EV-D68-infected mice can have paralysis prevented by depletion of CD8 T cells, still with EV-D68 infection of the spinal cord. However, even if 2Apro is not ultimately responsible for motor neurons dying in human infections, that does not exclude the possibility that cleavage of nups could still disrupt motor neuron function. Notably, most children with AFM have some amount of motor function return after their acute period of paralysis, but most still have some residual paralysis for years to life. It is possible that 2A pro could mediate the acute onset of weakness, while T cells killing neurons could determine the amount of long-term, residual paralysis.

      Strengths:

      The characterization of nuclear pore complex components that appear to be targets of both poliovirus and EV-D68 proteases is quite thorough and expansive, so this data set alone will be useful for reference to the field. And the process by which the authors narrowed their focus to EV-D68 2Apro reducing Nup98 and POM121 as consequential to both import and export of nuclear cargo but not RNA was technically impressive, thorough, and convincing. As will be detailed below, when the authors move from studying over-expressed proteases in transformed cell lines to studying actual virus infection in both transformed cell lines and iPSC-derived neurons, some of the data only indirectly support their conclusions; however, the quality of the experiments performed is still high. So even if the claim that 2Apro causes neurotoxicity is circumstantial, the data certainly are intriguing and certainly justify further study of the effects of EV-D68 2Apro on the NPC and how this impacts pathogenesis. This is a convincing start to an intriguing line of inquiry.

      Weaknesses:

      This study falls a bit shy of actually showing that 2Apro effects are causing motor neuron toxicity because the evidence of this is fairly indirect. At points, the authors do admit these limitations, but at other times, they claim to have shown the link directly. The following are reasons why these claims are only indirectly supported:

      (1) Cleavage of Nup98 and POM121 after EV-D68 infection in RD cells and diMNs is never demonstrated.

      (2) Telaprevir was able to rescue nucleocytoplasmic transport in RD cells at low concentrations (Figure 4A). It is not shown if this correlates with its antiviral effect in RD cells, or could this correlate with inhibition of 2A cleavage of Nup98 or POM121, which is never measured.

      (3) Building off of the prior point, the authors' claim that the neuroprotective effect of telaprevir is independent of its antiviral effect is not well-founded. Figure 4E (neuroprotection) was done with MOI 5, and Figure 4G (virus growth) was MOI 0.5. Telaprevir neuroprotection is not shown at MOI 0.5, nor is the neuroprotective effect correlated with inhibition of 2A cleavage of Nup98 or POM121.

      (4) The use of mixed virus isolates only in the diMNs is problematic because different EV-D68 isolates are known to have drastically different effects on pathogenesis in mice. Since all initial data were generated with the MO isolate, adding the additional MD isolate to the diMN experiments actually adds uncertainty to the conclusions. It is not clear if the authors infected different cultures with the different isolates and combined the data or infected all cultures with a mixture of the two isolates. If the former, then the data should be reported separately to see the effect of each individual strain, which would be interesting to EV-D68 virologists. If the latter, then there is no way to know from these data whether one of the two isolates had increased fitness over the other and exerted a dominant effect. If the MD isolate overtook the MO isolate, from which all other data in this manuscript are derived, then we have much less of an idea how much the data from the first three figures supports the final figure.

    5. Author response:

      We thank the reviewers for their detailed and thoughtful comments on the manuscript.  In general, the reviewers found the data supporting the role of Enterovirus D68 proteases in disrupting the composition of the nuclear pore complex, the 2A protease disrupting nucleocytoplasmic transport of protein cargoes, and the mechanistic dissection of this process to be convincing and potentially relevant to the pathogenesis of AFM.  Reviewers requested additional experiments evaluating our observation that RNA export was not similarly impaired, particularly in the context of viral infection rather than solely expression of recombinant proteases.  They also requested that cleavage of POM121 and Nup98 by 2A protease, which was demonstrated in 2A<sup>pro</sup> transfected cells and in biochemical assays, also be demonstrated in motor neurons infected by EV-D68.  Finally, reviewers noted that while suggestive, the evidence falls short of demonstrating that the toxicity of 2A<sup>pro</sup> is mediated through nuclear pore complex dysfunction.

      To address these critiques, we aim to do the following:

      (1) Determine the impact of live virus infection on RNA export by repeating the ethinyl uridine pulse-chase assay in the setting of live virus infection.  We will also provide representative images for these data and the previously reported data from transfection with GFP-2A<sup>pro</sup> and GFP-3C<sup>pro</sup>.

      (2) Evaluate cleavage of POM121 and Nup98 in EV-D68-infected diMNs and inhibition of cleavage by telaprevir by Western blot.

      (3) Present motor neuron survival data in figure 4 as separate graphs for each of the viral strains tested, rather than pooling the data.  To clarify reviewer #3’s concern, these were not mixed cultures.

      We agree that we have not demonstrated conclusively that the mechanism by which 2A<sup>pro</sup> is toxic to motor neurons is via NPC dysfunction.  Future work will determine the extent to which NPC dysfunction contributes to 2A<sup>pro</sup>-mediated motor neuron toxicity versus other potential targets of 2A<sup>pro</sup>.  We feel that the additional experiments required to achieve this will be extensive and are beyond the scope of the present manuscript, which represents a key first step in this line of inquiry.

      In addition to the above, there were several points of disagreement between reviewers.  We would like to respond to those as follows:

      Reviewer #1: “The hypothesis that infection of motoneurons is the cause of EVD68-induced neurological complications so far is supported by only one autopsy report.  Other data suggest that infection of other cell types, such as astrocytes, and/or inflammatory cell infiltration in the CNS, are likely to be responsible for the symptoms.”

      Reviewer #3: “This study opens up a very intriguing hypothesis: that EV-D68 2Apro could be directly responsible for motor neuron cell death, mediated by POM121 and possibly Nup98 cleavage, that ultimately results in paralysis known as acute flaccid myelitis. This hypothesis notably does run counter to other published data showing that human neuronal organoids derived from iPSCs can support productive EV-D68 infection for weeks without cell death and that EV-D68-infected mice can have paralysis prevented by depletion of CD8 T cells, still with EV-D68 infection of the spinal cord. However, even if 2Apro is not ultimately responsible for motor neurons dying in human infections, that does not exclude the possibility that cleavage of nups could still disrupt motor neuron function. Notably, most children with AFM have some amount of motor function return after their acute period of paralysis, but most still have some residual paralysis for years to life. It is possible that 2A pro could mediate the acute onset of weakness, while T cells killing neurons could determine the amount of long-term, residual paralysis.”

      The infection of motor neurons is strongly supported not only by the aforementioned autopsy data[1], but also by mouse model data demonstrating replication of EV-D68 within motor neurons in the anterior horn of the spinal cord.[2 ] There are also extensive reports of electromyography and nerve conduction studies from human AFM patients demonstrating that the site of pathology is the spinal motor neuron.[3-10]. By contrast, infection of astrocytes has been demonstrated only in primary murine astrocyte cultures in which no neurons were present.[11] .Therefore, while the available data suggest that EV-D68 infection of astrocytes is possible, in the in vivo context of human and mouse spinal cord, tropism to motor neurons appears to be preferential.  The relative contributions to toxicity of neuron-autonomous vs non-autonomous processes such as glial dysfunction and inflammatory cell infiltration remain to be elucidated, and are not mutually exclusive.

      Our working hypothesis is more in line with that of Reviewer #3.  Motor neuron dysfunction and motor neuron death may ultimately prove to have dissociable causes, each of which may be neuron-autonomous, non-neuron-autonomous, or a mixture thereof.  The infection of motor neurons is likely the initiating event, with multiple downstream consequences.  Much additional work will be required to resolve this controversy.

      Reviewer #1: “Demonstrates a therapeutic effect of telaprevir, with neuroprotection independent of viral replication inhibition, adding translational value to the findings.”

      Reviewer #3: “The authors' claim that the neuroprotective effect of telaprevir is independent of its antiviral effect is not well-founded. Figure 4E (neuroprotection) was done with MOI 5, and Figure 4G (virus growth) was MOI 0.5. Telaprevir neuroprotection is not shown at MOI 0.5, nor is the neuroprotective effect correlated with inhibition of 2A cleavage of Nup98 or POM121.”

      The selection of MOIs for these two experiments was limited by technical considerations.  If the viral growth curve were to be performed at MOI 5, it would be confounded by cell death.  Further, a low MOI is required in order to allow multiple rounds of infection, replication, and spread within the culture, and is therefore more sensitive for assaying the effect of telaprevir on viral replication.  On the other hand, at MOI 0.5 diMN death is very gradual, and in the neuroprotection assay we would have lacked the statistical power to determine whether a rescue of this small magnitude of toxicity is significant.  The EC<sub>50</sub> of telaprevir is not expected to vary significantly at different MOIs.

      References:

      (1) Vogt, M. R. et al. Enterovirus D68 in the Anterior Horn Cells of a Child with Acute Flaccid Myelitis. N Engl J Med 386, 2059-2060 (2022). https://doi.org/10.1056/NEJMc2118155

      (2) Hixon, A. M. et al. A mouse model of paralytic myelitis caused by enterovirus D68. PLoS Pathog 13, e1006199 (2017). https://doi.org/10.1371/journal.ppat.1006199

      (3) Andersen, E. W., Kornberg, A. J., Freeman, J. L., Leventer, R. J. & Ryan, M. M. Acute flaccid myelitis in childhood: a retrospective cohort study. Eur J Neurol 24, 1077-1083 (2017). https://doi.org/10.1111/ene.13345

      (4) Elrick, M. J. et al. Clinical Subpopulations in a Sample of North American Children Diagnosed With Acute Flaccid Myelitis, 2012-2016. JAMA Pediatr 173, 134-139 (2018). https://doi.org/10.1001/jamapediatrics.2018.4890

      (5) Hovden, I. A. & Pfeiffer, H. C. Electrodiagnostic findings in acute flaccid myelitis related to enterovirus D68. Muscle Nerve 52, 909-910 (2015). https://doi.org/10.1002/mus.24738

      (6) Knoester, M. et al. Twenty-Nine Cases of Enterovirus-D68 Associated Acute Flaccid Myelitis in Europe 2016; A Case Series and Epidemiologic Overview. Pediatr Infect Dis J 38, 16-21 (2018). https://doi.org/10.1097/INF.0000000000002188

      (7) Martin, J. A. et al. Outcomes of Colorado children with acute flaccid myelitis at 1 year. Neurology 89, 129-137 (2017). https://doi.org/10.1212/WNL.0000000000004081

      (8) Saltzman, E. B. et al. Nerve Transfers for Enterovirus D68-Associated Acute Flaccid Myelitis: A Case Series. Pediatr Neurol 88, 25-30 (2018). https://doi.org/10.1016/j.pediatrneurol.2018.07.018

      (9) Van Haren, K. et al. Acute Flaccid Myelitis of Unknown Etiology in California, 2012-2015. JAMA 314, 2663-2671 (2015). https://doi.org/10.1001/jama.2015.17275

      (10) Natera-de Benito, D. et al. Acute Flaccid Myelitis With Early, Severe Compound Muscle Action Potential Amplitude Reduction: A 3-Year Follow-up of a Child Patient. J Clin Neuromuscul Dis 20, 100-101 (2018). https://doi.org/10.1097/CND.0000000000000217

      (11) Rosenfeld, A. B., Warren, A. L. & Racaniello, V. R. Neurotropism of Enterovirus D68 Isolates Is Independent of Sialic Acid and Is Not a Recently Acquired Phenotype. Mbio (2019). https://doi.org/10.1128/mBio

    1. eLife Assessment

      This important study provides evidence for our understanding of HIV transmission dynamics by age and sex in Zambia during the PopART trial; by combining phylogenetic and individual-based mathematical modelling (IBM), it adds depth to the epidemiological literature and may inform more strategic allocation of HIV prevention resources in sub-Saharan Africa. The authors employ two complementary and well-established methodologies (phylogenetics and IBM), and this dual approach is a notable strength. However, the evidence supporting key conclusions is incomplete, with several claims insufficiently substantiated by the data presented. Improvements in data presentation (e.g., quantification of qualitative statements, statistical estimates, and clearer description of results) would substantially strengthen the paper.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript describes the results of phylogenetic and epidemiological modeling of the PopART community cohorts in Zambia. The current manuscript draft is methodologically strong, but needs revision to strengthen the take-home messages. As written, there are many possible take-away conclusions. For example, the agreement between IBM and phylogenetic analysis is noteworthy and provides a methodological focus. The revealed age patterns of transmission could be a focus. The effects of the PopART intervention and the consequences of a 1-year disruption could be a focus. It is important, though, that any main messages summarized by the authors are substantiated by the evidence provided and do not extrapolate beyond the data that have been generated. I recommend that the authors think deeply about what the most important, well-supported messages are and reframe the discussion and abstract accordingly.

      Strengths/weaknesses by section:

      (1) ABSTRACT

      The Abstract summarizes qualitative findings nicely, but the authors should incorporate quantitative results for all of the qualitative findings statements.

      The ending claim is not substantiated by the modeling scenarios that have been run: "targeted interventions for demographic groups such as under-35 men may be the key to finally ending HIV." It is straightforward to run this specific scenario in the model to determine whether or not this is true.

      The authors should add confidence intervals to the quantitative metrics, such as the 93.8% and 62.1% incidence reduction.

      (2) RESULTS

      The authors should check the Results section for any qualitative claims not substantiated by the analyses performed, and ensure the corresponding analyses are presented to support the claims.

      The Results and Methods describe the model's implementation of the PopART intervention differently. The Methods describes it as including VMMC, TB, and STI services, while the Results only mentions intensified HIV testing and linkage.

      A limitation of the model is that HIV disease progression is based on the ATHENA cohort in the Netherlands, which is a different HIV subtype (B) than the one in the research setting (C). The model should be configured using subtype C progression data, which have been published, or at least a sensitivity analysis should be conducted with respect to disease progression assumptions.

      In Table 2, the authors should consider adding a p-value to establish whether or not IBM and phylogenetics estimates are different.

      (3) DISCUSSION

      The literature review and comparison of study results to previously published phylogenetic studies is very nice. The authors could strengthen this by providing quantitative estimates with CIs for a more scientific comparison of the study results vs. prior studies, perhaps as a table or figure.

      The authors state that due to "the narrow geographical catchment area... The results should not be automatically extrapolated to apply to other SSA settings." The authors should exercise this caution when comparing the results to studies in South Africa and elsewhere.

      There are many other limitations to the analysis, including some mentioned above, that are not acknowledged. The authors should think carefully about what the most important limitations are and acknowledge them honestly at the end of the Discussion section.

    3. Reviewer #2 (Public review):

      Summary:

      The authors analyzed PopART data to better characterize the age and sex-specific heterosexual HIV transmission dynamics in Zambia, with the goal of allocating resources.

      Strengths:

      Important analysis to hone in on the key driver of HIV transmission in Zambia, which hopefully can be used to tune prevention efforts to maximize effect while limiting required resources. Two analytic approaches were used, and while the phylogenetic data were markedly more limited, they mirrored the simulated epidemic. The authors did a nice job reviewing the limitations of the data and the analyses. The authors did a nice job of providing analyses to support their goals and hypothesis, and this work may have more impact now that resources in SSA for HIV prevention and treatment may become more scarce

      Weaknesses:

      To increase the impact and utility of this work, it would be helpful to parse the analysis just a bit further to estimate the roles of undiagnosed vs diagnosed and untreated subpopulations on this transmission. PopART is a multifaceted intervention, but the cost, effort, and approach to reengagement in care vs testing/treatment can be quite different.

    4. Author resposne:

      We thank the editors and reviewers for their positive and constructive comments. The three most substantial points raised by the public review are the following:

      No explicit modelling of targeting of young men as a course to ending HIV. 

      We did not intend to imply that the epidemic could be ended by this alone, or even that targeting young men was the optimum strategy if resources were available for more general preventative interventions. The “last mile” for HIV will be a very complex scenario in which key populations will start to play an outsize role, and our modelling framework was not developed to consider it. As a result, we would not have confidence in modelling the decline of the viral population to zero. We shall be qualifying the existing language in the paper in order to make this clear.

      Subtype-specific disease progression data. 

      The criticism is that our modelling of disease progression was based on subtype B, while the HIV viral population in Zambia is overwhelmingly subtype C. Sensitivity to subtype has not been looked at in detail in this analysis as the literature suggests that the rate of CD4 decline does not differ between subtypes B and C.

      While some studies have shown differences in CD4 cell decline between subtypes, they have generally highlighted that subtype D progresses faster than other subtypes. Little evidence has been published on the differences between subtype B and C, and studies that do include both subtypes concluded that there was no significant difference in rates of CD4 decline between subtypes.

      No significant difference between rate of CD4 progression by subtype is evidenced in the following publications:<br /> - Klein et al. (2014) (N=9772)<br /> - Bouman et al. (2023) (although no subtype B)<br /> - Easterbrook et al. (2010) (N=861)

      While some studies have illustrated that "progression changes with HIV subtype", an interrogation of the underlying data highlights that subtype B is not included, e.g.<br /> - Kanki et al. (1999) looked at A versus "non-A subtype" but included no subtype B data.<br /> - Vasan et al. (2006) claims differences in rate of CD4 decline by subtype when compared to subtype D but includes no subtype B data.<br /> - Baeten et al. (2007) claims subtype D has faster progression that subtype A but includes no subtype B data.<br /> - Kiwanuka et al. (2008) claims differences in rate of CD4 decline but includes no subtype B data.<br /> - Amornkul et al. (2013) has no subtype B data.

      Furthermore, to explain why we used subtype B data to parameterise the model: usually, statistical analyses of CD4 count progression do not report parameters in a form that can be directly imported into models. Analysing summary statistics to include in models results in under-specified models of disease progression in simulations. For this reason we use the estimates from Cori et al. (2015); where the statistical analysis was specifically tailored to generate modelling parameters. The trade-off is therefore to use subtype C data with model misspecification, or subtype B data without; neither choice is perfect, and we chose the subtype B correctly specified estimates.

      The role of undiagnosed versus diagnosed and untreated subpopulations. 

      We will add an additional analysis us to compare age differences in sources and recipients according to the diagnostic status of the source.

      The rest of the comments in the public review ask for improvements in data presentation (including some additional statistical analyses) and to make sure qualitative claims are fully justified. We are happy to oblige with these, and will make our thinking clear on all points in the full response.

    1. eLife Assessment

      This useful study describes a mechanism of microbial modulation of anti-tumor immunity, which is of considerable interest in the field. However, the experimental supports for the key mechanistic claim, the interaction between RadD and NKp46, are not robust. Multiple experimental inconsistencies, especially in vivo, weaken the conclusions, making the strength of evidence incomplete. Additional controls, direct binding assays, and clarification of in vivo mechanistic relevance would strengthen the work.

    2. Reviewer #1 (Public review):

      In this manuscript, Rishiq et al. investigate whether natural killer (NK) cells can interact with Fusobacterium nucleatum and identify the molecular mediators involved in this interaction. The authors propose that the bacterial adhesin RadD may bind to the activating NK cell receptor NKp46 (NCR1 in mice), leading to NK cell activation and tumor control. While the topic is of significant interest and the hypothesis intriguing, the manuscript lacks critical experimental evidence, contains several technical concerns, and requires substantial revisions.

      Major Concerns:

      (1) Lack of Direct Evidence for RadD-NKp46 Interaction

      The central claim that RadD interacts with NKp46 is not formally demonstrated. A direct binding assay (e.g., Biacore, ELISA, or pull-down with purified proteins) is essential to support this assertion. The absence of this fundamental experiment weakens the mechanistic conclusions of the study.

      (2) Figure 2: Binding Specificity and Bacterial Strains

      A CEACAM1-Ig control should be included in all binding experiments to distinguish between specific and non-specific Ig interactions. There is differential Ig binding between strains ATCC 23726 and 10953. The authors should quantify RadD expression in each strain to determine if the difference in binding is due to variation in RadD levels.

      (3) Figure 3: Flow Cytometry Inconsistencies and Missing Controls

      What do the FITC-negative, Ig-negative events represent? The authors should clarify whether these are background signals, bacterial aggregates, or debris.

      Panel B, CEACAM1-Ig binding appears markedly increased compared to WT bacteria. The reason for this enhancement should be discussed-does it reflect upregulation of the bacterial ligand or an artifact of overexpression? Fluorescence compensation should be carefully reviewed for the NKp46/NCR1-Ig binding assays to ensure that the signals are not due to spectral overlap or nonspecific binding. Importantly, binding experiments using the FadI/RadD double knockout strain are missing and should be included. This control is essential.

      In Panel E, the basis for calculating fold-change in MFI is unclear. Please indicate the reference condition to which the change is normalized.

      (4) Figure 4: Binding Inhibition and Receptor Sensitivity

      Panel A lacks representative FACS plots and is currently difficult to interpret. Differences in the sensitivity of human vs. mouse NKp46 to arginine inhibition should be discussed, given species differences in receptor-ligand interactions. What are the inhibition results using F. nucleatum strains deficient in FadI?

      In Panel B, CEACAM1-Ig and RadD-deficient bacteria must be included as negative controls for binding specificity upon anti-NKp46 blocking.

      (5) Figure 5: Functional NK Activation and Tumor Killing

      In Panels B and C, the key control condition (NK cells + anti-NKp46, without bacteria) is missing. This is needed to evaluate if NKp46 recognition is involved in tumor killing. The authors should explicitly test whether pre-incubation of NK cells with bacteria enhances their anti-tumor activity. Alternatively, could bacteria induce stress signals in tumor cells that sensitize them to NK killing? This distinction is critical.

      (6) Figure 5D: Mechanism of Peripheral Activation

      It is suggested that contact between bacteria and NK cells in the periphery leads to their activation. Can the authors confirm whether this pre-activation leads to enhanced killing of tumor targets, or if bacteria-tumor co-localization is required? The literature indicates that F. nucleatum localizes intracellularly within tumor cells. If so, how is RadD accessible to NKp46 on infiltrating NK cells?

      (8) Figure 5E and In Vivo Relevance

      Surprisingly, F. nucleatum infection is associated with increased tumor burden. Does this reflect an immunosuppressive effect? Are NK cells inhibited or exhausted in infected mice (TGIT, SIGLEC7...)? If NK cell activation leads to reduced tumor control in the infected context, the role of RadD-induced activation needs further explanation. RadD-deficient bacteria, which do not activate NK cells, result in even poorer tumor control. This paradox needs to be addressed: how can NK activation impair tumor control while its absence also reduces tumor control?

      (9) NKp46-Deficient Mice: Inconsistencies

      In Ncr1⁻/⁻ mice, infection with WT or RadD-deficient F. nucleatum has no impact on tumor burden. This suggests that NKp46 is dispensable in this context and casts doubt on the physiological relevance of the proposed mechanism. This contradiction should be discussed more thoroughly.

    3. Reviewer #2 (Public review):

      Summary:

      In the present study, Rishiq et al. investigated whether the RadD protein expressed by Fusobacterium nucleatum subsp. Nucleatum serves as a natural ligand for the NK-activating receptor NKp46, and whether RadD-NKp46 interaction enhances NK cell cytotoxicity against tumor cells. To address this, the authors first performed an association analysis of F. nucleatum abundance and NKp46 expression in head and neck squamous cell carcinoma (HNSC) and colorectal cancer (CRC) using the TCMA and TCGA databases, respectively. While a positive association between NKp46⁺ and F. nucleatum⁺ status with improved overall survival was observed in HNSC patients, no such correlation was found in CRC.

      Next, they examined the binding of NKp46-Ig to various F. nucleatum strains. To confirm that this interaction was mediated specifically by RadD, they employed a RadD-deficient mutant strain. Finally, to establish the functional relevance of the RadD-NKp46 interaction in promoting NK cell cytotoxicity and anti-tumor responses, they utilized a syngeneic mouse breast cancer model. In this setup, AT3 cells were orthotopically implanted into the mammary fat pad of C57BL/6 wild-type (WT) or Ncr1-deficient (NCR1⁻/⁻; murine orthologue of human NKp46) mice, followed by intravenous inoculation with either WT F. nucleatum or the ∆RadD mutant strain.

      Strengths:

      A notable strength of the work is that it identifies a previously unrecognized activating interaction between F. nucleatum RadD and the NK cell receptor NKp46, demonstrating that the same bacterial protein can engage distinct NK cell receptors (activating or inhibitory) to exert context-dependent effects on anti-tumor immunity. This dual-receptor insight adds depth to our understanding of F. nucleatum-immune interactions and highlights the complexity of microbial modulation of the tumor microenvironment.

      Weaknesses:

      (1) A previous study by this group (PMID: 38952680) demonstrated that RadD of F. nucleatum binds to NK cells via Siglec-7, thereby diminishing their cytotoxic potential. They further proposed that the RadD-Siglec-7 interaction could act as an immune evasion mechanism exploited by tumor cells. In contrast, the present study reports that RadD of F. nucleatum can also bind to the activating receptor NKp46 on NK cells, thereby enhancing their cytotoxic function.

      While F. nucleatum-mediated tumor progression has been documented in breast and colon cancers, the current study proposes an NK-activating role for F. nucleatum in HNSC. However, it remains unclear whether tumor-infiltrating NK cells in HNSC exhibit differential expression of NKp46 compared to Siglec-7. Furthermore, heterogeneity within the NK cell compartment, particularly in the relative abundance of NKp46⁺ versus Siglec-7⁺ subsets, may differ substantially among breast, colon, and HNSC tumors. Such differences could have been readily investigated using publicly available single-cell datasets. A deeper understanding of this subset heterogeneity in NK cells would better explain why F. nucleatum is passively associated with a favorable prognosis in HNSC but correlates with poor outcomes in breast and colon cancers.

      (2) The in vivo tumor data (Figure 5D-F) appear to contradict the authors' claims. Specifically, Figure 5E suggests that WT mice engrafted with AT3 breast tumors and inoculated with WT F. nucleatum exhibited an even greater tumor burden compared to mice not inoculated with F. nucleatum, indicating a tumor-promoting effect. This finding conflicts with the interpretation presented in both the results and discussion sections.

      (3) Although the authors acknowledge that F. nucleatum may have tumor context-specific roles in regulating NK cell responses, it is unclear why they chose a breast cancer model in which F. nucleatum has been reported to promote tumor growth. A more appropriate choice would have been the well-established preclinical oral cancer model, such as the 4-nitroquinoline 1-oxide (4NQO)-induced oral cancer model in C57BL/6 mice, which would more directly relate to HNSC biology.

      (4) Since RadD of F. nucleatum can bind to both Siglec-7 and NKp46 on NK cells, exerting opposing functional effects, the expression profiles of both receptors on intratumoral NK cells should be evaluated. This would clarify the balance between activating and inhibitory signals in the tumor microenvironment and provide a more mechanistic explanation for the observed tumor context-dependent outcomes.

    1. Author response:

      Reviewer #1 (Public review):

      For summary:

      Thank you for your insightful and rigorous review. We fully agree with your core concern: establishing a causal link between MORC2 phase separation (PS) and its gene regulatory function is not only a key need in the phase separation field but also essential to elevating the overall utility of our work. To resolve the current gap in causal evidence, we will design experiments that explicitly distinguish the role of phase-separated condensates from soluble MORC2 complexes: We will generate a phase-separation-deficient but dimerization-competent MORC2 mutant by mutating key hydrophobic residues in the IDRa region (critical for IDR-IBD multivalent interactions driving phase separation) without disrupting the CC3 domain’s dimerization interface. In addition, we plan to investigate whether introducing a KS sequence[1] at the C-terminus can effectively attenuate the phase separation propensity of MORC2. These mutants will allow us to decouple “phase separation capacity” from “protein dimerization” (a prerequisite for both soluble complex formation and condensates).

      For strengths:

      We appreciate the reviewer’s recognition of our characterization of MORC2 phase separation and its structural basis. Our understanding of the CW domain’s function remains preliminary. Although we observed that the CW domain can influence condensate size, the IDR, IBD, and CC3 domains constitute the core structural elements driving phase separation. Consequently, the CW domain was not a primary focus of the current study. Nonetheless, investigating its functional contributions represents an interesting avenue for future work.

      For weaknesses:

      (1) We appreciate the reviewer’s rigorous concern. Our RNA-seq data were generated from fully independent transfections performed in triplicate across different time points and cell culture batches, aiming to maximize sample independence. However, for sensitive sequencing experiments, we observed that variability in transfection efficiency and cell culture across batches can introduce experimental differences, resulting in variable regulation of differentially expressed genes across samples. During differential gene analysis, p-value filtering excluded an additional 40 overlapping genes. In total, 61 genes overlapped with those reported in reference 22[2] (ZNF91, ZNF721, ZNF66, ZNF493, ZNF462, ZNF221, ZNF121, VGLL3, TUFT1, TLE4, TGFB2, SYS1-DBNDD2, STXBP6, SPRY2, SAMD9, ROR1, PTGES, PLK2, PLCXD2, PEA15, PDE2A, OLR1, NYAP2, NTN4, NRXN3, NEXN, MYLK, MPP7, MDGA1, MAMDC2, LBH, KRT80, ITGB8, IGFBP3, IGF2BP2, ICAM1, HIVEP3, GRB14, GPRC5A, GLCE, GJB3, GADD45B, GADD45A, FOXE1, FOSL1, FGF2, ETV5, ERBB3, DNAJC22, DIRAS1, DBNDD2, CXCL16, CRB2, COL9A3, CLDN1, BDNF, ATP8A1, AMOTL2, AHNAK2, ADAMTS16, ACSF2). To further enhance reproducibility, we will perform additional sequencing experiments.

      (2).Disease-associated mutants of MORC2

      At the current stage, the results for disease-associated mutations are descriptive. While we observed that certain mutations clustered at the N-terminus can affect MORC2 condensate formation, ATPase activity, and DNA binding, we did not identify a mechanistic explanation for these correlations. Notably, the T424R mutation, previously reported to significantly enhance ATPase activity, also increased both intracellular condensate formation and in vitro DNA binding in our experiments. In contrast, other mutations did not show such consistent effects. Previous studies have established that MORC2’s ATP-binding and DNA-binding activities are independent[2]. Our results further suggest that MORC2’s phase separation behavior is also independent of both ATP and DNA binding, although existing evidence hints at potential cross-regulatory interactions among these three functions.

      We are fully committed to implementing these revisions with strict rigor and plan to complete them within 8–10 weeks. We will submit a comprehensive response letter alongside the revised manuscript, explicitly mapping how each of your concerns has been addressed, and ensuring that our conclusions about MORC2 PS’s functional role are supported by solid, reproducible data. We believe these revisions will transform our study from a strong “mechanism-focused” work to a comprehensive one that bridges PS mechanisms and biological function—aligning with the high standards of the phase separation field. Thank you again for your invaluable guidance in improving our work.

      Reviewer #2 (Public review):

      For summary:

      Thank you for your thorough and constructive review of our manuscript. We fully agree with the key concerns you raised and have developed a detailed revision plan to address each point comprehensively. We will perform additional control and validation experiments to directly link MORC2’s condensate-forming capacity with its gene silencing function. At the current stage, the results for disease-associated mutations are descriptive. While we observed that certain mutations clustered at the N-terminus can affect MORC2 condensate formation, ATPase activity, and DNA binding, we did not identify a mechanistic explanation for these correlations. Notably, the T424R mutation, previously reported to significantly enhance ATPase activity[3], also increased both intracellular condensate formation and in vitro DNA binding in our experiments. In contrast, other mutations did not show such consistent effects. Previous studies have established that MORC2’s ATP-binding and DNA-binding activities are independent[4]. Our results further suggest that MORC2’s phase separation behavior is also independent of both ATP and DNA binding, although existing evidence hints at potential cross-regulatory interactions among these three functions.

      For strengths:

      We thank the reviewer for their appreciation of the key findings presented in this manuscript.

      For weaknesses:

      We thank the reviewer for their careful assessment of MORC2’s DNA-binding properties and its relationship with ATPase and transcriptional activities. We would like to offer the following clarifications to address these concerns, which will also be incorporated into the Discussion section of the revised manuscript.

      (1) Recent work by Tan et al.[4] similarly identified multiple DNA-binding sites in MORC2, consistent with our findings, though there are discrepancies in the precise binding regions. In particular, they reported that isolated CC1 and CC2 domains do not bind 60 bp dsDNA, which contrasts with our observations. We attribute this difference to the types of DNA used in the assays. In our study, we employed 601 DNA, a defined nucleosome-positioning sequence, which differs substantially from randomly designed short dsDNA. For instance, prior work by Christopher H. Douse et al.[3] also confirmed that MORC2’s CC1 domain can bind 601 DNA.

      (2) In the study by Fendler et al.², DNA binding was reported to reduce MORC2’s ATPase activity—an observation that appears inconsistent with the results presented in our Fig. 5j. A critical distinction between the two studies lies in the experimental systems used: Fendler et al. employed a truncated MORC2 construct (residues 1–603) and 35 bp double-stranded DNA (dsDNA), whereas our experiments utilized full-length MORC2 and 601 bp DNA (a sequence with high nucleosome assembly potential). These differences—including the absence of potentially regulatory C-terminal regions in the truncated construct and the varying length/structural properties of the DNA substrates—introduce variables that substantially complicate direct comparative analysis of ATPase activity outcomes.

      Separately, Douse et al.³ demonstrated that the efficiency of HUSH complex-dependent epigenetic silencing decreases as MORC2’s ATP hydrolysis rate increases, implying an inverse relationship between ATPase activity and silencing function. Notably, our current work has not established a direct mechanistic link between MORC2 phase separation and its ATPase activity. Thus, we refrain from inferring that the effect of MORC2 phase separation on transcriptional repression is mediated through modulation of its ATPase function—this remains an important question to address in future studies.

      (3) Finally, we plan to perform additional experiments to rule out the potential effects of CC3 dimerization. We will generate a phase-separation-deficient but dimerization-competent MORC2 mutant by mutating key hydrophobic residues in the IDRa region (critical for IDR-IBD multivalent interactions driving phase separation) without disrupting the CC3 domain’s dimerization interface. In addition, we plan to investigate whether introducing a KS sequence[1] at the C-terminus can effectively attenuate the phase separation propensity of MORC2. These mutants will allow us to decouple “phase separation capacity” from “protein dimerization”.

      We are committed to implementing these revisions with strict rigor and plan to complete them within 8–10 weeks. We will submit a detailed response letter alongside the revised manuscript, explicitly mapping how each of your concerns has been addressed, and ensuring the Discussion section is robust, context-rich, and fully integrates our work with the existing literature. We believe these improvements will significantly enhance the reliability, contextual relevance, and impact of our study, and we sincerely thank you for guiding us to elevate its quality.

      Reviewer #3 (Public review):

      For summary:

      Thank you for your insightful review and constructive suggestions, which have been invaluable in refining our manuscript. We greatly appreciate your recognition of the study’s strengths, including its logical structure, integration of multi-disciplinary approaches (in vitro LLPS assays, cellular studies, NMR, and crystallography), and the establishment of a functional link between MORC2 phase separation, DNA binding, and transcriptional control. Your identification of areas needing stronger evidence has provided clear, actionable directions for improvement, and we are fully committed to addressing each point comprehensively.

      For Major comments:

      To strengthen the manuscript as per your recommendations:

      (1) For the characterization of IDR-IBD interactions in PS: We will perform systematic in vitro assays, including PS turbidity measurements and confocal imaging of MORC2 variants lacking IDR or IBD (ΔIDR, ΔIBD) and truncated constructs (IDR alone, IBD alone). These experiments will quantify how each domain individually or synergistically contributes to phase separation propensity (e.g., critical concentration, condensate size/distribution).

      (2) To assess DNA’s influence on PS: We will generate phase diagrams by testing a range of MORC2 concentrations (0.5–10 μM) or with 601 DNA (147bp) and concentrations (0–2 μM), using turbidity assays and microscopy to map phase boundaries. This will systematically clarify how DNA modulates MORC2 phase separation.

      We plan to complete these experiments within 3–4 weeks, with rigorous quantification and statistical analysis to support our conclusions. The revised manuscript will include a detailed response letter mapping each of your suggestions to specific data additions, ensuring enhanced robustness and conviction. We believe these revisions will significantly strengthen the study’s conclusions, and we sincerely thank you for guiding us to improve its quality.

      Reference:

      [1] Mensah, M. A., Niskanen, H., Magalhaes, A. P., Basu, S., Kircher, M., Sczakiel, H. L., Reiter, A. M. V., Elsner, J., Meinecke, P., Biskup, S., et al. (2023). Aberrant phase separation and nucleolar dysfunction in rare genetic diseases. Nature 614, 564-571. https://doi.org/10.1038/s41586-022-05682-1.

      [2] Fendler, N. L., Ly, J., Welp, L., Lu, D., Schulte, F., Urlaub, H., and Vos, S. M. (2024). Identification and characterization of a human MORC2 DNA binding region that is required for gene silencing. Nucleic Acids Res 53, gkae1273. https://doi.org/10.1093/nar/gkae1273.

      [3] Douse, C. H., Bloor, S., Liu, Y. C., Shamin, M., Tchasovnikarova, I. A., Timms, R. T., Lehner, P. J., and Modis, Y. (2018). Neuropathic MORC2 mutations perturb GHKL ATPase dimerization dynamics and epigenetic silencing by multiple structural mechanisms. Nat Commun 9, 651. https://doi.org/10.1038/s41467-018-03045-x.

      [4] Tan, W., Park, J., Venugopal, H., Lou, J. Q., Dias, P. S., Baldoni, P. L., Moon, K. W., Dite, T. A., Keenan, C. R., Gurzau, A. D., et al. (2025). MORC2 is a phosphorylation-dependent DNA compaction machine. Nat Commun 16, 5606. https://doi.org/10.1038/s41467-025-60751-z.

    2. Reviewer #3 (Public review):

      Summary:

      The manuscript by Zhang et al. demonstrates that MORC2 undergoes liquid-liquid phase separation (LLPS) to form nuclear condensates critical for transcriptional repression. Using a combination of in vitro LLPS assays, cellular studies, NMR spectroscopy, and crystallography, the authors show that a dimeric scaffold formed by CC3 drives phase separation, while multivalent interactions between an intrinsically disordered region (IDR) and a newly defined IDR-binding domain (IBD) further promote condensate formation. Notably, LLPS enhances MORC2 ATPase activity in a DNA-dependent manner and contributes to transcriptional regulation, establishing a functional link between phase separation, DNA binding, and transcriptional control. Overall, the manuscript is well-organized and logically structured, offering mechanistic insights into MORC2 function, and most conclusions are supported by the presented data. Nevertheless, some of the claims are not sufficiently supported by the current data and would benefit from additional evidence to strengthen the conclusions.

      The following suggestions may help strengthen the manuscript:

      Major comments:

      (1) The central model proposes that multivalent interactions between the IDR and IBD promote MORC2 LLPS. However, the characterization of these interactions is currently limited. It is recommended that the authors perform more systematic analyses to investigate the contribution of these interactions to LLPS, for example, by in vitro assays assessing how the IDR or IBD individually influence MORC2 phase separation.

      (2) The authors mention that DNA binding can promote MORC2 LLPS. It is recommended that they generate a phase diagram to systematically assess how DNA influences phase separation.

      (3) The authors use the N39A mutant as a negative control to study the effect of DNA binding on ATP hydrolysis. Given that N39A is defective in DNA binding, it could also be employed to directly test whether DNA binding influences MORC2 phase separation.

      (4) Many of the cellular and in vitro LLPS experiments employ EGFP fusions. The authors should evaluate whether the EGFP tag influences MORC2 phase separation behavior.

    3. Reviewer #2 (Public review):

      Summary:

      The study by Zhang et al. focuses on how phase separation of a chromatin-associated protein MORC2, could regulate gene expression. Their study shows that MORC2 forms dynamic nuclear condensates in cells. In vitro, MORC2 phase separation is driven by dimerization and multivalent interactions involving the C-terminal domain. A key finding is that the intrinsically disordered region (IDR) of MORC2 exhibits strong DNA binding. They report that DNA binding enhances MORC2's phase separation and its ATPase activity, offering new insights into how MORC2 contributes to chromatin organization and gene regulation. The authors try to correlate MORC2's condensate-forming ability with its gene silencing function, but this warrants additional controls and validation. Moreover, they investigate the effect of disease-linked mutations in the N-terminal domain of MORC2 on its ability to form cellular condensates, ATPase activity, and DNA-binding, though the findings appear inconclusive in the manuscript's current form.

      Strengths:

      The authors determined a 3.1 Å resolution crystal structure of the dimeric coiled-coil 3 (CC3) domain of MORC2, revealing a hydrophobic interface that stabilizes dimer formation. They present extensive evidence that MORC2 undergoes liquid-liquid phase separation (LLPS) across multiple contexts, including in vitro, in cellulo, and in vivo. Through systematic cellular screening, they identified the C-terminal domain of MORC2 as a key driver of condensate formation. Biophysical and biochemical analyses further show that the IDR within the C-terminal domain interacts with the C-terminal end region (IBD) and also exhibits strong DNA-binding capacity, both of which promote MORC2 phase separation. Together, this study emphasizes that interactions mediated by multiple domains-CC3, IDR, and IBD- drives MORC2 phase separation. Finally, the authors quantified the effect of removing the CC3 on the upregulation and downregulation of target gene expression.

      Weaknesses:

      Though the findings appear compelling in isolation, the study lacks discussion on how its findings compare with previous studies. Particularly in the context of MORC2-DNA binding, there are previous studies extensively exploring MORC2-DNA binding (Tan, W., Park, J., Venugopal, H. et al. Nat Commun 2025), and its effect on ATPase activity (ref 22). The contradictory results in ref 22 about the impact of DNA-binding on ATPase activity, and ATPase activity on transcriptional repression, warrant proper discussion. The authors performed extensive in-cellulo screening for the investigation of domain contribution in MORC2 condensate formation, but the study does not consider/discuss the possibility of some indirect contributions from the complex cellular environment. Alternatively, the domain-specific contributions could be quantified in vitro by comparing phase diagrams for their variants. While the basis of this study is to investigate the mechanism of MORC2 condensate-mediated gene silencing, the findings in Figure 6 appear incomplete because the CC3 deletion not only affects phase separation of MORC2 but also dimerization. Furthermore, their investigation on disease-linked MORC2 mutations appears very preliminary and inconclusive because there are no obvious trends from the data. Overall, the discussion appears weak as it is missing references to previous studies and, most importantly, how their findings compare to others'.

    4. Reviewer #1 (Public review):

      Summary:

      This work demonstrates that MORC2 undergoes phase separation (PS) in cells to form nuclear condensates, and the authors demonstrate convincingly the interactions responsible for this phase separation. Specifically, the authors make good use of crystallography and NMR to identify multiple protein:protein interactions and use EMSA to confirm protein:DNA interactions. These interactions work together to promote in vitro and in cell phase separation and boost ATPase activity by the catalytic domain of MORC2.

      However, the authors have very weak evidence supporting their potentially valuable claim that MORC2 PS is important for the appropriate gene regulatory role of MORC2 in cells. Exploring causal links between PS and function is an important need in the phase separation field, particularly as regards the role of condensates in gene regulation, and is a non-trivial matter. Any study with convincing data on this matter will be very important. For this reason, it is crucial to properly explore the alternative possibility that soluble complexes, existing in the same conditions as phase-separated condensates, are the functional species. It is also critical to keep in mind that, while a specific protein domain may be essential for PS, this does not mean its only important function pertains to PS.

      In this study, the authors do not sufficiently explore the role that soluble MORC2 complexes may play alongside MORC2 condensates. Neither do they include enough data to solidly show that domain deletion leads to phenotypes via a loss of phase separation per se, rather than the loss of phase separation being a microscopically visible result, not cause, of an underlying shift in protein function. For these reasons, the authors' conclusions regarding the functional role of MORC2 condensates are based on incomplete data. This also dampens the utility of this work as a whole, since the very nice work detailing the mechanism of MORC2 PS is not paired with strong data showing the importance of this observation.

      Strengths:

      Static light scattering and crystallography are nicely used to demonstrate the dimerization of MORC2FL and to discover the structure of the CC3 domain dimer, presumably responsible for the dimerization of MORC2FL (Figure 1).

      Extensive use of deletion mutants in multiple cell lines is used to identify regions of MORC2 that are important for forming condensates in the nucleus: the IBD, IDR, and CC3 domains are found to be essential for condensate formation, while the CW domain plays an unknown role in condensate morphology (Figure 3). The authors use NMR to further identify that the IBD domain seems to interact with the first third of the centrally located IDR, termed IDRa, but not with the latter two-thirds of the IDR domain (Figure 4). This leads them to propose that phase separation is the product of IDB:IDRa interaction, CC3 dimerization, and an unknown but important role for the CW domain.

      Based on the observation that removal of the NLS resulted in diffuse cytoplasmic localization, they hypothesized that DNA may play an important role in MORC2 PS. EMSA was used to demonstrate interaction between DNA and several MORC2 domains: CC1, CC2, IDR, and TCD-CC3-IBD. Further in vitro microscopy with purified MORC2 showed that DNA addition significantly reduces MORC2 saturation concentration (Figure 5).

      These assays convincingly demonstrate that MORC2 phase separates in cells, and identify the protein domains and interactions responsible for this phenomenon, with the notable caveat that the role of the CW domain here is left unexplored.

      Weaknesses:

      Although the authors demonstrated phase separation of MORC2FL, their evidence that this plays a functional role in the cell is incomplete.

      Firstly, looking at differentially upregulated genes under MORC2FL overexpression, the authors acknowledge that only 10% are shared with differentially regulated genes identified in other MORC2FL overexpression studies (Figure 6c,d). No explanation is given for why this overlap is so low, making it difficult to trust conclusions from this data set.

      Secondly, of the 21 genes shared in this study and in earlier studies, the authors note that the differential regulation is less pronounced when a phase-separation-deficient MORC2 mutant is overexpressed, rather than MORC2FL (Figure 6e). This is taken as evidence that phase separation is important for the proper function of MORC2. However, no consideration is made for the alternative possibility that the mutant, lacking the CC3 dimerization domain, may result in non-functional complexes involving MORC2, eliminating the need for a PS-centric conclusion. To take the overexpression data as solid evidence for a functional role of MORC2 PS, the authors would need to test the alternative, soluble complex hypothesis. Furthermore, there seems to be low replicate consistency for the MORC2 mutant condition (Figure S6a), with replicate 3 being markedly upregulated when compared to replicates 1 and 2.

      Thirdly, the authors close by examining the in-cell PS capabilities and ATPase activity of several disease-associated mutants of MORC2 ( Figure 7). However, the relevance of these mutants to the past 6 figures is unclear. None of these mutations is in regions identified as important for PS. Two of the mutations result in a higher percentage of the cell population being condensate-positive, but this is not seemingly connected to ATPase activity, as only one of these two mutants has increased ATPase activity. Figure 7 does not add any support to the main hypotheses in the paper, and nowhere in the paper do the authors investigate the protein regions where the mutations in Figure 7 are found.

    5. eLife Assessment

      This useful study has demonstrated that MORC2 undergoes phase separation in cells and established multiple interactions responsible for the phase separation. While the characterizations of protein-protein and protein-DNA interactions are solid, there is currently incomplete evidence supporting the claim that MORC2 phase separation contributes to the gene regulatory role of MORC2 in cells. With a stronger link between MORC2 phase separation and cellular function, and further analysis of how disease-linked mutations impact condensation propensity, this study would be of significant interest to biophysicists and molecular biologists working on the role of condensates in gene regulation.

    1. eLife Assessment

      This important study demonstrates that ocular organoids can generate both retina and lens through a non-canonical, "inside-out" morphogenetic route. The work is solid, with well-designed experiments combining imaging, molecular analyses, and transcriptomics to establish that lens formation in organoids follows conserved molecular programs despite an alternative morphogenesis. These findings expand our understanding of self-organization and developmental plasticity, and will be of broad interest to researchers working on eye development, organoids, and tissue engineering.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      The authors focused on medaka retinal organoids to investigate the mechanism underlying the eye cup morphogenesis. The authors succeeded to induce lens formation in fish retinal organoids using 3D suspension culture with minimal growth factor-containing media containing the Hepes. At day 1, Rx3:H2B-GFP+ cells appear in the surface region of organoids. At day 1.5, Prox1+cells appear in the interface area between the organoid surface and the core of central cell mass, which develops a spherical-shaped lens later. So, Prox1+ cells covers the surface of the internal lens cell core. At day 2, foxe3:GFP+ cells appear in the Prox1+ area, where early lens fiber marker, LFC, starts to be expressed. In addition, foxe3:GFP+ cells show EdU+ incorporation, indicating that foxe3:GFP+ cells have lens epithelial cell-characters. At day 4, cry:EGFP+ cells differentiate inside the spherical lens core, whose surface area consists of LFC+ and Prox1+ cells. Furthermore, at day 4, the lens core moves towards the surface of retinal organoids to form an eyecup like structure, although this morphogenesis "inside out" mechanism is different from in vivo cellular "outside -in" mechanism of eye cup formation. From these data, the authors conclude that optic cup formation, especially the positioning of the lens, is established in retinal organoids though the different mechanism of in vivo morphogenesis.

      Overall, manuscript presentation is nice. However, there are still obscure points to understand background mechanism. My comments are shown below.

      Major comments:

      (1) At the initial stage of retinal organoid morphogenesis, a spherical lens is centrally positioned inside the retinal organoids, by covering a central lens core by the outer cell sheet of retinal precursor cells. I wonder if the formation of this structure may be understood by differential cell adhesive activity or mechanical tension between lens core cells and retinal cell sheet, just like the previous study done by Heisenberg lab on the spatial patterning of endoderm, mesoderm and ectoderm (Nat. Cell Biol. 10, 429 - 436 (2008)). Lens core cells may be integrated inside retinal cell mass by cell sorting through the direct interaction between retinal cells and lens cells, or between lens cells and the culture media. After day 1, it is also possible to understand that lens core moves towards the surface of retinal organoids, if adhesive/tensile force states of lens core cells may be change by secretion of extracellular matrix. I wonder if the authors measure physical property, adhesive activity and solidness, of retinal precursor cells and lens core cells. If retinal organoids at day 1 are dissociated and cultured again, do they show the same patterning of internal lens core covering by the outer retinal cell sheet?

      (2) Optic cup is evaginated from the lateral wall of neuroepithelium of the diencephalon. In zebrafish, cell movement occurs from the pigment epithelium to the neural retina during eye morphogenesis in an FGF-dependent manner. How the medaka optic cup morphogenesis is coordinated? I also wonder if the authors conduct the tracking of cell migration during optic cup morphogenesis to reveal how cell migration and cell division are regulated in lens of the Medaka retinal organoids. It is also interesting to examine how retinal cell movement is coordinated during Medaka retinal organoids.

      (3) The authors showed that blockade of FGF signaling affects lens fiber differentiation in day 1-2, whereas lens formation seems to be intact in the presence of FGF receptor inhibitor in day 0-1. I suggest the authors to examine which tissue is a target of FGF signaling in retinal organoids, using markers such as pea3, which is a downstream target of ERK branch of FGF signaling. Since FGF signaling promotes cell proliferation, is the lens core size normal in SU5402-treated organoids from day 0 to day 1?

      (4) Fig. 3f and 3g indicate that there is some cell population located between foxe3:GFP+ cells and rx2:H2B-RFP+ cells. What kind of cell-type is occupied in the interface area between foxe3:GFP+ cells and rx2:H2B-RFP+ cells?

      (5) Fig. 5e indicates the depth of Rx3 expression at day 1. Is the depth the thickness of Rx3 expressing cell sheet, which covers the central lens core in the organoids? If so, I wonder if total cell number of Rx3 expressing cell sheet may be different in each seeded-cell number, because thickness is the same across each seeded-cell number, but the surface area size may be different depending on underneath the lens core size. Please clarify this point.

      (6) Noggin application inhibits lens formation at day 0-1. BMP signaling regulates formation of lens placode and olfactory placode at the early stage of development. It is interesting to examine whether Noggin-treated organoid expands olfactory placode area. Please check forebrain territory markers.

      Significance:

      Strength: This study is unique. The authors examined eye cup morphogenesis using fish retinal organoids. Eye cup normally consists of the lens, the neural retina, pigment epithelium and optic stalk. However, retinal organoids seem to be simple and consists of two cell types, lens and retina. Interestingly, a similar optic cup-like structure is achieved in both cases; however, underlying mechanism is different. It is interesting to investigate how eye morphogenesis is regulated in retinal organoids, under the unconstrained embryo-free environment.

      Limitation: Description is OK, but analysis is not much profound. It is necessary to apply a bit more molecular and cellular level analysis, such as tracking of cell movement and visualization of FGF signaling in organoid tissues.

      Advancement: The current study is descriptive. Need some conceptual advance, which impact cell biology field or medical science.

      Audience: The target audience of current study are still within ophthalmology and neuroscience community people, maybe translational/clinical rather than basic biology. To beyond specific fields, need to formulate a general principle for cell and developmental biology.

    3. Reviewer #2 (Public review):

      Summary:

      In this study from Stahl et al., the authors demonstrate that medaka pluripotent embryonic cells can self-organise into eye organoids containing both retina and lens tissues. While these organoids can self-organize into an eye structure that resembles the vertebrate eye, they are built from a fundamentally different morphogenetic process - an "inside-out" mechanism where the lens forms centrally and moves outward, rather than the normal "outside-in" embryonic process. This is a very interesting discovery, both for our understanding of developmental biology and the potential for tissue engineering applications. The study would benefit from some additional experiments and a few clarifications. The authors suggest that the lens cells are the ones that move from the central to a more superficial position. Is this an active movement of lens cells or just the passive consequence of the retina cells acquiring a cup shape? Are the retina cells migrating behind the lens or the lens cells pushing outwards? High-resolution imaging of organoid cup formation, tracking retina cells in combination with membrane labeling of all cells would help elucidate the morphogenetic processes occurring in the organoids. Membrane labeling would also be useful as Prox1 positive lens cells appear elongated in embryos while in the organoids, cell shapes seem less organised, less compact and not elongated (for example as shown in Fig 3f,g).

      The organoids could be a useful tool to address how cell fate is linked to cell shape acquisition. In the forming organoids, retinal tissue initially forms on the outside, while non-retinal tissue is located in the centre; this central tissue later expresses lens markers. Do the authors have any insights into why fate acquisition occurs in this pattern? Is there a difference in proliferation rates between the centrally located cells and the external ones? Could it be that highly proliferative cells give rise to neural retina (NR), while lower proliferating cells become lens?

      What happens in organoids that do not form lenses? Do these organoids still generate foxe3 positive cells that fail to develop into a proper lens structure? And in the absence of lens formation, does the retina still acquire a cup shape?

      The author suggest that lens formation occurs even in the absence of Matrigel. Is the process slower in these conditions? Are the resulting organoids smaller? While there are indeed some LFC expressing cells by day2, these cells are not very well organised and the pattern of expression seems dotty. Moreover, LFC staining seems to localise posterior to the LFC negative, lens-like structure (e.g. Fig.S1 3o'clock).

      How do these organoids develop beyond day 4? Do they maintain their structural integrity at later stages?

      The role of HEPES in promoting organoid formation is intriguing. Do the authors have any insights into why it is important in this context? Have the authors tried other culture conditions and does culture condition influence the morphogenetic pathways occurring within the organoids?

      Significance:

      This is a very interesting paper, and it will be important to determine whether this alternative morphogenetic process is specific to medaka or if similar developmental routes can be recapitulated in organoid cultures from other vertebrate species.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Stahl and colleagues reports an approach to generate ocular organoids composed of retinal and lens structures, derived from Medaka blastula cells. The authors present a comprehensive characterisation of the timeline followed by lens and retinal progenitors, showing these have distinct origins, and that they recapitulate the expression of differentiation markers found in vivo. Despite this molecular recapitulation, morphogenesis is strikingly different, with lens progenitors arising at the centre of the organoid, and subsequently translocating to the outside.

      Major Comments:

      - The manuscript presents a beautiful set of high-quality images showing expression of lens differentiation markers over time in the organoids. The set of experiments is very robust, with high numbers of organoids analysed and reproducible data. The mechanism by which lens specification is promoted in these organoids is, however, poorly analysed, and the reader does not get a clear understanding of what is different in these experiments, as compared to previous attempts, to support lens differentiation. There is a mention to HEPES supplementation, but no further analysis is provided, and the fact that the process is independent of ECM contradicts, as the authors point out, previous reports. The manuscript would benefit from a more detailed analysis of the mechanisms that lead to lens differentiation in this setting.

      - The markers analysed to show onset of lens differentiation in the organoids seem to start being expressed, in vivo, when the lens placode starts invaginating. An analysis of earlier stages is not presented. This would be very informative, allowing to determine whether progenitors differentiate as placode and neuroepithelium first, to subsequently continue differentiating into lens and retina, respectively. Could early placodal and anterior neural plate markers be analysed in the organoids? This would provide a more complete sequence of lens vs retina differentiation in this model.

      - The analysis of BMP and Fgf requirement for lens formation and differentiation is suggestive, but the source of these signals is not resolved or mentioned in the manuscript. Are BMP4 and Fgf8 expressed by the organoids? Where are they coming from?

      - The fact that the lens becomes specified in the centre of the organoid is striking, but it is for me difficult to visualise how it ends up being extruded from the organoid. Did the authors try to follow this process in movies? I understand that this may be technically challenging, but it would certainly help to understand the process that leads to the final organisation of retinal and lens tissues in the organoid. There is no discussion of why the morphogenetic mechanism is so different from the in vivo situation. The manuscript would benefit from explicitly discussing this.

      Significance:

      This study describes a reproducible approach to differentiate ocular organoids composed of lens and retinal tissues. The characterisation of lens differentiation in this model is very detailed, and despite the morphogenetic differences, the molecular mechanisms show many similarities to the in vivo situation. The manuscript however does not highlight, in my opinion, why this model may be relevant. Clearly articulating this relevance, particularly in the discussion, will enhance the study and provide more clarity to the readers regarding the significance of the study for the field of organoid research, ocular research and regenerative studies.

    5. Author response:

      Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      The authors focused on medaka retinal organoids to investigate the mechanism underlying the eye cup morphogenesis. The authors succeeded to induce lens formation in fish retinal organoids using 3D suspension culture with minimal growth factor-containing media containing the Hepes. At day 1, Rx3:H2B-GFP+ cells appear in the surface region of organoids. At day 1.5, Prox1+cells appear in the interface area between the organoid surface and the core of central cell mass, which develops a spherical-shaped lens later. So, Prox1+ cells covers the surface of the internal lens cell core. At day 2, foxe3:GFP+ cells appear in the Prox1+ area, where early lens fiber marker, LFC, starts to be expressed. In addition, foxe3:GFP+ cells show EdU+ incorporation, indicating that foxe3:GFP+ cells have lens epithelial cell-characters. At day 4, cry:EGFP+ cells differentiate inside the spherical lens core, whose the surface area consists of LFC+ and Prox1+ cells. Furthermore, at day 4, the lens core moves towards the surface of retinal organoids to form an eye-cup like structure, although this morphogenesis "inside out" mechanism is different from in vivo cellular "outside -in" mechanism of eye cup formation. From these data, the authors conclude that optic cup formation, especially the positioning of the lens, is established in retinal organoids though the different mechanism of in vivo morphogenesis.

      Overall, manuscript presentation is nice. However, there are still obscure points to understand background mechanism. My comments are shown below.

      Major comments

      (1) At the initial stage of retinal organoid morphogenesis, a spherical lens is centrally positioned inside the retinal organoids, by covering a central lens core by the outer cell sheet of retinal precursor cells. I wonder if the formation of this structure may be understood by differential cell adhesive activity or mechanical tension between lens core cells and retinal cell sheet, just like the previous study done by Heisenberg lab on the spatial patterning of endoderm, mesoderm and ectoderm (Nat. Cell Biol. 10, 429 - 436 (2008)). Lens core cells may be integrated inside retinal cell mass by cell sorting through the direct interaction between retinal cells and lens cells, or between lens cells and the culture media. After day 1, it is also possible to understand that lens core moves towards the surface of retinal organoids, if adhesive/tensile force states of lens core cells may be change by secretion of extracellular matrix. I wonder if the authors measure physical property, adhesive activity and solidness, of retinal precursor cells and lens core cells. If retinal organoids at day 1 are dissociated and cultured again, do they show the same patterning of internal lens core covering by the outer retinal cell sheet?

      The question, whether different adhesive activity is involved in cell sorting and lens formation is indeed very intriguing. To address this point, we will include additional experiment (see Revision Plan, experiment 1). This experiment will be based on the dissociation and re-aggregation of lens-forming organoids as suggested by the reviewer. To monitor cell type specific sorting, we will employ a lens progenitor reporter line Foxe3::GFP and the retina-specific Rx2::H2B-RFP. If different adhesive activities of lens and retinal progenitor cells are involved and drive the process of cell sorting, dissociation and re-aggregation will result in cell sorting based on their identity. 

      (2) Optic cup is evaginated from the lateral wall of neuroepithelium of the diencephalon. In zebrafish, cell movement occurs from the pigment epithelium to the neural retina during eye morphogenesis in an FGF-dependent manner. How the medaka optic cup morphogenesis is coordinated? I also wonder if the authors conduct the tracking of cell migration during optic cup morphogenesis to reveal how cell migration and cell division are regulated in lens of the Medaka retinal organoids. It is also interesting to examine how retinal cell movement is coordinated during Medaka retinal organoids.

      Looking into the detail of how optic cup-looking tissue arrangement of ocular organoids is achieved on cellular level is of course interesting. Our previous study showed that optic vesicles of medaka retinal organoids do not form optic cups (for details please see Zilova et al., 2021, eLIFE). We assume that the formation of cup-looking structure of the ocular organoids is mediated by the following processes: establishment of retina and lens domains at the specific region of the organoid – retina on the surface and lens in the center (see Figure S2 d and Figure 3e, and Figure 4). Further dislocation of the centrally formed lens towards the organoid periphery through the retina layer, places the lens to the periphery while retinal cells stay static. We assume that the “cup-like” shape is acquired by extrusion of the lens from the center of the organoid. To clarify this process with respect to tissue rearrangements and cell movements, we will include additional experiments (see Revision Plan, experiment 2) and follow lens- and retina-fated cells (by employing lens-specific Foxe3::GFP and retina-specific Rx2::H2B-RFP reporter lines) through the process of lens extrusion to dissect individual contribution of retinal/lens cells to this process (cross-reference with Reviewer #2).

      (3) The authors showed that blockade of FGF signaling affects lens fiber differentiation in day 1-2, whereas lens formation seems to be intact in the presence of FGF receptor inhibitor in day 0-1. I suggest the authors to examine which tissue is a target of FGF signaling in retinal organoids, using markers such as pea3, which is a downstream target of ERK branch of FGF signaling. Since FGF signaling promotes cell proliferation, is the lens core size normal in SU5402-treated organoids from day 0 to day 1?

      Assessing the activity of FGF signaling (cross-reference to Reviewer #3) in the organoids is indeed an important point. To address which tissue is the target of FGF signaling we will include additional experiments and assess the phosphorylation status of ERK (pERK) and expression of the ERK downstream target pea3, as suggested by the reviewer (see Revision Plan, experiment 3). That will allow to identify the tissue within the organoid responding to the Fgf signaling.

      Lens core size of organoids treated with SU5402 from day 0 to day 1 is fully comparable to the control (please see Figure 6b).

      (4) Fig. 3f and 3g indicate that there is some cell population located between foxe3:GFP+ cells and rx2:H2B-RFP+ cells. What kind of cell-type is occupied in the interface area between foxe3:GFP+ cells and rx2:H2B-RFP+ cells?

      That is for sure an interesting question. We are aware of this population of cells. We currently do not have data that would with certainty clarify the fate of those cells. We are currently following up on that question with the use of scRNA sequencing, however we will not be able to address this question in the current manuscript.

      (5) Fig. 5e indicates the depth of Rx3 expression at day 1. Is the depth the thickness of Rx3 expressing cell sheet, which covers the central lens core in the organoids? If so, I wonder if total cell number of Rx3 expressing cell sheet may be different in each seeded-cell number, because thickness is the same across each seeded-cell number, but the surface area size may be different depending on underneath the lens core size. Please clarify this point.

      Yes. Figure 5e indicates the thickness of the cell sheet expressing Rx3 that lies on the surface of the organoid. Indeed, the number of Rx3-expressing cells (and lens cells) scales with the size of the organoid as stated in the submitted manuscript.

      (6) Noggin application inhibits lens formation at day 0-1. BMP signaling regulates formation of lens placode and olfactory placode at the early stage of development. It is interesting to examine whether Noggin-treated organoid expands olfactory placode area. Please check forebrain territory markers.

      What tissue differentiates at the expense of the lens in BMP inhibitor-treated organoids is of course an intriguing question. To address the identity of cells differentiated under this condition we will include an additional experiment (see Revision Plan, experiment 4 as suggested by the reviewer). We will check for the expression of Lhx2, Otx2 and Huc/D to address this point.

      I have no minor comments

      Referees cross-commenting

      I agree that all reviewers have similar suggestions, which are reasonable and provided the same estimated time for revision.

      Reviewer #1 (Significance):

      Strength:

      This study is unique. The authors examined eye cup morphogenesis using fish retinal organoids. Eye cup normally consists of the lens, the neural retina, pigment epithelium and optic stalk. However, retinal organoids seem to be simple and consists of two cell types, lens and retina. Interestingly, a similar optic cup-like structure is achieved in both cases; however, underlying mechanism is different. It is interesting to investigate how eye morphogenesis is regulated in retinal organoids,under the unconstrained embryo-free environment.

      Limitation:

      Description is OK, but analysis is not much profound. It is necessary to apply a bit more molecular and cellular level analysis, such as tracking of cell movement and visualization of FGF signnaling in organoid tissues.

      Advancement:

      The current study is descriptive. Need some conceptual advance, which impact cell biology field or medical science.

      Audience:

      The target audience of current study are still within ophthalmology and neuroscience community people, maybe translational/clinical rather than basic biology. To beyond specific fields, need to formulate a general principle for cell and developmental biology.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this study from Stahl et al., the authors demonstrate that medaka pluripotent embryonic cells can self-organise into eye organoids containing both retina and lens tissues. While these organoids can self-organize into an eye structure that resembles the vertebrate eye, they are built from a fundamentally different morphogenetic process – an “inside-out” mechanism where the lens forms centrally and moves outward, rather than the normal “outside-in” embryonic process. This is a very interesting discovery, both for our understanding of developmental biology and the potential for tissue engineering applications. The study would benefit from some additional experiments and a few clarifications.

      The authors suggest that the lens cells are the ones that move from the central to a more superficial position. Is this an active movement of lens cells or just the passive consequence of the retina cells acquiring a cup shape? Are the retina cells migrating behind the lens or the lens cells pushing outwards? High-resolution imaging of organoid cup formation, tracking retina cells in combination with membrane labeling of all cells would help elucidate the morphogenetic processes occurring in the organoids. Membrane labeling would also be useful as Prox1 positive lens cells appear elongated in embryos while in the organoids, cell shapes seem less organised, less compact and not elongated (for example as shown in Fig 3f,g).

      Looking into the detail of how optic cup-looking tissue arrangement of ocular organoids is achieved on cellular level is of course interesting. We assume that the formation of cup-looking structures of the ocular organoids is mediated by following processes: establishment of retina and lens domains at a specific region of the organoid – retina on the surface and lens in the center (see Figure S2 d and Figure 3e, and Figure 4). Further dislocation of centrally formed lenses towards the organoid periphery through the retina layer, place the lens to the periphery while retinal cells stay static. We assume that the “cup-like” shape is acquired by extrusion of the lens. To clarify this process with respect to tissue rearrangements and cell movements, we will include additional experiments (see Revision Plan, experiment 2). We will follow lens- and retina-fated cells (by employing lens-specific Foxe3::GFP and retina-specific Rx2::H2B-RFP reporter lines) through the process of lens extrusion to dissect the individual contribution of retinal/lens cells to this process (cross-reference with Reviewer #1).

      The organoids could be a useful tool to address how cell fate is linked to cell shape acquisition. In the forming organoids, retinal tissue initially forms on the outside, while non-retinal tissue is located in the centre; this central tissue later expresses lens markers. Do the authors have any insights into why fate acquisition occurs in this pattern? Is there a difference in proliferation rates between the centrally located cells and the external ones? Could it be that highly proliferative cells give rise to neural retina (NR), while lower proliferating cells become lens?

      The question how is the retinal and lens domain established in this specific manner is indeed intriguing and very interesting. We dedicated a part of the discussion to this topic. We discuss the role of the diffusion limit and the potential contribution of BMB and FGF signaling to this arrangement. Additional experiments (see Revision Plan, experiment 3) addressing the source and target tissues of FGF and BMP signaling in the organoid will ultimately bring more clarity to our understanding of the tissue arrangements in the organoid. 

      Although analysis of the proliferation rate of the cells at the surface and in the central region of the organoid might possibly show some differences in the proliferation rates between lens and retinal cells, we do not have any indications, that the proliferation rate itself would be instructive or superior to the cell fate decisions.

      What happens in organoids that do not form lenses? Do these organoids still generate foxe3 positive cells that fail to develop into a proper lens structure? And in the absence of lens formation, does the retina still acquire a cup shape?

      Lens formation is primarily dependent on acquisition/specification of Foxe3-expressing lens placode progenitors. If those are not present, a lens does not develop. Once Foxe3-expressing progenitors are established, a lens is formed in unperturbed conditions (measured by the presence of expression of crystallin proteins). In such conditions, organoids that do not have a lens, do not carry Foxe3-expressing cells.

      In the absence of the lens, the organoid is composed of retinal neuroepithelium, that does not form an optic cup (for details of such phenotypes please see Zilova et al., 2021, eLIFE).

      The author suggest that lens formation occurs even in the absence of Matrigel. Is the process slower in these conditions? Are the resulting organoids smaller? While there are indeed some LFC expressing cells by day2, these cells are not very well organised and the pattern of expression seems dotty. Moreover, LFC staining seems to localise posterior to the LFC negative, lens-like structure (e.g. Fig.S1 3o’clock).

      How do these organoids develop beyond day 4? Do they maintain their structural integrity at later stages?

      The role of HEPES in promoting organoid formation is intriguing. Do the authors have any insights into why it is important in this context? Have the authors tried other culture conditions and does culture condition influence the morphogenetic pathways occurring within the organoids?

      We thank the reviewer for pointing this out. We were not clear in the wording and describing of our observation. Indeed, Matrigel is not required for acquisition of lens fate, which can be demonstrated with the expression of lens-specific markers. However, the presence of Matrigel has a profound impact on the structural aspects of organoid formation. Matrigel is essential for organization of retinal-committed cells into the retinal epithelium (Zilova et al., 2021, eLIFE). The absence of the structure of the retinal epithelium can indeed negatively impact on the cellular organization and the overall lens structure. To clarify the contribution of the Matrigel to the speed of organoid lens development and to the overall structure of the organoid lens we will perform additional experiments (see Revision Plan, experiment 5). With the use of Foxe3::GFP reporter line we will measure the onset of the lens-specific gene expression. In addition, we will use the immunohistochemistry to assess the gross morphology and size of the organoids grown without the Matrigel (cross-reference with Reviewer #3).

      The role of the HEPES in lens formation is indeed very intriguing and currently under investigation. As HEPES is mainly used to regulate pH of the culture media and pH might have an impact on multiple cellular processes, it will require significant time investment to dissect molecular mechanism underlying the effect of HEPES on the process of lens formation (cross reference with Reviewer #3) and therefore cannot be addressed in the current manuscript.

      Referees cross-commenting

      Pleased to see that all the other reviewers are positive about the study and raise similar concerns and comments

      Reviewer #2 (Significance):

      This is a very interesting paper, and it will be important to determine whether this alternative morphogenetic process is specific to medaka or if similar developmental routes can be recapitulated in organoid cultures from other vertebrate species.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      The manuscript by Stahl and colleagues reports an approach to generate ocular organoids composed of retinal and lens structures, derived from Medaka blastula cells. The authors present a comprehensive characterisation of the timeline followed by lens and retinal progenitors, showing these have distinct origins, and that they recapitulate the expression of differentiation markers found in vivo. Despite this molecular recapitulation, morphogenesis is strikingly different, with lens progenitors arising at the centre of the organoid, and subsequently translocating to the outside.

      Comments:

      - The manuscript presents a beautiful set of high quality images showing expression of lens differentiation markers over time in the organoids. The set of experiments is very robust, with high numbers of organoids analysed and reproducible data. The mechanism by which lens specification is promoted in these organoids is, however, poorly analysed, and the reader does not get a clear understanding of what is different in these experiments, as compared to previous attempts, to support lens differentiation. There is a mention to HEPES supplementation, but no further analysis is provided, and the fact that the process is independent of ECM contradicts, as the authors point out, previous reports. The manuscript would benefit from a more detailed analysis of the mechanisms that lead to lens differentiation in this setting.

      The role of the HEPES in lens formation is indeed very intriguing and under current investigation. As HEPES is mainly used to regulate pH of the culture media and pH might have an impact on multiple cellular processes it will require a significant time investment to dissect molecular mechanism underlying the effect of HEPES on the process of lens formation (cross reference with Reviewer #2) and therefore unfortunately cannot be addressed in the current manuscript.

      To clarify the contribution of the Matrigel to the organoid lens development we will perform additional experiments (see Revision Plan, experiment 5). With the use of Foxe3::GFP reporter line we will measure the onset of the lens-specific gene expression. In addition, we will use the immunohistochemistry to assess the gross morphology and size of the organoids grown without the Matrigel (cross-reference with Reviewer #2).

      - The markers analysed to show onset of lens differentiation in the organoids seem to start being expressed, in vivo, when the lens placode starts invaginating. An analysis of earlier stages is not presented. This would be very informative, allowing to determine whether progenitors differentiate as placode and neuroepithelium first, to subsequently continue differentiating into lens and retina, respectively. Could early placodal and anterior neural plate markers be analysed in the organoids? This would provide a more complete sequence of lens vs retina differentiation in this model.

      Yes. The figures show the expression of lens and retinal markers in the embryo in later developmental stages and the timing of their expression can be documented with higher temporal resolution. In the revised version of the manuscript, we will provide the information about the onset of expression of Rx3::H2B-GFP (retina) and Foxe3::GFP (lens) (see Author response image 1). Rx3 represents one of the earlies markers labeling the presumptive eye field within the region of the anterior neural plate (S16, late gastrula). FoxE3::GFP expression can be detected within the head surface ectoderm before the lens placode is formed showing that Foxe3 is a suitable marker of placodal progenitors in medaka.

      We are convinced that the onset of Rx3 and Foxe3-driven reporters is early enough to make the claim about the separate origin of the lens (placodal) and retinal (anterior neuroectoderm) tissues within the ocular organoids.

      Author response image 1.

      - The analysis of BMP and Fgf requirement for lens formation and differentiation is suggestive, but the source of these signals is not resolved or mentioned in the manuscript. Are BMP4 and Fgf8 expressed by the organoids? Where are they coming from?

      Indeed, addressing the source of BMP and FGF activation would bring more clarity in understanding the mechanism of retina/lens specification within the ocular organoids (cross reference with Reviewer #1). To address this point, we will include additional experiments (see Revision Plan, experiment 3). We will analyze the expression of respective ligands (Bmp4 and Fgf8) and activation of downstream effectors of BMP and FGF signaling pathways within the ocular organoids as suggested by Reviewer #1 and Reviewer #3.

      - The fact that the lens becomes specified in the centre of the organoid is striking, but it is for me difficult to visualise how it ends up being extruded from the organoid. Did the authors try to follow this process in movies? I understand that this may be technically challenging, but it would certainly help to understand the process that leads to the final organisation of retinal and lens tissues in the organoid. There is no discussion of why the morphogenetic mechanism is so different from the in vivo situation. The manuscript would benefit from explicitly discussing this.

      Following the extruding lens in vivo is indeed very relevant suggestion. To clarify the process of ocular organoid formation in the respect of tissue rearrangements and cell movements, we will include additional experiment (see Revision Plan, experiment 2). We will follow lens- and retina-fated cells (by employing lens-specific Foxe3::GFP and retina-specific Rx2::H2B-RFP reporter lines) through the process of lens extrusion (cross-reference with Reviewer #1 and Reviewer #2).

      Referees cross-commenting

      We all seem to have similar comments and concerns. I think overall the suggestions are feasible and realistic for the timeframe provided.

      Reviewer #3 (Significance):

      This study describes a reproducible approach to differentiate ocular organoids composed of lens and retinal tissues. The characterisation of lens differentiation in this model is very detailed, and despite the morphogenetic differences, the molecular mechanisms show many similarities to the in vivo situation. The manuscript however does not highlight, in my opinion, why this model may be relevant. Clearly articulating this relevance, particularly in the discussion, will enhance the study and provide more clarity to the readers regarding the significance of the study for the field of organoid research, ocular research and regenerative studies.

      Revision Plan:

      (1) To address whether differential adhesion properties of retinal and lens progenitors mediate cell sorting to establish retina and lens domains in the organoids (Reviewer #1, comment 1), we will perform dissociation of the organoids on day 1 and subsequential re-aggregation. This experiment will allow to follow cell type specific adhesion properties of lens and retinal progenitor cells. We will employ lens progenitor reporter line Foxe3::GFP and retina-specific Rx2::H2B-RFP to monitor cell type specific sorting with fluorescent microscopy.

      (2)   Multiple reviewers (Reviewer #1, Reviewer #2, Reviewer #3) asked for the presentation of detailed in vivo imaging experiment showing individual contributions of retina- and lens- fated cells to the resulting tissue organization withing the ocular organoid. We will perform in vivo live imaging experiment to follow the movements of individual lens (Foxe3::GFP) and retinal (Rx2::H2B-GFP) cells from day 1 to day 2 of organoid development to address this point.

      (3) Reviewer #1 and Reviewer #3 raised questions concerning the role of FGF and BMP signaling and sources of these signaling pathway activities in ocular organoid tissue arrangement. To address this point and bring more light into the molecular mechanisms regulating lens and retina tissue arrangement in the organoid, we will perform additional experiment. We will assess the expression of candidate FGF and BMP ligands (Fgf8, Bmp7 and Bmp4) and activation of downstream effectors (p-ERK, p-SMAD) and the direct transcriptional target of Fgf signaling (Pea3) in the developing organoids. This will allow the identification of the tissue producing the ligand on one site and tissue responding to the signaling on the other site and help out to narrow down the molecular mechanism controlling tissue arrangements in the organoid.

      (4) We will analyze the expression of forebrain territory markers in organoids treated with the BMP inhibitor to identify the identity of the tissue differentiating at the expense of lens under the BMP inhibition (suggested by Reviewer #1). We will label Noggin-treated organoids with the antibodies against Lhx2, Otx2 and HuC/D to address this point.

      (5) We will provide more comprehensive analysis of the organoids grown without the Matrigel and compare them to the organoids grown in the presence of the Matrigel (mentioned by Reviewer #2 and Reviewer #3). With the use of lens progenitor-specific Foxe3::GFP reporter line, we will measure the onset of the lens-specific gene expression. In addition, we will use the immunohistochemistry to assess the gross morphology and size of the organoids grown without the Matrigel.

      Description of analyses that authors prefer not to carry out

      Reviewer #1:

      (4) Fig. 3f and 3g indicate that there is some cell population located between foxe3:GFP+ cells and rx2:H2B-RFP+ cells. What kind of cell-type is occupied in the interface area between foxe3:GFP+ cells and rx2:H2B-RFP+ cells?

      That is for sure interesting question. We are aware of this population of cells. We currently do not have a data that would with certainty clarify the fate of those cells. We are currently following up on that question with the use of scRNA sequencing, however we will not be able to address this question in the current manuscript.

      Reviewer #2:

      The role of HEPES in promoting organoid formation is intriguing. Do the authors have any insights into why it is important in this context? Have the authors tried other culture conditions and does culture condition influence the morphogenetic pathways occurring within the organoids?

      The role of the HEPES in lens formation is indeed very intriguing and under current investigation. As HEPES is mainly used to regulate pH of the culture media and pH might have impact on multiple cellular processes it will require significant time investment to dissect molecular mechanism underlying the effect of the HEPES on the process of lens formation (cross reference with Reviewer #3) and cannot be addressed in the current manuscript.

      Is there a difference in proliferation rates between the centrally located cells and the external ones? Could it be that highly proliferative cells give rise to neural retina (NR), while lower proliferating cells become lens?

      Although analysis of the proliferation rate of the cells at the surface and in the central region of the organoid might possibly show some differences in the proliferation rates between lens and retinal cells, we do not have any indications, that the proliferation rate itself would be instructive or superior to the cell fate decisions.

    1. eLife Assessment

      This important work examines how microexons contribute to brain activity, structure, and behavior. The authors find that loss of microexon sequences generally has subtle impacts on these metrics in larval zebrafish, with few exceptions. The evidence is convincing, using modern high-throughput phenotyping methodology in zebrafish. Overall, this work will be of interest to neuroscientists and generate further studies of interest to the field.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript by Lopez-Blanch and colleagues, 21 microexons are selected for a deep analysis of their impacts on behavior, development, and gene expression. The authors begin with a systematic analysis of microexon inclusion and conservation in zebrafish and use these data to select 21 microexons for further study. The behavioral, transcriptomic, and morphological data presented are largely convincing and discussion of the potential explanations for the subtle impacts of individual microexon deletions versus loss-of-function in srrm3 and/or srrm4 is quite comprehensive and thoughtful.

      Strengths:

      The study uses a wide variety of techniques to assess the impacts of microexon deletion, ranging from assays of protein function through regulation of behavior and development.

      The authors provide comprehensive analyses of the molecular impact of their microexon deletions, including examining how host-gene and paralog expression is affected.

    3. Reviewer #3 (Public review):

      Summary:

      Microexons are highly conserved alternative splice variants, the individual functions of which have thus far remained mostly elusive. Inclusion of microexons in mature mRNAs increases during development, specifically in neural tissues, and is regulated by SRRM proteins. Investigation of individual microexon function is a vital avenue of research, since microexon inclusion is disrupted in diseases like autism. This study provides one of the first rigorous screens (using zebrafish larvae) of the functions of individual microexons in neurodevelopment and behavioural control. The authors precisely excise 21 microexons from the genome of zebrafish using CRISPR-Cas9 and assay the downstream impacts on neurite outgrowth, larvae motility and sociality. A small number of mild phenotypes were observed, which contrasts with the more dramatic phenotypes observed when microexon master regulators SRRM3/4 are disrupted. Importantly, this study attempts to address the reasons why mild/few phenotypes are observed and identifies transcriptomic changes in microexon mutants that suggest potential compensatory gene regulatory mechanisms.

      Strengths:

      (1) The manuscript is well written with excellent presentation of the data in the figures.

      (2) The experimental design is rigorous and explained in sufficient detail.

      (3) The identification of a potential microexon compensatory mechanism by transcriptional alterations represents a valued attempt to begin to explain complex genetic interactions.

      Overall this is a study with robust experimental design that addresses a gap in knowledge of the role of microexons in neurodevelopment.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript by Lopez-Blanch and colleagues, 21 microexons are selected for a deep analysis of their impacts on behavior, development, and gene expression. The authors begin with a systematic analysis of microexon inclusion and conservation in zebrafish and use these data to select 21 microexons for further study. The behavioral, transcriptomic, and morphological data presented are for the most part convincing. Furthermore, the discussion of the potential explanations for the subtle impacts of individual microexon deletions versus lossof-function in srrm3 and/or srrm4 is quite comprehensive and thoughtful. One major weakness: data presentation, methods, and jargon at times affect readability / might lead to overstated conclusions. However, overall this manuscript is well-written, easy to follow, and the results are of broad interest.

      We thank the Reviewer for their positive comments on our manuscript. In the revised version, we will try to improve readability, reduce jargon and avoid overstatements.  

      Strengths:

      (1) The study uses a wide variety of techniques to assess the impacts of microexon deletion, ranging from assays of protein function to regulation of behavior and development.

      (2) The authors provide comprehensive analyses of the molecular impact of their microexon deletions, including examining how host-gene and paralog expression is affected.

      Weaknesses:

      Major Points:

      (1) According to the methods, it seems that srrm3 social behavior is tested by pairing a 3mpf srrm3 mutant with a 30dpf srrm3 het. Is this correct? The methods seem to indicate that this decision was made to account for a slower growth rate of homozygous srrm3 mutant fish. However, the difference in age is potentially a major confound that could impact the way that srrm3 mutants interact with hets and the way that srrm3 mutants interact with one another (lower spread for the ratio of neighbour in front value, higher distance to neighbour value). This reviewer suggests testing het-het behavior at 3 months to provide age-matched comparisons for del-del, testing age-matched rather than size-matched het-del behavior, and also suggests mentioning this in the main text / within the figure itself so that readers are aware of the potential confound.

      Thank you for bringing up this point. For the tests shown in Figure 5, we indeed decided to match the pairs involving srrm3 mutant fish by fish size since we reasoned this would be more comparable to the other lines, both biologically and methodologically (in terms of video tracking, etc.). However, we are confident the results would be very similar if matched by age, since the differences in social interactions between the srrm3 homozygous mutants and their control siblings are very dramatic at any age. As an example, this can be appreciated, in line with the Reviewer's suggestion, in Videos S2 and S3, which show groups of five 5 mpf fish that are either srrm3 mutant or wild type. It can be observed that the behavior of 5 mpf WT fish (Video S3) is very similar to those of 1 mpf WT fish pairs, with very small interindividual distances, while the difference with repect to the srrm3 mutant group (Video S2) is dramatic. We nonetheless agree that this decision on the experimental design should be clearly stated in the main text and figure legend and we have done so in the revised version.

      (2) Referring to srrm3+/+; srrm4-/- controls for double mutant behavior as "WT for simplicity" is somewhat misleading. Why do the authors not refer to these as srrm4 single mutants?

      This comment applies to Figure 4 as well as the associated figure supplements. We reasoned that this made the understanding of plots easier, but the Reviewer is correct that it can be misleading. As a middle ground, we have now changed Figure 4 to follow the nomenclature of Figure 3D (WD, HD, DD), which is further explained in the legend, but kept the original format in the figure supplements for consistency with the (many) other plots in those figures.

      (3) It's not completely clear how "neurally regulated" microexons are defined / how they are different from "neural microexons"? Are these terms interchangeable?

      Yes, they are interchangeable. We have now double checked the wording to avoid confusion and for consistency.

      (4) Overexpression experiments driving srrm3 / srrm4 in HEK293 cells are not described in the methods.

      We apologized for this omission. We now briefly describe the data and asscoiated methods in more detail in the revised version; however, please note that the data was obtained from a previous publication (Torres-Mendez et al, 2019), where the detailed methodology is reported.

      (5) Suggest including more information on how neurite length was calculated. In representative images, it appears difficult to determine which neurites arise from which soma, as they cross extensively. How was this addressed in the quantification?

      We have added further details to the revised version. With regards to the specific question, we would like to mention that this has not been a very common issue for the time points used in the manuscript (10 hap and 24 hap). At those stages, it was nearly always evident how to track each individual neurite. Dubious cases were simply ignored and not measured, as we aimed for 100 neurites per well. Of course, such complex cases become much more common at later time points (48 and 72 hap), which were not used in this study.

      Reviewer #2 (Public review):

      Summary:

      This manuscript explores in zebrafish the impact of genetic manipulation of individual microexons and two regulators of microexon inclusion (Srrm3 and Srrm4). The authors compare molecular, anatomical, and behavioral phenotypes in larvae and juvenile fish. The authors test the hypothesis that phenotypes resulting from Srrm3 and 4 mutations might in part be attributable to individual microexon deletions in target genes.

      The authors uncover substantial alterations in in vitro neurite growth, locomotion, and social behavior in Srrm mutants but not any of the individual microexon deletion mutants. The individual mutations are accompanied by broader transcript level changes which may resemble compensatory changes. Ultimately, the authors conclude that the severe Srrm3/4 phenotypes result from additive and/or synergistic effects due to the de-regulation of multiple microexons.

      Strengths:

      The work is carefully planned, well-described, and beautifully displayed in clear, intuitive figures. The overall scope is extensive with a large number of individual mutant strains examined. The analysis bridges from molecular to anatomical and behavioral read-outs. Analysis appears rigorous and most conclusions are well-supported by the data.

      Overall, addressing the function of microexons in an in vivo system is an important and timely question.

      Weaknesses:

      The main weakness of the work is the interpretation of the social behavior phenotypes in the Srrm mutants. It is difficult to conclude that the mutations indeed impact social behavior rather than sensory processing and/or vision which precipitates apparent social alterations as a secondary consequence. Interpreting the phenotypes as "autism-like" is not supported by the data presented.

      The Reviewer is absolutely right. It was not our intention to imply that these social defects should be interpreted simply as autistic-like. It is indeed very likely that the main reason for the social alterations displayed by the srrm3 mutants is their impaired vision. We have now added this discussion point explicitly in the revised version. 

      Reviewer #3 (Public review):

      Summary:

      Microexons are highly conserved alternative splice variants, the individual functions of which have thus far remained mostly elusive. The inclusion of microexons in mature mRNAs increases during development, specifically in neural tissues, and is regulated by SRRM proteins. Investigation of individual microexon function is a vital avenue of research since microexon inclusion is disrupted in diseases like autism. This study provides one of the first rigorous screens (using zebrafish larvae) of the functions of individual microexons in neurodevelopment and behavioural control. The authors precisely excise 21 microexons from the genome of zebrafish using CRISPR-Cas9 and assay the downstream impacts on neurite outgrowth, larvae motility, and sociality. A small number of mild phenotypes were observed, which contrasts with the more dramatic phenotypes observed when microexon master regulators SRRM3/4 are disrupted. Importantly, this study attempts to address the reasons why mild/few phenotypes are observed and identify transcriptomic changes in microexon mutants that suggest potential compensatory gene regulatory mechanisms.

      Strengths:

      (1) The manuscript is well written with excellent presentation of the data in the figures.

      (2) The experimental design is rigorous and explained in sufficient detail.

      (3) The identification of a potential microexon compensatory mechanism by transcriptional alterations represents a valued attempt to begin to explain complex genetic interactions.

      (4) Overall this is a study with a robust experimental design that addresses a gap in knowledge of the role of microexons in neurodevelopment.

      Thank you very much for your positive comments to our manuscript.

      Reviewer #1 (Recommendations for the authors):

      Minor Suggestions

      (1) Axes are often scaled differently even between panels in the same figure. For example in Figure 5 - supplement 10, the srrm3_17 y axis scales from 0-20, while the neighboring panels scale from ~1-2.5. This somewhat underrepresents the finding that srrm3 mutants have much larger inter-individual distances. Similarly, in the panel above (src_1), the y-axis is scaled to include a single point around 17cm. As a result, it appears at first glance that the src_1 trials resulted in much lower inter-individual distance. Suggest scaling all of these the same to improve readability.

      While the Reviewer is certainly correct, after careful consideration we decided to have autoscaled axis to prioritize within-plot visualization (i.e. among genotypes within an experiment) than across plots (i.e. among experiments and lines).

      (2) Attention to italicizing gene names.

      Thanks.

      (3) In many points in the methods, we are instructed to "see below." Suggest directing the reader to a particular section heading.

      We found only one such instance, and we directed the reader to the specific section, as suggested.

      (4) In Methods, remove "in the corpus callosum." This is not an accurate descriptor for the site at which Mauthner axons cross.

      This is absolutely correct, apologies for this mistake.

      Clarify:

      (1) In the results section, "tissue-specific regulation was validated..." - suggest mentioning that this was performed in adult tissues / describe dissection in the methods.

      Added.

      (2) In the results section, the meaning of "no event ortholog" is not clear. Does this mean that a microexon does not have a human homolog? If so, suggest stating more clearly.

      Correct. We have added addition information.

      (3) In the results, the authors state that 78% of microexons are affected by srrm3/4 loss-offunction. Suggest stating the method used here (e.g. RNA-seq in mutants as compared to siblings)

      Added.

      (4) It is not clear what "siblings for the main founders means" for example in 3D. Is this effectively the analysis of microexon knockouts across multiple independent lines? Are the lines pooled for stats, for example in 3C?

      The main founder correspond to that listed as _1 and as default for experiments when only one found is used. We now explicitely state this.  

      For 3C, the lines are not pooled for stats; the stats correspond only to the main founder for each line. However, for each main founder line, multiple experiments are usually analyzed together and the stats are done taking their data structure into account (i.e. not simply pooling the values).

      (5) The purpose and a general description of NanoBRET assays should be included in the results.

      We added the main purpose of the NanoBRET assays (testing protein-protein interactions).

      (6) Specify that baseline behavior is analyzed in the light.

      Added.

      (7) In Figure 4A, adult fish are schematized being placed into a 96-well plate. Suggest using the larval diagram as in Figure 6 for accuracy.

      Done.

      (8) In Figure 4, plot titles could be made more accessible, especially in 4 F. Suggest removing extraneous information / italicizing gene names, etc. In G, suggest writing out Baseline, Dark, and Light to make it more accessible. Same in 4B.

      We have implemented some of the suggestions. In particular, italics were not used, since we are referring to the founder line, not the gene.

      (9) Figure 6 legend B - after (barplots), suggest inserting the word "and", to make clear that barplots indicate host gene *and* closely related paralogs are indicated by dots.

      Done.

      (10) In methods: "To better capture all microexons..." This sentence is difficult to understand. Suggested edit: "we excluded *from our calculation?* tissues with known or expected partial overlap... from comparison (for example, ...).

      Done.

      (11) In the methods, "which were defined with similar parameters but -min_rep 2." Suggest spelling this out, e.g. "with similar parameters, but requiring sufficient read coverage in at least n=2 samples per valid tissue group, whereas we only required one.".

      Done.

      (12) RNA was extracted for event and knockout validations. What does event mean here?

      Event refers to the validation of the exon regulatory pattern in WT tissues. We added this information.

      Provide definitions for abbreviations:

      (1) (Figure 6) Delta corrected VST Expression.

      Done.

      (2) "Mic-hosting genes" paralogs.

      Done.

      (3) In Figure 1F, "emic" is not defined.

      Done.

      Misspellings:

      All corrected.

      (1) Figure 6B (percentile is spelled percentil).

      (2) Figure 6B legend (bottom or top decile*).

      (3) Figure 6D - Schizophrenia* genes.

      (4) In Zebrafish husbandry and genotyping: suggest "srrm3 mutants grew more slowly.".

      (5) In results, "reduced body size at 90pdf" > 90dpf.

      Reviewer #2 (Recommendations for the authors):

      (1) Characterization of microexon mutants (Figure 2): The semi-quantitative PCR with flanking primers (Figure 2, supplement1) is well-suited to assess successful deletion of the exon and enables detection of potential mis-splicing around the alternative segment. However, it does not quantify the impact on total transcript levels. The authors should complement those experiments with qPCR measures of the transcript levels - otherwise, it is difficult to link mutant phenotypes to isoforms (as opposed to alterations in the level of gene expression). This point is somewhat addressed in Figure 6 by the RNA Seq analysis but it might help to add data specifically in Figure 2.

      As the Reviewer says, this point is explicitely addressed in Figure 6, where were show the change in the host gene's expression that follows the the removal of some microexons. We prefer to keep this in Figure 6, for consistency, as we believe this is not a direct (regulatory) consequence of the removal, but more likely a compensation effect.

      (2) Social behavior alterations in juvenile fish: The authors report "increased leadership" in Srrm3 mutant fish. However, these fish have impaired vision. Thus, "increased leadership" may simply reflect the fact that they do not perceive their conspecifics and, thus, do not follow them. The heterozygous conspecific will then mostly follow the Srrm3 mutant which appears as the mutant exhibiting an increase in leadership. Figure 5D suggests that Srrm3 del and het fish have the same ratio of "neighbor in front" which would be consistent with the hypothesis that the change in this metric is a consequence of a loss of following behavior due to a loss of vision. The authors should either adjust the discussion of this point or assess with additional experiments whether this is indeed a "social phenotype" or rather a secondary consequence of a loss of vision.

      The Reviewer is absolutely correct, and we have thus modified the short discussion directly related to these patterns.

      (3) The discussion centers on potential reasons why only mild phenotypes are observed in the single microexon mutants. One caveat of the phenotypic analysis provided in the manuscript is that it does not very deeply explore the phenotypic space of neuronal morphologies or circuit function. The behavioral and anatomical read-outs are rather coarse. There are no experiments exploring fine-structure of neuronal projections in vivo or synapse number, morphology, or function. Moreover, no attempts are made to explore which cell types normally express the microexons to potentially focus the loss-of-function analysis to these specific cell types. Of course, such analysis would substantially expand the scope of a study that already covers a large number of mutant alleles. However, the authors may want to add a discussion of these limitations in the manuscript.

      The Reviewer is correct. We aimed at covering this when referring to "(i) we may not be assessing the traits that these microexons are impacting, (ii) we may not have the sensitivity to robustly measure the magnitude of the changes caused by microexon removal". We have now added some of the specific points raised by the Reviewer as examples.

      (4) Note typos in Figure 6D: "schizoFrenia", "WNT signIalling"

      Done.

      Reviewer #3 (Recommendations for the authors):

      I only have a few minor suggestions for the authors.

      (1) It is interesting that a not insignificant number of microexon deletions (3/21) result in cryptic inclusions of intron fragments, and perhaps alludes to an as yet unreported molecular function of microexons in the regulation of host gene expression. Is it possible that microexon inclusion in these 3 genes could be important for expression? I think this requires some further discussion, as (if I'm not mistaken) microexons have thus far only been hypothesised to act as modulators of protein function, not as gene regulatory units.

      While we see that microexon removal can impact expression of the host gene (Figure 6), this is likely a compensatory mechanism (or so we suggest). We do not think these three cases are related to a putative physiological regulation, since the cryptic exons appear only in the deletion line. On the contrary, we think these are "regulatory artifacts" that originate in the nonWT mutated context. I.e. we removed the exon but some splicing signals remained in the intron, which are then recoginized by the spliceosome that incorrectly includes a different piece of the intron.

      (2) The flow of the text accompanying the molecular investigation of microexon function for evi5b and vav in Figure 3 could be improved. The text currently fades out with a speculative explanation for the lack of evi5b interaction phenotype. This final sentence could be moved to the discussion and replaced with a more general summary of the data.

      We have now swapped the order in which these results are described and leave out the discussion about evi5b's microexon function.

      (3) Is this a co-submission with Calhoun et al? If so, both papers should reference each other in the discussion and discuss the relative contributions of each.

      Done

      (4) "1 × 104 cells" in methods Nanobret paragraph should be superscript.

      Done

    1. eLife Assessment

      This Review Article explores the intricate relationship between humans and Mycobacterium tuberculosis (Mtb), providing an additional perspective on TB disease. Specifically, this review focuses on the utilization of systems-level approaches to study TB, while highlighting challenges in the frameworks used to identify the relevant immunologic signals that may explain the clinical spectrum of disease. The work could be further enhanced by better defining key terms that anchor the review, such as "unified mechanism" and "immunological route." This review will be of interest to immunologists as well as those interested in evolution and host-pathogen interactions.

    2. Reviewer #1 (Public review):

      Summary:

      This is an interesting and useful review highlighting the complex pathways through which pulmonary colonisation or infection with Mycobacterium tuberculosis (Mtb) may progress to develop symptomatic disease and transmit the pathogen. I found the section on immune correlates associated with individuals who have clearly been exposed to and reacted to Mtb but did not develop latent infections particularly valuable. However, several aspects would benefit from clarification.

      Strengths:

      The main strengths lie in the arguments presented for a multiplicity of immune pathways to TB disease.

      Weaknesses:

      The main weaknesses lie in clarity, particularly in the precise meanings of the three figures.

      I accept that there is a 'goldilocks zone' that underpins the majority of TB cases we see and predominantly reflects different patterns of immune response, but the analogies used need to be more clearly thought through.

    3. Reviewer #2 (Public review):

      Summary:

      This is a thought-provoking perspective by Reichmann et al, outlining supportive evidence that Mycobacterium tuberculosis co-evolved with its host Homo Sapiens to both increase susceptibility to infection and reduce rates of fatal disease through decreased virulence. TB is an ancient disease where two modes of virulence are likely to have evolved through different stages of human evolution: one before the Neolithic Demographic Transition, where humans lived in sparse hunter-gatherer communities, which likely selected for prolonged Mtb infection with reduced virulence to allow for transmission across sparse populations. Conversely, following the agricultural and industrial revolutions, Mtb virulence is likely to have evolved to attack a higher number of susceptible individuals. These different disease modalities highlight the central idea that there are different immunological routes to TB disease, which converge on a disease phenotype characterized by high bacterial load and destruction of the extracellular matrix. The writing is very clear and provides a lot of supportive evidence from population studies and the recent clinical trials of novel TB vaccines, like M72 and H56. However, there are areas to support the thesis that have been described only in broad strokes, including the impact of host and Mtb genetic heterogeneity on this selection, and the alternative model that there are likely different TB diseases (as opposed to different routes to the same disease), as described by several groups advancing the concept of heterogeneous TB endotypes. I expand on specific points below.

      Strengths:

      (1) The idea that Mtb evolved to both increase transmission (and possible commensalism with humans) with low rates of reactivation is intriguing. The heterogeneous TB phenotypes in the collaborative cross model (PMID: 35112666) support this idea, where some genetic backgrounds can tolerate a high bacterial load with minimal pathology, while others show signs of pathogenesis with low bacterial loads. This supports the idea that the underlying host state, driven by a number of factors like genetics and nutrition, is likely to explain whether someone will co-exist with Mtb without pathology, or progress to disease. I particularly enjoyed the discussion of the protective advantages provided by Mtb infection, which may have rewired the human immune system to provide protection against heterologous pathogens- this is supported by recent studies showing that Mtb infection provides moderate protection against SARS-CoV-2 (PMID: 35325013, and 37720210), and may have applied to other viruses that are likely to have played a more significant role in the past in the natural selection of Homo Sapiens.

      (2) Modeling from Marcel Behr and colleagues (PMID: 31649096) indeed suggests that there are at least TB clinical phenotypes that likely mirror the two distinct phases of Mtb co-evolution with humans. Most of the TB disease progression occurs rapidly (within 1-2 years of exposure), and the rest are slow cases of reactivation over time. I enjoyed the discussion of the difference between the types of immune hits needed to progress to disease in the two scenarios, where you may need severe immune hits for rapid progression, a phenotype that likely evolved after the Neolithic transition to larger human populations. On the other hand, a series of milder immune events leading to reactivation after a long period of asymptomatic infection likely mirrors slow progression in the hunter-gatherer communities, to allow for prolonged transmission in scarce populations. Perhaps a clearer analysis of these models would be helpful for the reader.

      Weaknesses:

      (1) The discussion of genetic heterogeneity is limited and only discusses evidence from MSMD studies. Genetics is an important angle to consider in the co-evolution of Mtb and humans. There is a large body of literature on both host and Mtb genetic associations with TB disease. The very fact that host variants in one population do not necessarily cross-validate across populations is evidence in support of population-specific adaptations. Specific Mtb lineages are likely to have co-evolved with distinct human populations. A key reference is missing (PMID: 23995134), which shows that different lineages co-evolved with human migrations. Also, meta-analyses of human GWAS studies to define variants associated with TB are very relevant to the topic of co-evolution (e.g., PMID: 38224499). eQTL studies can also highlight genetic variants associated with regulating key immune genes involved in the response to TB. The authors do mention that Mtb itself is relatively clonal with ~2K SNPs marking Mtb variation, much of which has likely evolved under the selection pressure of modern antibiotics. However, some of this limited universe of variants can still explain co-adaptations between distinct Mtb lineages and different human populations, as shown recently in the co-evolution of lineage 2 with a variant common in Peruvians (PMID: 39613754).

      (2) Although the examples of anti-TNF and anti-PD1 treatments are relevant as drivers of TB in limited clinical contexts, the bigger picture is that they highlight major distinct disease endotypes. These restricted examples show that TB can be driven by immune deficiency (as in the case of anti-TNF, HIV, and malnutrition) or hyperactivation (as in the case of anti-PD1 treatment), but there are still certainly many other routes leading to immune suppression or hyperactivation. Considering the idea of hyper-activation as a TB driver, the apparent higher rate of recurrence in the H56 trial referenced in the review is likely due to immune hyperactivation, especially in the context of residual bacteria in the lung. These different TB manifestations (immune suppression vs immune hyperactivation) mirror TB endotypes described by DiNardo et al (PMID: 35169026) from analysis of extensive transcriptomic data, which indicate that it's not merely different routes leading to the same final endpoint of clinical disease, but rather multiple different disease endpoints. A similar scenario is shown in the transcriptomic signatures underlying disease progression in BCG-vaccinated infants, where two distinct clusters mirrored the hyperactivation and immune suppression phenotypes (PMID: 27183822). A discussion of how to think about translating the extensive information from system biology into treatment stratification approaches, or adjunct host-directed therapies, would be helpful.

    4. Reviewer #3 (Public review):

      Summary:

      This perspective article by Reichmann et al. highlights the importance of moving beyond the search for a single, unified immune mechanism to explain host-Mtb interactions. Drawing from studies in immune profiling, host and bacterial genetics, the authors emphasize inconsistencies in the literature and argue for broader, more integrative models. Overall, the article is thought-provoking and well-articulated, raising a concept that is worth further exploration in the TB field.

      Strengths:

      Timely and relevant in the context of the rapidly expanding multi-omics datasets that provide unprecedented insights into host-Mtb interactions.

      Weaknesses (Minor):

      (1) Clarity on the notion of a "unified mechanism". It remains unclear whether prior studies explicitly proposed a single unifying immunological model. While inconsistencies in findings exist, they do not necessarily demonstrate that earlier work was uniformly "single-minded". Moreover, heterogeneity in TB has been recognized previously (PMIDs: 19855401, 28736436), which the authors could acknowledge.

      (2) Evolutionary timeline and industrial-era framing. The evolutionary model is outdated. Ancient DNA studies place the Mtb's most recent common ancestor at ~6,000 years BP (PMIDs: 25141181; 25848958). The Industrial Revolution is cited as a driver of TB expansion, but this remains speculative without bacterial-genomics evidence and should be framed as a hypothesis. Additionally, the claim that Mtb genomes have been conserved only since the Industrial Revolution (lines 165-167) is inaccurate; conservation extends back to the MRCA (PMID: 31448322).

      (3) Trained immunity and TB infection. The treatment of trained immunity is incomplete. While BCG vaccination is known to induce trained immunity (ref 59), revaccination does not provide sustained protection (ref 8), and importantly, Mtb infection itself can also impart trained immunity (PMID: 33125891). Including these nuances would strengthen the discussion.

    5. Author response:

      We thank the reviewers for their primarily positive comments and the critiques about where the manuscript could be improved. We agree with the vast majority of points raised. In our revised submission, we will:

      • Clarify some of the wording such as “unified mechanism” so that our intended meaning is clear to all readers

      • Completely change figure 2, as we accept the critique that an X-Y plot is not the logical way to present this concept

      • Amend the legends of figures 1 and 3 so that the disease pathways we are attempting to illustrate are clear for all readers

      • Expand on the genetic interactions between humans and TB and cite the manuscripts suggested

      • Add further discussion on multiple disease endotypes, and the immunological events that may lead to these distinct end points, along with how this may inform treatment stratification approaches

      • Extend the discussion about trained immunity

      • Make specific changes to address each of the reviewers’ points in the recommendations to authors

      • In the minority of cases where we feel a change is not necessary, we will justify this in our response to reviews

    1. eLife Assessment

      This study uses all-optical electrophysiology methods to provide a valuable insight into the organization of cortical networks and their ability to balance the activity of groups of neurons with similar functional tuning. The all-optical approach used in this study is impressive and the claim that the effects of optical stimulation correspond to a specific homeostatic mechanism is solid. The work will be of interest to neurobiologists and to developers of optical approaches for interrogating brain function.

    2. Reviewer #1 (Public review):

      Summary:

      Kang et al. provide the first experimental insights from holographic stimulation of auditory cortex. Using stimulation of functionally-defined ensembles, they test whether overactivation of a specific subpopulation biases simultaneous and subsequent sensory-evoked network activations.

      Strengths:

      The investigators use a novel technique to investigate the sensory response properties in functionally defined cell assemblies in auditory cortex. These data provide the first evidence of how acutely perturbing specific frequency-tuned neurons impacts the tuning across a broader population. Their revised manuscript appropriately tempers any claims about specific plasticity mechanisms involved.

      Weaknesses:

      Although the single cell analyses in this manuscript are comprehensive, questions about how holographic stimulation impacts population coding are left to future manuscripts, or perhaps re-analyses of this unique dataset.

    3. Reviewer #2 (Public review):

      The goal of HiJee Kang et al. in this study is to explore the interaction between assemblies of neurons with similar pure-tone selectivity in mouse auditory cortex. Using holographic optogenetic stimulation in a small subset of target cells selective for a given pure tone (PTsel), while optically monitoring calcium activity in surrounding non-target cells, they discovered a subtle rebalancing process: co-tuned neurons that are not optogenetically stimulated tend to reduce their activity. The cortical network reacts as if an increased response to PTsel in some tuned assemblies is immediately offset by a reduction in activity in the rest of the PTsel-tuned assemblies, leaving the overall response to PTsel unchanged. The authors show that this rebalancing process affects only the responses of neurons to PTsel, not to other pure tones. They also show that assemblies of neurons that are not selective for PTsel don't participate in the rebalancing process. They conclude that assemblies of neurons with similar pure-tone selectivity must interact in some way to organize this rebalancing process, and they suggest that mechanisms based on homeostatic signaling may play a role.

      The authors have successfully controlled for potential artefacts resulting from their optogenetic stimulation. This study is therefore pioneering in the field of the auditory cortex (AC), as it is the first to use single-cell optogenetic stimulation to explore the functional organization of AC circuits in vivo. The conclusions of this paper are very interesting. They raise new questions about the mechanisms that could underlie such a rebalancing process.

      (1) This study uses an all-optical approach to excite a restricted group of neurons chosen for their functional characteristics (their frequency tuning), and simultaneously record from the entire network observable in the FOV. As stated by the authors, this approach is applied for the first time to the auditory cortex, which is a tour de force. However, such approach is complex and requires precise controls to be convincing. The authors provide important controls to demonstrate the precise ability of their optogenetic methods. In particular, holographic patterns used to excite 5 cells simultaneously may be associated with out-of-focus laser hot spots. Cells located outside of the FOV could be activated, therefore engaging other cells than the targeted ones in the stimulation. This would be problematic in this study as their tuning may be unrelated to the tuning of the targeted cells. To control for such effect, the authors have decoupled the imaging and the excitation planes, and checked for the absence of out-of-focus unwanted excitation (Suppl Fig1).

      (2) In the auditory cortex, assemblies of cells with similar pure-tone selectivity are linked together not only by their ability to respond to the same sound, but also by other factors. This study clearly shows that such assemblies are structured in a way that maintains a stable global response through a rebalancing process. If a group of cells within an assembly increases its response, the rest of the assembly must be inhibited to maintain the total response.<br /> The boundary between assemblies is smooth as the rebalancing process occurring in one assembly seem to affect also the response of the other assembly comprising cells tuned to a the other frequency. This trend is not significant but visible for both tested frenquencies in Fig. 3 and Fig S3.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      Summary: 

      Kang et al. provide the first experimental insights from holographic stimulation of auditory cortex. Using stimulation of functionally-defined ensembles, they test whether overactivation of a specific subpopulation biases simultaneous and subsequent sensory-evoked network activations. 

      Strengths: 

      The investigators use a novel technique to investigate the sensory response properties in functionally defined cell assemblies in auditory cortex. These data provide the first evidence of how acutely perturbing specific frequency-tuned neurons impacts the tuning across a broader population. Their revised manuscript appropriately tempers any claims about specific plasticity mechanisms involved. 

      Weaknesses: 

      Although the single cell analyses in this manuscript are comprehensive, questions about how holographic stimulation impacts population coding are left to future manuscripts, or perhaps re-analyses of this unique dataset. 

      Reviewer #2 (Public review): 

      The goal of HiJee Kang et al. in this study is to explore the interaction between assemblies of neurons with similar pure-tone selectivity in mouse auditory cortex. Using holographic optogenetic stimulation in a small subset of target cells selective for a given pure tone (PTsel), while optically monitoring calcium activity in surrounding non-target cells, they discovered a subtle rebalancing process: co-tuned neurons that are not optogenetically stimulated tend to reduce their activity. The cortical network reacts as if an increased response to PTsel in some tuned assemblies is immediately offset by a reduction in activity in the rest of the PTseltuned assemblies, leaving the overall response to PTsel unchanged. The authors show that this rebalancing process affects only the responses of neurons to PTsel, not to other pure tones. They also show that assemblies of neurons that are not selective for PTsel don't participate in the rebalancing process. They conclude that assemblies of neurons with similar pure-tone selectivity must interact in some way to organize this rebalancing process, and they suggest that mechanisms based on homeostatic signaling may play a role. 

      The authors have successfully controlled for potential artefacts resulting from their optogenetic stimulation. This study is therefore pioneering in the field of the auditory cortex (AC), as it is the first to use single-cell optogenetic stimulation to explore the functional organization of AC circuits in vivo. The conclusions of this paper are very interesting. They raise new questions about the mechanisms that could underlie such a rebalancing process. 

      (1) This study uses an all-optical approach to excite a restricted group of neurons chosen for their functional characteristics (their frequency tuning), and simultaneously record from the entire network observable in the FOV. As stated by the authors, this approach is applied for the first time to the auditory cortex, which is a tour de force. However, such approach is complex and requires precise controls to be convincing. The authors provide important controls to demonstrate the precise ability of their optogenetic methods. In particular, holographic patterns used to excite 5 cells simultaneously may be associated with out-of-focus laser hot spots. Cells located outside of the FOV could be activated, therefore engaging other cells than the targeted ones in the stimulation. This would be problematic in this study as their tuning may be unrelated to the tuning of the targeted cells. To control for such effect, the authors have decoupled the imaging and the excitation planes, and checked for the absence of out-of-focus unwanted excitation (Suppl Fig1). 

      (2) In the auditory cortex, assemblies of cells with similar pure-tone selectivity are linked together not only by their ability to respond to the same sound, but also by other factors. This study clearly shows that such assemblies are structured in a way that maintains a stable global response through a rebalancing process. If a group of cells within an assembly increases its response, the rest of the assembly must be inhibited to maintain the total response. 

      One surprising result is the clear boundary between assemblies: a rebalancing process occurring in one assembly does not affect the response in another assembly comprising cells tuned to a different frequency. However, this is slightly challenged by the data shown in Figure 3. 

      Figure 3B-left, for example, shows that, compared to controls, non-target 16 kHzpreferring neurons only decrease their response to a 16 kHz pure tone when the cells targeted by the opto stimulation also prefer 16 kHz, but not when the targeted cells prefer 54 kHz. However, the inverse is not entirely true. Again compared to controls, Figure 3B (right) shows that non-target 54 kHz-preferring neurons decrease their response to a 54 kHz pure tone when the targeted cells also prefer 54 kHz; however, they also tend to be inhibited when the targeted cells prefer 16 kHz. 

      The authors suggest this may be due to the partial activation of 54 kHz-preferring cells by 16 kHz tones and propose examining the response of highly selective neurons. The results are shown in Figure 3F. It would have been more logical to show the same results as in Figure 3B, but with the left part restricted to highly 16 kHz-selective cells and the right part to highly 54 kHz-selective cells. However, the authors chose to pool all responses to 16 kHz and 54 kHz tones in every triplet of conditions (control, opto stimulation on 16 kHz-preferring cells and opto stimulation on 54 kHz-preferring cells), which blurs the result of the analysis. 

      We thank reviewers for highlighting the strengths of our work and providing valuable feedback. We further developed our manuscript mainly from Reviewer 2’s point on the overall effect explained as the main result. One of the main reasons why we chose to pool all tone preferring cells instead of highly selective cells was to ensure that the observed effect not necessarily driven by only a small group of neurons but rather that the effect was driven at the population level, especially at a subject level for Figure 3B. While Figure 3F represents how highly selective cells to each frequency play a major role in the effect, we now have added additional results with only highly selective neurons as Supplementary Figure 3. The left panel shows restricting the population to highly selective neurons to 16 kHz and the right panel restricting the population to highly selective neurons to 54 kHz at cell population level to emphasize the result (Supplementary Figure 3). 

      We appreciate an additional raised point by Reviewer 1 regarding the stimulation effect on population coding. Our primary focus in this manuscript was to establish single cell level effects of holographic stimulation, and we believe that population coding analyses would benefit from a more cell-type-specific approach. We plan to pursue such analyses in follow-up studies where cell types can be better identified and linked to network dynamics. 

      Reviewer #1 (Recommendations for the authors): 

      The authors have appropriately addressed my concerns. 

      As this dataset will be of general interest, it would be helpful to include a doi/link to their data repository in the data availability section. 

      Updating the data repository to the institution server is currently in progress. We will provide the correct doi or link as soon as it becomes available. In the meantime, we will ensure to share them with anyone who contacts to us directly. 

      Reviewer #2 (Recommendations for the authors): 

      Many references to Figures have not been updated between the two versions of the manuscript. See lines 107, 128, 297, 321 and 346. 

      We are sorry for the confusion with mislabelled figures. We now have updated all the figure numbers accordingly.

      In the paragraph beginning on line 266, there is no explicit reference to Figure 3C. 

      We now added Figure 3C reference in the main text (line 290). 

      If the new analysis includes 15 FOV for stim on 54 kHz-preferring cells, as indicated in the rebuttal, the corresponding numbers should be corrected in lines 152 and 180. 

      We now updated the number of FOVs accordingly. 

      The added model is not explained well enough. How are the calcium traces simulated? It is difficult to ascertain whether the result shown in Figure 3C is merely a trivial consequence of the hypothesis that suppression is applied to co-tuned neurons or to all neurons. 

      We are sorry for the lack of important details in the explanation of the model. We simulated time-varying sound-evoked calcium transient especially by applying different decay time constant (faster decay for co-tuned neurons and slower decay for non co-tuned neurons) to closely match the real data. More detailed explanation on this is now included in the manuscript (lines 644 – 650). Since our data do not currently allow us to identify specific cell types, we focused on modelling the stronger suppression observed in co-tuned neurons, especially by adapting the stimulation effect of target cells from the real data. In this revision, we now added data showing that ‘Randomly selected cells’ from the two groups (co-tuned or non co-tuned cell groups) did not exhibit any stimulation effect (added column in Figure 3D) to further indicate that suppression specific to co-tuned neurons is the key factor underlying the observed effects in the real data. We hope to build on this work in future studies to identify cell-type-specific effects and their computational roles. 

      Although the rebuttal clearly states that experiments are carried out on awake animals, this information is still missing from the manuscript. 

      We now stated ‘Fully awake animals’ in the experimental procedures.

    1. eLife Assessment

      This study presents a useful method based on flow cytometry to study partitioning noise during cell division. The methods, data and analysis support the claims of the authors is convincing. This work will be of interest to cell biologists and biophysicists working on asymmetric partitioning during cell division.

    2. Reviewer #1 (Public review):

      Summary:

      The aim of this paper is to develop a simple method to quantify fluctuations in the partitioning of cellular elements. In particular, they propose a flow-cytometry based method coupled with a simple mathematical theory as an alternative to conventional imaging-based approaches.

      Strengths:

      The approach they develop is simple to understand and its use with flow-cytometry measurements is clearly explained. Understanding how the fluctuations in the cytoplasm partition varies for different kinds of cells is particularly interesting.

      Weaknesses:

      The theory only considers fluctuations due to cellular division events. Fluctuations in cellular components are largely affected by various intrinsic and extrinsic sources of noise and only under particular conditions does partitioning noise become the dominant source of noise. In the revised version of the manuscript, they argue that in their setup, noise due to production and degradation processes are negligible but noise due to extrinsic sources such as those stemming from cell-cycle length variability may still be important. To investigate the robustness of their modelling approach to such noise, they simulated cells following a sizer-like division strategy, a scenario that maximizes the coupling between fluctuations in cell-division time and partitioning noise. They find that estimates remain within the pre-established experimental error margin.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present a combined experimental and theoretical workflow to study partitioning noise arising during cell division. Such quantifications usually require time-lapse experiments, which are limited in throughput. To bypass these limitations, the authors propose to use flow-cytometry measurements instead and analyse them using a theoretical model of partitioning noise. The problem considered by the authors is relevant and the idea to use statistical models in combination with flow cytometry to boost statistical power is elegant. The authors demonstrate their approach using experimental flow cytometry measurements and validate their results using time-lapse microscopy. The approach focuses on a particular case, where the dynamics of the labelled component depends predominantly on partitioning, while turnover of components is not taken into account. The description of the methods is significantly clearer than in the previous version of the manuscript. I have only two comments left:

      • In eq. (1) the notation has been changed/corrected, but the text immediately after it still refers to the old notation.

      • Maybe I don't fully understand the reasoning provided by the authors, but it is still not entirely clear to me why microscopy-based estimates are expected to be larger. Fewer samples will increase the estimation uncertainty, but this can go either way in terms of the inferred variability.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The aim of this paper is to develop a simple method to quantify fluctuations in the partitioning of cellular elements. In particular, they propose a flow-cytometry-based method coupled with a simple mathematical theory as an alternative to conventional imaging-based approaches.

      Strengths:

      The approach they develop is simple to understand and its use with flow-cytometry measurements is clearly explained. Understanding how the fluctuations in the cytoplasm partition vary for different kinds of cells is particularly interesting.

      Weaknesses:

      The theory only considers fluctuations due to cellular division events. This seems a large weakness because it is well known that fluctuations in cellular components are largely affected by various intrinsic and extrinsic sources of noise and only under particular conditions does partitioning noise become the dominant source of noise.

      We thank the Reviewer for her/his evaluation of our manuscript. The point raised is indeed a crucial one. In a cell division cycle, there are at least three distinct sources of noise that affect component numbers [1] :

      (1) Gene expression and degradation, which determine component numbers fluctuations during cell growth.

      (2) Variability in cell division time, which depending on the underlying model may or may not be a function of protein level and gene expression.

      (3) Noise in the partitioning/inheritance of components between mother and daughter cells.

      Our approach specifically addresses the latter, with the goal of providing a quantitative measure of this noise source. For this reason, in the present work, we consider homogeneous cancer cell populations that could be considered to be stationary from a population point-of-view. By tracking the time evolution of the distribution of tagged components via live fluorescent markers, we aim at isolating partitioning noise effects. However, as noted by the Reviewer, other sources of noise are present, and depending on the considered system the relative contributions of the different sources may change. Thus, we agree that a quantification of the effect of the various noise sources on the accuracy of our measurements will improve the reliability of our method.

      In this respect, assuming independence between noise sources, we reasoned that variability in cell cycle length would affect the timing of population emergence but not the intrinsic properties of those populations (e.g., Gaussian variance). To test this hypothesis, we conducted a preliminary set of simulations in which cell division times were drawn from an Erlang distribution (mean = 18 h, k=4k = 4k=4). The results, showing the behavior of the mean and variance of the component distributions across generations, are presented in Supplementary Information - Figure 1. Under the assumption of independence between different noise sources, no significant effects were observed even for high asymmetries of the partitioning distribution.

      Next, we quantified the accuracy of our measurements in the presence of cross-talks between the various noise sources.Indeed, cells may adopt different growth and division strategies, which can be grouped into three categories based on what triggers division:

      ● Sizer-like cells divide upon reaching a certain size;

      ● Timer-like cells divide after a fixed time (corresponding to the previously treated case with independent noise);

      ● Adder-like cells divide once their volume has increased by a finite amount.

      A detailed discussion of these strategies, including their mathematical formulation, can be found in [2]. Here we have assumed that cells follow a sizer-like model. In this way, we study a system in which cells with a higher number of components have shorter division times. Hence, older (newer) generations are emptied (populated) starting from higher values.

      As can be observed, higher levels of division asymmetry increase the fluctuations of the system relative to the analytically expected behavior, particularly in later generations.

      The result in Supplementary Information - Figure 3 demonstrates the robustness of our method, as the estimates remain within the pre-established experimental error margin. We have now discussed this aspect both in the main and in the Supplementary Information and thank the Reviewer for pointing it out.

      (1) Soltani, Mohammad, et al. "Intercellular variability in protein levels from stochastic expression and noisy cell cycle processes." PLoS computational biology 12.8 (2016): e1004972.

      (2) Mattia Miotto, Simone Scalise, Marco Leonetti, Giancarlo Ruocco, Giovanna Peruzzi, and Giorgio Gosti. A size-dependent division strategy accounts for leukemia cell size heterogeneity. Communications Physics, 7(1):248, 2024.

      Reviewer #2 (Public review):

      Summary:

      The authors present a combined experimental and theoretical workflow to study partitioning noise arising during cell division. Such quantifications usually require time-lapse experiments, which are limited in throughput. To bypass these limitations, the authors propose to use flow-cytometry measurements instead and analyse them using a theoretical model of partitioning noise. The problem considered by the authors is relevant and the idea to use statistical models in combination with flow cytometry to boost statistical power is elegant. The authors demonstrate their approach using experimental flow cytometry measurements and validate their results using time-lapse microscopy. However, while I appreciate the overall goal and motivation of this work, I was not entirely convinced by the strength of this contribution. The approach focuses on a quite specific case, where the dynamics of the labelled component depend purely on partitioning. As such it seems incompatible with studying the partitioning noise of endogenous components that exhibit production/turnover. The description of the methods was partly hard to follow and should be improved. In addition, I have several technical comments, which I hope will be helpful to the authors.

      We are grateful to the Reviewer for the comments. Indeed, both partitioning and production turnover noise are in general fundamental processes. At present the only way to consider them together are time-consuming and costly transfection/microscopy/tracking experiments. In this work, we aimed at developing a method to effectively pinpoint the first component, i.e. partitioning noise thus we opted to separate the two different noise sources.

      Below, we provided a point-by-point response that we hope will clarify all raised concerns.

      Comments:

      (1) In the theoretical model, copy numbers are considered to be conserved across generations. As a consequence, concentrations will decrease over generations due to dilution. While this consideration seems plausible for the considered experimental system, it seems incompatible with components that exhibit production and turnover dynamics. I am therefore wondering about the applicability/scope of the presented approach and to what extent it can be used to study partitioning noise for endogenous components. As presented, the approach seems to be limited to a fairly small class of experiments/situations.

      We see the Reviewer's point. Indeed, we are proposing a high-throughput and robust procedure to measure the partitioning/inheritance noise of cell components through flow cytometry time courses. By using live-cell staining of cellular compounds, we can track the effect of partitioning noise on fluorescence intensity distribution across successive generations. This specific procedure is purposely optimized to isolate partitioning noise from other sources and, as it is, can not track endogenous components or dyes that require fixation. While this certainly poses limits to the proposed approach, there are numerous contexts in which our methodology could be used to explore the role of asymmetric inheritance. Among others, (i) investigating how specific organelles are differentially partitioned and how this influences cellular behavior could provide deeper insights into fundamental biological processes: asymmetric segregation of organelles is a key factor in cell differentiation, aging, and stress response. During cell division, organelles such as mitochondria, the endoplasmic reticulum, lysosomes, peroxisomes, and centrosomes can be unequally distributed between daughter cells, leading to functional differences that influence their fate. For instance, Kajaitso et al. [1] proposed that asymmetric division of mitochondria in stem cells is associated with the retention of stemness traits in one daughter cell and differentiation in the other. As organisms age, stem cells accumulate damage, and to prevent exhaustion and compromised tissue function, cells may use asymmetric inheritance to segregate older or damaged subcellular components into one daughter cell. (ii) Asymmetric division has also been linked to therapeutic resistance in Cancer Stem Cells [2]. Although the functional consequences are not yet fully determined, the asymmetric inheritance of mitochondria is recognized as playing a pivotal role [3]. Another potential application of our methodology may be (iii) the inheritance of lysosomes, which, together with mitochondria, appears to play a crucial role in determining the fate of human blood stem cells [4]. Furthermore, similar to studies conducted on liquid tumors [5][6], our approach could be extended to investigate cell growth dynamics and the origins of cell size homeostasis in adherent cells [7][8][9]. The aforementioned cases of study can be readily addressed using our approach that in general is applicable whenever live-cell dyes can be used. We have added a discussion of the strengths and limitations of the method in the Discussion section of the revised version of the manuscript

      (1) Katajisto, Pekka, et al. "Asymmetric apportioning of aged mitochondria between daughter cells is required for stemness." Science 348.6232 (2015): 340-343.

      (2) Hitomi, Masahiro, et al. "Asymmetric cell division promotes therapeutic resistance in glioblastoma stem cells." JCI insight 6.3 (2021): e130510.

      (3) García-Heredia, José Manuel, and Amancio Carnero. "Role of mitochondria in cancer stem cell resistance." Cells 9.7 (2020): 1693.

      (4) Loeffler, Dirk, et al. "Asymmetric organelle inheritance predicts human blood stem cell fate." Blood, The Journal of the American Society of Hematology 139.13 (2022): 2011-2023.

      (5) Miotto, Mattia, et al. "Determining cancer cells division strategy." arXiv preprint arXiv:2306.10905 (2023).

      (6) Miotto, Mattia, et al. "A size-dependent division strategy accounts for leukemia cell size heterogeneity." Communications Physics 7.1 (2024): 248.

      (7) Kussell, Edo, and Stanislas Leibler. "Phenotypic diversity, population growth, and information in fluctuating environments." Science 309.5743 (2005): 2075-2078.

      (8) McGranahan, Nicholas, and Charles Swanton. "Clonal heterogeneity and tumor evolution: past, present, and the future." Cell 168.4 (2017): 613-628.

      (9) De Martino, Andrea, Thomas Gueudré, and Mattia Miotto. "Exploration-exploitation tradeoffs dictate the optimal distributions of phenotypes for populations subject to fitness fluctuations." Physical Review E 99.1 (2019): 012417.

      (2) Similar to the previous comment, I am wondering what would happen in situations where the generations could not be as clearly identified as in the presented experimental system (e.g., due to variability in cell-cycle length/stage). In this case, it seems to be challenging to identify generations using a Gaussian Mixture Model. Can the authors comment on how to deal with such situations? In the abstract, the authors motivate their work by arguing that detecting cell divisions from microscopy is difficult, but doesn't their flow cytometry-based approach have a similar problem?

      The point raised is an important one, as it highlights the fundamental role of the gating strategy. The ability to identify the distribution of different generations using the Gaussian Mixture Model (GMM) strongly depends on the degree of overlap between distributions. The more the distributions overlap, the less capable we are of accurately separating them.

      The extent of overlap is influenced by the coefficients of variation (CV) of both the partitioning distribution function and the initial component distribution. Specifically, the component distribution at time t results from the convolution of the component distribution itself at time t−1 and the partitioning distribution function. Therefore, starting with a narrow initial component distribution allows for better separation of the generation peaks. The balance between partitioning asymmetry and the width of the initial component distribution is thus crucial.

      As shown in Supplementary Information - Figure 5, increasing the CV of either distribution reduces the ability to distinguish between different generations.

      However, the variance of the initial distribution cannot be reduced arbitrarily. While selecting a narrow distribution facilitates a better reconstruction of the distributions, it simultaneously limits the number of cells available for the experiment. Therefore, for components exhibiting a high level of asymmetry, further narrowing of the initial distribution becomes experimentally impractical.

      In such cases, an approach previously tested on liquid tumors [1] involves applying the Gaussian Mixture Model (GMM) in two dimensions by co-staining another cellular component with lower division asymmetry.

      Regarding time-lapse fluorescence microscopy, the main challenge lies not in disentangling the interplay of different noise sources, but rather in obtaining sufficient statistical power from experimental data. While microscopy provides detailed insights into the division process and component partitioning, its low throughput limits large-scale statistical analyses. Current segmentation algorithms still perform poorly in crowded environments and with complex cell shapes, requiring a substantial portion of the image analysis pipeline to be performed manually, a process that is time-consuming and difficult to scale. In contrast, our cytometry-based approach bypasses this analysis bottleneck, as it enables a direct population-wide measurement of the system's evolution. We have added a detailed discussion of this argument in the Supplementary Material of the manuscript and added a clarification of the role of the gating strategy in the main text.

      (1) Peruzzi, Giovanna, et al. "Asymmetric binomial statistics explains organelle partitioning variance in cancer cell proliferation." Communications Physics 4.1 (2021): 188.

      (3) I could not find any formal definition of division asymmetry. Since this is the most important quantity of this paper, it should be defined clearly.

      We thank the Reviewer for the note. With division asymmetry we refer to a quantity that reflects how similar two daughter cells are likely to be in terms of inherited components after a division process. We opted to measure it via the coefficient of variation (root squared variance divided by the mean) of the partitioning fraction distribution. We have amended this lack of definition in the reviewed version of the manuscript.

      (4) The description of the model is unclear/imprecise in several parts. For instance, it seems to me that the index "i" does not really refer to a cell in the population, but rather a subpopulation of cells that has undergone a certain number of divisions. Furthermore, why is the argument of Equation 11 suddenly the fraction f as opposed to the component number? I strongly recommend carefully rewriting and streamlining the model description and clearly defining all quantities and how they relate to each other.

      We have amending the text carefully to avoid double naming of variables and clarifying each computation passage. In equation 11 the variable f refers to the fluorescent intensity, but the notation will be changed to increase clarity.

      (5) Similarly, I was not able to follow the logic of Section D. I recommend carefully rewriting this section to make the rationale, logic, and conclusions clear to the reader.

      We have updated the manuscript clarifying the scope of section D and its results. In brief, Section A presents a general model to derive the variance of the partitioning distribution from flow cytometry time-course data without making any assumptions about the shape of the distribution itself. In Section D, our goal is to interpret the origin of asymmetry and propose a possible form for the partitioning distribution. Since the dyes used bind non-specifically to cytoplasmic amines, the tagged proteins are expected to be uniformly distributed throughout the cytoplasm and present in large numbers. Given these assumptions the least complex model for division follows the binomial distribution, with a parameter that measures the bias in the process. Therefore, we performed a similar computation to that in Section A, which allows us to estimate not only the variance but also the degree of biased asymmetry. Finally, we fitted the data to this new model and proposed an experimental interpretation of the results.

      (6) Much theoretical work has been done recently to couple cell-cycle variability to intracellular dynamics. While the authors neglect the latter for simplicity, it would be important to further discuss these approaches and why their simplified model is suitable for their particular experiments.

      We agree with the Reviewer, we have added a discussion on this topic in the Introduction and Discussion sections of the main text.

      (7) In the discussion the authors note that the microscopy-based estimates may lead to an overestimation of the fluctuations due to limited statistics. I could not follow that reasoning. Due to the gating in the flow cytometry measurements, I could imagine that the resulting populations are more stringently selected as compared to microscopy. Could that also be an explanation? More generally, it would be interesting to see how robust the results are in terms of different gating diameters.

      The Reviewer is right on the importance of the sorting procedure. As already discussed in a previous point, the gating strategy we employed plays a fundamental role: it reduces the overlap of fluorescence distributions as generations progress, enables the selection of an initial distribution distinct from the fluorescence background, allowing for longer tracking of proliferation, and synchronizes the initial population. The narrower the initial distribution, the more separated the peaks of different generations will be. However, this also results in a smaller number of cells available for the experiment, requiring a careful balance between precision and experimental feasibility. A similar procedure, although it would certainly limit the estimation error, would be impracticable In the case of microscopy. Indeed, the primary limitation and source of error is the number of recorded events. Our pipeline allowed us to track on the order of hundreds of division dynamics, but the analysis time scales non-linearly with the number of events. Significantly increasing the dataset would have been extremely time-consuming. Reducing the analysis to cells with similar fluorescence, although theoretically true, would have reduced the statistics to a level where the sampling error would drastically dominate the measure. Moreover, different experiments would have been hardly comparable, since different fluorescences could map in equally sized cells. In light of these factors, we expect higher CV for the microscopy measure than for flow cytometry’s ones. In the plots below, we show the behaviour of the mean and the standard deviation of N numbers sampled from a gaussian distribution N(0,1) as a function of the sampling number N. The higher is N the closer the sampled distribution will be to the true one. The region in the hundreds of samples is still very noisy, but to do much better we would have to reach the order of thousands. We have added a discussion on these aspects in the reviewed version of the manuscript, with a deeper description of the importance of the sorting procedure in the Supplementary Material. .

      Author response image 1.

      Standard deviation and mean value of a distribution of points sampled from a Gaussian distribution with mean 0 and standard deviation 1, versus the number of samples, N. Increasing N leads to a closer approximation of the expected values. In orange is highlighted the Microscopy Working Region (Microscopy WR) which corresponds to the number of samples we are able to reach with microscopy experiments. In yellow the region we would have to reach to lower the estimating error, which is although very expensive in terms of analysis time.

      (7) It would be helpful to show flow cytometry plots including the identified subpopulations for all cell lines, currently, they are shown only for HCT116 cells. More generally, very little raw data is shown.

      We have provided the requested plots for the other cell lines together with additional raw data coming from simulations in the Supplementary Material.

      (8) The title of the manuscript could be tailored more to the considered problem. At the moment it is very generic.

      We see the Reviewer point. The proposed title aims at conveying the wide applicability of the presented approach, which ultimately allows for the assessment of the levels of fluctuations in the levels of the cellular components at division. This in turn reflects the asymmetricity in the division.

      Reviewer #1 (Recommendations for the authors):

      (1) I am quite concerned about the fact that the theory only considers fluctuations due to cellular division events since intrinsic and extrinsic noise sources are often dominant. I suggest that the authors simulate a full model of cell growth and division (that accounts for fluctuations in gene expression, cell-cycle dynamics, and cell division to generate a controlled synthetic dataset and then use this as input to their method to understand how robust are their results to the influence of noise sources other than partitioning.

      We thank the reviewer for the suggestions and following his advice we performed two sets of simulations in which we took into account the effect of the other noise sources. A detailed description of the results and the methods has been added to the Supplementary Material, while the topic has also been assessed in the main text. A cell proliferation cycle is affected by different sources of variability: (i) production and degradation processes of molecules; (ii) variability in length of the cell cycle; (iii) partitioning noise, which identifies asymmetric inheritance of components between the two daughter cells. However, the experimental approach and the model have been formulated to specifically address the effects of partitioning noise. Indeed, since we are dealing with components tagged via live fluorescent markers, production of new fluorophores is impossible and can therefore be discarded. Instead, the degradation process is a global effect that influences the behavior of the mean of the distribution in a time-dependent manner. However, by looking at the experimental data in Figure 1 of the main text, no significant depletion of fluorescence is observed, or at least it is hidden by the experimental fluctuations of the measure. Instead, a more careful evaluation has to be done for what concerns fluctuation in cell cycle length. We conducted two sets of simulations. In the first, we assumed the independence between fluctuations in cell cycle length and partitioning noise.

      Cell’s division time was extracted from an Erlang distribution (mean = 18 , k = 4) and the results, showing the behavior of the mean and variance of the component distributions across generations, are presented in Supplementary Information - Figure 1. Under the assumption of independence between different noise sources, no significant effects were observed even for high asymmetries of the partitioning distribution. The second set of simulations considered a situation in which the cell’s components and division time are coupled. We assumed a sizer-like division strategy for which bigger cells have a shorter division time and the results of the simulations are shown in Supplementary Information - Figure 2.

      As can be observed, higher levels of division asymmetry increase the fluctuations of the system relative to the analytically expected behavior, particularly in later generations.

      The result in Supplementary Information - Figure 3 demonstrates the robustness of our method, as the estimates remain within the pre-established experimental error margin. However, a detailed description of this topic has been provided in the Supplementary Information and into the main text.

      (2) I find the use of the Cauchy distribution somewhat odd since this does not have a finite mean or a variance and I suspect it is unlikely this mimics a naturally measurable distribution in their experiments. This should either be justified biologically or else replaced by a more realistic distribution.

      Following the reviewer’s suggestion, we have changed the distribution to Gaussian one.

      (3) There is a large body of literature on gene expression models that incorporate a large amount of detail including cell-cycle dynamics and cell division which are relevant to their discussion but not referenced. I suggest they read the following and see how to incorporate at least some of them in their discussion:

      Frequency domain analysis of fluctuations of mRNA and protein copy numbers within a cell lineage: theory and experimental validation., Physical Review X, 11.2 (2021): 021032.

      Exact solution of stochastic gene expression models with bursting, cell cycle and replication dynamics., Physical Review E, 101.3 (2020): 032403.

      Coupling gene expression dynamics to cell size dynamics and cell cycle events: Exact and approximate solutions of the extended telegraph model., Iscience, 26.1 (2023).

      Models of protein production along the cell cycle: An investigation of possible sources of noise., Plos one, 15.1 (2020): e0226016.

      Sources, propagation and consequences of stochasticity in cellular growth., Nature communications, 9(1), 4528

      Intrinsic and extrinsic noise of gene expression in lineage trees., Scientific Reports, 9.1 (2019): 474.

      We thank the Reviewer for the provided articles. We enlarged both introduction and discussion commenting on them, also in response to the second Reviewer comments.

      Reviewer #2 (Recommendations for the authors):

      (1) Even when it is used only during simulation for the sake of illustration, the Cauchy distribution is a somewhat unfortunate choice as its moments do not exist and hence, the authors' approach would not apply. I would recommend using another distribution instead.

      Following the Reviewer’s suggestion we have changed the distribution to Gaussian ones.

      (2) "cells population" should be "cell population".

      We have amended this mistake in the text.

    1. eLife Assessment

      This important study addresses mechanisms of feedback inhibition between planar cell polarity protein complexes during convergent extension movements in Xenopus embryos. The authors propose a conceptually new model, in which non-canonical Wnt ligand stimulates transition of Dishevelled from its complex with Vangl to Frizzled, with essential roles of Prickle and Ror in this process. The main observations supporting molecular interactions are interesting and convincing but do not directly assess convergent extension, and the immunoprecipitations carried out with overexpressed proteins show subtle effects. With the analysis of cell intercalations supporting the main conclusions, this work would be significant and of broad interest to cell and developmental biologists.

    2. Reviewer #1 (Public review):

      Summary:

      Planar cell polarity core proteins Frizzled (Fz)/Dishevelled (Dvl) and Van Gogh-like (Vangl)/Prickle (Pk) are localized on opposite sides of the cell and engage in reciprocal repression to modulate cellular polarity within the plane of static epithelium. In this interesting manuscript, the authors explore how the anterior core proteins (Vangl/Pk) inhibit the posterior core protein (Dvl). The authors propose that Pk assists Vangl2 in sequestering both Dvl2 and Ror2, while Ror2 is essential for Dvl to transition from Vangl to Fz in response to non-canonical Wnt signaling. Nevertheless, there are several major and minor points that affect the strength of the author's proposed model (and are listed below).

      Strengths:

      The strengths of the manuscript are found in the very interesting and new concept along with supportive data for a model of how non-canonical Wnt induces Dvl to transition from Vangl to Fz with an opposing role for PK and Vangl2 to suppress Dvl during convergent extension movements. Ror is key player required for the transition and antagonizes Vangl.

      Weaknesses:

      The weaknesses are in the clarity and resolution of the data that forms the basis of the model. In addition to general whole embryo morphology that is used as evidence for CE defects, two forms of data are presented, co-expression and IP, as well as a strong reliance on IF of exogenously expressed proteins. Thus, it is critical that both forms of evidence be very strong and clear, and this is where there are deficiencies; 1) For vast majority of experiments general morphology and LWR was used as evidence of effects on convergent extension movements rather than keller explants or actual cell movements in the embryo. 2) the microscopy would benefit from super resolution microscopy since in many cases the differences in protein localization are not very pronounced. 3) the IP and Western analysis data often shows very subtle differences, and some cases not apparent.

      Major points.

      (1) Assessment of CE movement

      The authors conducted an analysis of the subcellular localization of PCP core proteins, including Vangl2, Pk, Fz, and Dvl, within animal cap explants (ectodermal explants). The authors primarily used the length-to-width ratio (LWR) to evaluate CE movement as a basis for their model. However, LWR can be influenced by multiple factors and is not sufficient to directly and clearly represent CE defects. While the author showed that Prickle knockdown suppresses animal cap elongation mediated by Activin treatment, they did not test their model using standard assays such as animal cap elongation or dorsal marginal zone (DMZ) Keller explants. Furthermore, although various imaging analyses were performed in Wnt11-overexpressing animal caps and DMZ explants, the Wnt11-overexpressing animal caps did not undergo CE movement. Given that this study focuses on the molecular mechanisms of Vangl2 and Ror2 regulation of Dvl2 during CE, the model should be validated in more appropriate tissues, such as DMZ explants.

      (2) Overexpression conditions

      Another concern is that most analyses were performed with overexpression conditions. PCP core proteins (Vangl2, Pk, Dvl, and Fz receptors) are known to display polarized subcellular localization in both the neural epithelium and DMZ explants (Ref: PCP and Septins govern the polarized organization of the actin cytoskeleton during convergent extension, Current Biology, 2024). However, in this study, overexpressed PCP core proteins failed to show polarized localization. Previous studies, such as those from the Wallingford lab, typically used 10-30 pg of RNA for PCP core proteins, whereas this study injected 100-500 pg, which is likely excessive and may have created artificial conditions that confound the imaging results.

      (3) Subtle and insufficient effects

      Several of the reported results show quite modest changes in imaging and immunoprecipitation analyses, which are not sufficient to strongly support the proposed molecular model. For example, most Dvl2 remained localized with Fz7 even under Vangl2 and Pk overexpression (Fig. 4). Similarly, Wnt11 overexpression only slightly reduced the association between Vangl2 and Dvl2 (Sup. Fig. 8), and the Ror2-related experiments also produced only subtle effects (Fig. 8, Sup. Fig. 15).

    3. Reviewer #2 (Public review):

      The authors use Xenopus embryos to study feedback interactions between the planar cell polarity (PCP) proteins in the context of convergence and extension. They show that binding of the cytoplasmic polarity protein Pk2 to Vangl2 is needed for them to synergistically suppress defects in convergence and extension caused by Dvl overexpression. They then examine protein localizations in animal cap cells, and show that Wnt11-induced accumulation of Fzd7, Ror2 and Dvl into plasma membrane patches is disrupted by the functional Vangl2/Pk complex. This disperses Fzd and causes its endocytosis, while Dvl remains at the plasma membrane. Interestingly, Ror2 and Vangl2 tend to have a broader localization within the membrane patches than Fzd7/Dvl, leading to a model in which Ror2 mediates the transfer of Dvl from Vangl2/Pk to Fzd7 in response to Wnt11.

      This work uses a mixture of biochemical approaches, phenotypic assays in Xenopus and imaging. The data is carefully quantitated and the imaging is high quality. This is an interesting paper, showing mechanisms by which Vangl2/Pk can functionally antagonize Fzd/Dvl during planar cell polarity.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)

      The weaknesses are in the clarity and resolution of the data that forms the basis of the model. In addition to whole embryo morphology that is used as evidence for convergent extension (CE) defects, two forms of data are presented, co-expression and IP, as well as a strong reliance on IF of exogenously expressed proteins. Thus, it is critical that both forms of evidence be very strong and clear, and this is where there are deficiencies; 1) For vast majority of experiments general morphology and LWR was used as evidence of effects on convergent extension movements rather than Keller explants or actual cell movements in the embryo. 2) The study would benefit from high or super resolution microscopy, since in many cases the differences in protein localization are not very pronounced. 3) The IP and Western analysis data often show subtle differences, and not apparent in some cases. 4) It is not clear how many biological repeats were performed or how and whether statistical analyses were performed. 

      (1) To more objectively assess the convergent extension phenotypes, we developed a Fiji macro to automatically quantify the LWR in various injected Xenopus embryos, as detailed in the Methods section. We acknowledge that a limitation in the current manuscript is how to link our mechanistic model at the molecular level with the actual cellular behavior during convergent extension, and we plan to perform cell biological studies in the future to elucidate the link;

      (2) We have repeated some of the imaging experiments in DMZ explants using a Zeiss LSM 900 confocal equipped with Airyscan2 detector that can increase the resolution to ~100 nm. The new data are in Suppl. Fig. 4, 9, 11, 16;

      (3) We have repeated all IP and western blots at least three times and provided quantification and statistical analyses;

      (4) We have added the information on biological repeats and statistical analyses in all figures and figure legends.

      Reviewer #2 (Public Review):

      The protein localization experiments in animal cap assays are for the most part convincing, but with the caveat that the authors assume that the proteins are acting within the same cell. As Fzd and Vangl2 are thought to localize to opposite cell ends in many contexts, can the authors be sure that the effects they observe are not due to trans interactions? 

      In our previous publication, we provided evidence that Vangl is necessary and sufficient to recruit Dvl to the plasma membrane within the same cell (Figure 3 in 10.1093/hmg/ddx095). In a more recent publication ( 10.1038/s41467-025-57658-0 ), we further elucidated a mechanism through which Dvl oligomerization switches its binding from Vangl to Fz, and determined that Dvl binding to Vangl and Fz are differentially mediated by its PDZ and DEP domain, respectively. In the current manuscript, we also performed co-IP experiment under various conditions to demonstrate binding between Dvl and Vangl. We feel that these evidences together provide a strong argument for our model where Vangl2 acts within the same cell to sequester Dvl from Fz.

      In regards to the Dvl patches induced by Wnt11 (Fig. 3 and Suppl. Fig. 9), we performed separate injection of EGFP- and mSc-tagged Dvl into adjacent blastomeres, and demonstrated that the Wnt11-induced patches arise from symmetrical accumulation of Dvl at contact of two neighboring cells (Suppl. Fig. 9a-c’). This scenario is different from epithelial PCP where Fz/Dvl and Vangl/Pk are asymmetrically accumulated at the contact between two adjacent cells.

      The authors propose a model whereby Vangl2 acts as an adaptor between Dvl and Ror, to first prevent ectopic activation of signaling, and then to relay Dvl to Fzd upon Wnt stimulation. This is based on the observation that Ror2 can be co-IPed with Vangl2 but not Dvl; and secondly that the distribution of Ror2 in membrane patches after Wnt11 stimulation is broader than that of Fzd7/Dvl, while Vangl2 localizes to the edges of these patches. The data for both these points is not wholly convincing. The co-IP of Ror2 and Vangl2 is very weak, and the input of Dvl into the same experiment is very low, so any direct interaction could have been missed. Secondly, the broader distribution of Ror2 in membrane patches is very subtle, and further analysis would be needed to firm up this conclusion. 

      (1) We repeated the co-IP experiment with Myc-tagged Vangl or Dvl. Using the same anti-Myc antibody and experimental condition (including the expression level of Vangl, Dvl and Ror2), we still found that Ror2 could be pulled down by Vangl but not Dvl (Suppl. Fig. 15b). Whereas this data confirms our previous conclusion, we acknowledge that a negative data does not fully exclude the possibility for direct biding between Ror and Dvl.

      (2) We re-analyzed the signal intensity of Dvl and Ror in Wnt11-induced patches. By quantifying the intensity ratio between Ror and Dvl along the patches, we found an increase over two folds at the border of the patches (Fig. 7j, bottom panel). We interpret this data to suggest that Ror is accumulated to a higher level than Dvl at the patch borders.     

      A final caveat to these experiments is that in the animal cap assays, loss of function and gain of function both cause convergence and extension defects, so any genetic interactions need to be treated with caution i.e. two injected factors enhancing a phenotype does not imply they act in the same direction in a pathway, in particular as there are both cis/trans and positive/negative feedbacks between the PCP proteins. 

      We agree with the reviewer that a difficulty in studying PCP/ non-canonical signaling is that both loss and gain of function of any its components can cause convergence and extension defects. Genetic interactions, especially synergistic interactions, should be interpreted with caution. But we do want to point out that, in a number of case, we were also able to demonstrate epistasis. For instance, we found that Dvl2 over-expression induced CE defects can be rescued by Pk over-expression (Fig. 1e and f), whereas Vangl/ Pk co-injection induced severe CE defects can be reciprocally rescued by Dvl2 over-expression (Fig. 1g). Likewise, we showed that Fz2/ Dvl2 co-injection induced CE defects can be rescued by wild-type Vangl2 but not Vangl2 RH mutant (Suppl. Fig. 6b), and Ror2 can rescue Vangl2 overexpression induced CE defect (Suppl. Fig. 14). Collectively, these functional interaction data consistently demonstrate an antagonism between Dvl/ Fz/ Ror2 and Vangl2/ Pk, which is correlated with our imaging and biochemical studies.

      As you can see from the reviews, the referees generally agree that your paper is a potentially valuable contribution to the field. Your observations are important because of the novel model based on the inhibitory feedback regulation between planar cell polarity (PCP) protein complexes. However, the reviewers also stated that the model is only partly supported by data because of insufficient clarity and missing controls in several experiments supporting the proposed model. The paper would be significantly improved if your conclusions are backed up by additional experimentation. Specifically, the referees wanted to see the reproducibility of the results shown in Figures 3, 4, 8, S3, S7, S12. 

      We hope that you are able to revise the paper along the lines suggested by the referees to increase the impact of your study on the current understanding of PCP signaling mechanisms. 

      We thank the reviewers for careful reading of our manuscript and for their constructive critiques and suggestions. We have repeated the animal cap studies in original Figures 3, 4, 8 and S3 with DMZ explants, and the new data are in Supplementary Fig. 9, 11, 16 and 4, respectively. We also repeated the biochemical studies in original Figure S 7and 12, and the new data are in Supplementary Fig. 8 and 15.

      Reviewer #1 (Recommendations For The Authors):

      Major points:(1) The author conducted an analysis of the subcellular localization of PCP core proteins, including Vangl2, Pk, Fz, and Dvl, within animal cap explants (ectodermal explants). To validate the model proposing that 'non-canonical Wnt induces Dvl to transition from Vangl to Fz, while PK inhibits this transition, and they function synergistically with Vangl to suppress Dvl during Convergent Extension (CE),' it is crucial to assess the subcellular localization of PCP core proteins in dorsal marginal zone (DMZ) cells, which are known to undergo CE. Notably, the overexpression of Wnt11 alone, as employed by the author, does not induce animal cap elongation. Therefore, the use of animal cap explants may not be sufficient to substantiate the model during Convergent Extension (CE). Indeed, previous knowledge indicates that Vangl2 and Pk localize to the anterior region in DMZ explants. However, the results presented in this manuscript appear to differ from this established understanding. Consequently, to provide more robust support for the proposed model, it is advisable to replicate the key experiments (Figures 3, 4, 8, and Figure S3) using DMZ explants. 

      We repeated the experiments in Figure 3, 4, 8 and Figure S3 with DMZ explant and the new data are in new Supplementary Fig. 9, 11, 16 and 4, respectively.In regards to “previous knowledge indicates that Vangl2 and Pk localize to the anterior region in DMZ explants”, we are aware Vangl/ Pk localization to the anterior cell cortex in neural epithelium from the studies by the Sokol and Wallingford labs, but are not aware of similar reports in DMZ explants. When we examined the localization of small amount of injected EGFP-mPk2 (0.1 ng mRNA) in DMZ explants, we saw a somewhat uniform distribution on the plasma membrane (Suppl. Fig. 4). In addition, in a related recent publication, we examined endogenous XVangl2 protein localization in activin induced animal cap explants that do undergo CE. What we observed was that whereas low level injected Dvl2 and Fz form clusters on the plasma member, endogenous XVangl2 remains uniformly distributed on the plasma membrane (Suppl. Fig. 3S-Z in 10.1038/s41467-025-57658-0 ). These observations may suggest potential differences of PCP protein localization during neural vs. mesodermal convergence and extension.

      (2) The author suggests that 'Vangl2 and Pk together synergistically disrupt Fz7-Dvl2 patches.' As shown in Figure 4 (panels J' to I'), it is evident that the co-expression of Pk and Vangl2 increases Fz7 endocytosis. Nevertheless, a significant amount of Fz7 still co-localizes with Dvl2. To strengthen the author's hypothesis, additional clear assay is required such as Fluorescence resonance energy transfer (FRET) assay. 

      We appreciate this valuable advice. Since none of the tagged Fz/ Dvl/ Vangl proteins we had were suitable for FRET, we made proteins tagged with mClover and mRuby2, which were reported as optimized FRET pairs. But in our hands mRuby2 seems to require very long time (~2 days) to mature and become detectable at room temperature, and is not suitable for our Xenopus experiments. We are in the process of establishing a luciferase based NanoBiT system to detect Fz-Dvl and Dvl-Vangl interactions in live cells and cell lysates, and will use it in future studies to investigate their interaction dynamics.

      For the current manuscript, we reason that a substantial reduction of Fz7-Dvl2 clusters with Vangl2/ Pk co-injection would still support our idea that Vangl2 and Pk act synergistically to sequester Dvl from Fz to prevent their clustering in response to non-canonical Wnt ligands.

      (3) The IP data is less clear and evident. A couple of examples are: a) Fig 2g where the authors report that the Vangl2 R177H variant reduced Vangl2 interaction with Pk and recruitment of Pk to the plasma membrane, but it appears that the variant interacts slightly better than WT Vangl2 with Pk. In Fig. S7a, the authors state that Pk overexpression can indeed significantly reduce Wnt11-induced dissociation of EGFP-Vangl2 and Flag-Dvl2 in the DMZ. However, there is a minimal impact when compared to the Wnt11 absent control. Based on the results presented in Fig S12a the authors indicate that Wnt11 reduces the association between Vangl2 and Dvl2, which can be discerned, but loss of Ror2 does not change this in any obvious way - but the authors indicate it does. In S12b, the authors have suggested that Ror and Dvl do not form a direct binding interaction. However, the interpretation of Figure S12b is not entirely convincing due to several issues. Notably, the expression levels of each protein appear inconsistent, the bands are not sufficiently clear, and there is the detection of three different tag proteins on a single blot. To strengthen the validity of these findings, it is advisable to repeat this experiment with improved quality. 

      We repeated all the co-IP and western blot analyses pointed out by the reviewer, and performed quantification and statistical analyses.

      Fig 2g had a mistake in the labeling and is replaced with new Figure 2g;

      Fig. S7a is replaced by new data in Supplementary Figure 8a and b;

      Fig. S12a and 12b are replaced by new data in Supplementary Figure 15a, a’ and b, respectively. In 15a and a’, we noticed a consistent decrease of Dvl2-Vangl2 co-IP in Xror2 morphant. The reason for this is not yet clear and will need further study in the future.

      Minor points: (1) In all the whole embryo injection assays examining morphology, no Western analysis is performed to show roughly equivalent and appropriate levels of the various proteins are being expressed. Differences will affect the data. 

      Although we did not do western analyses to examine the protein levels in various functional interaction assays, we did examine how co-expression of Vangl2, mPk2 or Dvl2 may impact each other’s protein levels in Supplementary Fig. 2, which did not reveal any significant change when co-injected in different combination.

      (2) The author's prior publication (Bimodal regulation of Dishevelled function by Vangl2 during morphogenesis, Hum Mol Genet. 2017) presented clear evidence of Vangl2 overexpression inducing Dvl2 membrane localization. However, Figure S4 in the current manuscript did not provide clear evidence of membrane localization. To strengthen the hypothesis that Vangl2-RH mutant also induces Dvl2 membrane localization, further comprehensive imaging analysis is needed. 

      We re-analyzed the imaging data and replaced old Figure S4 with a new Supplementary Fig. 5.

      (3) In Supplementary Figure 9, the authors propose that the overexpression of Vangl2/Pk induces Fz7 endocytosis, as indicated by its co-localization with FM4-64. However, it raises a question: how does the Fz7-GFP protein internalize into the cells without endocytosis, as seen in Figures S9a-c'? To enhance readers' understanding, a discussion addressing this point should be included. 

      We think that this might be a technical issue. As detailed in the Method section, we only incubated the embryos transiently with FM4-64 for 30 minutes, and the embryos were subsequently washed and dissected in 0.1X MMR without the dye. Therefore, only the Fz7-GFP protein endocytosed during the 30 minute-incubation would be labeled by FM-64, whereas that endocytosed before or after the incubation would not. Alternatively, the very few Fz7-GFP puncta occasionally observed in the absence of Vangl2/Pk overexpression could be vesicles trafficking to the plasma membrane.

      (4) Statistical analyses are absent for several results, including those in Figure 2f, Figure S4d, and Figure S7b. 

      We repeated these experiments and included statistical analyses. The new data are in Figure 2f, Supplementary Fig. 5d and Supplementary Fig. 8b.

      (5) This manuscript lacks any results regarding Ck1. Therefore, it is advisable to consider removing the discussion or mention of CK1. 

      We agree, and tune down the discussion on CK1 and removed CK1 from our model in Fig. 9.

      Reviewer #2 (Recommendations For The Authors):

      (1) In all the convergence and extension assays, the authors should report n numbers (i.e. number of animals), what statistical test is used, and what the error bars show. Ideally dot-plots would be used instead of bar charts as they give a better insight into the data distribution. It might be useful to give a section on the statistical analyses used in the M&M, including e.g. any power calculations carried out, as now required by many journals. 

      We have follow the advice to use dot-plots for all the quantification analyses in the manuscript. We include in the figure legends the statistical test used and what the error bars show. The number of embryos analyzed were included in each panel in the figures. We also provided more details in the Methods section on how the LWR quantification was carried out.

      (2) I think Figure 2g is wrongly labelled? FLAG bands are in all three lanes in the western blot, but not labelled as such in the schematic. 

      We corrected the schematic labeling in Figure 2g, and thank the reviewer for catching this mistake.

      (3) In Figure S7, the authors show that co-IP of Dvl and Vangl2 is reduced by Wnt11 and the effects of Wnt are blocked by Pk. Does Pk have any effect in the absence of Wnt? 

      We examined the effect of Pk over-expression on Dvl2-Vangl2 co-IP as advised, and did not see a significant impact in the absence of Wnt11 co-injection. The data is included in the new Supplementary Figure 8a. We interpret the data to suggest that “at least under the condition of our co-IP experiment, Pk may not directly impact the steady-state binding between Vangl and Dvl”.

      (4) In Figure 3, the authors show (as published previously) that Wnt11 induces patches of Dvl at the plasma membrane. It would be useful to see Dvl in the absence of Wnt and Vangl2/Dvl in the absence of Wnt. 

      Dvl is widely known as a cytoplasmic protein and its localization has been published by many labs over the past 20-30 years. In our recent publication (10.1038/s41467-025-57658-0 ), we also re-examined Dvl localization when injected at various dosages. So we did not feel it was necessary to show its localization in the absence of Wnt11 again, but included a reference to our prior publication. In regards to Vangl/Dvl distribution in the absence of Wnt11, the readers can see Suppl. Fig. 5b as an example, in addition to our previous publications referenced in the manuscript.

      (5) In the review figures, the difference in Fz7-GFP patch formation in d' and e' (vs e.g. a') is not very clear. Could the images be improved or (better) quantified in some way? 

      We assume that “review figures” refer to Figure 3 or 4? If so, we felt that Fz7-GFP patch formation was clear in Fig. 3d’, e’ or Fig. 4d’, e’. Nevertheless, we repeated these experiments in DMZ explants as advised by Reviewer 1, and additional examples of Fz7-EGFP patch formation can be seen in the new Suppl. Fig. 9d-f’ and Suppl. Fig. 11d-f’.

      (6) In Figure 6d, I'm concerned that the loss of flag-Dvl2 might occur via dephosphorylation in the IP reaction. Also the M&M don't include methodological details about buffers and whether phosphatase inhibitors were used. A compelling control would be anti-FLAG pulldown showing retention of phosphorylation. Also Figure 6f shows a reduced ratio of fast-to-slow migrating bands of Dvl with Vangl2/Pk - unless I have misunderstood, is this ratio the wrong way round? 

      We added co-IP buffer and protease inhibitor information in Methods.

      We agree that the concern about dephosphorylation during IP reaction is valid, and that direct pull down of Dvl to show the phosphorylated form is a compelling control. We therefore note that in Suppl. Fig. 8a and 15b, direct pull down of Flag-Dvl or Myc-Dvl (with anti-Flag or anti-Myc) did show the slower migrating, phosphorylated form. Additional examples in which Vangl only co-IP the faster migrating unphosphorylated Dvl include Suppl. Fig. 15a, and in a related paper we published recently (Fig. 3R and R’ in 10.1038/s41467-025-57658-0 ).

      Finally, we did wrongly label Figure 6f in the last submission, and the ratio should have been “slow/fast”. We have made the correction, and appreaicte the reviewer for the meticulousness in perusing our manuscript.

      (7) In Figure 7, what does Ror2 look like in the absence of Wnt11? 

      We included new Figure 7a-c to show that without Wnt11 co-injection, Ror2 is uniformly distributed on the plasma membrane.

      (8) Also in Figure 7, Ror2 patches are said to be slightly wider than Dvl2 patches "reminiscent of Vangl2" - I wouldn't describe them as being similar. Vangl2 shows a distinct dip in the center of the Dvl patches, Ror2 does not show a dip, and is only (at best) in a slightly wider patch, and I would want to see further examples to be convinced that the localization domain is reproducibly wider. The merge of many samples in 7d may actually be making the distribution harder to see and if the Xror2 and Dvl2 intensities were normalized I'm not sure how different the curves would appear. (i.e. the Xror2 curve looks like a flattened version of the Dvl2 curve). 

      We have added an additional panel in the new Figure 7j to compare the intensity ratio of Ror/ Dvl2 along the patches, and this analysis reveals an over two folds increase of the ratio at the border region. This quantification may make a more convincing argument that at the patch border region, Dvl is diminished whereas Ror2 accumulate with Vangl2. 

      (9) In Figure S12a, the authors suggest Wnt11 induced dissociation of Dvl from Vangl2 (by co-IP), and this is reduced after Ror2 MO. This would be more convincing with replicates and quantitation. 

      We have repeated this experiment with Vangl2 pull down and added quantification. The data is in the new Suppl. Fig. 15a.

      (10) In Figure S12b, the authors suggest Ror2 can co-IP Vangl2 but not Dvl. This is not very convincing, as the Dvl input band is very weak, and the Vangl2 co-IP band is very weak. 

      We repeated the co-IP experiment with Myc-tagged Vangl or Dvl. Using the same anti-Myc antibody and experimental condition (including the expression level of Vangl, Dvl and Ror2), we still found that Ror2 could be pulled down by Vangl but not Dvl (Suppl. Fig. 15b).

      (11) "Prickle" spelled "Prickel" in the abstract (and abbreviated to "PK" not "Pk" at one place in the abstract and several places in text) 

      We have corrected these typos.

      (12) Quite a lot of interesting observations are in supplemental figures. Normally it might be expected that extra data supporting a conclusion would be in supplemental, but here some of the supplemental data feels like it is more than simply additional evidence. For instance supplemental Figures 2 and 3 feel more than just supplemental (and Supplemental Figure 3 if merged with Figure 2 would make it easier for the reader). Moreover, for example, the description of the results in Figure 2 is punctuated by references to supplemental Figures 4 and 5 that contain key data to support the conclusions, which means the reader has to flick backwards and forwards from place to place in the manuscript to follow the argument. It is of course up to the authors, but in some cases putting supplemental data back into the main figures (for which there is no size or number limit) would increase clarity. 

      These are excellent points; in the resubmitted manuscript we have a total of 24 data figures, and we used 8 as main figures since we felt that they provide the most relevant and conclusive evidence to our model. We will consult the copy editors at eLife on how to arrange the rest as main vs. supporting figures when requesting publication as version of record.

    1. eLife Assessment

      This work presents a useful investigation of functional and structural brain changes following navigation and verbal memory training. The analyses of whole-brain volumetric changes are convincing and support the study's main conclusion regarding the lack of a volumetric whole-brain plasticity effects. Some analyses are compelling in demonstrating the presence of longitudinal behavioural effects, the presence of functional activation changes, and the lack of hippocampal volume changes.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates plasticity effects in brain function and structure from training in navigation and verbal memory.

      The authors used a longitudinal design with a total of 75 participants across two sites. Participants were randomised to one of three conditions: verbal memory training, navigation training, or a video control condition. The results show behavioural effects in relevant tasks following the training interventions. The central claim of the paper is that network-based measures of task-based activation are affected by the training interventions, but structural brain metrics (T2w-derived volume and diffusion-weighted imaging microstructure) are not impacted by any of the training protocols tested.

      Strengths:

      (1) This is a well-designed study which uses two training conditions, an active control, and randomisation, as appropriate. It is also notable that the authors combined data acquisition across two sites to reach the needed sample size and accounted for it in their statistical analyses quite thoroughly. In addition, I commend the authors on using pre-registration of the analysis to enhance the reproducibility of their work.

      (2) Some analyses in the paper are exhaustive and compelling in showcasing the presence of longitudinal behavioural effects, functional activation changes, and lack of hippocampal volume changes. The breadth of analysis on hippocampal volume (including hippocampal subfields) is convincing in supporting the claim regarding a lack of volumetric effect in the hippocampus.

      Comments on revisions:

      All my comments have been addressed. The evidence regarding lack of a volumetric effect at the whole-brain level now seems more robust. Many details are now clearer, particularly regarding the the volumetric analyses methods and the rationale and timeline of preregistration.

      Minor comment:

      I appreciate that there are limited possibilities with the available Diffusion-Weighted Imaging data. However, I would recommend the authors remove mentions of "white matter connectivity" in the Abstract and elsewhere, which are misleading if no tractography or voxel-wise analyses are performed.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Summary:

      This study investigates plasticity effects in brain function and structure from training in navigation and verbal memory.

      The authors used a longitudinal design with a total of 75 participants across two sites. Participants were randomised to one of three conditions: verbal memory training, navigation training, or a video control condition. The results show behavioural effects in relevant tasks following the training interventions. The central claim of the paper is that network-based measures of task-based activation are affected by the training interventions, but structural brain metrics (T2w-derived volume and diffusion-weighted imaging microstructure) are not impacted by any of the training protocols tested.

      Strengths:

      (1) This is a well-designed study which uses two training conditions, an active control, and randomisation, as appropriate. It is also notable that the authors combined data acquisition across two sites to reach the needed sample size and accounted for it in their statistical analyses quite thoroughly. In addition, I commend the authors on using pre-registration of the analysis to enhance the reproducibility of their work.

      (2) Some analyses in the paper are exhaustive and compelling in showcasing the presence of longitudinal behavioural effects, functional activation changes, and lack of hippocampal volume changes. The breadth of analysis on hippocampal volume (including hippocampal subfields) is convincing in supporting the claim regarding a lack of volumetric effect in the hippocampus.

      Weaknesses:

      (1) The rationale for the study and its relationship with previous literature is not fully clear from the paper. In particular, there is a very large literature that has already explored the longitudinal effects of different types of training on functional and structural neuroimaging. However, this literature is barely acknowledged in the Introduction, which focuses on cross-sectional studies. Studies like the one by Draganski et al. 2004 are cited but not discussed, and are clumped together with cross-sectional studies, which is confusing. As a reader, it is difficult to understand whether the study was meant to be confirmatory based on previous literature, or whether it fills a specific gap in the literature on longitudinal neuroimaging effects of training interventions.

      We thank the reviewer for these comments and feedback. 

      We want to clarify that through our pre-registered analysis plan, our approach was confirmatory, rather than exploratory (or rather than post-hoc justified.) This confirmatory approach allowed us to critically evaluate the theoretically novel and important hypotheses which tested what no other study like our longitudinal/intervention study proposed or performed previously. We have now clarified this in the introduction. 

      This allowed us to address the following novel theoretical questions: 1) what neural changes, if any, result from an intensive within-participant intervention that improves memory or navigation skills in healthy young adults 2) if such changes occur, what is the degree of neural overlap between the acquisition of these cognitive skills.”

      “We pre-registered three novel and specific hypotheses, which are described in more detail here (https://osf.io/etxvj) ”

      We have also attempted to better separate cross-section and longitudinal studies. Due to space limitations, we have focused on interventional studies that involved gray matter changes that could relevance to either navigation, episodic memory, or the hypothesized time frame we chose for the training. We also note that some of these relevant studies are discussed in more depth in the discussion.

      “Successful cognitive interventions suggest that targeted within-participant cognitive training, even for as little as 1-2 weeks, can result in improvements to specific cognitive functions, including changes in focal gray matter [4,23-27]; but see[28].”

      We have also added some additional citations to relevant cognitive intervention work, although we agree that this is an extensive literature, only a subset of which we are able to capture here:

      “In some instances, interventions may even generalize to areas not explicitly trained but closely related to the training (termed “near transfer”)[29-33].”

      (2.1) The main claim regarding the lack of changes in brain structure seems only partially supported by the analyses provided. The limited whole-brain evidence from structural neuroimaging makes it difficult to confirm whether there is indeed no effect of training. Beyond hippocampal analyses, many whole-brain analyses of both volumetric and diffusion-weighted imaging metrics are only based on coarse ROIs (for example, 34 cortical parcellations for grey matter analyses).

      Although vertex-wise analyses in FreeSurfer are reported, it is unclear what metrics were examined (cortical thickness? area? volume?). 

      We appreciate the reviewer’s thoughtful feedback. We apologize for the lack of clarity in the original manuscript regarding the type of metric used in the vertex-wise analysis. We confirm that these analyses were based on cortical volume, not thickness or area. To clarify this, we have explicitly stated in the revised Methods that the vertex-wise analyses were conducted on cortical volume using FreeSurfer’s mri_glmfit.

      In addition, in response to the concern regarding the coarse nature of the ROI-based analyses, we have re-analyzed the volumetric data using the more fine-grained Destrieux atlas, which contains 148 cortical ROIs (74 per hemisphere), instead of the original, coarser 34-region atlas. These more detailed analyses still revealed no significant volume changes from pre- to post-training in any of the three groups. We believe this provides stronger support for the lack of training-induced volumetric changes outside the medial temporal lobe.

      Relevant revisions have been made to the Results and Methods sections. Below is the updated content added to the manuscript:

      In Results:

      “We also analyzed gray matter volume changes outside of the medial temporal lobe using FreeSurfer (see Methods) to determine if any cortical or other relevant brain areas might have been affected by the training. We applied a vertex-wise analysis of cortical volume, again finding no significant differences across the entire cortex (see Methods). This finding was further validated using the Destrieux atlas, which includes 74 cortical parcellations per hemisphere (148 ROIs in total). Paired-sample t-tests revealed that none of the ROIs exhibited significant volume changes from pre- to post-test in any of the three groups (all ps > 0.542, FDR-corrected). These findings suggest that training did not result in any measurable cortical volumetric changes.”

      In Methods:

      “Whole-brain structural analyses were conducted using FreeSurfer (version 7.4.1; https://surfer.nmr.mgh.harvard.edu). T1-weighted anatomical images were processed using the longitudinal processing pipeline. Vertex-wise analyses of cortical volume were performed using FreeSurfer’s general linear modeling tool, mri_glmfit. Group-level comparisons were corrected for multiple comparisons using mri_glmfit-sim, which implements cluster-wise correction based on Monte Carlo simulations. A vertex-wise threshold of Z > 3.0 (corresponding to p < 0.001, two-sided) was applied to detect both positive and negative effects. Clusters were retained if they survived a cluster-wise corrected p < 0.05.

      In addition to vertex-wise analysis, cortical parcellation was performed using the Destrieux atlas (aparc.a2009s), which includes 74 cortical regions per hemisphere, yielding 148 ROIs in total. To account for variability in brain size, each ROI volume was normalized by estimated intracranial volume (ICV) and scaled by a factor of 100. Longitudinal comparisons were conducted using paired-sample t-tests. To correct for multiple comparisons, we applied FDR correction (q < 0.05).”

      (2.2) Diffusion-weighted imaging seems to focus on whole-tract atlas ROIs, which can be less accurate/sensitive than tractography-defined ROIs or voxel-wise approaches.

      We appreciate the reviewer’s important point regarding diffusion-weighted imaging (DWI) analysis. We focused primarily on atlas-defined tract-level ROIs derived from a standard white matter tract atlas as we did not feel that we had the resolution for more fine-grained analyses with our sequences. While this approach has the advantage of robust anatomical correspondence and improved interpretability, we agree that it may be less sensitive than tractography-defined or voxel-wise methods for detecting more subtle, localized training-related changes. Because of limitations in our DWI sequence, which was optimized to be shorter and identical between different scanners, we are not able to provide more fine-grained analysis of the DWI data.

      (3) Quality control of images is only mentioned for FA images in subject space. Given that most analyses are based on atlas ROIs, visual checks following registration are fundamental and should be described in further detail.

      Thank you for your thoughtful comment. We agree that visual quality control is critical when using atlas-based ROI analyses. In our study, we implemented comprehensive quality control procedures across all structural and functional imaging analyses.

      For hippocampal segmentation using ASHS, we performed manual visual inspections of each participant's subfield segmentation to verify the accuracy of the automated outputs. This is now clearly described in the revised Methods section:

      “Each participant's subfield segmentations were manually inspected to ensure the accuracy and reliability of the segmentation protocol.”

      For FreeSurfer-based hippocampal and cortical segmentation, we also conducted detailed visual inspections and manual edits following the standard FreeSurfer longitudinal pipeline. We have added the following description to the Methods section to clarify this process:

      “Visual quality control was conducted by three trained raters who systematically inspected skull stripping, surface reconstruction, and segmentation accuracy at both the within-subject template and individual timepoints. Manual edits were primarily applied to the within-subject template to correct segmentation errors—particularly in challenging regions such as the hippocampus—since corrections to the template automatically propagate to all timepoints. Raters followed standardized FreeSurfer longitudinal editing guidelines to ensure consistent and reproducible corrections across subjects. Discrepancies were resolved via consensus discussion. This quality control approach enhanced the accuracy and consistency of segmentation across longitudinal scans, thereby improving the reliability of morphometric analyses and atlas-based ROI extractions.”

      For functional MRI preprocessing, all registration steps—including transformations from individual functional runs to MNI space—were visually checked for each participant to ensure accurate alignment with the Schaefer atlas. We have clarified this point in the revised Methods section with the following statement:

      “Prior to ROI extraction, all registration steps—from individual functional space to MNI space—were visually inspected for each participant to confirm accurate alignment between the functional images and the atlas parcellation.”

      These additions now more clearly reflect the robust quality control procedures that were employed throughout our pipeline to ensure the validity of atlas-based analyses.

      Recommendations for the authors:

      (1) As a reader, I would have appreciated a short section in the methods regarding the preregistration and power analysis. Currently, it is not too straightforward to understand which analyses were included in the preregistration, and at what point in the project the pre-registration was written. Finding all the relevant information from OSF is feasible, but it would be more accessible if a summary of the information were available inside the text.

      We thank the reviewer for this valuable suggestion. We agree that providing a concise summary within the manuscript's methods section will significantly improve accessibility for readers. 

      The full preregistration is now explicitly referenced in the Methods:

      Preregistration and Power Analysis

      This study was preregistered on the Open Science Framework (OSF; https://osf.io/etxvj). The preregistration was completed on October 30, 2023, after approximately 80% of data collection had been completed, but prior to any analysis of the primary outcome variables. The preregistration outlines the study hypotheses, design, target sample size, and planned behavioral and neuroimaging analyses, including longitudinal ROI comparisons and statistical correction procedures.

      A priori power analysis was conducted using G*Power 3.1 to estimate the required sample size for detecting a Group × Time interaction in a mixed-design ANOVA. Assuming a small-to-medium effect size (f = 0.35), we determined that 24 participants per group would provide 80% power to detect a significant effect at α = 0.05. To allow for potential attrition and data exclusion (e.g., due to excessive motion or incomplete datasets), we targeted recruitment of 30 participants per group across two study sites.

      All primary hypotheses, analytic plans, and inference criteria are documented in the preregistration. Exploratory analyses are clearly delineated in both the preregistration and the present manuscript.”

      (2) The relevance of the study for "disease" is mentioned in the Abstract but is absent in the Introduction. This may be worth removing?

      Thank you for pointing this out. We agree that the reference to "disease" in the Abstract was not well-supported in the Introduction. To maintain consistency and avoid overstatement, we have removed the mention of "disease" from the Abstract in the revised manuscript.

      In Abstract:

      “Training cognitive skills, such as remembering a list of words or navigating a new city, has important implications for everyday life.”

    1. eLife Assessment

      In this manuscript the authors examine correlations between intrinsic electrophysiological properties of HVC neurons projecting to Area X and the temporal structure of the birds' song. The study provides important insights into how the structure of vocalization can relate to intrinsic physiological properties of the neurons that are essential for learning the behavior. The evidence supporting the idea that song temporal structure is related to intrinsic physiology is solid and this research will be of general interest to researchers in the field and neurophysiologists.

    2. Reviewer #1 (Public Review):

      Summary:

      Previous research from the Margoliash laboratory has demonstrated that the intrinsic electrophysiological properties of one class of projection neurons in the song nucleus HVC, HVCX neurons, are similar within birds and differ between birds in a manner that relates to the bird's song. The current study builds on this research by addressing how intrinsic properties may relate to the temporal structure of the bird's song and by developing a computational model for how this can influence sequence propagation of activity within HVC during singing.

      First, the authors identify that the duration of the song motif is correlated with the duration of song syllables and particularly the length of harmonic stacks within the song. They next found positive correlations between some of the intrinsic properties, including firing frequency, sag ratio, and rebound excitation area with the duration of the birds' longest harmonic syllable and some other measure of motif duration. These results were extended by examining measures of firing frequency and sag ratio between two groups of birds that were experimentally raised to learn songs that only differed by the addition of a long terminal harmonic stack in one of the groups. Lastly, the authors present an HH-based model elucidating how the timing and magnitude of rebound excitation of HVCX neurons can function to support previously reported physiological network properties of these neurons during singing.

      Strengths:

      By trying to describe how intrinsic properties (IPs) may relate to the structure of learned behavior and providing a potentially plausible model (see below for more on this) for how differences in IPs can relate to sequence propagation in this neural network, this research is addressing an important and challenging issue. An understanding of how cell types develop IPs and how those IPs relate to the function and output of a network is a fundamental issue. Tackling this in the zebra finch HVC is an elegant approach because it provides a quantifiable and reliable behavior that is explicitly tied to the neurons that the authors are studying. Nonetheless, this is a difficult problem, and kudos to the authors for trying to unravel this.

      Correlations between harmonic stack durations and song durations are well-supported and interesting. This provides a new insight that can and will likely be used by other research groups in correlating neuronal activity patterns to song behavior and motif duration. Additionally, correlations between IPs associated with rebound excitation are also well supported in this study.

      The HH-model presented is important because it meaningfully relates how high or low rebound excitation can set the integration time window for HVCX neurons. Further, the synaptic connectivity of this model provides at least one plausible way in how this functions to permit the bursting activity of HVCX neurons during singing (and potentially during song playback experiments in sleeping birds). Thus, this model will be useful to the field for understanding how this network activity intersects with 'learned' IPs in an important class of neurons in this circuit.

      Comments on revised version:

      The authors have adequately addressed my previous concerns.

    3. Reviewer #2 (Public Review):

      Intrinsic properties of a neuron refer to the ion channels that a neuron expresses. These ion channels determine how a neuron responds to its inputs. How intrinsic properties link to behavior remains poorly understood. Medina and Margoliash address this question using the zebra finch, a well-studied songbird. Previous studies from their lab and other labs have shown that the intrinsic properties of adult songbird basal-ganglia projecting premotor neurons, are more similar within a bird than across birds. Across birds, this similarity is related to the extent of similarity in the songs; the more similar the song between two birds, the more similar the intrinsic properties between the neurons of these two birds. Finally, the intrinsic properties of these neurons change over the course of development and are sensitive to intact auditory feedback. However, the song features that relate to these intrinsic properties and the function of the within-bird homogeneity of intrinsic properties are unclear.

      In this manuscript, the authors address these two questions by examining the intrinsic properties of basal-ganglia projecting premotor neurons in zebra finch brain slices. Specifically, they focus on the Ih current (as this is related to rhythmic activity in many pattern-generating circuits) and correlate the properties of the Ih current with song features. They find that the sag ratio (a measure of the driving force of the Ih current) and the rebound area (a measure of the post-inhibitory depolarisation) are both correlated with the temporal features of the song. First, they show the presence of correlations between the length of the song motif and the length of the longest syllable (most often a harmonic stack syllable). Based on this, they conclude that longer song motifs are composed of longer syllables. Second, they show that HVCX neurons within a bird have more similar sag ratios and rebound areas than across birds. Third, the mean sag ratio and mean rebound areas across birds were correlated with the duration of the longest harmonic stack within the song. These two results suggest that IPs are correlated with the temporal structure of the song. To further test this, the authors used natural and experimental tutoring procedures to have birds that learned two different types of songs that only differed in length; the longer song had an extra harmonic stack at the end. Using these two sets of birds, the authors find larger sag ratios and higher firing frequencies in birds with longer songs. Fifth, they show that the post-inhibitory rebound area allows neurons to respond to excitatory inputs and produce spikes. Neurons with a larger rebound area have a larger time window for responding to excitatory inputs. Based on this, they speculate that HVCX neurons with larger rebound areas integrate over larger time windows. Finally, they make a network model of HVC and show that one specific model could explain sequence-specific bursting of HVCX neurons.

      Strengths:

      The question being addressed is an interesting question and the authors use appropriate techniques. The authors find a new temporal structure within the song, specifically, they find that longer songs typically have more syllables and longer syllables. As far as I know, this has not been shown earlier. The authors build on existing literature to suggest that IPs of HVCX neurons are correlated with the temporal structure of songs.

      Comments on revised version:

      I have read through the revised paper and I also feel that my comments have been addressed.

    4. Reviewer #3 (Public Review):

      It is rare to find systems in neuroscience where a detailed mechanistic link can be made between the biophysical properties of individual neurons and observable behaviors. In this study, Medina and Margoliash examined how the intrinsic physiological properties of a subclass of neurons in HVC, the main nucleus orchestrating the production of birdsong, might have an effect on the temporal structure of a song. This builds on prior work from this lab demonstrating that intrinsic properties of these neurons are highly consistent within individual animals and more similar between animals with similar songs, by identifying specific acoustic features of the song that covary with intrinsic properties and by setting forth a detailed biophysical network model to explain the relationship.

      The main experimental finding is that excitability, hyperpolarization-evoked sag, and rebound depolarization are correlated with song duration and the duration of long harmonic elements. This motivates the hypothesis that rebound depolarization acts as a coincidence detector for the offset of inhibition associated with the previous song element and excitation associated with the start of the next element, with the delay and other characteristics of the window determined primarily by Ih. The idea is then that the temporal sensitivity of coincidence detection, which is common to all HVCx neurons, sets a global tempo that relates to the temporal characteristics of a song. This model is supported by some experimental data showing variation in the temporal integration of rebound spiking and by a Hodgkin-Huxley-based computational model that demonstrates proof of principle, including the emergence of a narrow (~50 ms) post-inhibitory window when excitatory input from other principal neurons can effectively evoke spiking.

      Overall, the data are convincing and the model is compelling. The manuscript plays to the strengths of zebra finch song learning and the well-characterized microcircuitry and network dynamics of HVC. Of particular note, the design for the electrophysiology experiments employed both a correlational approach exploiting the natural variation in zebra finch song and a more controlled approach comparing birds that were tutored to produce songs that differed primarily along a single acoustical dimension. The modeling is based on Hodgkin-Huxley ionic conductances that have been pharmacologically validated, and the connections and functional properties of the network are consistent with prior work. This makes for a level of mechanistic detail that will likely be fruitful for future work.

      Comments on revised version:

      I read through everything and I also feel that my comments have been adequately addressed.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1(Public Review):

      The correlation between rebound excitation and song structure (e.g., harmonic stack duration) may depend on outliers, such as birds with harmonic stacks >150ms.

      If in wild zebra finch, or even if in domesticated zebra finch including our birds and the birds from the other labs that we evaluated, the distribution of durations of longest harmonic stacks has a long tail, it is not apparent that birds with long duration harmonic stacks are properly considered as outliers. Examining the distribution of motif durations (a less derived statistic) in 33 birds (Fig. 2C) does not support the idea that birds with longer duration songs are outliers. Thus, we view the reviewer question as addressing whether there are different mechanisms operating in birds with long harmonic stacks than for other birds. Unfortunately, the numbers of long-duration harmonic stack birds are too small to give confidence in any statistical analysis of that group. Thus, we limited our re-analysis to the data excluding birds with harmonic stacks >150ms (which is arbitrary), examining how these birds influence our conclusions. We conclude that the influence of the excluded birds on the overall result is modest. The updated results are presented in Supplemental Figure 6, and the Results section has been revised to state:

      “We found that while some of the p values increased above 0.05 (p = 0.058 for rebound area vs. longest harmonic stack and p = 0.082 for sag ratio and longest harmonic stack), it remained significant for firing frequency and longest stack (Pearson’s R, p = 0.0017) and for sag ratio and motif duration (p = 0.024). However, when sag ratio was compared against the duration of the motif excluding the longest harmonic stack, there was no relationship (p = 0.85).”

      There is a disconnect between the physiological measurements and the HH model presented.

      We acknowledge that addressing this limitation would involve additional experimental and modeling assumptions. Rather than overextending our interpretations, we have clarified the limitations of the current study in the Discussion:

      “While this HH model provides a plausible framework for linking intrinsic properties to sequence propagation, it does not fully account for the observed relationship between IPs and song structure. A principal limitation constraining the current model is the absence of information for the same neurons combining characterization of both IPs and network activity during singing (or song playback), when HVC<sub>X</sub> express activity related to song features. Addressing this gap would requires additional and challenging experiments and is beyond the scope of this study.”

      Although disynaptic inhibition between HVC<sub>X</sub> neurons and between HVC<sub>RA</sub> and HVC<sub>X</sub> neurons is well established, I am not aware of any data indicating direct synaptic connections between HVC<sub>X</sub> neurons.

      This is an important theoretical point about the reliance of the intervaldetecting network model on HVC<sub>X</sub> neurons and about how the model would change if many of the HVC<sub>X</sub> were swapped for HVC<sub>RA</sub> neurons. Connections between HVC<sub>RA</sub> neurons to HVC<sub>X</sub> neurons are established, whereas there is relative paucity of evidence for HVC<sub>X</sub> to HVC<sub>X</sub> connectivity. This is based on work from Prather and Mooney, 2005 (among others) which performed paired sharp electrode recordings to characterized connections in HVC. This work found very few HVC<sub>X</sub> - HVC<sub>X</sub> connections. However, if connected HVC<sub>X</sub> neurons are physically more distant from each other than are connected HVC<sub>RA</sub> – HVC<sub>X</sub> neurons, they would more likely be missed in blind paired recordings. Using different approaches, recent results from the Roberts lab (Trusel et al.,eLife,  2025) supports the existence of robust HVC<sub>X</sub>  - HVC<sub>X</sub>  connections.

      Reviewer #2(Public Review):

      The interpretation of p-values is rigid, and near-significant results (e.g., p = 0.06) are dismissed without discussion.

      We revised the text to reflect a more nuanced and consistent interpretation of p-values and updated the reporting to include exact values. For example, the Results section now states:

      "Nonetheless, the longest syllable duration was not significantly correlated with the average sag ratio for each bird (Pearson’s R: R<sup>2</sup> = 0.12, p = 0.065, Supplemental Fig. 2, top left panel), though it is trending toward significance (see Discussion)”

      The conclusion that harmonic stacks influence intrinsic properties lacks necessary controls.

      We have attempted to further clarify that harmonic stacks were used as a representative feature of temporal song structure rather than a unique determinant of intrinsic properties. The Discussion now states:

      “Although harmonic stacks provide a useful test case for studying temporal integration, our findings suggest that IPs are broadly linked to song duration and structure, rather than specific syllable types. This is also consistent with prior results that found all HVC<sub>X</sub> ion currents that were modeled were influenced by song learning[31].”

      The relationship between rebound area and experimentally tutored birds was not fully explored.

      We expanded the analysis to include rebound area in instrumentally tutored birds, which has now been incorporated into Figure 4C. These additional analyses also robustly support our hypotheses. The Results section has been updated to state:

      “We then evaluated the IPs of HVC<sub>X</sub> in the birds from the two groups. HVC<sub>X</sub> neurons from birds who sang unmodified songs (N = 5 birds, 31 neurons), which had shorter harmonic stacks and shorter overall duration, had lower sag ratios (Mann-Whitney: p = 0.025), firing frequency (Mann-Whitney, p = 0.0051) and rebound area (Mann-Whitney: p = 0.0003)”

      Reviewer #3 (Public Review):

      Limited data supports the claim that intrinsic properties influence temporal integration windows.

      While we agree that further data could strengthen this claim, we show that this can happen in principle (Figure 5) but believe that the appropriate experiment to test this requires further experiments in-vivo. We emphasize in the Discussion:

      “Our findings suggest that post-inhibitory rebound excitation in HVC<sub>X</sub> could expand temporal integration. Ultimately, experiments combining in vitro with in vivo recordings can directly quantify this effect. We hope our results motivate such experiments.”

      Technical Corrections

      (1) Fixed typographical errors (e.g., Line 177: corrected "r2 = 4" to "r2 = 0.4").

      (2) Revised figure legends for clarity (e.g., Figure 4E now includes tutoring design details).

      (3) Updated methods to specify how motifs were defined and measured.

      Revised Figures

      Figure 4: Updated to include analysis of rebound area in instrumentally tutored birds, reflecting the relationship between experimental tutoring and intrinsic properties.

      Supplemental Figure 6: Correlation analysis excluding outliers

    1. eLife Assessment

      In this important study, the authors provide compelling evidence that the likelihood of looking behaviour is predicted by the expected information gain, hence constituting an invaluable formal model and explanation of habituation. Such modelling represents a crucial advance in explanation, over-and-above less specified models that can be fitted post hoc to any empirical pattern. The findings would be of interest to researchers studying cognitive development, and perception and learning more broadly.

    2. Reviewer #1 (Public review):

      Summary:

      This paper proposes a new model of perceptual habituation and tests it over two experiments with both infants and adults. The model combines a neural network for visual processing with a Bayesian rational model for attention (i.e., looking time) allocation. This Bayesian framework allows the authors to measure elegantly diverse factors that might drive attention, such as expected information gain, current information gain, and surprise. The model is then fitted to infant and adult participants' data over two experiments, which systematically vary the amount of habituation trials (Experiment 1) and the type of dishabituation stimulus (familiarity, pose, number, identity and animacy). Results show that a model based on (expected) information gain performs better than a model based on surprise. Additionally, while novelty preference is observed when exposure to familiar stimuli is elevated, no familiarity preference is observed when exposure to familiar stimuli is low or intermediate, which is in contrast with past work.

      Strengths:

      There are three key strengths of this work:

      (1) It integrates a neural network model with a Bayesian rational learner, thus bridging the gap between two fields that have often been disconnected. This is rarely seen in the cognitive science field, but the advantages are very clear from this paper: It is possible to have computational models that not only process visual information, but also actively explore the environment based on overarching attentional processes.

      (2) By varying parametrically the amount of stimulus exposure and by testing the effects of multiple novel stimulus types, this work allowed the authors to put classical theories of habituation to the test on much finer scales than previous research has done.

      (3) The Bayesian model allows the authors to test what specific aspects are different in infants and adults, showing that infants display greater values for the noise parameter.

      Weaknesses:

      This model pertains visual habituation. What drives infants' (dis)engagement of attention more broadly, for example, when learning the probabilistic structures of the environment around them (e.g., language, action prediction) may follow different principles and dynamics.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weakness:

      Although a familiarity preference is not found, it is possible that this is related to the nature of the stimuli and the amount of learning that they offer. While infants here are exposed to the same perceptual stimulus repeatedly, infants can also be familiarised to more complex stimuli or scenarios. Classical statistical learning studies for example expose infants to specific pseudo-words during habituation/familiarisation, and then test their preference for familiar vs novel streams of pseudo-words. The amount of learning progress in these probabilistic learning studies is greater than in perceptual studies, and familiarity preferences may thus be more likely to emerge there. For these reasons, I think it is important to frame this as a model of perceptual habituation. This would also fit well with the neural net that was used, which is processing visual stimuli rather than probabilistic structures. If statements in the discussion are limited to perceptual paradigms, they would make the arguments more compelling. 

      Thank you for your thoughtful feedback. We have now qualified our claims more explicitly throughout the manuscript to clarify the scope of our study. Specifically, we have made the following revisions:

      (1) Title Update: We have modified the title to “A stimulus-computable rational model of visual habituation in infants and adults” to explicitly specify the domain of our model.

      (2) Qualifying Language Throughout Introduction: We have refined our language throughout the introduction to ensure the scope of our claims is clear. Specifically, we have emphasized that our model applies to visual habituation paradigms by incorporating qualifying language where relevant. At the end of Section 1, we have revised the statement to: "Habituation and dishabituation to sequential visual stimuli are well described by a rational analysis of looking time." This clarification makes sure that our model is framed within the context of visual habituation paradigms, particularly those involving structured sequences of stimuli, while acknowledging that habituation extends beyond the specific cases we study.

      (3) New Paragraph on Scope in the Introduction: We have added language in the Introduction acknowledging that while visual habituation is a fundamental mechanism for learning, it is not the only form of habituation. Specifically, we highlight that: “While habituation is a broadly studied phenomenon across cognitive domains—including language acquisition, probabilistic learning, and concept formation—our focus here is on visual habituation, where infants adjust their attention based on repeated exposure to a visual stimulus.”

      (4) New Paragraph on Scope in the General Discussion: We have also revisited this issue in the General Discussion. We added a dedicated paragraph discussing the scope: “This current work focuses on visual habituation, a fundamental but specific form of habituation that applies to sequential visual stimuli. While habituation has been studied across various domains, our model is specifically designed to account for looking time changes in response to repeated visual exposure. This focus aligns with our choice of perceptual representations derived from CNNs, which process visual inputs rather than abstract probabilistic structures. Visual habituation plays a foundational role in infant cognition, as it provides a mechanism for concept learning based on visual experience. However, it does not encompass all forms of habituation, particularly those involving complex rule learning or linguistic structures. Future work should investigate whether models like RANCH can be extended to capture habituation mechanisms in other learning contexts.”

      Reviewer #2 (Public review):

      There are no formal tests of the predictions of RANCH against other leading hypotheses or models of habituation. This makes it difficult to evaluate the degree to which RANCH provides an alternative account that makes distinct predictions from other accounts. I appreciate that because other theoretical descriptions haven't been instantiated in formal models this might be difficult, but some way of formalising them to enable comparison would be useful. 

      We appreciate the reviewer's concern regarding formal comparisons between RANCH and other leading hypotheses of habituation. A key strength of RANCH is that it provides quantitative, stimulus-computable predictions of looking behavior—something that existing theoretical accounts do not offer. Because previous models can not generate predictions about behaviors, we can not directly compare the previous model with RANCH. 

      The one formal model that the reviewer might be referring to is the Goldilocks model, discussed in the introduction and shown in Figure 1. We did in fact spend considerable time in an attempt to implement a version of the Goldilocks model as a stimulus-computable framework for comparison. However, we found that it required too many free parameters, such as the precise shape of the inverted U-shape that the Goldilocks model postulates, making it difficult to generate robust predictions that we would feel confident attributing to this model specifically. This assertion may come as a surprise to a reader who expects that formal models should be able to make predictions across many situations, but prior models 1) cannot be applied to specific stimuli, and 2) do not generate dynamics of looking time within each trial. These are both innovations of our work. Instead, even prior formal proposals derive metrics (e.g., surprisal) that can only be correlated with aggregate looking time. And prior, non-formalized theories, such as the Hunter and Ames model, are simply not explicit enough to implement. 

      To clarify this point, we have now explicitly stated in the Introduction that existing models are not stimulus-computable and do not generate predictions for looking behavior at the level of individual trials: 

      “Crucially, RANCH is the first stimulus-computable model of habituation, allowing us to derive quantitative predictions from raw visual stimuli. Previous theoretical accounts have described broad principles of habituation, but they do not generate testable, trial-by-trial predictions of looking behavior. As a result, direct comparisons between RANCH and these models remain challenging: existing models do not specify how an agent decides when to continue looking or disengage, nor do they provide a mechanistic link between stimulus properties and looking time. By explicitly modeling these decision processes, RANCH moves beyond post-hoc explanations and offers a computational framework that can be empirically validated and generalized to new contexts.” 

      We also highlight that our empirical comparisons in Figure 1 evaluate theoretical predictions based on existing conceptual models using behavioral data, rather than direct model-to-model comparisons: 

      “Addressing these three challenges allowed us to empirically test competing hypotheses about habituation and dishabituation using our experimental data (Figure

      \ref{fig:conceptual}). However, because existing models do not generate quantitative predictions, we could not directly compare RANCH to alternative computational models. Instead, we evaluated whether RANCH accurately captured key behavioral patterns in looking time.”

      The justification for using the RMSEA fitting approach could also be stronger - why is this the best way to compare the predictions of the formal model to the empirical data? Are there others? As always, the main issue with formal models is determining the degree to which they just match surface features of empirical data versus providing mechanistic insights, so some discussion of the level of fit necessary for strong inference would be useful. 

      Thank you for recommending additional clarity on our choice of evaluation metrics. RMSE is a very standard measure (for example, it’s the error metric used in fitting standard linear regression!). On the other hand, it captures absolute rather than relative errors. Correlation-based measures (e.g., r and r<sup>2</sup>-type measures) provide a measure of relative distance between predictive measures. In our manuscript we reported both RMSE and R². In the revised manuscript, we have now:

      (1) Added a paragraph in the main text explaining that RMSE captures the absolute error in the same units as looking time, whereas r² reflects the relative proportion of variance explained by the model: 

      “RANCH predictions qualitatively matched habituation and dishabituation in both infants and adults. To quantitatively evaluate these predictions, we fit a linear model (adjusting model‐generated samples by an intercept and scaling factor) and then assessed two complementary metrics. First, the root mean squared error (RMSE) captures the absolute error in the same units as looking time. Second, the coefficient of determination ($R^2$) measures the relative variation in looking time that is explained by the scaled model predictions. Since each metric relies on different assumptions and highlights distinct aspects of predictive accuracy, they together provide a more robust assessment of model performance. We minimized overfitting by employing cross‐validation—using a split‐half design for infant data and ten‐fold for adult data—to compute both RMSE and $R^2$ on held‐out samples.”

      (2) We updated Table 1 to include both RMSE and R² for each model variant and linking hypothesis. We now reported both RMSE and R² across the two experiments. 

      We hope these revisions address your concerns by offering a more comprehensive and transparent assessment of our model’s predictive accuracy.

      Regarding your final question, the desired level of fit for insight, our view is that – at least in theory development – measures of fit should always be compared between alternatives (rather than striving for some absolute level of prediction). We have attempted to do this by comparing fit within- and across-samples and via various ablation studies. We now make this point explicit in the General Discussion:

      More generally, while there is no single threshold for what constitutes a “good” model fit, the strength of our approach lies in the relative comparisons across model variants, linking hypotheses, and ablation studies. In this way, we treat model fit not as an absolute benchmark, but as an empirical tool to adjudicate among alternative explanations and assess the mechanistic plausibility of the model’s components.

      The difference in model predictions for identity vs number relative to the empirical data seems important but isn't given sufficient weight in terms of evaluating whether the model is or is not providing a good explanation of infant behavior. What would falsification look like in this context? 

      We appreciate the reviewer’s observation regarding the discrepancy between model predictions and the empirical data for identity vs.~number violations. We were also very interested in this particular deviation and we discuss it in detail in the General Discussion, noting that RANCH is currently a purely perceptual model, whereas infants’ behavior on number violations may reflect additional conceptual factors. Moreover, because this analysis reflects an out-of-sample prediction, we emphasize the overall match between RANCH and the data (see our global fit metrics) rather than focusing on a single data point. Infant looking time data also exhibit considerable noise, so we caution against over-interpreting small discrepancies in any one condition. In principle, a more thorough “falsification” would involve systematically testing whether larger deviations persist across multiple studies or stimulus sets, which is beyond the scope of the current work. 

      For the novel image similarity analysis, it is difficult to determine whether any differences are due to differences in the way the CNN encodes images vs in the habituation model itself - there are perhaps too many free parameters to pinpoint the nature of any disparities. Would there be another way to test the model without the CNN introducing additional unknowns? 

      Thank you for raising this concern. In our framework, the CNN and the habituation model operate jointly to generate predictions, so it can be challenging to parse out whether any mismatches arise specifically from one component or the other. However, we are not worried that the specifics of our CNN procedure introduces free parameters because:

      (1) The  CNN introduces no additional free parameters in our analyses, because it is a pre‐trained model not fitted to our data. 

      (2) We tested multiple CNN embeddings and observed similar outcomes, indicating that the details of the CNN are unlikely to be driving performance (Figure 12).

      Moreover, the key contribution of our second study is precisely that the model can generalize to entirely novel stimuli without any parameter adjustments. By combining a stable, off‐the‐shelf CNN with our habituation model, we can make out‐of‐sample predictions—an achievement that, to our knowledge, no previous habituation model has demonstrated.

      Related to that, the model contains lots of parts - the CNN, the EIG approach, and the parameters, all of which may or may not match how the infant's brain operates. EIG is systematically compared to two other algorithms, with KL working similarly - does this then imply we can't tell the difference between an explanation based on those two mechanisms? Are there situations in which they would make distinct predictions where they could be pulled apart? Also in this section, there doesn't appear to be any formal testing of the fits, so it is hard to determine whether this is a meaningful difference. However, other parts of the model don't seem to be systematically varied, so it isn't always clear what the precise question addressed in the manuscript is (e.g. is it about the algorithm controlling learning? or just that this model in general when fitted in a certain way resembles the empirical data?) 

      Thank you for highlighting these points about the model’s components and the comparison of EIG- vs. KL-based mechanisms. Regarding the linking hypotheses (EIG, KL, and surprisal), our primary goal was to assess whether rational exploration via noisy perceptual sampling could account for habituation and dishabituation phenomena in a stimulus-computable fashion. Although RANCH contains multiple elements—including the CNN for perceptual embedding, the learning model, and the action policy (EIG or KL)—we did systematically vary the “linking hypothesis” (i.e., whether sampling is driven by EIG, KL, or surprisal). We found that EIG and KL gave very similar fits, while surprisal systematically underperformed.

      We agree that future experiments could be designed to produce diverging predictions between EIG and KL, but examining these subtle differences is beyond the scope of our current work. Here, we sought to establish that a rational model of habituation, driven by noisy perceptual sampling, can deliver strong quantitative predictions—even for out-of-sample stimuli—rather than to fully disentangle forward- vs. backward-looking information metrics.

      We disagree, however, that we did not evaluate or formally compare other aspects of the model. In Table 1 we report ablation studies of different aspects of the model architecture (e.g., removal of learning and noise components). Further, the RMSE and R² values reported in Table 1 and Section 4.2.3 can be treated as out-of-sample estimates of performance and used for direct comparison (because Table 1 uses cross-validation and Section 4.2.3 reports out of sample predictions). 

      Perhaps the reviewer is interested in statistical hypothesis tests, but we do not believe these are appropriate here. Cross-validation provides a metric of out-of-sample generalization and model selection based on the resulting numerical estimates. Significance testing is not typically recommended, except in a limited subset of cases (see e.g. Vanwinckelen & Blokeel, 2012 and Raschka, 2018).

      Reviewer #1 (Recommendations for the authors):

      "We treat the number of samples for each stimulus as being linearly related to looking time duration." Looking times were not log transformed? 

      Thank you for your question. The assumption of a linear relationship between the model’s predicted number of samples and looking time duration is intended as a measurement transformation, not a strict assumption about the underlying distribution of looking times. This linear mapping is used simply to establish a direct proportionality between model-generated samples and observed looking durations.

      However, in our statistical analyses, we do log-transform the empirical looking times to account for skewness and stabilize variance. This transformation is standard practice when analyzing infant looking time data but is independent of how we map model predictions to observed times. Since there is no a priori reason to assume that the number of model samples must relate to looking time in a strictly log-linear way, we retained a simple linear mapping while still applying a log transformation in our analytic models where appropriate.

      It would be nice to have figures showing the results of the grid search over the parameter values. For example, a heatmap with sigma on x and eta on y, and goodness of fit indicated by colour, would show the quality of the model fit as a function of the parameters' values, but also if the parameters estimates are correlated (they shouldn't be). 

      Thank you for the suggestion. We agree that visualizing the grid search results can provide a clearer picture of how different parameter values affect model fit. In the supplementary materials, we already present analyses where we systematically search over one parameter at a time to find the best-fitting values.

      We also explored alternative visualizations, including heatmaps where sigma and eta are mapped on the x and y axes, with goodness-of-fit indicated by color. However, we found that the goodness of fit was very similar across parameter settings, making the heatmaps difficult to interpret due to minimal variation in color. This lack of variation in fit reflects the observation that our model predictions are robust to changes in parameter settings, which allows us to report strong out of sample predictions in Section 4. Instead, we opted to use histograms to illustrate general trends, which provide a clearer and more interpretable summary of the model fit across different parameter settings. Please see the heatmaps below, if you are interested. 

      Author response image 1.

      Model fit (measured by RMSE) across a grid of prior values for Alpha, Beta, and V shows minimal variation. This indicates that the model’s performance is robust to changes in prior assumptions.

      Regarding section 5.4, paragraph 2: It might be interesting to notice that a potential way to decorrelate these factors is to look at finer timescales (see Poli et al., 2024, Trends in Cognitive Sciences), which the current combination of neural nets and Bayesian inference could potentially be adapted to do. 

      Thank you for this insightful suggestion. We agree that examining finer timescales of looking behavior could provide valuable insights into the dynamics of attention and learning. In response, we have incorporated language in Section 5.4 to highlight this as a potential future direction: 

      Another promising direction is to explore RANCH’s applicability to finer timescales of looking behavior, enabling a more detailed examination of within-trial fluctuations in attention. Recent work suggests that analyzing moment-by-moment dynamics can help disentangle distinct learning mechanisms \autocite{poli2024individual}.Since RANCH models decision-making at the level of individual perceptual samples, it is well-suited to capture these fine-grained attentional shifts.

      Previous work integrating neural networks with Bayesian (like) models could be better acknowledged: Blakeman, S., & Mareschal, D. (2022). Selective particle attention: Rapidly and flexibly selecting features for deep reinforcement learning. Neural Networks, 150, 408-421. 

      Thank you for this feedback. We have now incorporated this citation into our discussion section: 

      RANCH integrates structured perceptual representations with Bayesian inference, allowing for stimulus-computable predictions of looking behavior and interpretable parameters at the same time. This integrated approach has been used to study selective attention \autocite{blakeman2022selective}.

      Unless I missed it, I could not find an OSF repository (although the authors refer to an OSF repository for a previous study that has not been included). In general, sharing the code would greatly help with reproducibility. 

      Thanks for this comment. We apologize that – although all of our code and data were available through github, we did not provide links in the manuscript. We have now added this at the end of the introduction section. 

      Reviewer #2 (Recommendations for the authors):

      Page 7 "infants clearly dishabituated on trials with longer exposures" - what are these stats comparing? Novel presentation to last familiar? 

      Thank you for pointing out this slightly confusing passage. The statistics reported are comparing looking time in looking time between the novel and familiar test trials after longer exposures. We have now added the following language: 

      Infants clearly dishabituated on trials with longer exposures, looking longer at the novel stimulus than the familiar stimulus after long exposure.

      Order effects were covaried in the model - does the RANCH model predict similar order effects to those observed in the empirical data, ie can it model more generic changes in attention as well as the stimulus-specific ones? 

      Thank you for this question. If we understand correctly, you are asking whether RANCH can capture order effects over the course of the experiment, such as general decreases in attention across blocks. Currently, RANCH does not model these block-level effects—it is designed to predict stimulus-driven looking behavior rather than more general attentional changes that occur over time such as fatigue. In our empirical analysis, block number was included as a covariate to account for these effects statistically, but RANCH itself does not have a mechanism to model block-to-block attentional drift independent of stimulus properties. This is an interesting direction for future work, where a model could integrate global attentional dynamics alongside stimulus-specific learning. To address this, we have added a sentence in the General Discussion saying:

      Similarly, RANCH does not capture more global attention dynamics, such as block-to-block attentional drift independent of stimulus properties.

      "We then computed the root mean squared error (RMSE) between the scaled model results and the looking time data." Why is this the most appropriate approach to considering model fit? Would be useful to have a brief explanation. 

      Thank you for pointing this out. We believe that we have now addressed this issue in Response to Comment #2 from Reviewer 1. 

      The title of subsection 3.3 made me think that you would be comparing RANCH to alternate hypotheses or models but this seems to be a comparison of ways of fitting parameters within RANCH - I think worth explaining that. 

      We have now added a sentence in the subsection to make the content of the comparison more explicit: 

      Here we evaluated different ways of specifying RANCH's decision-making mechanism (i.e., different "linking hypotheses" within RANCH).

      3.5 would be useful to have some statistics here - does performance significantly improve? 

      As discussed above, we systematically compared model variants using cross-validated RMSE and R² values, which provide quantitative evidence of improved performance. While these differences are substantial, we do not report statistical hypothesis tests, as significance testing is not typically appropriate for model comparison based on cross-validation (see Vanwinckelen & Blockeel, 2012; Raschka, 2018). Instead, we rely on out-of-sample predictive performance as a principled basis for evaluating model variants.

      It would be very helpful to have a formal comparison of RANCH and other models - this seems to be largely descriptive at the moment (3.6).

      We believe that we have now addressed this issue in our response to the first comment.

      Does individual infant data show any nonlinearities? Sometimes the position of the peak look is very heterogenous and so overall there appears to be no increase but on an individual level there is. 

      Thank you for your question. Given our experimental design, each exposure duration appears in separate blocks rather than in a continuous sequence for each infant. Because of this, the concept of an individual-level nonlinear trajectory over exposure durations does not directly apply. Instead, each infant contributes looking time data to multiple distinct conditions, rather than following a single increasing-exposure sequence. Any observed nonlinear trend across exposure durations would therefore be a group-level effect rather than a within-subject pattern.

      In 4.1, why 8 or 9 exposures rather than a fixed number? 

      We used slightly variable exposure durations to reduce the risk that infants develop fixed expectations about when a novel stimulus will appear. We have now clarified this point in the text.

      Why do results differ for the model vs empirical data for identity? Is this to do with semantic processing in infants that isn't embedded in the model? 

      Thank you for your comment. The discrepancy between the model and empirical data for identity violations is related to the discrepancy we discussed for number violations in the General Discussion. As noted there, RANCH relies on perceptual similarity derived from CNN embeddings, which may not fully capture distinctions that infants make.

      The model suggests the learner’s prior on noise is higher in infants than adults, so produces potentially mechanistic insights. 

      We agree! One of the key strengths of RANCH is its ability to provide mechanistic insights through interpretable parameters. The finding that infants have a higher prior on perceptual noise than adults aligns with previous research suggesting that early visual processing in infants is more variable and less precise.

    1. eLife Assessment

      This important work presents an example of how genomic data can be used to improve understanding of an ongoing, long-term bacterial outbreak in a hospital with an application to multi-drug resistant Pseudomonas aeruginosa, and will be of interest to researchers concerned with the spread of drug-resistant bacteria in hospital settings. The convincing genomic analyses highlight the value of routine surveillance of patients and environmental sampling and show how such data can help in dating the origin of the outbreak and in characterising the epidemic lineages. These findings highlight the importance of understanding environmental factors contributing to the transmission of P. aeruginosa for guiding and tailoring infection control efforts.

    2. Reviewer #1 (Public review):

      Summary:

      This is a manuscript describing outbreaks of Pseudomonas aeruginosa ST 621 in a facility in the US using genomic data. The authors identified and analysed 254 P. aeruginosa ST 621 isolates collected from a facility from 2011 to 2020. The authors described the relatedness of the isolates across different locations, specimen types (sources), and sampling years. Two concurrently emerged subclones were identified from the 254 isolates. The authors predicted that the most recent common ancestor for the isolates can be dated back to approximately 1999 after the opening of the main building of the facility in 1996. Then the authors grouped the 254 isolates into two categories: 1) patient-to-patient; or 2) environment-to-patient using SNP thresholds and known epidemiological links. Finally, the authors described the changes of resistance gene profiles, virulence genes, cell wall biogenesis and signaling pathway genes of the isolates over the sampling years.

      Strengths:

      The major strength of this study is the utilisation of genomic data to comprehensively describe the characteristics of a long-term Pseudomonas aeruginosa ST 621 outbreak in a facility. This fills the data gap of a clone that could be clinically important but easily missed from microbiology data alone.

      Weaknesses:

      As the authors highlighted in the Discussion section, a limitation of this study is that there is potential sampling bias due to partial sampling of clinical P. aeruginosa isolates. However, the work is still important to showcase the potential benefits of applying genomic sequencing techniques to support infection prevention controls in hospital settings. The limitation on potential sampling bias could inspire further work to explore an optimal clinical isolate sampling framework for genomic analyses to support outbreak investigation. The other limitation that the authors have highlighted in the Discussion session is the lack of epidemiology data to support the interpretation of the inferred patient-to-patient and environment-to-patient transmissions, which emphasised the importance of metadata to complement genomic data analysis in outbreak investigation for future studies.

      Impact of the work:

      First, the work adds to the growing evidence implicating sinks as long-term reservoirs for important MDR pathogens, with direct infection control implications. Moreover, the work could potentially motivate investments in generating and integrating genomic data into routine surveillance. The comprehensive descriptions of the Pseudomonas aeruginosa ST 621 clones outbreak is a great example to demonstrate how genomic data can provide additional information about long-term outbreaks that otherwise could not be detected using microbiology data alone. Moreover, identifying the changes in resistance genes and virulence genes over time would not be possible without genomic data. Finally, this work provided additional evidence for the existence of long-term persistence of Pseudomonas aeruginosa ST 621 clones, which likely occur in other similar settings.

      Comments on revisions:

      The paper would be further strengthened from an additional timeline indicating when routine surveillance was introduced and examples of actions or changes guided by the surveillance data that resulted in decrease in ST 621 transmission. This additional information would be useful to support the final statement in the Abstract suggesting "Since initial identification, extensive infection control efforts guided by routine, near real- time surveillance have proved successful at slowing transmission."

    3. Reviewer #2 (Public review):

      Summary:

      The authors present a report of a large Pseudomonas aeruginosa hospital outbreak affecting more than 80 patients with first sampling dates in 2011 that stretched over more than 10 years and was only identified through genomic surveillance in 2020. The outbreak strain was assigned to the sequence type 621, an ST that has been associated with carpabapenem resistance across the globe. Ongoing transmission coincided with both increasing resistance without acquisition of carbapenemase genes as well as convergence of mutations towards a host-adapted lifestyle.

      Strengths:

      The convincing genomic analyses indicate spread throughout the hospital since the beginning of the century and provide important benchmark findings for future comparison

      The sampling was based on all organisms sent to the Multidrug resistant Organism Repository and Surveillance Network across the U.S. Military Health System.

      Using sequencing data from patient and environmental samples for phylogenetic and transmission analyses as well as determining recurring mutations in outbreak isolates allows for insights into the evolution of potentially harmful pathogens with the ultimate aim of reducing their spread in hospitals.

      Weaknesses:

      The epidemiological information was limited and the sampling methodology was inconsistent, thus complicating inference of exact transmission routes. Epidemiological data relevant for this analysis include information on the reason for sampling, patient admission and discharge data and underlying frequency of sampling and sampling results in relation to patient turnover.

      Comments on revisions:

      Thank you for the careful revision and consideration of my comments.

      I am pleased to confirm that all my concerns have been comprehensively addressed.

      The changes and additions made have resolved my initial feedback, and I see no need to alter my evaluation.

    4. Reviewer #3 (Public review):

      Summary:

      This paper by Stribling and colleagues sheds light on a decade-long P. aeruginosa outbreak of the high-risk lineage ST-621 in a US Military hospital. The origins of the outbreak date back to the late 90s and it was mainly caused by two distinct subclones SC1 and SC2. The data of this outbreak showed the emergence of antibiotic resistance to cephalosporin, carbapenems and colistin over time highlighting the emerging risk of extensively resistant infections due to P. aeruginosa and the need for ongoing surveillance.

      Strengths:

      This study, overall, is well constructed and clearly written. Since detailed information on floor plans of the building and transfers between facilities was available, the authors were able to show that these two subclones emerged in two separate buildings of the hospital. The authors support their conclusions with prospective environmental sampling in 2021 and 2022 and link the role of persistent environmental contamination to sustaining nosocomial transmission. Information on resistance genes in repeat isolates for the same patients allowed the authors to detect the emergence of resistance within patients. The conclusions have broader implications for infection control at other facilities. In particular, the paper highlights the value of real-time surveillance and environmental sampling in slowing nosocomial transmission of P. aeruginosa.

      Weaknesses:

      My major concern is that the authors used fixed thresholds and definitions to classify the origin of an infection. As such, they were not able to give uncertainty measures around transmission routes nor quantify the relative contribution of persistent environmental contamination vs patient-to-patient transmission. The latter would allow the authors to quantify the impact of certain interventions. In addition, these results represent a specific US military facility and the transmission patterns might be specific to that facility. The study also lacked any data on antibiotic use that could have been used to relate to and discuss the temporal trends of antimicrobial resistance.

      Comments on revisions:

      The authors have addressed my concerns adequately in the revised manuscript.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This is a manuscript describing outbreaks of Pseudomonas aeruginosa ST 621 in a facility in the US using genomic data. The authors identified and analysed 254 P. aeruginosa ST 621 isolates collected from a facility from 2011 to 2020. The authors described the relatedness of the isolates across different locations, specimen types (sources), and sampling years. Two concurrently emerged subclones were identified from the 254 isolates. The authors predicted that the most recent common ancestor for the isolates can be dated back to approximately 1999 after the opening of the main building of the facility in 1996. Then the authors grouped the 254 isolates into two categories: 1) patient-to-patient; or 2) environment-to-patient using SNP thresholds and known epidemiological links. Finally, the authors described the changes in resistance gene profiles, virulence genes, cell wall biogenesis, and signaling pathway genes of the isolates over the sampling years.

      Strengths:

      The major strength of this study is the utilisation of genomic data to comprehensively describe the characteristics of a long-term Pseudomonas aeruginosa ST 621 outbreak in a facility. This fills the data gap of a clone that could be clinically important but easily missed from microbiology data alone.

      Weaknesses:

      The work would further benefit from a more detailed discussion on the limitations due to the lack of data on patient clinical information, ward movement, and swabs collected from healthcare workers to verify the transmission of Pseudomonas aeruginosa ST 621, including potential healthcare worker to patient transmission, patient-to-patient transmission, patient-to-environment transmission, and environment-to-patient transmission. For instance, the definition given in the manuscript for patient-to-patient transmission could not rule out the possibility of the existence of a shared contaminated environment. Equally, as patients were not routinely swabbed, unobserved carriers of Pseudomonas aeruginosa ST 621 could not be identified and the possibility of misclassifying the environment-to-patient transmissions could not be ruled out. Moreover, reporting of changes in rates of resistance to imipenem and cefepime could be improved by showing the exact p-values (perhaps with three decimal places) rather than dichotomising the value at 0.05. By doing so, readers could interpret the strength of the evidence of changes.

      Impact of the work:

      First, the work adds to the growing evidence implicating sinks as long-term reservoirs for important MDR pathogens, with direct infection control implications. Moreover, the work could potentially motivate investments in generating and integrating genomic data into routine surveillance. The comprehensive descriptions of the Pseudomonas aeruginosa ST 621 clones outbreak is a great example to demonstrate how genomic data can provide additional information about long-term outbreaks that otherwise could not be detected using microbiology data alone. Moreover, identifying the changes in resistance genes and virulence genes over time would not be possible without genomic data. Finally, this work provided additional evidence for the existence of long-term persistence of Pseudomonas aeruginosa ST 621 clones, which likely occur in other similar settings.

      We thank the reviewer for their thorough evaluation of our work, and for the suggested improvements. A main goal of this study was to show that integrating routine wgs in the clinic was a game changer for infection control efforts. We appreciate this aspect was highlighted as a strength by this reviewer. While some of the weaknesses identified are inherent to the data (or lack thereof) available for this study, we have revised the manuscript to include a detailed discussion on limitations (sampling, thresholds of genetic relatedness, definition and categories etc.) that could influence the genomic inferences. We also provided exact p-values for the changes in rates of resistance, as requested. Finally, we have positively answered all the specific recommendations suggested by the reviewer and modified the manuscript accordingly.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a report of a large Pseudomonas aeruginosa hospital outbreak affecting more than 80 patients with first sampling dates in 2011 that stretched over more than 10 years and was only identified through genomic surveillance in 2020. The outbreak strain was assigned to the sequence type 621, an ST that has been associated with carpabapenem resistance across the globe. Ongoing transmission coincided with both increasing resistance without acquisition of carbapenemase genes as well as the convergence of mutations towards a host-adapted lifestyle.

      Strengths:

      The convincing genomic analyses indicate spread throughout the hospital since the beginning of the century and provide important benchmark findings for future comparison.

      The sampling was based on all organisms sent to the Multidrug-resistant Organism Repository and Surveillance Network across the U.S. Military Health System.

      Using sequencing data from patient and environmental samples for phylogenetic and transmission analyses as well as determining recurring mutations in outbreak isolates allows for insights into the evolution of potentially harmful pathogens with the ultimate aim of reducing their spread in hospitals.

      Weaknesses:

      The epidemiological information was limited and the sampling methodology was inconsistent, thus complicating the inference of exact transmission routes. Epidemiological data relevant to this analysis include information on the reason for sampling, patient admission and discharge data, and underlying frequency of sampling and sampling results in relation to patient turnover.

      We thank the reviewer for their thoughtful feedback on our manuscript and for highlighting the quality of the genomic analyses. We agree that the lack of patient epi data (e.g. date of admission and discharge) and the inconsistent sampling through the years are limitations of this study. We have revised the manuscript to acknowledge these limitations and discuss how not having this data complicates the inference of exact transmission routes. Finally, we have positively answered all the specific recommendations suggested by the reviewer and modified the manuscript accordingly.

      Reviewer #3 (Public Review):

      Summary:

      This paper by Stribling and colleagues sheds light on a decade-long P. aeruginosa outbreak of the high-risk lineage ST-621 in a US Military hospital. The origins of the outbreak date back to the late 90s and it was mainly caused by two distinct subclones SC1 and SC2. The data of this outbreak showed the emergence of antibiotic resistance to cephalosporin, carbapenems, and colistin over time highlighting the emerging risk of extensively resistant infections due to P. aeruginosa and the need for ongoing surveillance.

      Strengths:

      This study overall is well constructed and clearly written. Since detailed information on floor plans of the building and transfers between facilities was available, the authors were able to show that these two subclones emerged in two separate buildings of the hospital. The authors support their conclusions with prospective environmental sampling in 2021 and 2022 and link the role of persistent environmental contamination to sustaining nosocomial transmission. Information on resistance genes in repeat isolates for the same patients allowed the authors to detect the emergence of resistance within patients. The conclusions have broader implications for infection control at other facilities. In particular, the paper highlights the value of real-time surveillance and environmental sampling in slowing nosocomial transmission of P. aeruginosa.

      Weaknesses:

      My major concern is that the authors used fixed thresholds and definitions to classify the origin of an infection. As such, they were not able to give uncertainty measures around transmission routes nor quantify the relative contribution of persistent environmental contamination vs patient-to-patient transmission. The latter would allow the authors to quantify the impact of certain interventions. In addition, these results represent a specific US military facility and the transmission patterns might be specific to that facility. The study also lacked any data on antibiotic use that could have been used to relate to and discuss the temporal trends of antimicrobial resistance.

      We thank the reviewer for their evaluation of our work and for highlighting the broad implications of our findings regarding the application of real-time surveillance to suppress nosocomial transmission. We agree with the reviewer that fixed thresholds and definitions are imperfect to classify the origin of an infection. The design of this study (e.g. inconsistent sampling through time) was not conducive to provide a comprehensive/quantitative measurement of transmission routes. Thus, we decided to apply conservative thresholds of genetic relatedness and strict conditions (e.g. time between isolate collection, shared hospital location etc.) to favor specificity as our goal was simply to establish that cases of environmentto-patient transmission did happen. In the absence of a truth set, we have not performed sensitivity analysis, but we are conducting a follow-up study to compare inferences from MCMC models to our original fixed-thresholds predictions. This limitation is now discussed in the revised manuscript. Finally, we have positively answered all the specific recommendations suggested by the reviewer and modified the manuscript accordingly including the addition of Figure S3.

      Reviewer #1 (Recommendations For The Authors):

      The definitions used on lines 391-396 are necessarily somewhat arbitrary, but it would be helpful to have a little bit more justification for the choices made, particularly for the definition of environmental involving the "3x the number of years they were separated". It seems a little hard to square this with the more relaxed 10 SNP cutoff for a patient-to-patient designation. Are there reasons for thinking SNP differences associated with environmental transmission should be smaller than for patient-to-patient, or is the aim here just to set the bar higher for assuming an environmental source? Because these definitions are quite arbitrary, there could also be some value in exploring the sensitivity of the results to these assumptions.

      Thank you. We agree with the reviewers that SNP thresholds, albeit necessarily, are arbitrary and that more discussion/justification was needed to put the genomic inferences in context. We have revised the manuscript to indicate that: 1/ the 10 SNP cutoff for a patient-to-patient designation was set to account for the known evolution rate of P. aeruginosa (inferred by BEAST at 2.987E-7 subs/site/year in this study and similar to previous estimates PMID: 24039595) and the observed within host variability (now displayed in revised Fig. 1E). We note that this SNP distance was not sufficient and that an epi link (patients on the same ward at the same time) needed to be established. 2/ the environment-to-patient definition was indeed set to be most conservative (nearly identical isolates in two patients from the same ward with no known temporal overlap for > 365 days). This was indeed done to favor high specificity as this inference relied solely on clinical isolates (i.e. the identical environmental strain in the patientenvironment-patient chain was not sampled). For these clinical isolates to have acquired no/very little mutation in that much time, no/low replication is expected and, although unsampled, we propose this most likely happened on hospital surfaces.

      While the term "core genome" should be familiar to most readers, "shell genome" and "cloud genome" are less widely known, and an explanation of what these terms mean here would be helpful.

      Thank you. We have revised the manuscript to define the core, shell, and cloud genomes as genes sets found in ≥ 99%, ≥ 95% and ≥ 15% of isolates, respectively.

      In the first paragraph of the discussion, it could be added that in many cases for clinically important Gram negatives short read sequencing alone will fail to detect transmission events as outbreaks can be driven by plasmid spread with only very limited clonal spread (see, for example, https://www.nature.com/articles/s41564-021-00879-y )

      Thank you. We agree this is an important/emerging aspect of surveillance. However, the goal of this discussion point was to explain why such a large outbreak was missed prior to implementing WGS (short read) surveillance. We feel that discussing “plasmid outbreaks” (which is not at play here, and relatively rare in P. aeruginosa compared to the Enterobacteriaceae) and the need for long read will distract from the narrative. 

      line 599 What does "Mock" mean here? Would it be more accurate to say it is a simplified floor plan?

      Thank you. “Mock” was changed to “simplified”

      IPAC abbreviation is only used once - spelling it out in full would increase readability.

      Revised manuscript was edited as suggested.

      MHS is only used twice.

      Revised manuscript was edited to spell out Military Health System

      Line 364: full stop missing.

      Revised manuscript was edited as suggested.

      Line 401: Bayesian rather than bayesian.

      Revised manuscript was edited as suggested.

      Reviewer #2 (Recommendations For The Authors):

      Thank you for giving me the opportunity to review this interesting manuscript.

      The conclusions of this paper are mostly well supported by the data presented, but epidemiological information was limited and the sampling methodology was inconsistent, thus complicating inference of exact transmission routes.

      Major issues:

      What was the baseline frequency of clinical and/or screening samples of Pseudomonas aeruginosa at the hospital? Neither Figure 1D nor Table S1 allows for differentiating between clinical and screening samples. Most isolates were cultured from clinical materials, and there is no information about the patients' length of stay and their respective sampling dates. Is there any possibility of finding out whether the samples were collected for clinical or screening purposes? Would it be possible to include the patients' admission data to determine whether the strains were imported into the hospital or related to a previous stay, e.g. among known carriers? Also, the issue of sampling dates vs. patient stay on the ward should be addressed, as there may be an overlap in patients' stay on the ward but no overlap in terms of sampling dates or even missing samples (missing links).

      We have revised the manuscript to address this important point: i) 16 isolates were from surveillance swabs and are labelled “Surveillance” in Table S1. The remaining 237 were clinical isolates; ii) unfortunately, because the sampling was done under a public health surveillance framework, we do not have access to historical patient data (admission/discharge date, wards, rooms, etc.) and we can not calculate length of stay or better identify patient overlap. These limitations are now acknowledged in the discussion of the revised manuscript.

      In order to evaluate the extent of the outbreak, more epidemiological data would be useful What is the size of the hospital, what is the average patient turnover, and what is the average length of stay in ICU and non-ICU? Is there any specialization besides the military label?

      We have revised the manuscript to indicate that facility A is 425-bed medical center and is the only Level 1 trauma center in the Military Health System. Unfortunately, the data to calculate length of stay, throughout the years, in ICU and non-ICU, was not available to us. This limitation is now also acknowledged in the discussion.

      Perhaps the authors could attempwt to discuss the extent to which large outbreaks like these may be considered as part of unavoidable evolutionary processes within the hospital microbiome as opposed to accumulation and transmission of potentially harmful genes/clones, and differentiate between the putative community spread without any epidemiological links on the one hand, and hospital outbreaks that could be targeted by local infection prevention activities on the other hand.,

      We respectfully disagree with the suggestion that this large outbreak “may be considered as part of unavoidable evolutionary processes within the hospital microbiome” and should be opposed to “transmission of potentially harmful genes/clones”. As a matter of fact, our data showed that infection control staff at Facility A responded with multiple interventions, including closing sinks, replacing tubing, and using foaming detergents. This resulted in slowing the spread of the ST621 outbreak with just 3 cases identified in 2022, 0 cases in 2023 and 1 case in 2024. This is now discussed in the revised manuscript.

      Page 5, lines 88-92 lines 101-104. It seems as if the outbreak was identified only by the means of genomic surveillance. This raises questions as to the rationale for sampling and sequencing, especially prior to 2020. Considering 11 cases per year between 2011 and 2016, one could assume such an outbreak would have been noticed without sequencing data.

      The MRSN was created in 2010, in response to the outbreak of MDR Acinetobacter baumannii in US military personnel returning from Iraq and Afghanistan. Between 2011 and 2017, the MRSN collected MDR isolates (mandate for all MDR ESKAPE but compliance varied between years and facilities) from across the Military Health System and, for select isolates (e.g. high-risk isolates carrying ESBLs or carbapenemases) performed molecular typing by PFGE. In 2017 the MRSN started to perform whole genome sequencing of its entire repository. In 2020, a routine prospective sequencing service was started and first detected the ST621 outbreak. A retrospective analysis of historical isolate genomes (2011-2019) identified additional cases. The first paragraph of the discussion lists possible factors to explain why the ST621 escaped detection by traditional approaches. We believe 11 cases per year is not a strong signal when stratified by month, wards, or both, especially for a clone lacking a carbapenemase and without a remarkable antibiotic susceptibility profile. 

      Did the infection control personnel suspect transmission? If yes, was the sampling and submission of samples to the MRSN adapted based on the epidemiologic findings?

      The ST621 outbreak was unsuspected before the initial genomic detection in 2020. Until that point, MDR isolates only (Magiorakos et al PMID: 21793988) were collected but compliance was variable through time. Quickly thereafter (starting in 2021), complete sampling of all clinical P. aeruginosa (MDR or not) from Facility A was started. The manuscript was revised to clarify those details of the sampling strategy.

      Is there any information about how many environmental sites were sampled without evidence of ST621 / screening samples were cultured without evidence of Pseudomonas aeruginosa?

      For patient isolates, only 16 isolates were from surveillance swabs. The remaining 237 were clinical isolates. No denominator data was available to calculate P. aeruginosa and ST-621 positivity rate in surveillance swabs throughout the time period. For environmental isolates, a total of 159 swabs were taken from 55 distinct locations in 8 wards/units including the ER. This data is now included in the revised manuscript. However, a complete analysis of these swabs (positivity rate for ESKAPE pathogens, P. aeruginosa, per ward/floor/room, per swab type (sink drain, bed rail etc.) etc.) is beyond the scope of this study and is being performed as a follow up investigation.

      Page 5 lines 89 and 39 Figure S1B. Please describe how the allelic distance for the cluster threshold was selected.

      As indicated in the legend of Figure S1B, no thresholds were applied. All ST621 isolates ever sequenced by the MRSN were included. All except 3 isolates shared between 023 cgMLST allelic differences. The remaining 3 were distant by 88-89 allelic differences. The text was revised to clarify this point.

      Page 5 lines 99-100. Could the authors please provide some distribution measures (e.g. IQR).

      Done as requested. The revised manuscript now reads “…of just 38 single nucleotide polymorphisms (SNPs), and an IQR of 19 (Fig. 1A, Table S1).”

      Page 5 line 102. Could the authors please provide some distribution measures (e.g. IQR).

      Please see above. A chart was created and is now included as Fig. S2.

      Page 6 line 107 and page 34 figure 1c. In the text it is stated that isolates were collected in 27 wards, the figure 1C depicts 26 wards and n/a.

      Thank you for spotting this inconsistency. This has been fixed in the revised manuscript.

      Page 6 lines 117-118. Samples collected in the emergency room would imply samples collected on admission, already addressed previously. Did the authors investigate a potential import into the hospital from community reservoirs or were all these isolates collected among patients who had been previously admitted to the hospital and/or tested positive for the outbreak strain?

      We agree that samples collected in the ER imply samples collected on admission. Of the 29 ER isolates only 9 (31%) were primary isolates (first detection in a new patient) which suggests a majority were from returning patients at Facility A. Because the sampling was done under a public health surveillance framework, we do not have access to historical patient data (admission/discharge date, wards, rooms, etc.) to investigate/confirm that these 9 patients had previous visits at Facility A. This point is now discussed in the revised manuscript.

      Page 6 line 128. This could also represent increased selective pressure. However, according to Table S1, the 28 isolates collected in 2011 (the number does not match with Figure 1D) were from many different wards, thus indicating earlier spread throughout the hospital.

      Yes, we agree. Please note that table S1 lists all isolates for 2011 whereas Figure 1D focuses on primary (first isolate from each patients) only.  

      Page 7 line 133. Both Figure 2 and the discussion section, page 13 line 296 suggest the year 2005 instead of 2004?

      Thank you for catching this typographical error. This was corrected to 2004 in the revised manuscript.

      Figure 1E. The figure should also depict intra-patient diversity for comparison.

      Thank you for this great suggestion. We have revised Figure 1E accordingly.

      Page 7, lines 146-147 Could the authors attempt explaining the upper part of the bimodal peaks?

      This is an all-vs-all SNP analysis for all inter-patient isolates. For each isolates all distances to other isolates are reported, not only the smallest. The upper peaks represent comparisons to isolates from a different outbreak subclone (SC1 vs SC2).

      Page 7, line 150 This is a very small number considering the extent of the outbreak and suggests a large number of missing links. Or does this rather imply continuous import and evolution over time that does not necessarily represent transmission within the hospital?

      We believe all cases were due to transmission happening within the hospital. Based on conservative thresholds (genetic relatedness and epi link, or lack thereof) the precise origin from another patient (n=10) or a contaminated surface (n=12) can be inferred. For the remaining 60 patients, with the available sampling, the conditions we chose are not met and we simply do not conclude whether a direct patient-to-patient or an environmental origin was more likely.

      Page 8 line 155. What does the temporal overlap refer to - sampling date versus patient's stay on the ward? Please specify.

      The temporal overlap was investigated from sampling dates, as dates of patient admission/discharged were not available.

      Page 8, line 157: What does primary/serial isolate mean - first and follow-up samples of ST621 per patient?

      Yes. Primary isolate is used to designate the first isolate from a patient. Serial isolates designate follow-up samples of ST621.

      Page 8 line 165: Table S3 and Figure 3 only refer to environmental samples from three wards. Ward 20 rooms 2 and 18 as well as ward 1 rooms 1 and 6 were hotspots - is there any information on the specific infection control/disinfection measures? Addressed in discussion page 12, lines 273-275, but no information on what was actually done.

      The manuscript was revised to indicate the precise disinfection measures that were taken. A follow-up study is ongoing to assess long-term efficacy and monitor possible retrograde growth from previously contaminated sinks.

      Page 8 line 175: Evaluation of change in resistance fraction over time - There may have been a selection bias with an inconsistent number of strains sequenced per year.

      Yes, incomplete sampling and possible selection bias are now listed with other limitations of this study in the discussion of the revised manuscript.

      Page 9 line 183: The referral to Table S1 is unclear, I could not find the number and the specific isolates selected for long-read sequencing.

      Thank you. This has been added to the revised Table S1.

      Page 10 lines 217-225 and Figure 4C: Perhaps it is possible to better align what is written in the text and the caption of the figure. The caption does not clarify that only one patient develops colistin resistance (what was the reason to include the other patients?).

      Thank you. We have revised the text and the caption of the figure to clarify that only isolates from one patient developed colistin resistance. The isolates from the other patients on Fig. 4C are shown to provide context and accurately map the emergence of the PhoQE77fs mutation.  

      Page 10, lines 228-229 and Table S5: How is it possible to identify those 64 genes in Table S5?

      We have revised Table S5 to facilitate the identification of the 64 genes with ≥ 2 independently acquired mutations (excluding SYN). Specifically, we have added column E labeled “Counts independent mutations per locus (excluding SYN)”. A total of 205 rows (in this table each row is a variant) have a value ≥ 2 and these represent 64 genes (upon deduplication of locus tags).  

      Page 13, lines 280-281: Where is the information on chronic infection presented? Serial cultures would not necessarily mean chronic infection.

      Authors response: Yes, we agree this was not the appropriate characterization and this was revised to ‘long-term’ infections.

      Page 14 line 306: Emergence of colistin resistance in a single patient, correct?

      Yes. This was further clarified in the text.

      Page 14 lines 315-320: This should go to the results section. In particular disinfection, closing, and replacing of tubing should be mentioned in the results section in reference to the results presented in Table S3.

      Thank you. We have considered this suggestion and have decided to leave this discussion as the closing paragraph of this publication. A follow-up study is ongoing to assess long-term efficacy of these interventions on the ST-621 bur also other outbreak clones at Facility A.

      Methods

      Page 15 lines 330-333: Perhaps it is possible to avoid redundancy.

      Thank you. We have revised the text accordingly.

      Page 15 lines 341: Information on which isolates were subjected to long-read sequencing is missing.

      Thank you. This has been added to the revised Table S1.

      Page 16 line 345: Was there a particular reason why Newbler was chosen?

      No. At the time Newbler was the default assembler built in the MRSN bacterial genome analysis pipeline and QC processes.

      Page 16, line 357-358: What was the rationale for selecting this isolate as reference genome?

      This isolate was chosen because it was collected early in the outbreak and phylogenetic analysis revealed it had low root to tip divergence.

      Page 16 line 361: Why 310 isolates, if only 253 were assigned to the outbreak clone and only a subset of those were collected in facility A?

      This was a typographical error that has corrected (it now reads “…set of 253 isolates.”) in the revised manuscript.  

      Page 17 lines 387-395: What is the reason that intra-patient diversity was not included in the set of criteria for SNP distances?

      The observed within host variability (now displayed in revised Fig. 1E) was taken into consideration when setting SNP thresholds for categorizing patient-to-patient transmission or environment-to-patient event. This is now clarified in the revised manuscript.

      Page 17 line 392: How was the threshold of <=10 SNPs determined?

      The 10 SNP cutoff to infer a patient-to-patient transmission event was set to account for the known evolution rate of P. aeruginosa (inferred by BEAST at 2.987E-7 subs/site/year in this study, and similar to previous estimates PMID: 24039595) and the observed within host variability (now displayed in revised Fig. 1E). We note that this SNP distance was not sufficient and that an epi link (patients on the same ward within the same month) needed to be established.

      Page 17 line 395 and Figure 2: What was the assumed average mutation rate per genome per year?

      Thank you. The mean substitution rate inferred by BEAST was 2.987E-7 similar to estimate from previous studies on P. aeruginosa outbreaks (e.g. PMID: 24039595).

      Reviewer #3 (Recommendations For The Authors):

      Please find (line-by-line comments) on each section of the manuscript below:

      Introduction

      Line 86: I am wondering why the authors state ">28 facilities" instead of the exact number of facilities from which these lineages were recovered.

      Thank you. Manuscript was revised to provide the exact number of facilities. It now reads “…recovered from 37 and 28 facilities, respectively.”

      Methods

      It's not clear to me which criteria were used for collecting these isolates (both prospective and retrospective). I understand that some of the data are described in more detail in Lebreton et al but I did not find the specific criteria for the collection of the isolates and I imagine that these might differ if different facilities. Would it be possible to comment on that and add a short paragraph in the Methods section?

      Thank you. This lack of clarity was also raised by other reviewers, and we have revised the manuscript to indicate that: 1/MDR isolates only (Magiorakos et al PMID: 21793988) were collected from 2011-2020 with the same criteria for all facilities although compliance was variable through time and between facilities; and 2/ starting in 2021 all P. aeruginosa isolates, irrespective of their susceptibility profile, were collected from Facility A

      The data comes from a US Military hospital. Is this related to the US Veterans Affairs Healthcare system? Is there more detailed information about the demographics of the patient population?

      Facility A is part of the Military Health System (MHS) which provides care for active service members and their families. This is distinct from the US Veterans Affairs Healthcare system. Only limited patient data was accessible to us as this study was done as part of our public health surveillance activities. Patient age (avg. 57.2 +/- 21.0) and gender (ratio male/female 1.7) are provided in the revised manuscript. 

      Line 384ff: The origin of infection was inferred based on the SNP threshold and epidemiological links. However, recombination events can complicate the interpretation of SNP data. Have the authors attempted to account for this?

      Thank you. We agree that recombination events can complicate the interpretation of SNP data. We used Gubbins v2.3.1 to filter out recombination from the core SNP alignment, as indicated in the revised manuscript.

      The authors' definition of environment-to-patient transmission seems conservative (nearly identical strain and no known temporal overlap for > 365 days). Have the authors changed the threshold, performed sensitivity analyses, and tested how this would affect their results?

      Indeed, acknowledging that fixed thresholds have limitations in their ability to accurately predict the origin of infections, we took a conservative approach to favor specificity as our goal was simply to establish that cases of environment-to-patient transmission did happen. In the absence of a truth set, we have not performed sensitivity analysis, but we are conducting a follow-up study to compare inferences from MCMC models to our original predictions. This limitation is now discussed in the revised manuscript.

      The authors don't seem to incorporate the role of healthcare workers in the transmission process. Could they comment on this? I am assuming that environment-to-patient transmission could either be directly from the environment to the patient or via a healthcare worker. I think it's fine to make simplifying assumptions here but it would be great if this was explicitly described.

      Thank you for this suggestion. We have not sampled the hands of healthcare workers in this study. As a result, the reviewer is correct to say that we made the simplifying assumption that healthcare workers would be possible intermediates in either environment-topatient or patient-to-patient transmissions, as previously described by others (PMID: 8452949). This limitation is now discussed in the revised manuscript.

      Page 5, line 100: What does "all vs all" mean? Based on the supplement, I assume it's the pairwise distance and then averaged across all of those. It would improve the readability of the manuscript if the authors could briefly define this term and then maybe refer to Table S1.

      Thank you. We have created Fig.S2 and revised the manuscript to state that ST-621 isolates from facility A belonged to the same outbreak clone with a distance (averaged all vs all pairwise comparison) of just 38 single nucleotide polymorphisms (SNPs), and an IQR of 19 (Fig. S2, Table S1).

      Figure 1D: It would be interesting to see additional figures in the supplement on the percentage of sequenced isolates per year and whether it varies across the different sources/sites. Is there any information on which isolates were chosen for sequencing?

      Lack of clarity in the sampling/sequencing scheme was raised by multiple reviewers and we have provided a thorough response to earlier comments. We also have revised the material and methods section accordingly. Finally, we have created Fig. S3 to show the percentage of sequenced isolates per year across different sources/sites, as suggested by the reviewer. No noticeable patterns were observed. 

      It seems like only a subset of all clinical isolates were sequenced. Would it be possible that SC2 was present already earlier but not picked up until a certain date?

      Although all isolates received by the MRSN were sequenced, compliance varied through time so it is true that not all clinical isolates were sequenced between 2011-2019. As such, we fully agree with this hypothesis and discuss this possibility as BEAST analysis placed the origin of SC2 in 2004 while the first detection of an SC2 isolate was in December 2012. This limitation is now discussed in the revised manuscript.

      Could the authors elaborate on whether the isolates resulted from single-colony picks? Is it possible that the different absence of a subclone is due to the fact that they picked only a colony?

      Yes, the isolates resulted from single-colony picks except when the presence of different colony morphologies was noted. In the latter, representative isolates for each colony morphologies were processed. We have revised the methods to make that clear.

      Figure 2: It is difficult to see which nodes belong to which patient due to the small font size. I wonder if it was possible to color the nodes for each patient, to make it more readable.

      We tried coloring the nodes but with > 60 distinct patients/colors we decided it did not improve clarity. We have revised figure 2 to increase the font size.  

      Page 7-8, lines 154-155: Did the authors check whether there were isolates of the same strain (that were found in the environment) present in other patients elsewhere in the ward?

      Yes. In rare cases, we observed virtually genetically identical isolates from two patients collected in different wards. Because we only have access to clinical isolate data (collected from patient X in ward Y) and do not have access to patient data (admission/discharge date, wards, rooms, etc.), we do not know but cannot exclude that patients overlap in a room prior to the sampling of their P. aeruginosa isolates. We designed our fixed thresholds to be conservative. As a result, in this analysis, these cases are labelled as “undetermined”.  

      Page 8: Do the authors have any information on antibiotic use during this timeframe? From the discussion, it seems like there is no patient-level prescription data. Is there any data on overall trends? How were trends in antibiotic use correlated with trends in antibiotic resistance?

      Unfortunately, patient-level prescription data (or any other data not linked to the bacterial specimens) was not accessible to us as this study was done as part of our public health surveillance activities.

      To infer the origin of infection, the authors used a static method with fixed thresholds and definitions. This study does not provide any uncertainty with their estimates. Maybe the authors could add a sentence in the discussion section that MCMC methods to infer transmission trees incorporating WGS could provide these estimates. These methods have not been applied to PA a lot but two examples where MCMC methods have been used without WGS (though the definition of environmental contamination may differ between these studies and this study).

      https://doi.org/10.1186/s13756-022-01095-x

      https://doi.org/10.1371/journal.pcbi.1006697

      Thank you for this great suggestion. We have revised the manuscript to include a discussion on the limitations of fixed thresholds to infer transmission chains/origins, and to discuss existing alternatives including MCMC methods. 

      Line 322-323: This sentence is a bit vague since not all of these HAI are due to P. aeruginosa. I would suggest citing a number that is specific to PA.

      Thank you. While our paper shows a particular example of protracted P. aeruginosa outbreak, the roll-out of routine WGS surveillance in the clinic will help prevent hospital-associated drug-resistant infections for more than this species. We believe that broadening the scope in the last sentence of the manuscript is important and we decline to revise as suggested.

    1. eLife Assessment

      This is an important study demonstrating that cholecystokinin is a key modulator of auditory thalamocortical plasticity during development and in young adult but not aged mice, though cortical application of this neuropeptide in older animals appears to go some way to restoring this age-dependent loss in plasticity. A strength of this work is the use of multiple experimental approaches, which together provide convincing support for the proposed involvement of cholecystokinin. This work is likely to be influential in opening up a new avenue of investigation into the roles of neuropeptides in sensory plasticity.

    2. Reviewer #1 (Public review):

      This report addresses a compelling topic. The authors demonstrate that tetanic stimulation of the auditory thalamus induces cortical long-term potentiation (LTP), which can be elicited by either electrical or optical stimulation of the thalamus or by noise bursts. They further show that thalamocortical LTP is abolished when thalamic CCK is knocked down or when cortical CCK receptors are blocked. Notably, in 18-month-old mice, thalamocortical LTP was largely absent but could be restored by cortical application of CCK. The authors conclude that CCK is a critical contributor to thalamocortical plasticity and may enhance this form of plasticity in aged subjects.

      The findings presented in this report are valuable and advance our understanding of thalamocortical plasticity.

    3. Reviewer #2 (Public review):

      Summary:

      This work used multiple approaches to show that CCK is critical for long-term potentiation (LTP) in the auditory thalamocortical pathway. They also showed that the CCK mediation of LTP is age-dependent and supports frequency discrimination. This work is important because is opens up a new avenue of investigation of the roles of neuropeptides in sensory plasticity.

      Strengths:

      The main strength is the multiple approaches used to comprehensively examine the role of CCK in auditory thalamocortical LTP. Thus, the authors do provide a compelling set of data that CCK mediates thalamocortical LTP in an age-dependent manner.

      Weaknesses:

      The behavioral assessment is relatively limited, but may be fleshed out in future work.

    4. Reviewer #3 (Public review):

      Summary:

      Cholecystokinin (CCK) is highly expressed in auditory thalamocortical (MGB) neurons and CCK has been found to shape cortical plasticity dynamics. In order to understand how CCK shapes synaptic plasticity in the auditory thalamocortical pathway, they assessed the role of CCK signaling across multiple mechanisms of LTP induction with the auditory thalamocortical (MGB - layer IV Auditory Cortex) circuit in mice. In these physiology experiments that leverage multiple mechanisms of LTP induction and a rigorous manipulation of CCK and CCK-dependent signaling, they establish an essential role of auditory thalamocortical LTP on the co-release of CCK from auditory thalamic neurons. By carefully assessing the development of this plasticity over time and CCK expression, they go on to identify a window of time that CCK is produced throughout early and middle adulthood in auditory thalamocortical neurons to establish a window for plasticity from 3 weeks to 1.5 years in mice, with limited LTP occurring outside of this window. The authors go on to show that CCK signaling and its effect on LTP in the auditory cortex is also capable of modifying frequency discrimination accuracy in an auditory PPI task. In evaluating the impact of CCK on modulating PPI task performance, it also seems that in mice <1.5 years old CCK-dependent effects on cortical plasticity is almost saturated. While exogenous CCK can modestly improve discrimination of only very similar tones, exogenous focal delivery of CCK in older mice can significantly improve learning in a PPI task to bring their discrimination ability in line with those from young adult mice.

      Strengths:

      (1) The clarity of the results along with the rigor multi-angled approach provide significant support for the claim that CCK is essential for auditory thalamocortical synaptic LTP. This approach uses a combination of electrical, acoustic, and optogenetic pathway stimulation alongside conditional expression approaches, germline knockout, viral RNA downregulation and pharmacological blockade. Through the combination of these experimental configures the authors demonstrate that high-frequency stimulation-induced LTP is reliant on co-release of CCK from glutamatergic MGB terminals projecting to the auditory cortex.

      (2) The careful analysis of the CCK, CCKB receptor, and LTP expression is also a strength that puts the finding into the context of mechanistic causes and potential therapies for age-dependent sensory/auditory processing changes. Similarly, not only do these data identify a fundamental biological mechanism, but they also provide support for the idea that exogenous asynchronous stimulation of the CCKBR is capable of restoring an age-dependent loss in plasticity.

      (3) Although experiments to simultaneously relate LTP and behavioral change or identify a causal relationship between LTP and frequency discrimination are not made, there is convincing evidence that CCK signaling in the auditory cortex (known to determine synaptic LTP) is important for auditory processing/frequency discrimination. These experiments are key for establishing the relevance of this mechanism.

      Weaknesses:

      The following are weaknesses or limitations of the study that may also fall outside of the scope of this work, but which could be addressed in the future.

      (1) Given the magnitude of the evoked responses, one expects that pyramidal neurons in layer IV are primarily those that undergo CCK-dependent plasticity, but the degree to which PV-interneurons and pyramidal neurons participate in this process differently is unclear.

      (2) While these data support an important role for CCK in synaptic LTP in the auditory thalamocortical pathway, perhaps temporal processing of acoustic stimuli is as or more important than frequency discrimination. Given the enhanced responsivity of the system, it is unclear whether this mechanism would improve or reduce the fidelity of temporal processing in this circuit. Understanding this dynamic may also require consideration of cell type as raised in weakness #1.

      (3) In Figure 1, an example of increased spontaneous and evoked firing activity of single neurons after HFS is provided. Yet it is surprising that the group data are analyzed only for the fEPSP. It seems that single neuron data would also be useful at this point to provide insight into how CCK and HFS affect temporal processing and spontaneous activity/excitability.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This report addresses a compelling topic. However, I have significant concerns, which necessitate a reassessment of the report's overall value.

      Anatomical Specificity and Stimulation Site:

      While the authors clarify that the ventral MGB (MGv) was the intended stimulation target, the electrode track (Fig. 1A) and viral spread (Fig. 2E) suggest possible involvement of the dorsal MGB (MGd) and broader area. Given that MGv-AI and MGd-AC pathways have distinct-and sometimes opposing-effects on plasticity, the reported LTP values (with unusually small standard deviations) raise concerns about the specificity of the findings. Additional anatomical verification would help resolve this issue.

      We thank the reviewer for highlighting the importance of anatomical specificity in MGv targeting. In the revised manuscript, we have taken several steps to address these issues:

      (1) Higher-magnification histology has been added to Figure 1A, clearly identifying the electrode tip localized within the MGv.

      (2) Figure 2E has been replaced with a new image showing viral expression largely confined to MGB, with minimal spread to surrounding structures.

      (3) In the Discussion, we explicitly acknowledge that although targeting was guided by stereotaxic coordinates and histological confirmation, some viral spread throughout the MGB occurred. We also discuss the possibility that both MGv-A1 and MGd-AC pathways may contribute to the recorded responses, which could influence the observed plasticity, as previously suggested by the reviewer.

      These additions and acknowledgments are now incorporated to ensure the reader can interpret the data with full consideration of anatomical targeting limitations.

      Results section:

      “Higher-magnification histology confirmed accurate MGv targeting (Figure 1A, lower-middle panel)’”

      Discussion section:

      “Although our experiment targeting the MGv was guided by stereotaxic coordinates and verified post hoc, we acknowledge potential contributions from non-lemniscal medial geniculate nucleus dorsal (MGd) projections. Anatomical and physiological evidence indicates that MGv-AC projections provide rapid, frequency‑specific, tonotopically organized excitation, whereas MGd pathways target higher‑order auditory cortex with broader tuning, less precise tonotopy, longer response latencies, and greater context‑dependence, features that can differentially shape cortical sensory integration and plasticity (Lee and Sherman, 2010; Smith et al., 2012; Ohga et al., 2018; Lee, 2015; Hu, 2003). While the co-recruitment of lemniscal and non-lemniscal inputs may enhance the generality of our CCK-dependent mechanism, the differing response characteristics of these pathways suggest subtle differences in their relative engagement in the observed plasticity. Future pathway-specific manipulations will help clarify their respective contributions”

      Lee, C.C., and Sherman, S.M. (2010). Topography and physiology of ascending streams in the auditory tectothalamic pathway. Proceedings of the National Academy of Sciences 107, 372-377. doi:10.1073/pnas.0907873107.

      Smith, P.H., Uhlrich, D.J., Manning, K.A., and Banks, M.I. (2012). Thalamocortical projections to rat auditory cortex from the ventral and dorsal divisions of the medial geniculate nucleus. Journal of Comparative Neurology 520, 34-51.

      Ohga, S., Tsukano, H., Horie, M., Terashima, H., Nishio, N., Kubota, Y., Takahashi, K., Hishida, R., Takebayashi, H., and Shibuki, K. (2018). Direct Relay Pathways from Lemniscal Auditory Thalamus to Secondary Auditory Field in Mice. Cerebral Cortex 28, 4424-4439. 10.1093/cercor/bhy234.

      Lee, C.C. (2015). Exploring functions for the non-lemniscal auditory thalamus. Frontiers in Neural Circuits 9, 69.

      Hu, B. (2003). Functional organization of lemniscal and nonlemniscal auditory thalamus. Experimental Brain Research 153, 543-549. 10.1007/s00221-003-1611-5.

      Figure legend section:

      “Post-hoc histology at higher magnification (lower-middle) shows the electrode tip confined within the MGv. White lines delineate the MGv/MGd border based on cytoarchitectonic landmarks.”

      Statistical Rigor and Data Variability:

      The remarkably low standard deviations in LTP measurements are unexpected based on established variability in thalamocortical plasticity. The authors' response confirms these values are accurate, but further justification, such as methodological controls or replication-would bolster confidence in these results. Additionally, the comparison of in vivo vs. in vitro LTP variability requires more substantive support.

      We appreciate the reviewer's concern regarding the unusually small variability. We would like to clarify that the error bars in our figures represent Standard Error of the Mean (SEM) rather than Standard Deviations (SD). As SEM is derived from the SD while incorporating sample size, it is inherently smaller than SD, which may have led to the impression of unrealistically low variability. This has now been explicitly clarified in the figure legends and Methods.

      To illustrate the raw variability, we have added Supplementary Figure S1E showing unaveraged fEPSP slopes compare to SEM, corresponding to Figure S1C. This addition ensures transparency and allows readers to directly assess the quality and consistency of our recordings.

      Regarding the comparison between in vivo and in vitro LTP variability:

      We agree that clarifying the basis of our in vivo vs. in vitro variability comparison is important. For example, in Chen et al., 2019, using identical LTP induction protocols (Fig. J), the SED of in vitro slice measurements (Fig. K) was substantially larger than that of in vivo recordings (Fig. L).

      This difference likely reflects:

      (1) In vitro: neighboring data points within a single experiment are highly correlated; variability across experiments is large due to heterogeneous sensitivity to LTP induction (10–200% increasement).

      (2) In vivo: lower correlation between neighboring data points, but each is averaged from 12 recordings over 2 min, reducing cross-trial variability; sensitivity to LTP induction is less variable across experiments (5–60% changes).

      We hope that these clarifications and additional data address the reviewer’s concerns regarding statistical rigor and data variability.

      Methods section:

      “The slopes of the evoked fEPSPs were calculated and normalized using a customized MATLAB script, and the group data were plotted as mean ± Standard Error of the Mean (SEM).”

      “All data are presented as mean ± SEM. Error bars and shaded areas represent SEM. Here, n represents the number of stimulation-recording sites or and N represents the number of animals in each experiment. At each time point, fEPSPs were averaged across 12 consecutive trials (2 min) to reduce within-experiment fluctuation. Normalized time courses were then used for repeated-measures analyses.”

      Figure legend section:

      “Data are mean ± SEM; error bars indicate SEM.”

      “(E) Unaveraged fEPSP slopes are shown for each time point, with individual data points corresponding to all sites included in Fig. 1C; mean ± SEM overlays are shown in black. Note that all individual data points are displayed in this figure, whereas in Figure S1C, only the averaged values are shown.”

      Viral Targeting and Specificity:

      The manuscript does not clearly address whether cortical neurons were inadvertently infected by AAV9. Given the potential for off-target effects, explicit confirmation (e.g., microphotograph of stimulation site) would strengthen the study's conclusions.

      We appreciate the request for quantitative confirmation of off-target cortical infection. We clarify that our histological verification was conducted by systematic sampling rather than exhaustive quantification. Under the same sampling procedure, we did not detect tdTomato-positive cortical somata after AAV9‑Syn‑ChrimsonR‑tdTomato injections into the MGB, whereas we observed rare EYFP-positive cortical somata after AAV9‑EF1a‑DIO‑ChETA‑EYFP (median < 1 cell per 0.4 × 0.4 mm² section, Supplementary Figure S1E). Although these observations do not constitute a formal statistical estimate, they were consistent across sampled sections and are in line with the low-level trans-synaptic transfer reported for AAV9. We have discussed their potential implications for data interpretation in the Discussion.

      We hope these clarifications and the newly presented histological evidence address the reviewer’s concerns and further strengthen the rigor of our study.

      Discussion section:

      “Another potential limitation of our study is the trans-synaptic transfer property of AAV9 (Figure S1F). To mitigate this risk, we carefully control the injection volume, rate, and viral expression time, while also verifying expression post-hoc. Systematic sampling histological analysis detected no tdTomato-positive cortical somata in the ACx (Figure 2E lower panel), whereas rare EYFP-positive cortical somata were observed after AAV9-EF1a-DIO-ChETA-EYFP injections (median < 1 cell in 0.4 × 0.4 mm2 section, Figure S1F, corresponds to Figure 2A upper-middle panel). These construct‑dependent observations align with occasional low‑level trans‑synaptic transfer reported for AAV9 (Zingg et al., 2017) and indicate that off‑target cortical infection was negligible for ChrimsonR and exceedingly rare for ChETA under our experimental conditions.”

      Zingg, B., Chou, X.L., Zhang, Z.G., Mesik, L., Liang, F., Tao, H.W., and Zhang, L.I. (2017). AAV-Mediated Anterograde Transsynaptic Tagging: Mapping Corticocollicular Input-Defined Neural Pathways for Defense Behaviors. Neuron 93, 33-47. 10.1016/j.neuron.2016.11.045.

      Figure legend:

      “Representative histological images demonstrating low-level transsynaptic spread following AAV9-EF1a-DIO-ChETA-EYFP injection into the MGv. Rare EYFP-positive cortical neurons were observed (median < 1 cell per 0.4 × 0.4 mm² section). Scale bar: 100 µm.”

      Integration of Prior Literature:

      The discussion of existing work is adequate but could be more comprehensive. A deeper engagement with contrasting findings would provide better context for the study's contributions.

      We appreciate the reviewer’s suggestion to engage more deeply with contrasting findings. In the revised Introduction and Discussion, we have:

      (1) Refocused the historical context toward adult auditory thalamocortical plasticity and explicitly contrasted it with visual and somatosensory cortices, while adult ACx exhibits weaker and more gated NMDAR dependence.

      (2) Positioned CCK–CCKBR signaling as a permissive/gating mechanism that can complement or partially compensate for postsynaptic NMDAR signaling, potentially reconciling variability across cortical areas and life stages.

      (3) Clarified the potential differential contributions of lemniscal (MGv) and non‑lemniscal (MGd) streams to plasticity expression and variability, acknowledging pathway-specific response properties.

      These additions are now integrated in the Introduction (paragraphs 2–3) and Discussion (sections “CCK Dependence of Thalamocortical Neuroplasticity in the ACx” and “Developmental and Age‑Dependent CCK‑Mediated Plasticity”), providing a more comprehensive and balanced context for our findings.

      Introduction section:

      “However, converging evidence shows that thalamocortical inputs retain a capacity for experience-dependent modification in adulthood. Sensory enrichment or deprivation can gate or reinstate thalamocortical plasticity. In the adult ACx, pairing sounds with neuromodulatory drive can reshape cortical representations. In vivo high-frequency stimulation (HFS) of dorsal lateral geniculate nucleus (LGN) or medial geniculate body (MGB) induces LTP in sensory cortices and has been linked to perceptual learning beyond the critical period. Notably, auditory thalamocortical plasticity appears less dependent on NMDA receptors compared to other cortical regions. The mechanisms underlying thalamocortical plasticity in the mature brain remain poorly understood.

      Cholecystokinin (CCK) and its receptor CCK-B receptor (CCKBR) are well positioned to influence thalamocortical transmission: Cck mRNA is abundant in MGB neurons and CCKBR is enriched in layer IV of ACx, the principal thalamorecipient layer.”

      Discussion section:

      “These findings suggest a potential involvement of CCK in thalamocortical plasticity. Our data extend this framework by identifying CCK–CCKBR signaling as a permissive modulator of adult thalamocortical LTP.”

      “We propose that CCKBR activation may trigger intracellular calcium release and AMPAR recruitment in parallel to, or partially compensating for,independently of postsynaptic NMDAR signaling, while the complementarity of CCKBR and NMDARs may contribute to robust thalamocortical plasticity. This complementary arrangement may reconcile differences across developmental stages and cortical areas, and highlights neuropeptidergic signaling as a lever to re-enable adult thalamocortical plasticity.

      Notably, exogenous CCK alone failed to induce LTP in the absence of accompanying stimulation (Figure S2A and S2B), emphasizing that CCK function as a modulator rather than a direct initiator of LTP. Activation of the thalamocortical pathway is also essential for LTP induction. Although our experiment targeting the MGv was guided by stereotaxic coordinates and verified post hoc, we acknowledge potential contributions from non-lemniscal medial geniculate nucleus dorsal (MGd) projections. Anatomical and physiological evidence indicates that MGv-AC projections provide rapid, frequency‑specific, tonotopically organized excitation, whereas MGd pathways target higher‑order auditory cortex with broader tuning, less precise tonotopy, longer response latencies, and greater context‑dependence, features that can differentially shape cortical sensory integration and plasticity. While the co-recruitment of lemniscal and non-lemniscal inputs may enhance the generality of our CCK-dependent mechanism, the differing response characteristics of these pathways suggest subtle differences in their relative engagement in the observed plasticity. Future pathway-specific manipulations will help clarify their respective contributions. Another potential limitation of our study is the trans-synaptic transfer property of AAV9 (Figure S1F). To mitigate this, we carefully controlled the injection volume, rate, and viral expression time, and conducted post-hoc histological analyses to minimize off-target effects, thereby reducing the likelihood of trans-synaptic transfer confounding the interpretation of our findings.”

      Therapeutic Implications:

      The authors' discussion of therapeutic potential is now appropriately cautious and well-reasoned.

      Conclusion:

      While the study presents intriguing findings, the concerns outlined above must be addressed to fully establish the validity and impact of the results. I appreciate the authors' efforts thus far and hope they can provide additional data or clarification to resolve these issues. With these revisions, the manuscript could make a valuable contribution to the field.

      Reviewer #2 (Public review):

      Summary:

      This work used multiple approaches to show that CCK is critical for long-term potentiation (LTP) in the auditory thalamocortical pathway. They also showed that the CCK mediation of LTP is age-dependent and supports frequency discrimination. This work is important because is opens up a new avenue of investigation of the roles of neuropeptides in sensory plasticity.

      Strengths:

      The main strength is the multiple approaches used to comprehensively examine the role of CCK in auditory thalamocortical LTP. Thus, the authors do provide a compelling set of data that CCK mediates thalamocortical LTP in an age-dependent manner.

      Weaknesses:

      There are some details that should be addressed, primarily regarding potential baseline differences in comparison groups. The behavioral assessment is relatively limited, but may be fleshed out in future work.

      We appreciate the reviewer’s suggestion regarding potential baseline differences. In our study, all groups underwent harmonized procedures, including identical exposure, timing, and acquisition parameters. Group allocation and data collection were performed under standardized conditions. For electrophysiology, baseline fEPSP measures and stimulation intensities were calibrated per site using consistent input-output procedures, with analyses based on normalized slopes relative to each site’s own baseline. For behavior, animals from the same litter served as both experimental and control groups, matched for handling conditions; startle/PPI data were acquired using identical hardware and timing settings. While no additional post hoc re-processing was performed, we have clarified these controls in the Methods to enhance transparency.

      We agree that the behavioral assessment is intentionally focused and does not encompass broader auditory perceptual functions (e.g., temporal processing). We now explicitly state this limitation and propose future studies to examine temporal acuity and cell-type-specific manipulations. These experiments will clarify how CCK-dependent thalamocortical plasticity generalizes to other perceptual domains.

      Reviewer #3 (Public review):

      Summary:

      Cholecystokinin (CCK) is highly expressed in auditory thalamocortical (MGB) neurons and CCK has been found to shape cortical plasticity dynamics. In order to understand how CCK shapes synaptic plasticity in the auditory thalamocortical pathway, they assessed the role of CCK signaling across multiple mechanisms of LTP induction with the auditory thalamocortical (MGB - layer IV Auditory Cortex) circuit in mice. In these physiology experiments that leverage multiple mechanisms of LTP induction and a rigorous manipulation of CCK and CCK-dependent signaling, they establish an essential role of auditory thalamocortical LTP on the co-release of CCK from auditory thalamic neurons. By carefully assessing the development of this plasticity over time and CCK expression, they go on to identify a window of time that CCK is produced throughout early and middle adulthood in auditory thalamocortical neurons to establish a window for plasticity from 3 weeks to 1.5 years in mice, with limited LTP occurring outside of this window. The authors go on to show that CCK signaling and its effect on LTP in the auditory cortex is also capable of modifying frequency discrimination accuracy in an auditory PPI task. In evaluating the impact of CCK on modulating PPI task performance, it also seems that in mice <1.5 years old CCK-dependent effects on cortical plasticity is almost saturated. While exogenous CCK can modestly improve discrimination of only very similar tones, exogenous focal delivery of CCK in older mice can significantly improve learning in a PPI task to bring their discrimination ability in line with those from young adult mice.

      Strengths:

      (1) The clarity of the results, along with the rigor multi-angled approach, provide significant support for the claim that CCK is essential for auditory thalamocortical synaptic LTP. This approach uses a combination of electrical, acoustic, and optogenetic pathway stimulation alongside conditional expression approaches, germline knockout, viral RNA downregulation and pharmacological blockade. Through the combination of these experimental configures the authors demonstrate that high-frequency stimulation-induced LTP is reliant on co-release of CCK from glutamatergic MGB terminals projecting to the auditory cortex.

      (2) The careful analysis of the CCK, CCKB receptor, and LTP expression is also a strength that puts the finding into the context of mechanistic causes and potential therapies for age-dependent sensory/auditory processing changes. Similarly, not only do these data identify a fundamental biological mechanism, but they also provide support for the idea that exogenous asynchronous stimulation of the CCKBR is capable of restoring an age-dependent loss in plasticity.

      (3) Although experiments to simultaneously relate LTP and behavioral change or identify a causal relationship between LTP and frequency discrimination are not made, there is still convincing evidence that CCK signaling in the auditory cortex (known to determine synaptic LTP) is important for auditory processing/frequency discrimination. These experiments are key for establishing the relevance of this mechanism.

      Weaknesses:

      (1) Given the magnitude of the evoked responses, one expects that pyramidal neurons in layer IV are primarily those that undergo CCK-dependent plasticity, but the degree to which PV-interneurons and pyramidal neurons participate in this process differently is unclear.

      We agree with the reviewer that the relative contributions of pyramidal neurons and PV-interneurons to CCK-dependent thalamocortical plasticity remain to be determined. Our recordings primarily reflected excitatory postsynaptic activity from layer IV pyramidal neurons, given the fEPSP metrics used. As PV-interneurons are essential in shaping cortical inhibition and temporal precision, they may also be modulated by CCK release from thalamocortical inputs. We have explicitly acknowledged this limitation in the Discussion section of the manuscript and propose that future studies should employ cell-type-specific recording or manipulation approaches to dissect the respective roles of inhibitory and excitatory neuronal populations in CCK-dependent thalamocortical plasticity. We appreciate the reviewer’s suggestion and believe this is a valuable direction for ongoing research.

      (2) While these data support an important role for CCK in synaptic LTP in the auditory thalamocortical pathway, perhaps temporal processing of acoustic stimuli is as or more important than frequency discrimination. Given the enhanced responsivity of the system, it is unclear whether this mechanism would improve or reduce the fidelity of temporal processing in this circuit. Understanding this dynamic may also require consideration of cell type as raised in weakness #1.

      We acknowledge that the current study primarily examined frequency discrimination and did not directly assess temporal processing. Enhanced network responsivity could have variable effects on temporal precision, depending on the balance between excitation and inhibition. PV-interneurons, in particular, are known to support temporal fidelity in auditory processing (Nocon et al., 2023; Cai et al., 2018). We discussion that future work should investigate how CCK modulation influences temporal coding at both the circuit and single-cell level, and whether such changes align with or diverge from the mechanisms underlying frequency discrimination improvements.

      (3) In Figure 1, an example of increased spontaneous and evoked firing activity of single neurons after HFS is provided. Yet it is surprising that the group data are analyzed only for the fEPSP. It seems that single neuron data would also be useful at this point to provide insight into how CCK and HFS affect temporal processing and spontaneous activity/excitability, especially given the example in 1F.

      We appreciate the reviewer’s suggestion. While we recorded single-unit activity during HFS protocols, long-term stability over >1.5 hours was less consistent compared to fEPSP measurements, leading to higher variability in spike-based metrics. We therefore used fEPSPs as our primary quantitative measure for robustness. We agree, however, that single-neuron data could yield valuable complementary insights. In future experiments combining stable single-unit recording with synaptic measurements will be conducted to better link cellular excitability and network plasticity.

      (4) The circuitry that determines PPI requires multiple brain areas, including the auditory cortex. Given the complicated dynamics of this process, it may be helpful to consider what, if anything, is known specifically about how layer IV synaptic plasticity in the auditory cortex may shape this behavior.

      We agree that PPI involves multiple cortical and subcortical nodes. In our paradigm, layer IV neurons receive segregated MGv inputs, high-frequency activation of thalamocortical projections induces robust synaptic plasticity in layer IV. The potentiation at these synapses could amplify the cortical representation of weak prepulses, facilitating their detection and enhancing PPI performance. This interpretation is consistent with prior work showing that local CCK infusion combined with auditory stimuli can augment cortical responses (Li et al., 2014). We have expanded the Discussion to highlight that in aged animals, where baseline PPI performance is often reduced due to degraded auditory inputs (Ouagazzal et al., 2006; Young et al., 2010), restoring thalamocortical plasticity via CCK may partially compensate for sensory gating deficits. We further note that the exact contribution of layer IV to PPI circuitry warrants future investigation using pathway-specific perturbations.

      Comments on revisions:

      The manuscript is much improved and many of the issues or questions have been addressed. Ideally, evidence for the degree of transsynaptic spread for AAV9-Syn-ChrimsonR-tdTomato would also be provided in some form since in the authors' response in sounds like some was observed, as expected.

      We thank the reviewer for this important point and for the opportunity to clarify. As requested, we have carefully examined the possibility of transsynaptic spread in our experiments:

      We clarify that our histological verification was conducted by systematic sampling rather than exhaustive quantification. Under the same sampling procedure, we did not detect tdTomato-positive cortical somata after AAV9‑Syn‑ChrimsonR‑tdTomato injections into the MGB, whereas we observed rare EYFP-positive cortical somata after AAV9‑EF1a‑DIO‑ChETA‑EYFP (median < 1 cell per 0.4 × 0.4 mm² section, see Figure 2A and Figure S1F), consistent with occasional low-level transsynaptic spread reported in the literature.

      We have updated the Discussion sections to clearly report these findings, and to emphasize the potential for vector- and construct-dependent variability in transsynaptic spread. We also explicitly acknowledge this technical limitation and discuss its implications for data interpretation.

      We hope these clarifications and additions address the reviewer’s concern regarding viral specificity and transsynaptic spread.

      Discussion section:

      “Another potential limitation of our study is the trans-synaptic transfer property of AAV9 (Figure S1F). To mitigate this risk, we carefully control the injection volume, rate, and viral expression time, while also verifying expression post-hoc. Systematic sampling histological analysis detected no tdTomato-positive cortical somata in the ACx (Figure 2E lower panel), whereas rare EYFP-positive cortical somata were observed after AAV9-EF1a-DIO-ChETA-EYFP injections (median < 1 cell in 0.4 × 0.4 mm2 section, Figure S1F, corresponds to Figure 2A upper-middle panel). These construct‑dependent observations align with occasional low‑level trans‑synaptic transfer reported for AAV9 (Zingg et al., 2017) and indicate that off‑target cortical infection was negligible for ChrimsonR and exceedingly rare for ChETA under our experimental conditions.”

      Zingg, B., Chou, X.L., Zhang, Z.G., Mesik, L., Liang, F., Tao, H.W., and Zhang, L.I. (2017). AAV-Mediated Anterograde Transsynaptic Tagging: Mapping Corticocollicular Input-Defined Neural Pathways for Defense Behaviors. Neuron 93, 33-47. 10.1016/j.neuron.2016.11.045.

      Figure legend:

      " Representative histological images demonstrating low-level transsynaptic spread following AAV9-EF1a-DIO-ChETA-EYFP injection into the MGv. Rare EYFP-positive cortical neurons were observed (median < 1 cell per 0.4 × 0.4 mm² section). Scale bar: 100 µm."

      Reviewer #1 (Recommendations for the authors):

      Thank you for your efforts in revising the manuscript. While progress has been made, I have a few remaining concerns that I hope you can address to further strengthen the study.

      Focus of the Introduction:

      Auditory thalamocortical plasticity is known to be NMDA-dependent, albeit with weaker dependence during early development. Given that this work examines thalamocortical LTP in young adult and aged mice, I recommend refining the Introduction to place greater emphasis on auditory thalamocortical plasticity in the adult brain. The current discussion of somatosensory plasticity during early development, while interesting, seems less directly relevant to the present study. A sharper focus on the auditory system would better frame your research questions.

      We thank the reviewer for this constructive suggestion. We have revised the Introduction to emphasize adult auditory thalamocortical plasticity and to streamline content less directly related to our study. Specifically:

      (1) We now foreground evidence that thalamocortical inputs retain experience-dependent plasticity beyond the critical period in adult ACx, including neuromodulatory pairing, HFS-induced LTP, and experience-dependent reinstatement.

      (2) We explicitly note that adult auditory thalamocortical plasticity is more weakly NMDAR-dependent than in other cortices, thereby motivating our focus on CCK–CCKBR signaling as a permissive mechanism for adult LTP.

      (3) We have condensed the discussion of somatosensory plasticity during early development to a brief background and shifted the focus to adult auditory mechanisms and knowledge gaps that directly frame our research questions.

      These changes appear in the revised Introduction (paragraphs 2–3), which now provide a sharper rationale for investigating CCK‑dependent thalamocortical LTP in young adult and aged mice.

      Introduction section:

      “However, converging evidence shows that thalamocortical inputs retain a capacity for experience-dependent modification in adulthood. Sensory enrichment or deprivation can gate or reinstate thalamocortical plasticity. In the adult ACx, pairing sounds with neuromodulatory drive can reshape cortical representations. In vivo high-frequency stimulation (HFS) of dorsal lateral geniculate nucleus (LGN) or medial geniculate body (MGB) induces LTP in sensory cortices and has been linked to perceptual learning beyond the critical period. Notably, auditory thalamocortical plasticity appears less dependent on NMDA receptors compared to other cortical regions. The mechanisms underlying thalamocortical plasticity in the mature brain remain poorly understood.

      Cholecystokinin (CCK) and its receptor CCK-B receptor (CCKBR) are well positioned to influence thalamocortical transmission: Cck mRNA is abundant in MGB neurons and CCKBR is enriched in layer IV of ACx, the principal thalamorecipient layer.”

      Anatomical Specificity of MGv Targeting:

      The mouse MGv is a small and deep structure, and precise targeting is critical given the functional differences between MGv and MGd pathways. In the current figures:

      Fig. 1A suggests the electrode track may have approached the MGd.

      Fig. 2E indicates some viral spread beyond the MGB.

      Since MGv-AI and MGd-AC pathways exhibit distinct (and sometimes opposing) effects on plasticity, I encourage you to provide additional clarification or verification of the stimulated/infected regions. This would greatly enhance the interpretability of your LTP data.

      Please see above.

      Data Variability and Transparency:

      The reported thalamocortical LTP values exhibit remarkably small standard deviations, which is somewhat unexpected given typical experimental variability in such measurements. To address this concern, it would be helpful to include example raw traces of the recorded LTP (e.g., in a supplementary figure). This would allow readers to better evaluate the data quality and consistency.

      Please see above.

      Reviewer #2 (Recommendations for the authors):

      Overall, the authors did an excellent job of responding to our critiques, both in their direct responses and in the modified text. The modified text is also more readable than before. Two issues that the authors should consider addressing;

      (1) Unless I missed it, there is no commentary stated about the impact of using aged C57 mice, which lose their hearing, such that the effects seen in the older mice could be related to hearing loss rather than aging alone. Some discussion of this point should be made.

      We thank the reviewer for raising this important point. C57BL/6 mice are known to develop age-related hearing loss, which could potentially affect PPI performance in older animals. We note that in our internal screening we observed markedly reduced startle amplitudes and frequent negative PPI values in many mice >20 months, indicating severe auditory impairment. To minimize this confound a priori, we excluded mice older than 20 months and restricted the aged cohort to 17–19 months, which consistently exhibited robust startle responses and reliable PPI. While some degree of presbycusis may still be present in this age range in C57BL/6 mice, the improvement of PPI following CCK administration combined with acoustic exposure indicates that the auditory pathways remained sufficiently functional to support sensorimotor gating. In fact, the presence of partial hearing loss in these aged mice may have allowed us to better detect the beneficial effects of CCK, further highlighting its therapeutic potential for age-related deficits. The greater improvement in PPI observed in older mice —as compared to younger mice, whose PPI in control group is already high—likely reflect the combined effects of age-related hearing loss and CCK deficiency, with CCK-induced restoration of thalamocortical plasticity being the primary focus of our study. We have now added a discussion of this point in the revised manuscript.

      Discussion section:

      “In aged mice, PPI deficits are commonly observed due to impaired auditory processing. Notably, C57BL/6 mice exhibit age-related hearing loss (Johnson et al., 1997). Both age-associated changes in auditory function and CCK deficiency contribute to impaired sensory gating. The presence of partial hearing loss in aged mice may have facilitated the detection of CCK’s beneficial effects, further highlighting its therapeutic potential for age-related deficits. Our results suggest that enhanced thalamocortical plasticity mediated by CCK might partially compensate for these deficits by amplifying residual auditory signals in aged mice.”

      Johnson, K.R., Erway, L.C., Cook, S.A., Willott, J.F., and Zheng, Q.Y. (1997). A major gene affecting age-related hearing loss in C57BL/6J mice. Hearing Research 114, 83-92. https://doi.org/10.1016/S0378-5955(97)00155-X.

      (2) Minor point - I do not agree with the use of the term "ventral to bregma" to describe where the craniotomies were placed (e.g., line 599). The direction being described is more typically referred to as "lateral." If the authors prefer to use the term "ventral," perhaps additional clarification can be added.

      We thank the reviewer for pointing out this issue and apologize for any confusion. We agree that “ventral to bregma” is not the standard terminology and have revised the Methods section to use “below the temporal ridge”. We have also clarified that the craniotomy for accessing the auditory cortex was performed on the lateral aspect of the skull in rodents, just below the temporal ridge. We hope this revision resolves the ambiguity.

      Method section:

      “A craniotomy was performed over the temporal bone, as the auditory cortex is located on the lateral surface of the brain (coordinates: 1.5 to 3.0 mm below the temporal ridge and 2.0 to 4.0 mm posterior to bregma for mice; 2.5 to 6.5 mm below the temporal ridge and 3.0 to 5.0 mm posterior to bregma for rats) to access the auditory cortex.”

      “Six-week after CCK-sensor virus injection, a craniotomy was performed to access the auditory cortex at the temporal bone (1.5 to 3.0 mm below the temporal ridge and 2.0 to 4.0 mm posterior to bregma), and the dura mater was opened.”

    1. eLife Assessment

      This valuable study explores the role of spatial genome organization in oncogenic transformation, addressing an ambitious and significant topic. The authors have assembled comprehensive datasets from various subtypes of localized and lung-metastatic breast cancer cells, as well as from healthy and cancerous lung cells. They identified switching patterns in the 3D genome organization of lung-metastatic breast cancer cells, revealing a reconfiguration of genome architecture that resembles that of lung cells. This provides solid evidence with significant biomedical implications for epigenetic regulation in both normal physiology and disease.

    2. Reviewer #1 (Public review):

      Summary:

      This study utilized publicly available Hi-C data to ensemble a comprehensive set of breast cancer cell lines (luminal, Her2+, TNBC) with varying metastatic features to answer whether breast cancer cells would acquire organ-specific feature at the 3D genome level to metastasize to that specific organ. The authors focused on lung metastasis and included several controls as the comparison including normal mammary lines, normal lung epithelial lines, and lung cancer cell lines. Due to the lower resolution at 250KB binning size, the authors only addressed the compartments (A for active compartment and B for inactive compartment) not the other 3D organization of the genome. They started by performing clustering and PCA analysis for the compartment identity and discovered that this panel of cell lines could be well separated based on Her2 and epithelial-mesenchymal features according to the compartment identity. While correlating with the transcriptomic changes, the authors noticed the existence of concordance and divergence between the compartment changes and transcriptomic changes. The authors then switched gear to tackle the core question in metastatic organotropism to the lung. They discovered a set of "lung permissive compartment changes" and concluded that "lung metastatic breast cancer cell lines acquire lung-like genome architecture" and "organotropic 3D genome changes match target organ more than an unrelated organ". To prove the latter point, the authors enlisted additional non-breast cancer cell line (prostate cancer) in the setting of brain metastasis. This is a piece of pure dry computational work without wet bench experiments.

      Strengths:

      The authors embarked on an ambitious journey to seek for the answer regarding 3D genome changes predisposing metastatic organotropism. The authors succeeded in the assembly of a comprehensive panel of breast cancer cell lines and the aggregation of the 3D genome structure data to conduct a hypothesis driven computation analysis. The authors also achieved in including proper controls representing normal non-cancerous epithelium and the end organ of interest. The authors did well in the citation of relevant references in 3D genome organization and EMT.

    3. Reviewer #2 (Public review):

      Summary:

      This work addresses an important question of chromosome architecture changes associated with organotopic metastatic traits, showing important trends in genome reorganization. The most important observation is that 3D genome changes consistent with adaptations for new microenvironment, including lung metastatic breast cells exhibiting signatures of the genome architecture typical to a lung cell-like conformation and brain metastatic prostate cancer cells showing compartment shifts toward a brain-like state.

      Strengths:

      This work presents interesting original results, which will be important for future studies and biomedical implications of epigenetic regulation in norm and pathology.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Strengths:

      The authors embarked on an ambitious journey to seek the answer regarding 3D genome changes predisposing to metastatic organotropism. The authors succeeded in the assembly of a comprehensive panel of breast cancer cell lines and the aggregation of the 3D genome structure data to conduct a hypothesis-driven computation analysis. The authors also achieved in including proper controls representing normal non-cancerous epithelium and the end organ of interest. The authors did well in the citation of relevant references in 3D genome organization and EMT.

      Weaknesses:

      (1) The authors should clearly indicate how they determine the patterns of spread of the breast cancer cell lines being utilized in this manuscript. How did the authors arrive at the conclusion that certain cell lines would be determined as "localized spread" and "metastatic tropism to the lung"? This definition is crucial, and I will explain why.

      It is indeed a critical point to clearly define and explain what qualifies as metastatic potential to particular organs in our system. Here, we intentionally limited our scope to metastasis that had occurred within the human system. Our cell lines are chosen based on their sites of origin and etiological history in the patients from which they were derived. For example, the cancer cell line BT474 was classified as “localized” because these cells were derived from a solid tumor in the breast itself. Meanwhile, MCF7 and T47D cell lines are considered lung metastatic because these cells were collected from the pleural effusion from the lung. We therefore model human organotropism from the breast to the lung by using cells that originated from infiltrative ductal carcinoma (human breast) but were collected from pleural effusions (human lung). We then use as a comparison a human lung cancer-derived cell line that was itself purified from a pleural effusion. In this way, we can compare the genome structure of a lung cancer cell in the lung environment to a breast cancer cell that has metastasized to the lung environment.

      In our revised version, we further clarify this definition in the text as well as in additional annotations in our supplemental table of all cell line information.

      Todd Golub's team from the Broad Institute of MIT and Harvard published "A metastasis map of human cancer cell lines" to exhaustively create a first-generation metastasis map (MetMap) that reveals organspecific patterns of metastasis. (By the way, this work was not cited in the reference in this manuscript.) The MetMap Explorer (https://depmap.org/metmap/vis-app/index.html) is a public resource that could be openly accessed to visualize the metastatic potential of each cell line as determined by the in vivo barcoding approach as described in the MetMap paper in the format of petal plots. 5 organs were tested in the MetMap paper, including brain, lung, liver, kidney, and bone. The authors would discover that some of the organ-specific metastasis patterns defined in the MetMap Explorer would be different from the authors' classification. For example, the authors defined MCF7 as a line as lung metastatic, and rightly so the MetMap charted a signal towards lung with low penetrance and low metastatic potential. The authors defined ZR751 as a line with localized spread, however, the MetMap charted a signal towards the kidney with low penetrance and low metastatic potential, the signal strength similar to the lung metastasis in MCF7. A similar argument could be made for T47D. The TNBC line MDA-MB-231 is indeed highly metastatic, however, in MetMap data, its metastasis is not only specific to the lung but towards all 5 organs with high penetrance and metastatic potential. The 2 lung cancer cell lines mentioned in this study, A549 and H460, the authors defined them as localized spread to the lung. However, the MetMap data clearly indicated that A549 and H460 are highly metastatic to all 5 organs with high penetrance and high metastatic potential.

      We acknowledge the valuable contributions of animal models in metastatic cancer studies, but we also want to avoid the potentially confounding variable of the animal microenvironment. The MetMap Explorer contains valuable information (and as part of our clarification on this point, we now cite the MetMap in the text), but the “metastatic potential of each cell line” for this tool is measured in a mouse environment. Knowing that a particular cell line, which originated from a human lung metastasis, can further metastasize to other organs in a mouse does not necessarily mean that those cells could do so in humans. The microenvironment responses to metastatic colonization recapitulate the events in wound repair, and these can differ among species (https://pubmed.ncbi.nlm.nih.gov/28916657/ https://pubmed.ncbi.nlm.nih.gov/39729995/ ). Further, the changes a cell needs to make to adapt to a new organ system in a mouse could be confounded by the changes needed to adapt to mouse conditions in general. Finally, migration from a site of ectopic injection may not mimic migration from an initial tumor site. These factors lead to well known cases where MetMap does not reflect the metastatic potential of cancers in humans. As a classic example, prostate cancer frequently metastasizes to bone in humans, and the PC3 cell line was derived from a bone metastatic prostate cancer. However, MetMap shows no evidence of PC3 being able to metastasize to bone in a mouse.

      We agree that the very best data would come from matched primary and metastatic tumors in the same human patient, but those data do not currently exist and generating them would require future work beyond the scope of this study.

      Since results will vary among different experimental models testing metastatic organotropism, (intracardiac injection was the metastasis model being adopted in the MetMap), the authors should state more clearly which experimental model system served as the basis for their definition of organ-specific metastasis. In my opinion, this is the most crucial first step for this entire study to be sound and solid.

      Taking all the above into account, in our revision, we have now included further clarification in the main text to more clearly explain how and why we chose the cell lines we did and what the advantages and limitations of this choice are.

      (2) Figure 1b: The authors found that "MDA-MB-231 cells were grouped with the lung carcinoma cells. This implies that the genome organization of this cell line is closer to that of lung cells than to other breast epithelial cell lines.". In fact, another TNBC line BT549 was also clustered under the same clade. So this clade consisted of normal-like and highly metastatic lines. Therefore, the authors should be mindful of the fact that the compartment features might not directly link to metastasis (or even metastatic organotropism).

      In figure 1b, the grouping that includes MDA-MB-231 (lung metastatic breast cancer) connected to A549, and H460 (lung cancer) occurs at a distance of about 0.2. If the clustering tree were cut at a distance of 0.26, 6 separate clusters would result: two clusters of Luminal subtypes (all labeled red), one that includes all healthy epithelial cells (both lung and breast, all labeled green), one that links two localized breast cancers, one that links MDA-MB-231 to lung carcinoma cell lines, and then BT549 by itself. So, while BT549 appears next to MDA-MB-231 along the horizontal axis, this is just coincidence of the representation: the dendrogram shows it is quite distant from all the other cell lines in this cluster according to compartment profile.

      So, it is only MDA-MB-231 that is very closely linked with the lung cancer cell types.

      It is true that the healthy lung cells (HTBE) are clustered separately and are more similar to normal/non tumorigenic breast epithelial cells (HMEC and MCF10A) than to any cancer cell type. This could suggest that there are aspects of the compartment pattern that represent any healthy epithelium as compared to cancer. What we find in the compartment profile, in both the clustering and the PCA analysis, is that compartment signatures contain information about cell properties on several overlapping levels: there is an aspect of the compartment profile that distinguishes healthy from cancerous cells, an aspect that distinguishes luminal cancers from other subtypes, a part that associates with organotropism, and an aspect that captures EMT status. The final compartment status is a composite of these numerous factors.

      We have clarified the text to indicate that we mean MDA-MB-231 clusters near lung cancer, not necessarily healthy lung cell models.

      (3) Figure 3: In the text, the authors stated, "To further investigate this result, we examined the transcription status of genes that changed compartment across the EMT spectrum and, conversely, the compartment status of genes that changed transcription (Fig. 3b, c, and d)". However, it was not apparent in the figure that the cell lines were arranged according to an EMT spectrum.

      To display these comparisons more clearly, we have now revised figure 3b, c, and d in two ways: First, we have defined the gene and cell line clustering by one set of data (for example, compartment identity in 3b) and then displayed the other data (gene expression) with all genes and cell lines in the same order. Therefore, for each column, genes and cell lines can be compared visually between top and bottom rows. Second, we have colored cell line names from purple to yellow according to their EMT scores as shown in Supplementary Figure 1a. This allows a visual indication of how the clustering separates cell lines by EMT status.

      Also, the clustering heatmaps did not provide sufficient information regarding the genes with concordant/divergent compartments vs transcription changes. It would be more informative if the authors could spend more effort in annotating these genes/pathways.

      We want to clarify that the genes plotted in the heatmaps in Figure 3 are also the genes whose functional enrichment we present in figures 1 and 2. So, the genes that segregate strongly based on A/B compartment (but not gene expression) in figure 3b are the same genes whose GO terms are annotated in Figure 1d. Likewise, the genes that segregate strongly based on gene expression, but not A/B compartment, in figure 3c and d are the same genes whose GO terms are annotated in Figure 2b. We have now made this connection clearer in the text.

      But, we also agree with the reviewer that it is important to explore a bit further the relationship between these divergent sets of genes. Our explorations have led to several observations:

      (1) In some cases, the compartment-segregated genes and the transcription-segregated genes are different members of the same pathways. In Author response image 1 below, for example, we show interactions (according to STRING) for genes from figure 3c that are highly expressed in the epithelial-like cell lines and are annotated as involved in epithelial development (green). We then added to the network genes from figure 3b that are specifically in the A compartment in the epithelial-like cell lines but not mesenchymal cell lines that are also annotated as involved in epithelial development (red). Most of these epithelial development genes that change expression are in the A compartment in all cell lines and therefore do not rely on spatial compartment changes for their regulation. But some additional epithelial development genes, which are interconnected in this same network, are changing compartments across the EMT spectrum. One example, FOXA1, is a key hub in the network and is known to be a pioneer transcription factor involved in development and differentiation. Controlling this gene at the level of spatial genome organization rather than local transcriptional control could be important in the stable cell fate changes that can happen with EMT.

      Author response image 1.

      (2) Overall, the set of genes that change compartments does not have as strong functional enrichment as the transcription change set of genes. This could indicate that some of the compartment changes that occur with EMT are not directly gene regulatory but rather enable an overall conformational change of the chromatin that is needed for the alterations in physical cell state or to accomplish long distance gene regulation changes.

      (3) Related to long distance gene regulation changes, we also see cases in which the gene that changes transcription but not compartment across EMT is adjacent to regions that switch compartments.

      A good example is TFF3 (yellow, Supplementary figure 1C). TFF3 is one of the genes that strongly segregates across EMT by transcription, being more highly expressed in epithelial-like (bottom 4 tracks) but not mesenchymal-like (top 4 tracks) cancers. Despite this differential expression, it is almost always in the A compartment across all cell lines. However, it is adjacent to regions that show strong compartment change EMT signatures. So, even though this specific gene region is not changing compartment, its regulation may be influenced by the entire region being Aassociated in epithelial-like but neighboring regions becoming B-associated in mesenchymal like cancers.

      TFF3 is expressed in normal breast epithelium and has been implicated as a biomarker for endocrine therapy response in breast cancer.

      Meanwhile, many genes that are in these compartment switching regions (BACE2, DSCAM, PDE9A) are not among the strongest expression signature genes.

      (4) Interestingly, some of the regions (such as the region shown in Supplementary figure 1C) that change compartment across the breast cancer spectrum overlap with regions that we found change compartment in the progression of prostate cancer, as shown in the string.db enrichment analysis below.

      Author response image 2.

      In our revised manuscript, we now include more of these explanations in the text and include the example offset compartment and transcription change region shown about as panel c of Supplementary Figure 1.

      (4) Figure 4: The title of the subheading of this section was 'Lung metastatic breast cancer cell lines acquire lung-like genome architecture". Echoing my comments in point 1, I am a bit hesitant to term it as "lung metastatic" but rather "metastatic' in general since cell lines such as MDA-MD-231 do metastasize to other organs as well. However, I do get the point that the definition of "lung metastasis" is derived from the common metastasis features among the cell lines here (MCF7, T47D, SKBR3, MDAMB-231). There might be another argument about whether the "lung" carcinoma cell lines can be considered "localized" since they are also capable of metastasizing to other organs.

      Rather than classifying cells on metastatic “potential” (as measured in a mouse), our cell lines are chosen based on their sites of origin and etiological history in the patients from which they were derived. Cancer cell lines called “lung metastasis” were collected from the pleural effusion from the human lung. Likewise, we call a cancer “localized” because it was taken from the tissue where the cancer originated, even if it might, if placed into a different context, be able to metastasize. We would argue that the genome structure features of the “localized” cancers reflect cancers that have not yet metastasized (even if they could in the future) while the “metastatic” cancers have already gone to a certain location (even if they could in theory have gone to a different location).

      In a way, what the authors probably were trying to leverage here is the "tissue" identity of that organ.

      Having said this, in addition to showing the "lung permissive changes", the authors should show the "breast identity conservation" as well. Because this section started to deal with the concept of "tissue/lineage identify", the authors should also clarify whether these breast cancer cell lines capable of making lung metastasis are also preserving their original tissue identity from the compartment features (which would most likely be the case).

      This is a great question. We have now more explicitly checked the proportions of genomic regions that change compartments to match lung vs. maintaining breast-specific compartment identity. The graphs in Supplementary Figure 2 begin with all genomic bins that have distinctive compartment identity between non-cancerous breast and lung epithelial cells. Then, the plots show what fraction of these tissue-specific bins change compartment to match lung vs. maintaining breast identity in each breast cancer cell line category. As we have shown in other graphs, particularly for switches to the A compartment, more bins change to match lung in the metastatic vs. primary site cell lines. In most cases, more than 50% of the tissue-specific bins shift to look more like lung.

      (5) Rest of the sections: The authors started to claim that the organ-specific metastasis permissive compartmental features mimic the destinated end organ. The authors utilized additional non-breast cancer cell lines (prostate cancer cell lines LNCaP as localized and DU145 as brain metastatic) in brain metastasis to strengthen this claim. (DU145 in MetMap again is highly metastatic to lung, brain, and kidney). However, this makes one wonder that for cell lines that are capable of metastasizing to multiple organ sites (eg. MDA-MB-231, DU145, A459, H460), does it mean that they all acquire the permissive features for all these organs? This scenario is clinically relevant in Stage 4 patients who often present with not only one metastatic lesion in one single organ but multiple metastatic lesions in more than one organ (eg. concomitant liver and lung metastasis). Do the authors think that there might be different clones having different tropism-permissive 3D genome features or there might be evolutionary trajectory in this?

      In my opinion, to further prove this point, the authors might need to consider doing in vivo experiments to collect paired primary and organ-specific metastatic samples to look at the 3D genome changes.

      We agree that an ideal experimental follow up to this study would be to collect paired metastatic and primary tumors, either in mouse xenograft or, even better, from patients. This is beyond the scope of what we can do for our current paper, but we have added a statement to the discussion of further experiments that would be required to clarify this point.

      (6) Technically, the study utilized public Hi-C data without generating new Hi-C data. The resolution of the Hi-C data for compartments was set at 250KB as the binning size indicating that the Hi-C data was at lower resolution so it might not be ideal to address other 3D genome architecture changes such as TADs or long-range loops. It is therefore unknown whether there might be permissive TAD/loop changes associated with organotropism and this is the limitation of this study.

      Our decision to focus on A/B compartmentalization rather than TAD or loop structure in this analysis was intentional and biologically motivated, rather than solely being a reflection of data resolution. Both compartments and topologically associated domains (TADs) are key parts of genome organization and disruption of these structures has the potential to alter downstream gene regulation, as shown by numerous studies. However, compartments have been found, more so than TADs, to be strongly associated with cell type and cell fate. Therefore, in this manuscript, we decided to focus only on the compartment organization changes between different healthy and cancerous cells as they are more likely to represent the stable alterations of the genome organization malignant transformations.

      (7) In the final sentence of the discussion the authors stated "Overall, our results suggest that genome spatial compartment changes can help encode a cell state that favors metastasis (EMT)". The "metastasis (EMT)" was in fact not clearly linked inside the manuscript. The authors did not provide a strong link between metastasis and EMT in their result description. It is also unclear whether the EMTassociated compartment identity would also correlate with the organotropic compartment identity.

      We agree that this statement involves too strong of an assumption. The literature on this topic is vast and complex, and while there is abundant evidence that pathways of EMT can play important roles in facilitating metastasis, there are other pathways at play in the metastatic process as well (https://journals.plos.org/Plosbiology/article?id=10.1371/journal.pbio.3002487). We have made a clearer statement about this in the text now.

      To address the question of whether the organotropic changes related to the EMT changes, we calculated the overlap between the genomic bins that strongly segregated cell lines in the compartment principal component analysis (PC1) with those that showed “organotropic” changes. As you can see in supplementary table 3, this overlap is actually very small, where only 3% of bins are important both for the EMT segregation of cell lines and organotropism.

      We have now included this overlap information as supplementary table 3 and have addressed this in the text.

      Reviewer #2 (Public review):

      Summary:

      This work addresses an important question of chromosome architecture changes associated with organotopic metastatic traits, showing important trends in genome reorganization. The most important observation is that 3D genome changes consistent with adaptations for new microenvironments, including lung metastatic breast cells exhibiting signatures of the genome architecture typical to a lung cell-like conformation and brain metastatic prostate cancer cells showing compartment shifts toward a brain-like state.

      Strengths:

      This work presents interesting original results, which will be important for future studies and biomedical implications of epigenetic regulation in norm and pathology.

      Weaknesses:

      The authors used publicly available data for 15 cell types. They should show how many different sources the data were obtained from and demonstrate that obtained results are consistent if the data from different sources were used.

      In our revised version, we have provided a clarified table of information about all the publicly available data used from all the cell lines, indicating the sources of the data. The 17 datasets used come from 8 different studies. So, indeed, the reviewer is correct that many different sources of data were used. To address the question of whether our results would be consistent if data from different sources were used, we created a comparison map of the A/B compartment profiles for data from multiple sources when it was available. You can see below that the Hi-C data from different sources for the same cell lines cluster quite closely and show high correlation and are well separated from different cell lines. So, we do not think that source batch effects play a major role in our results.

      Author response image 3.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1a: This figure could be re-formatted without the arrows. Arrows usually indicate upstreamto-downstream relationships along certain processes. Using arrows here would mislead people to think that the cell lines were derived from one another. The same could apply to the supplementary figures.

      We have now edited figure 1a to include lines linking cell lines, indicating conceptual relationships, rather than arrows, which would imply direct derivation.

      (2) Figure 1c: The PCA (PC2 axis) indeed seemed to separate the HER2 status quite well. One concern is MCF7, it is labeled as ERpos/HER2neg in MetMap but seems to be clustered as HER2pos in this study. Are they the same? (This again highlights the importance of cell line definition and annotation).

      It is a good point that MCF7, while generally considered HER2 negative (we indicate this negative status in Supplementary Table 1), falls near HER2 positive cells in PCA space. This indicates that PCA captures tendencies but is not a perfect classifier. In a high dimensional, complex system, it is expected that an unsupervised analysis such as this will not capture just one biological feature in a given principal component, and therefore something like HER2 status may not segregate perfectly. However, this analysis does suggest that MCF7 3D genome structure has features that are more similar to other HER2+ cell lines. This raises the interesting possibility that it may actually behave like HER2+ cells in some ways even while being HER2- itself. We have more clearly stated the MCF7 discrepancy in the text.

      Reviewer #2 (Recommendations for the authors):

      (1) The description of results can be shortened, to make it easier to read and understand.

      In our revision, we have tried to clarify where possible, but it was difficult to shorten without losing important caveats and context (especially to make important points emphasized by reviewer 1).

      (2) "100 most positive and negative eigenvalues for PC1" - please provide the correct description.

      We have altered this to make it clearer and more correct: “using the genes from the regions with the top 100 most positive and 100 most negative eigenvector loadings for this PC1”

    1. eLife Assessment

      This valuable study presents a framework for a shareable data analysis pipeline aimed at improving reproducibility in neuroscience. The evidence for robustness and inter-laboratory operability is convincing. However, aspects such as accessibility for new users, flexibility for custom analyses, and plans for long-term maintenance remain incomplete. Overall, this work will be of interest to neuroscientists engaged in the analysis of large-scale neuronal recordings.

    2. Reviewer #1 (Public review):

      Summary

      The manuscript by K.H. Lee et al. presents Spyglass, a new open-source framework for building reproducible pipelines in systems neuroscience. The framework integrates the NWB (Neurodata Without Borders) data standard with the DataJoint relational database system to organize and manage analysis workflows. It enables the construction of complete pipelines, from raw data acquisition to final figures. The authors demonstrate their capabilities through examples, including spike sorting, LFP filtering, and sharp-wave ripple (SWR) detection. Additionally, the framework supports interactive visualizations via integration with Figurl, a platform for sharing neuroscience figures online.

      Strengths:

      Reproducibility in data analysis remains a significant challenge within the neuroscience community, posing a barrier to scientific progress. While many journals now require authors to share their data and code upon publication, this alone does not ensure that the code will execute properly or reproduce the original results. Recognizing this gap, the authors aim to address the community's need for a robust tool to build reproducible pipelines in systems neuroscience.

      Weaknesses:

      The issues identified here may serve as a foundation for future development efforts.

      (1) User-friendliness:

      The primary concern is usability. The manuscript does not clearly define the intended user base within a modern systems neuroscience lab. Improving user experience and lowering the barrier to entry would significantly enhance the framework's potential for broad adoption. The authors provide an online example notebook and a local setup notebook. However, the local setup process is overly complex, with many restrictive steps that could discourage new users. A more streamlined and clearly documented onboarding process is essential. Additionally, the lack of Windows support represents a practical limitation, particularly if the goal is widespread adoption across diverse research environments.

      (2) Dependency management and long-term sustainability:

      The framework depends on numerous external libraries and tools for data processing. This raises concerns about long-term maintainability, especially given the short lifespan of many academic software projects and the instability often associated with Python's backward compatibility. It would be helpful for the authors to clarify how flexible and modular the pipeline is, and whether it can remain functional if upstream dependencies become deprecated or change substantially.

      (3) Extensibility for custom pipelines:

      A further limitation is the insufficient documentation regarding the creation of custom pipelines. It is unclear how a user could adapt Spyglass to implement their own analysis workflows, especially if these differ from the provided examples (e.g., spike sorting, LFP analysis that are very specific to the hippocampal field). A clearer explanation or example of how to extend the framework for unrelated or novel analyses would greatly improve its utility and encourage community contributions.

      (4) Flexibility vs. Standardization:

      The authors may benefit from more explicitly defining the intended role of the framework: is Spyglass designed as a flexible, general-purpose tool for developing custom data analysis pipelines, or is its primary goal to provide a standardized framework for freezing and preserving pipelines post-publication to ensure reproducibility? While both goals are valuable, attempting to fully support both may introduce unnecessary complexity and result in a tool that is not well-suited for either purpose. The manuscript briefly touches on this tradeoff in the introduction, and the latter-pipeline preservation-may be the more natural fit for the package. If so, this intended use should be clearly communicated in the documentation to help users understand its scope and strengths.

      Impact:

      This work represents a significant milestone in advancing reproducible data analysis pipelines in neuroscience. Beyond reproducibility, the integration of cloud-based execution and shareable, interactive figures has the potential to transform how scientific collaboration and data dissemination are conducted. The authors are at the forefront of this shift, contributing valuable tools that push the field toward more transparent and accessible research practices.

    3. Reviewer #2 (Public review):

      Summary:

      This valuable paper presents Spyglass, a comprehensive software framework designed to address the critical challenges of reproducibility and data sharing in neuroscience. The authors have developed a robust ecosystem built on community standards such as NWB and DataJoint, and demonstrate its utility by applying it to datasets from two independent labs, successfully validating the framework's ability to reproduce and extend published findings. While the framework offers a powerful blueprint for modern, reproducible research, its immediate broad impact may be tempered by the significant upfront investment required for adoption and its current focus on electrophysiological data. Nevertheless, Spyglass stands as an important and practical contribution, providing a well-documented and thoughtfully designed path toward more transparent and collaborative science.

      Strengths:

      (1) Principled solution to a foundational challenge:

      The work offers a concrete and comprehensive framework for reproducibility in neuroscience, moving beyond abstract principles to provide an implemented, end-to-end ecosystem.

      (2) Pragmatic and robust architectural design:

      Features such as the "cyclic iteration" motif for spike-sorting curation and the "merge" motif for pipeline consolidation demonstrate deep, practical experience with neurophysiological analysis and address real-world challenges.

      (3) Cross-laboratory validation:

      The successful replication and extension of published hippocampal decoding findings across independent datasets strongly support the framework's utility and underscore its potential for enabling reproducible science.

      (4) Accessibility through documentation and demos:

      Extensive tutorials and the availability of a public demo environment lower some of the barriers to adoption.

      Weaknesses:

      (1) High barrier to adoption:

      The requirement to convert all data into NWB, maintain a relational database, and train users in structured workflows is a significant hurdle, particularly for smaller labs.

      (2) Limited tool integration:

      The current pipelines, while useful, still resemble proof-of-principle demonstrations. Closer integration with established analysis libraries such as Pynapple and others could broaden the toolkit and reduce duplication of effort.

      (3) Experimental metadata support:

      While NWB provides a solid foundation for storing neurophysiology data streams, it still lacks broad and standardized support for experimental metadata, including descriptions of conditions, subject details, and procedures, as well as links across datasets. This limitation constrains one of Spyglass's key promises: enabling reproducible, cross-laboratory science. The authors should clarify how Spyglass plans to address or mitigate this gap - for example, by adopting or contributing to metadata extensions, providing templates for experimental conditions, or integrating with complementary systems that manage metadata across datasets.

      (4) Cross-laboratory interoperability:

      While demonstrated across two datasets, the manuscript does not fully address how Spyglass will handle the diversity of metadata standards, acquisition systems, and lab-specific practices that remain major obstacles to reproducibility.

      (5) Visualization limitations:

      Beyond the export system and Figurl, NWB offers relatively few options for interactive data exploration. The ability to explore data flexibly and discover new phenomena remains limited, which constrains one of the potential strengths of standardized pipelines.

      Spyglass is well-positioned to become a community framework for reproducible neuroscience workflows, with the potential to set new standards for transparency and data sharing. With expanded modality coverage, tighter integration of existing community tools, stronger solutions for cross-lab interoperability, and richer visualization capabilities, it could have a transformative impact on the field.

    1. eLife Assessment

      The study provides valuable insights into the role of thalamic nuclei in associative threat and extinction learning, supported by a large dataset and multipronged analyses. However, aspects of the evidence remain incomplete, particularly regarding the statistical methods, the claims of plasticity, and the network modeling framework. With this addressed, this manuscript will be of interest to those interested in learning and memory, fear, thalamic circuitry, and related mental heath conditions.

    2. Reviewer #1 (Public review):

      Summary:

      Badarnee and colleagues analyse fMRI data collected during an associative threat-learning task. They find evidence for parallel processes mediated by the mediodorsal, LGn, and pulvinar nuclei of the thalamus. The evidence for these conclusions is promising, but limited by a lack of clarity regarding the preprocessing and statistical methods.

      Strengths:

      The approach is inventive and novel, providing information about thalamocortical interactions that are scant in the current literature.

      Weaknesses:

      (1) There are not sufficient details present to allow for the direct interrogation of the methods used in the study.

      (2) The figures do not contain sufficiently granular details, making it challenging to determine whether the observed effects were robust to individual differences.

    3. Reviewer #2 (Public review):

      Summary:

      The authors quantify human fMRI BOLD responses in pulvinar and mediodorsal thalamic nuclei during a fear conditioning and extinction task across two days, in a large sample size (hundreds of participants). They show that the BOLD responses in these areas differentiate the conditioned (CS+) and safety (CS-) stimuli. Additionally, this changes with repeated trials, which could be a neural correlate of fear learning. They show that the anterior pulvinar is most correlated with the MD, and that this is not due to anatomical proximity. They perform graph analysis on the pulvinar subnuclei, which suggests that the medial pulvinar is a hub between the sensory (lateral/inferior) and associative (anterior) pulvinar. They show different patterns of thalamic activity across conditioning, extinction, recall, and renewal.

      Strengths:

      The data has a large sample size (n=293 in some measures, n=412 in others). This is a validated human fear conditioning/extinction task that Dr Milad's group has been working with for several years. Few labs have investigated the thalamus activity during fear conditioning and extinction, particularly with a large sample size. There is an independent replication of the pulvinar network structure (Figure 3), which suggests that the processing in the more sensory-related inferior and lateral pulvinar is relayed to the anterior pulvinar (and possibly thereby to more action-related prefrontal areas) via an intermediate step in the medial pulvinar - potentially a novel discovery, but that needs more validation.

      Weaknesses:

      (1) The authors cannot make causal claims about their results based on correlational neuroimaging evidence. Causal claims should be pared back. E.g., sentence 1 in the Results section: "The anterior pulvinar and MD contribute to early associative threat learning, as evidenced by increased functional activation in response to CS+ compared to CS- at the block level (Fig. 1b-c)." needs to be reworded to something like "The anterior pulvinar and MD have increased functional activation... This suggests that these areas may contribute to early associate threat learning."

      (2) Figure 1: The fact that the difference in BOLD activity between CS+ and CS- goes away on the third trial is not addressed. This is a very large effect in the data.

      (3) Figure 3: Could the observed network structure be due to anatomical proximity? Perhaps the authors should do an analogous analysis to what they did in Figure 2 for this intra-pulvinar analysis. This analysis doesn't take into account the indirect connections through corticothalamic and thalamocortical connections with the visual cortex and the pulvinar. There is an implicit assumption that there are interconnections between the pulvinar subnuclei, but there are few strong excitatory projections between these subnuclei to my knowledge. If visual areas are included in the graph, it would make things more complex, but would probably dramatically change the story. In this way, the message is somewhat constructed or arbitrary.

      (3) In the results section describing Figures 4-7, there are no statistics supporting the claims made. There needs to be a set of graphs comparing the results across the study sessions and days, with statistical comparisons between the different experiments to confirm differences.

      (4) Figure 7 does not include the major corticothalamic and thalamocortical projections from early, mid-level, and higher visual cortex to the different pulvinar nuclei. I doubt that there are strong direct projections between the pulvinar nuclei; rather, the functional connections are probably mediated through interconnections with cortical visual areas.

      (5) Stylistic: There are a lot of hypotheses and interpretations presented in this primary literature paper, which may be better suited for a review or perspective piece.

      (6) In the discussion, there is an assumption that the fMRI BOLD responses to CS+ and CS- need to be different to indicate that an area is processing these distinctly, but the BOLD signal can only detect large-scale changes in overall activity. It's easy to imagine that an area could be involved in processing these two stimuli distinctly without showing an overall difference in the gross amount of activity.

      (7) There is strong evidence that the BOLD responses to the threat-related and safety-related stimuli are different, modest evidence for their claims of learning/plasticity in these pathways, and circumstantial evidence supporting their hypothesized graph network models. Overall, most of the claims made in the discussion are better considered possible interpretations rather than proven findings - this is not a criticism, as these experiments and subject matter are extremely complex.

      This study continues to validate the power and utility of this in human fear conditioning/extinction paradigm, and extends this paradigm to investigating fear learning beyond the traditional limbic system pathways. It's possible that their models for the pulvinar nuclei interconnections could guide future neuromodulation or DBS studies that could provide more causal evidence for their hypotheses.

    4. Reviewer #3 (Public review):

      Summary:

      The present work was aimed at investigating the specific contributions of thalamic nuclei to associative threat learning and extinction. Using fMRI, they examined activation patterns across pulvinar divisions, the lateral geniculate nucleus (LGN), and the mediodorsal thalamus (MD) during threat acquisition, extinction, and recall. Their goal was to uncover whether distinct thalamic systems support different modes of learning-automatic survival mechanisms versus more deliberate processes - and to propose a hierarchical pulvinar model of fear conditioning. They also try to refine current neuroanatomical models of threat learning and memory, highlighting the role of thalamic nuclei in it.

      Strengths:

      (1) Valuable theoretical elaboration and modeling regarding the differential role of pulvinar subdivisions on feedforward (inferior, lateral) and higher-order integration (anterior), and their functional interplay with other relevant subcortical and cortical structures in associative threat and extinction learning.

      (2) Large sample sizes and multipronged analytical approaches were used for hypothesis testing.

      (3) Exhaustive literature review in the field of associative threat, as well as regarding the role of thalamic nuclei and other brain structures in it.

      Weaknesses:

      (1) Several weaknesses should be pointed out regarding how fMRI data were collected, as well as decisions regarding how the fMRI data were preprocessed and analyzed:

      a) fMRI data have low resolution (3 cubic mm), which certainly limits the examination of small nuclei such as the ones investigated here, and especially the examination of the LGN and inferior pulvinar.

      b) fMRI was normalized to standard space. Analyzing the data in individual-subject space would have given you the options of avoiding altering every participant's brain and of using a probabilistic thalamic atlas that better adapts to each subject's brain and thalamic nuclei (see, for instance, Iglesias et al., 2018). This would have been ideal and would have given the authors more precision, especially considering the low resolution of the fMRI data and the size of the thalamic nuclei of interest.

      c) On top of the two previous points, the authors decided to smooth the data to 6mm, which means that every single voxel within these small nuclei was blurred/mixed with the 2 immediately contiguous voxels (if they followed the standard SPM12 normalization resampling default which resamples, or upsamples the data in this case, to 2 x 2 x 2mm). Given the strong changes in structural connectivity and function that can occur, especially in the thalamus, on voxels of this size, this and the previous 2 decisions do not favor anatomical precision.

      d) Motion during scanning was poorly controlled in the preprocessing. Including the motion parameters as covariates of no interest in the GLM does not fully guarantee that motion is not influencing the results, and that motion is not differentially influencing some experimental conditions more than others.

      (2) It is not clearly indicated in the manuscript how many subjects and how many trials went into each of the analyses. It would be important to indicate this in the text and/or the figures.

      (3) It is not clear either, why, given the large sample size, some of the results were not conducted using reproducibility strategies such as dividing the sample into 2 or 3 groups or using further cross-validation strategies.

      (4) Limited testing of alternative hypotheses. The results clearly seem to be a selection of the findings supporting the hypotheses that the authors sought to confirm. (just one example: in the analysis reported in Figures 1-2; are there other correlations between the activation of the anterior pulvinar and MD with other pulvinar nuclei? only the MD-anterior Puv is reported).

      (5) The manuscript does not contain a limitations subsection. Practically every study has limitations, and this one is not an exception. Better to tell the limitations to the readers upfront so they can factor them into their evaluation of the relevance of the manuscript and reported evidence.

      (6) Data should be made available to the scientific community. Code too. Even if you just used standard fMRI toolboxes, any code used to run analyses will be helpful to the community, or if someone decides to try to replicate your findings.

      Despite these weaknesses and what can be derived from them, this manuscript constitutes a valuable contribution to the field to start characterizing and conceptualizing the involvement of thalamic nuclei and their interactions with other brain regions in the associative threat learning circuitries. It also paves the road for further testing of the functional dynamics among these regions and circuitries, and modeling testing.

    1. eLife Assessment

      This study presents valuable findings on the physiological and computational underpinnings of the accumulation of intermittent glimpses of sensory evidence. While the authors present solid evidence to support their claims, a more exhaustive characterisation of how the different signals interact could further strengthen their case. The work will be of interest to cognitive and systems neuroscientists working on decision-making

    2. Reviewer #1 (Public review):

      Summary:

      This paper aims to characterise the physiological and computational underpinnings of the accumulation of intermittent glimpses of sensory evidence.

      Strengths:

      (1) Elegant combination of electroencephalography and computational modelling.

      (2) The authors describe results of two separate experiments, with very similar results, in effect providing an internal replication.

      (3) Innovative task design, including different gap durations.

      Weaknesses:

      (1) The authors introduce the CPP as tracking an intermediary (motor-independent) evidence integration process, and the MBL as motor preparation that maintains a sustained representation of the decision variable. It would help if the authors could more directly and quantitatively assess whether their current data are in line with this. That is, do these signals exhibit key features of evidence accumulation (slope proportional to evidence strength, terminating at a common amplitude that reflects the bound)? Additionally, plotting these signals report locked (to the button press) would help here. What do the results mean for the narrative of this paper?

      (2) The novelty of this work lies partly in the aim to characterize how the CPP and MBL interact (page 5, line 3-5). However, this analysis seems to be missing. E.g., at the single-trial level, do relatively strong CPP pulses predict faster/larger MBL? The simulations in Figure 5 are interesting, but more could be done with the measured physiology.

      (3) The focus on CPP and MBL is hypothesis-driven but also narrow. Since we know only a little about the physiology during this "gaps" task, have the authors considered computing TFRs from different sensor groupings (perhaps in a supplementary figure?).

      (4) The idea of a potential bound crossing during P1 is elegant, albeit a little simplistic. I wonder if the authors could more directly show a physiological signature of this. For example, by focusing on the MBL or occipital alpha split by the LL, LH, HL and HH conditions, and showing this pulse- as well as report-locked. Related, a primacy effect can also be achieved by modelling (i) self-excitation of the current one-dimensional accumulator, or (ii) two competing accumulators that produce winner-take-all dynamics. Is it possible to distinguish between these models, either with formal model comparison or with diagnostic physiological signatures?

      (5) The way the authors specify the random effects of the structure of their mixed linear models should be specified in more detail. Now, they write: "Where possible, we included all main effects of interest as random effects to control for interindividual variability." This sounds as if they started with a model with a full random effect structure and dropped random components when the model would not converge. This might not be sufficiently principled, as random components could be dropped in many different orders and would affect the results. Do all main results hold when using classical random effects statistics on subject-wise regression coefficients?

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript examines decision-making in a context where the information for the decision is not continuous, but separated by a short temporal gap. The authors use a standard motion direction discrimination task over two discrete dot motion pulses (but unlike previous experiments, fill the gaps in evidence with 0-coherence random dot motion of differently coloured dots). Previous studies using this task (Kiani et al., 2013; Tohidi-Moghaddam et al., 2019; Azizi et al., 2021; 2023) or other discrete sample stimuli (Cheadle et al., 2014; Wyart et al., 2015; Golmohamadian et al., 2025) have shown decision-makers to integrate evidence from multiple samples (although with some flexible weighting on each sample). In this experiment, decision-makers tended not to use the second motion pulse for their decision. This allows the separation of neural signatures of momentary decision-evidence samples from the accumulated decision-evidence. In this context, classic electroencephalography signatures of accumulated decision-evidence (central-parietal positivity) are shown to reflect the momentary decision-evidence samples.

      Strengths:

      The authors present an excellent analysis of the data in support of their findings. In terms of proportion correct, participants show poorer performance than predicted if assuming both evidence samples were integrated perfectly. A regression analysis suggested a weaker weight on the second pulse, and in line with this, the authors show an effect of the order of pulse strength that is reversed compared to previous studies: A stronger second pulse resulted in worse performance than a stronger first pulse (this is in line with the visual condition reported in Golmohamadian et al., 2025). The authors also show smaller changes in electrophysiological signatures of decision-making (central parietal positivity and lateralised motor beta power) in response to the second pulse. The authors describe these findings with a computational model which allows for early decision-commitment, meaning the second pulse is ignored on the majority of trials. The model-predicted electrophysiological components describe the data well. In particular, this analysis of model-predicted electrophysiology is impressive in providing simple and clear predictions for understanding the data.

      Weaknesses:

      Some readers may be left questioning why behaviour in this experiment is so different from previous experiments, which use almost exactly the same design (Kiani et al., 2013; Tohidi-Moghaddam et al., 2019; Azizi et al., 2021; 2023). The authors suggest this may be due to the staircase procedure used to calibrate the coherence of (single-pulse) dot motion stimuli for individuals at the start of the experiment. But it remains unclear why overall performance in this experiment is so bad. Participants achieved ~85% correct following 400 ms of 33 - 45% coherent motion. In previous work, performance was ~90% correct following 240ms of 12.8% coherent motion. It seems odd that adding the 0% coherent motion in the temporal gaps would impair performance so greatly, given it was clearly colour-coded. There is a lack of detail about the stimulus presentation parameters to understand whether visual processing explains the declined performance, or if there is a more cognitive/motivational explanation.

    1. eLife Assessment

      This study uses a valuable combination of functional magnetic resonance imaging and electroencephalography (EEG) to study brain activity related to prediction errors in relation to both sensorimotor and more complex cognitive functions. It provides incomplete evidence to suggest that prediction error minimisation drives brain activity across both types of processing and that elevated inter-regional functional coupling along a superior-inferior axis is associated with high prediction error, whereas coupling along a posterior-anterior axis is associated with low prediction error. The manuscript will be of interest to neuroscientists working on predictive coding and decision-making, but would benefit from more precise localisation of EEG sources and more rigorous statistical controls.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates whether prediction error extends beyond lower-order sensory-motor processes to include higher-order cognitive functions. Evidence is drawn from both task-based and resting-state fMRI, with the addition of resting-state EEG-fMRI to examine power spectral correlates. The results partially support the existence of dissociable connectivity patterns: stronger ventral-dorsal connectivity is associated with high prediction error, while posterior-anterior connectivity is linked to low prediction error. Furthermore, spontaneous switching between these connectivity patterns was observed at rest and correlated with subtle intersubject behavioral variability.

      Strengths:

      Studying prediction error from the lens of network connectivity provides new insights into predictive coding frameworks. The combination of various independent datasets to tackle the question adds strength, including two well-powered fMRI task datasets, resting-state fMRI interpreted in relation to behavioral measures, as well as EEG-fMRI.

      Weaknesses:

      Major:

      (1) Lack of multiple comparisons correction for edge-wise contrast:

      The analysis of connectivity differences across three levels of prediction error was conducted separately for approximately 22,000 edges (derived from 210 regions), yet no correction for multiple comparisons appears to have been applied. Then, modularity was applied to the top 5% of these edges. I do not believe that this approach is viable without correction. It does not help that a completely separate approach using SVMs was FDR-corrected for 210 regions.

      (2) Lack of spatial information in EEG:

      The EEG data were not source-localized, and no connectivity analysis was performed. Instead, power fluctuations were averaged across a predefined set of electrodes based on a single prior study (reference 27), as well as across a broader set of electrodes. While the study correlates these EEG power fluctuations with fMRI network connectivity over time, such temporal correlations do not establish that the EEG oscillations originate from the corresponding network regions. For instance, the observed fronto-central theta power increases could plausibly originate from the dorsal anterior cingulate cortex (dACC), as consistently reported in the literature, rather than from a distributed network. The spatially agnostic nature of the EEG-fMRI correlation approach used here does not support interpretations tied to specific dorsal-ventral or anterior-posterior networks. Nonetheless, such interpretations are made throughout the manuscript, which overextends the conclusions that can be drawn from the data.

    3. Reviewer #2 (Public review):

      Summary:

      This paper investigates putative networks associated with prediction errors in task-based and resting-state fMRI. It attempts to test the idea that prediction errors minimisation includes abstract cognitive functions, referred to as the global prediction error hypothesis, by establishing a parallel between networks found in task-based fMRI where prediction errors are elicited in a controlled manner and those networks that emerge during "resting state".

      Strengths:

      Clearly, a lot of work and data went into this paper, including 2 task-based fMRI experiments and the resting state data for the same participants, as well as a third EEG-fMRI dataset. Overall, well written with a couple of exceptions on clarity, as per below, and the methodology appears overall sound, with a couple of exceptions listed below that require further justification. It does a good job of acknowledging its own weakness.

      Weaknesses:

      (1) The paper does a good job of acknowledging its greatest weakness, the fact that it relies heavily on reverse inference, but cannot quite resolve it. As the authors put it, "finding the same networks during a prediction error task and during rest does not mean that the networks' engagement during rest reflects prediction error processing". Again, the authors acknowledge the speculative nature of their claims in the discussion, but given that this is the key claim and essence of the paper, it is hard to see how the evidence is compelling to support that claim.

      (2) Given how uncontrolled cognition is during "resting-state" experiments, the parallel made with prediction errors elicited during a task designed for that effect is a little difficult to make. How often are people really surprised when their brains are "at rest", likely replaying a previously experienced event or planning future actions under their control? It seems to be more likely a very low prediction error scenario, if at all surprising.

      (3) The quantitative comparison between networks under task and rest was done on a small subset of the ROIs rather than on the full network - why? Noting how small the correlation between task and rest is (r=0.021) and that's only for part of the networks, the evidence is a little tenuous. Running the analysis for the full networks could strengthen the argument.

      (4) Looking at the results in Figure 2C, the four-quadrant description of the networks labelled for low and high PE appears a little simplistic. The authors state that this four-quadrant description omits some ROIs as motivated by prior knowledge. This would benefit from a more comprehensive justification. Which ROIs are excluded, and what is the evidence for exclusion?

      (5) The EEG-fMRI analysis claiming 3-6Hz fluctuations for PE is hard to reconcile with the fact that fMRI captures activity that is a lot slower, while some PEs are as fast as 150 ms. The discussion acknowledges this but doesn't seem to resolve it - would benefit from a more comprehensive argument.

    4. Reviewer #3 (Public review):

      Bogdan et al. present an intriguing and timely investigation into the intrinsic dynamics of prediction error (PE)-related brain states. The manuscript is grounded in an intuitive and compelling theoretical idea: that the brain alternates between high and low PE states even at rest, potentially reflecting an intrinsic drive toward predictive minimization. The authors employ a creative analytic framework combining different prediction tasks and imaging modalities. They shared open code, which will be valuable for future work.

      However, the current manuscript would benefit from further clarification and empirical grounding, especially with regard to its theoretical framing (that PE-like state fluctuations are intrinsic and help us minimize PE), interpretation of results, and broader functional significance. Below, I outline a few major comments and suggestions that I think would strengthen the contribution.

      (1) Consistency in Theoretical Framing

      The title, abstract, and introduction suggest inconsistent theoretical goals of the study.

      The title suggests that the goal is to test whether there are intrinsic fluctuations in high and low PE states at rest. The abstract and introduction suggest that the goal is to test whether the brain intrinsically minimizes PE and whether this minimization recruits global brain networks. My comments here are that a) these are fundamentally different claims, and b) both are challenging to falsify. For one, task-like recurrence of PE states during resting might reflect the wiring and geometry of the functional organization of the brain emerging from neurobiological constraints or developmental processes (e.g., experience), but showing that mirroring exists because of the need to minimize PE requires establishing a robust relationship with behavior or showing a causal effect (e.g., that interrupting intrinsic PE state fluctuations affects prediction).

      The global PE hypothesis-"PE minimization is a principle that broadly coordinates brain functions of all sorts, including abstract cognitive functions"-is more suitable for discussion rather than the main claim in the abstract, introduction, and all throughout the paper.

      Given the above, I recommend that the authors clarify and align their core theoretical goals across the title, abstract, introduction, and results. If the focus is on identifying fluctuations that resemble task-defined PE states at rest, the language should reflect that more narrowly, and save broader claims about global PE minimization for the discussion. This hypothesis also needs to be contextualized within prior work. I'd like to see if there is similar evidence in the literature using animal models.

      (2) Interpretation of PE-Related Fluctuations at Rest and Its Functional Relevance

      It would strengthen the paper to clarify what is meant by "intrinsic" state fluctuations. Intrinsic might mean task-independent, trait-like, or spontaneously generated. Which do the authors mean here? Is the key prediction that these fluctuations will persist in the absence of a prediction task?

      Regardless of the intrinsic argument, I find it challenging to interpret the results as evidence of PE fluctuations at rest. What the authors show directly is that the degree to which a subset of regions within a PE network discriminates high vs. low PE during task correlates with the magnitude of separation between high and low PE states during rest. While this is an interesting relationship, it does not establish that the resting-state brain spontaneously alternates between high and low PE states, nor that it does so in a functionally meaningful way that is related to behavior. How can we rule out brain dynamics of other processes, such as arousal, that also rise and fall with PE? I understand the authors' intention to address the reverse inference concern by testing whether "a participant's unique connectivity response to PE in the reward-processing task should match their specific patterns of resting-state fluctuation". However, I'm not fully convinced that this analysis establishes the functional role of the identified modules to PE because of the following:

      Theoretically, relating the activities of the identified modules directly to behavior would demonstrate a stronger functional role.

      a) Across participants: Do individuals who exhibit stronger or more distinct PE-related fluctuations at rest also perform better on tasks that require prediction or inference? This could be assessed using the HCP prediction task, though if individual variability is limited (e.g., due to ceiling effects), I would suggest exploring a dataset with a prediction task that has greater behavioral variance.

      Or even more broadly, does this variability in resting state PE state fluctuations predict general cognitive abilities like WM and attention (which the HCP dataset also provides)? I appreciate the inclusion of the win-loss control, and I can see the intention to address specificity. This would test whether PE state fluctuations reflect something about general cognition, but also above and beyond these attentional or WM processes that we know are fluctuating.

      b) Within participants: Do momentary increases in PE-network expression during tasks relate to better or faster prediction? In other words, is there evidence that stronger expression of PE-related states is associated with better behavioral outcomes?

      (3) Apriori Hypothesis for EEG Frequency Analysis

      It's unclear how to interpret the finding that fMRI fluctuations in the defined modules correlate with frontal Delta/Theta power, specifically in the 3-6 Hz range. However, in the EEG literature, this frequency band is most commonly associated with low arousal, drowsiness, and mind wandering in resting, awake adults, not uniquely with prediction error processing. An a priori hypothesis is lacking here: what specific frequency band would we expect to track spontaneous PE signals at rest, and why? Without this, it is difficult to separate a PE-based interpretation from more general arousal or vigilance fluctuations.

      (4) Significance Assessment

      The significance of the correlation above and all other correlation analyses should be assessed through a permutation test rather than a single parametric t-test against zero. There are a few reasons: a) EEG and fMRI time series are autocorrelated, violating the independence assumption of parametric tests;<br /> b) Standard t-tests can underestimate the true null distribution's variance, because EEG-fMRI correlations often involve shared slow drifts or noise sources, which can yield spurious correlations and inflating false positives unless tested against an appropriate null.

      Building a null distribution that preserves the slow drifts, for example, would help us understand how likely it is for the two time series to be correlated when the slow drifts are still present, and how much better the current correlation is, compared to this more conservative null. You can perform this by phase randomizing one of the two time courses N times (e.g., N=1000), which maintains the autocorrelation structure while breaking any true co-occurrence in patterns between the two time series, and compute a non-parametric p-value. I suggest using this approach in all correlation analyses between two time series.

      (5) Analysis choices

      If I'm understanding correctly, the algorithm used to identify modules does so by assigning nodes to communities, but it does not itself restrict what edges can be formed from these modules. This makes me wonder whether the decision to focus only on connections between adjacent modules, rather than considering the full connectivity, was an analytic choice by the authors. If so, could you clarify the rationale? In particular, what justifies assuming that the gradient of PE states should be captured by edges formed only between nearby modules (as shown in Figure 2E and Figure 4), rather than by the full connectivity matrix? If this restriction is instead a by-product of the algorithm, please explain why this outcome is appropriate for detecting a global signature of PE states in both task and rest.

      When assessing the correspondence across task-fMRI and rs-fMRI in section 2.2.2, why was the pattern during task calculated from selecting a pair of bilateral ROIs (resulting in a group of eight ROIs), and the resting state pattern calculated from posterior-anterior/ventral-dorsal fluctuation modules? Doesn't it make more sense to align the two measures? For example, calculating task effects on these same modules during task and rest?

    1. eLife Assessment

      This important study concerns the propagation of waves in bacterial biofilms, bridging active matter physics and bacterial biophysics. While the experimental observations are solid, the theoretical interpretation and model validation are currently incomplete and require further refinement. This work will be of interest to microbiologists, biophysicists, and researchers studying collective behavior in biological systems.

    2. Reviewer #1 (Public review):

      Summary:

      Overall, this is an interesting paper. The authors have found multiple experimental knobs to perturb a mechanical wave behavior driven by pilli feedback. The authors framed this as nonreciprocal interactions - while I can see how nonreciprocity could play a role - what about mechanical feedback? Phenomenological models are fine, but a lack of mechanistic understanding is a weakness. I think it will be more interesting to frame the model based on potential mechanochemical feedback to understand microscopic mechanisms. Regardless, more can be done to better constrain the model through finding knobs to explain experimental observations (in Figures 3, 4, 5, and 7).

      Strengths:

      The report of mechanical waves in bacterial collectives. The mechanism has potential application in a multicellular context, such as morphogenesis.

      Weaknesses:

      My most serious concern is about left-right symmetry breaking. I fail to see how the data in Figure 6 shows LR symmetry breaking. All they show is in-out directionality, which is a boundary condition. LR SM means breaking of mirror symmetry - the pattern cannot be superimposed on its mirror image using only rigid body transformations (translation and rotation) - as far as I am aware, this condition is not satisfied in this pattern-forming system.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Altin et al. examines the dynamics of bacterial assemblies, building on previously published work documenting mechanical spiral waves. The authors show that the emergent dynamics can be influenced by various factors, including the strain of bacteria and water content in the sample. While the topic of this paper would be of broad interest, and the preliminary results are certainly interesting, various aspects of this paper are underdeveloped and require further exploration.

      Strengths:

      One of the nice features of this system is the ability to transition between the different states based on the addition or withdrawal of water. The authors use a similar experimental model system and mathematical model to previously published work (Reference 49), but extend by showing that the behaviour can be modified through simple interventions. Specifically, the authors show that adding water droplets or drying the sample through heating can result in changes in the observed wave structure. This represents a possible way of controlling active matter.

      The mathematical model proposed in this paper involves a phase-oscillator model of Kuramoto-style coupling (similar to previously reported models). A non-reciprocal phase lag is introduced in order to facilitate the patterns seen in experiments. The qualitative agreement in the behaviour is quite striking, showing both spiral waves and travelling waves.

      Weaknesses:

      The principal observation of the paper - that spiral waves emerge in these systems and can be controlled in various ways - is not linked to microscale dynamics at the cell level. It is recognised that hydrodynamics can introduce non-reciprocity, an essential ingredient of this model. However, in this work the authors have not identified a physical mechanism for the lag, e.g., either through steric interactions or hydrodynamic disturbances. This is also relevant in the phase oscillator modelling section. In low Reynolds number flows, dynamics are instantaneously determined. In this light, what does the phase lag term represent? What is the origin of the coupling term, b? Can this be varied systematically or derived from experimental measurements or parameters?

      Classification of wave properties is an important aspect of this paper, but is not accomplished in a quantitative sense. What is the method for distinguishing between travelling and spiral waves? There is a range of quantitative tools that could be used to investigate these dynamics (and also compare quantitatively with the models). For example, examining the correlation functions and order parameters could assist with the extraction of wave features (see extensive literature on oscillator models).

      The methodology of changing the dynamics through moisture content appears to be slightly underdeveloped, e.g., adding water involves a droplet, and removing water is accomplished by heating (which presumably could cause other effects). Could the dynamics not be controlled more directly by varying the humidity? At the same time, the authors also mention that temperature itself plays a role in shaping the behaviour. What is the mechanism for this? Is it just through evaporation? Since the frequency increases with temperature, could it just be that activity increases with temperature?

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript presents a novel investigation into unidirectionally propagating waves observed on the surface of Pseudomonas nitroreducens bacterial biofilms. The authors explore how these waves, initially spiral in form, transition into combinations of spiral, target, and planar patterns. The study identifies the periodic extension-retraction cycles of type IV pili as the driving mechanism for wave propagation, which preferentially moves from the colony's edge to its center. Furthermore, the manuscript proposes two theoretical models-a phase-oscillator model and a continuum active solid model-to reproduce these phenomena, and demonstrates how external manipulations (e.g., water droplets, temperature, PEG) can control wave patterns and direction, often correlating with oscillation frequency gradients. The work aims to bridge the fields of active-matter physics and bacterial biophysics by providing both experimental observations and theoretical frameworks for understanding these complex biological wave phenomena.

      Strengths:

      The experimental discovery of unidirectionally propagating waves on bacterial biofilms is highly intriguing and represents a significant contribution to both microbiology and active-matter physics. The detailed observations of wave pattern transitions (spiral to target to planar) and their response to various environmental perturbations (water, temperature, PEG) provide valuable empirical data. The identification of type IV pili as the driving force offers a concrete biological mechanism. The observed correlation between frequency gradients and wave direction is a compelling finding with potential for broader implications in understanding biological pattern formation. This work has the potential to stimulate further research in the collective behavior of living systems and the physical principles underlying biological organization.

      Weaknesses:

      The manuscript attempts to link unidirectional wave propagation to non-reciprocal couplings but ultimately shows that the wave direction is determined by the gradient of the oscillation frequency. The couplings in the two theoretical models are both isotropic and thus cannot dictate the wave direction. A clear distinction should be made between non-reciprocity as a source of wave generation and non-uniformity as a controlling factor of wave direction.

      The relationship between the phase oscillator model and the active solid model is unclear. Given that U and P are both dynamical variables evolving in three-dimensional space, defining the phase Φ precisely in the phase space spanned by U and P could be challenging. A graphical illustration of the definition of Φ would be beneficial. To ensure reproducibility of the numerical results, the parameter values used in the numerical simulations and an explicit definition of the elastic force in the active solid model should be provided.

      The link between the theoretical models and experimental results is weak. For example, the propagation of the kink from the lower to the higher part of the surface (Figure 1e) could be addressed within the framework of the active solid model. The mechanism of transition from spiral to target waves (Figure 3a), b)) requires clarification, identifying which model parameter is crucial for inducing this transition. The wave propagation toward the lower frequency side is numerically demonstrated using the phase oscillator model, but a physical or intuitive explanation for this phenomenon is missing. Also, the wave transitions induced by the addition of water droplets and temperature rise are not linked to specific parameters in the theoretical models.

    1. eLife Assessment

      This paper reports a useful low-cost platform for studying mosquito behaviors such as flight activity, sugar feeding, and host-seeking responses over the course of several weeks, and demonstrates key applications of this platform. While the authors provide a biological proof of principle, the evidence that supports the validation of the tracking algorithm is incomplete; it lacks biological replicates, independent confirmation of the tracking algorithm, and data on mosquito survival.

    2. Reviewer #1 (Public review):

      Summary:

      This paper describes a behavioral platform "BuzzWatch" and its application in long-term behavioral monitoring. The study tested the system with different mosquito species and Aedes aegypti colonies and monitored behavioral response to blood feeding, change in photoperiod, and host-cue application at different times of the day.

      Strengths:

      BuzzWatch is a novel, custom-built behavioral system that can be used to monitor time-of-day-specific and long-term mosquito behaviors. The authors provide detailed documentation of the construction of the assay and custom flight tracking algorithm on a dedicated website, making them accessible to other researchers in the field. The authors performed a wide range of experiments using the BuzzWatch system and discovered differences in midday activity level among Aedes aegypti colonies, and reversible change in the daily activity profile post-blood-feeding.

      Weaknesses:

      The authors report the population metric "fraction flying" as their main readout of the daily activity profile. It is worth explaining why conventional metrics like travel distance/activity level are not reported. Alternatively, these metrics could be shown, considering the development and implementation of a flight trajectory tracking pipeline in this paper.

      The authors defined the sugar-feeding index using occupancy on the sugar feeder. However, the correlation between landing on the sugar feeder and active sugar feeding is not mentioned or tested in this paper. Is sugar feeding always observed when mosquitoes land on the sugar feeder? Do they leave the sugar feeding surface once sugar feeding is complete? One can imagine that texture preference and prolonged occupancy may lead to inaccurate reporting of sugar feeding. While occupancy on the sugar feeder is an informative behavioral readout, its link with sugar feeding activity (consumption) needs to be evaluated. Otherwise, the authors should discuss the caveats that this method presents explicitly to avoid overinterpretation of their results.

      Throughout the manuscript, the authors mentioned existing mosquito activity monitoring systems and their drawbacks. However, many of these statements are misleading and sometimes incorrect. The authors claim that beam-break monitors are "limited to counting active versus inactive states". Though these systems provide indirect readouts that may underreport activity, the number of beam-breaks in a time interval is correlated with activity level, as is commonly used and reported in Drosophila and mosquitoes and a number of reports in mosquitoes an updated LAM system with larger behavioral arenas and multiple infrared beams. The authors also mentioned the newer, camera-based alternatives to beam-break monitors, but again referred to these systems as "only detecting activity when a moving insect blocks a light beam"; however, these systems actually use video tracking (e.g., Araujo et al. 2020).

      The fold change in behavior presented in Figure 4D is rather confusing. Under the two different photoperiods, it is not clear how an hourly comparison is justified (i.e., comparing the light-on activity in the 20L4D condition with scotophase activity in the 12L12D condition). The same point applies to Figure 4H.

      The behavioral changes after changing photoperiod (Figure 4) require a control group (12L12D throughout) to account for age-related effects. This is controlled for the experiment in Figure 3 but not for Figure 4.

    3. Reviewer #2 (Public review):

      Summary:

      This study establishes a platform for studying mosquito flight activity over the course of several weeks and demonstrates key applications of such a paradigm: the comparison of daily activity profiles across different Aedes aegypti populations and the quantification of responses to physiological and environmental perturbations.

      Strengths:

      (1) Overall, the authors succeed in setting up a low-cost, scalable tracking system that stably records mosquito flight activity for several weeks and uses it to demonstrate compelling use cases.

      (2) The text is organized well, is easy to read, and is understandable for a broad audience.

      (3) Instructions for constructing housing and for performing tracking with a dedicated GUI are available on an accompanying website, with open-source (and well-organized) code.

      (4) A complementary pair of methods (one testing for activity signals at specific times of the day, and the other capturing broader daily patterns) is used effectively.

      Weaknesses:

      (1) In the interval-based GLMM results, since each time interval is tested independently, p-values should be corrected for multiple hypotheses (for instance, through controlling the false discovery rate).

      (2) The accompanying GUI application needs some modifications to fully work out of the box on a sample video.

    4. Reviewer #3 (Public review):

      Summary:

      The authors in this paper introduce BuzzWatch, an open-source, low-cost (200-300 Euros) platform for long-term monitoring of mosquito flight and behavior. They use a Raspberry Pi with a Noirv2 Camera set up under laboratory conditions to observe 3 different species of mosquitoes. The system captures a variety of multimodal data, like flight activity, sugar feeding, and host-seeking responses, with the help of external modules like CO2 and fructose-soaked cottons. They also release a GUI in addition to automated tracking and behaviour analysis, which doesn't run on Pi but rather on a personal laptop.

      Four main use cases are demonstrated:

      (1) Characterizing diel rhythms in various Aedes aegypti populations.

      (2) Differentiating behaviors of native African vs. invasive human-adapted subspecies.

      (3) Assessing physiological (blood-feeding) and environmental (light regime) perturbations.

      (4) Testing time-of-day variation in responses to host-associated cues like CO₂ and heat.

      Description (Strengths):

      (1) The authors introduce a low-cost, scalable system that uses flight tracking in 2D as an alternative to 3D multi-camera systems.

      (2) Due to the low pixel quality required by the system, they can record for weeks at a time, capturing long temporal and behavioral activities.

      (3) They also integrate external modules such as lights, CO2, and heat as a way to measure responses to a variety of stimuli.

      (4) They also introduce a wiki as a guide for building replication and a help in using the GUI module.

      (5) They implement both GLMM hourly and PCA of behavior data.

      Limitations - Major Comments:

      (1) Most experiments are only done with single replicates per colony. If the setup is claimed to be cheap and replicable, there should be clearer replicates across experiments.

      (2) No external validation for the flight tracking algorithm using manual annotation or comparison with field data. The authors focus early on biological proof of principle, but the validity of the tracking algorithm is not presented. How accurate is the algorithm at classifying behaviours (e.g., vs human ground truth)? How reliable is tracking?

      (3) Why develop a custom GUI instead of using established packages such as rethomics (https://rethomics.github.io/) that are already available for behavioral analysis?

      (4) Why use RGB light strips when perceptual white light for humans is not relevant for mosquitoes? The choice of lighting should be based on the mosquito's visual perception. - https://pmc.ncbi.nlm.nih.gov/articles/PMC12077400/ .

      (5) Why use GLMMs instead of GAMs (with explicit periodic components)? With GLMMs, you do not account for temporal structure, which is highly relevant and autocorrelated in behavioral time series data.

      (6) What is the proportion of mosquitoes that stay alive throughout the experiments? How do you address dead animals in tracking? No data are available on whether all mosquitoes made it through the monitoring period. No survival data is mentioned in the paper, and in the wiki, it is not clear how it is used or how it affects the analyses - https://theomaire.github.io/buzzwatch/analyze.html#diff-cond .

      (7 )The sugar feeding behavior is not manually validated.

      (8) Figure 4d is difficult to understand - how did you align time? Why is ZT4 aligning with ZT0? Should you "warp" the time series to compare them (e.g., from dawn to dusk)?

      (9) No video recordings are made available for demonstration or validation purposes.

      Appraisal

      (1) The core conclusions---that BuzzWatch can capture multiscale mosquito behavioral rhythms and quantify the effect of genetic, environmental, and physiological variation - show promise but require stronger validation.

      (2) Statistical approaches (GLMM, PCA) are chosen but may not be optimal for temporal data with autocorrelation.

      (3) The host-seeking module shows a differential response, which is a potentially valuable feature.

    5. Author response:

      We were pleased to read the positive comments regarding our manuscript and thank the reviewers and editors for the constructive feedback which we believe will be very helpful to improve the current version of the manuscript.

      Prior to addressing all comments in a full response, we provide a response to three issues that were raised in this provisional plan for revision: validation of the tracking algorithm, biological replicates, and mosquito survival.

      (1) Validation of the tracking algorithm:

      Reviewer 2 mentions that there is "No external validation for the flight tracking algorithm using manual annotation". We will address this comment in our full response by creating a manually labelled dataset to validate our detection algorithm.

      However, we would like to point out two important points:

      i) Quantifying the accuracy of a detection algorithm using a manually annotated set is indeed common practice in deep/machine learning algorithms in which manually annotated data are used to train the algorithm, and another set of manually annotated data is used to validate it. However, our detection and tracking algorithm is based on conventional computer vision techniques (not using any deep learning) that have been in use for several decades. Given that these algorithms are completely transparent and deterministic (as opposed to deep learning algorithms that are difficult to dissect and are created using partly stochastic processes) it is not common practice to use human annotations for validation. However, to address Reviewer 2's comment we will provide validation metrics in our full response.

      ii) We furthermore would like to note that our main metrics of interest (e.g. fraction of mosquitoes flying) only depends on accurately detecting mosquitoes and quantifying movement, its accuracy is not affected by potential identity swaps (the typical bottleneck in tracking algorithms).

      (2) Replicates:

      Reviewer 3 states that "Most experiments are only done with single replicates". This statement is not accurate: In Figure 2 we used 3 independent biological replicates for 4 colonies, 2 of which are Aaa and 2 are Aaf. We indeed provide additional data for 6 more colonies using a single replicate. Combined this data set comprises 588 days of continuous recordings. For Figures 3 and 4 we have 2 replicates for each perturbation experiment. For Figure 5 we provided 3 replicates for the host-seeking experiments. As outlined, the vast majority of our experiments has multiple replicates. We realize this may not have been described clearly enough in the manuscript, we will clarify this in the revised manuscript.

      (3) Mosquito survival:

      Below we provide survival data for the data shown in Figures 1 - 4, we will include this data as supplementary material. Overall we note here that mortality for all experiments was similar to the 'baseline' mortality we observe in our standard colony maintenance procedures. After three weeks, we typically observed that 70% of mosquitoes were still alive.

      Author response image 1.

      Survival curves for the data presented in Figures 1 - 4 of the main text. Day 0 indicates the day on which the BuzzWatch experiment started

    1. eLife Assessment

      This study uses the Drosophila mushroom body as a model to understand the molecular machinery that controls the temporal specification of neuronal cell types. With convincing experimental evidence, the authors made fundamental findings that the Pipsqueak domain-containing transcription factor Eip93F is central to the specification of a later-born neuronal subtype and in inhibiting gene expression for earlier subtypes.

    2. Reviewer #1 (Public review):

      Summary:

      The temporal regulation of neuronal specification and its molecular mechanisms are important problems in developmental neurobiology. This study focuses on Kenyon cells (KCs), which form the mushroom body in Drosophila melanogaster, in order to address this issue. Building on previous findings, the authors examine the role of the transcription factor Eip93F in the development of late-born KCs. The authors revealed that Eip93F controls the activity of flies at night through the expression of the calcium channel Ca-α1T. Thus, the study clarifies the molecular machinery that controls temporal neuronal specification and animal behavior.

      Strengths:

      The convincing results are based on state-of-the-art molecular genetics, imaging, and behavioral analysis.

      Weaknesses:

      Temporal mechanisms of neuronal specification are found in many nervous systems. However, the relationship between the temporal mechanisms identified in this study and those in other systems remains unclear.

    3. Reviewer #2 (Public review):

      Summary:

      Understanding the mechanisms of neural specification is a central question in neurobiology. In Drosophila, the mushroom body (MB), which is the associative learning region in the brain, consists of three major cell types: γ, α'/β', and α/β kenyon cells. These classes can be further subdivided into seven subtypes, together comprising ~2000 KCs per hemi-brain. Remarkably, all of these neurons are derived from just four neuroblasts in each hemisphere. Therefore, a lot of endeavors are put into understanding how the neuron is specified in the fly MB.

      Over the past decade, studies have revealed that MB neuroblasts employ a temporal patterning mechanism, producing distinct neuronal types at different developmental stages. Temporal identity is conveyed through transcription factor expression in KCs. High levels of Chinmo, a BTB-zinc finger transcription factor, promote γ-cell fate (Zhu et al., Cell, 2006). Reduced Chinmo levels trigger expression of mamo, a zinc finger transcription factor that specifies α'/β' identity (Liu et al., eLife, 2019). However, the specification of α/β neurons remains poorly understood. Some evidence suggests that microRNAs regulate the transition from α'/β' to α/β fate (Wu et al., Dev Cell, 2012; Kucherenko et al., EMBO J, 2012). One hypothesis even proposes that α/β represents a "default" state of MB neurons, which could explain the difficulty in identifying dedicated regulators.

      The study by Chung et al. challenges this hypothesis. By leveraging previously published RNA-seq datasets (Shih et al., G3, 2019), they systematically screened BAC transgenic lines to selectively label MB subtypes. Using these tools, they analyzed the consequences of manipulating E93 expression and found that E93 is required for α/β specification. Furthermore, loss of E93 impairs MB-dependent behaviors, highlighting its functional importance.

      Strengths:

      The authors conducted a thorough analysis of E93 manipulation phenotypes using LexA tools generated from the Janelia Farm and Bloomington collections. They demonstrated that E93 knockdown reduces expression of Ca-α1T, a calcium channel gene identified as an α/β marker. Supporting this conclusion, one LexA line driven by a DNA fragment near EcR (R44E04) showed consistent results. Conversely, overexpression of E93 in γ and α'/β' Kenyon cells led to downregulation of their respective subtype markers.

      Another notable strength is the authors' effort to dissect the genetic epistasis between E93 and previously known regulators. Through MARCM and reporter analyses, they showed that Chinmo and Mamo suppress E93, while E93 itself suppresses Mamo. This work establishes a compelling molecular model for the regulatory network underlying MB cell-type specification.

      Weaknesses:

      The interpretation of E93's role in neuronal specification requires caution. Typically, two criteria are used to establish whether a gene directs neuronal identity:<br /> (1) gene manipulation shifts the neuronal transcriptome from one subtype to another, and<br /> (2) gene manipulation alters axonal projection patterns.

      The results presented here only partially satisfy the first criterion. Although markers are affected, it remains possible that the reporter lines and subtype markers used are direct transcriptional targets of E93 in α/β neurons, rather than reflecting broader fate changes. Future studies using single-cell transcriptomics would provide a more comprehensive assessment of neuronal identity following E93 perturbation.

      With respect to the second criterion, the evidence is also incomplete. While reporter patterns were altered, the overall morphology of the α/β lobes appeared largely intact after E93 knockdown. Overexpression of E93 in γ neurons produced a small subset of cells with α/β-like projections, but this effect warrants deeper characterization before firm conclusions can be drawn. While the results might be an intrinsic nature of KC types in flies, the interpretation of the reader of the data should be more careful, and the authors should also mention this in their main text.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The temporal regulation of neuronal specification and its molecular mechanisms are important problems in developmental neurobiology. This study focuses on Kenyon cells (KCs), which form the mushroom body in Drosophila melanogaster, in order to address this issue. Building on previous findings, the authors examine the role of the transcription factor Eip93F in the development of late-born KCs. The authors revealed that Eip93F controls the activity of flies at night through the expression of the calcium channel Ca-α1T. Thus, the study clarifies the molecular machinery that controls temporal neuronal specification and animal behavior.

      Strengths:

      The convincing results are based on state-of-the-art molecular genetics, imaging, and behavioral analysis.

      Weaknesses:

      Temporal mechanisms of neuronal specification are found in many nervous systems. However, the relationship between the temporal mechanisms identified in this study and those in other systems remains unclear.

      We will expand the Discussion section to highlight the temporal mechanisms between different nervous systems.

      Reviewer #2 (Public review):

      Summary:

      Understanding the mechanisms of neural specification is a central question in neurobiology. In Drosophila, the mushroom body (MB), which is the associative learning region in the brain, consists of three major cell types: γ, α'/β', and α/β kenyon cells. These classes can be further subdivided into seven subtypes, together comprising

      ~2000 KCs per hemi-brain. Remarkably, all of these neurons are derived from just four neuroblasts in each hemisphere. Therefore, a lot of endeavors are put into understanding how the neuron is specified in the fly MB.

      Over the past decade, studies have revealed that MB neuroblasts employ a temporal patterning mechanism, producing distinct neuronal types at different developmental stages. Temporal identity is conveyed through transcription factor expression in KCs. High levels of Chinmo, a BTB-zinc finger transcription factor, promote γ-cell fate (Zhu et al., Cell, 2006). Reduced Chinmo levels trigger expression of mamo, a zinc finger transcription factor that specifies α'/β' identity (Liu et al., eLife, 2019). However, the specification of α/β neurons remains poorly understood. Some evidence suggests that microRNAs regulate the transition from α'/β' to α/β fate (Wu et al., Dev Cell, 2012; Kucherenko et al., EMBO J, 2012). One hypothesis even proposes that α/β represents a "default" state of MB neurons, which could explain the difficulty in identifying dedicated regulators.

      The study by Chung et al. challenges this hypothesis. By leveraging previously published RNA-seq datasets (Shih et al., G3, 2019), they systematically screened BAC transgenic lines to selectively label MB subtypes. Using these tools, they analyzed the consequences of manipulating E93 expression and found that E93 is required for α/β specification. Furthermore, loss of E93 impairs MB-dependent behaviors, highlighting its functional importance.

      Strengths:

      The authors conducted a thorough analysis of E93 manipulation phenotypes using LexA tools generated from the Janelia Farm and Bloomington collections. They demonstrated that E93 knockdown reduces expression of Ca-α1T, a calcium channel gene identified as an α/β marker. Supporting this conclusion, one LexA line driven by a DNA fragment near EcR (R44E04) showed consistent results. Conversely, overexpression of E93 in γ and α'/β' Kenyon cells led to downregulation of their respective subtype markers.

      Another notable strength is the authors' effort to dissect the genetic epistasis between E93 and previously known regulators. Through MARCM and reporter analyses, they showed that Chinmo and Mamo suppress E93, while E93 itself suppresses Mamo. This work establishes a compelling molecular model for the regulatory network underlying MB cell-type specification.

      Weaknesses:

      The interpretation of E93's role in neuronal specification requires caution. Typically, two criteria are used to establish whether a gene directs neuronal identity:

      (1) gene manipulation shifts the neuronal transcriptome from one subtype to another, and

      (2) gene manipulation alters axonal projection patterns.

      The results presented here only partially satisfy the first criterion. Although markers are affected, it remains possible that the reporter lines and subtype markers used are direct transcriptional targets of E93 in α/β neurons, rather than reflecting broader fate changes. Future studies using single-cell transcriptomics would provide a more comprehensive assessment of neuronal identity following E93 perturbation.

      We do plan to conduct multi-omics experiments to provide a more comprehensive assessment of neuronal identity upon loss-of-function of E93. However, omics results will be summarized in a new manuscript, but not for the revised manuscript.

      With respect to the second criterion, the evidence is also incomplete. While reporter patterns were altered, the overall morphology of the α/β lobes appeared largely intact after E93 knockdown. Overexpression of E93 in γ neurons produced a small subset of cells with α/β-like projections, but this effect warrants deeper characterization before firm conclusions can be drawn. While the results might be an intrinsic nature of KC types in flies, the interpretation of the reader of the data should be more careful, and the authors should also mention this in their main text.

      We will describe and interpret this part of results in the main text in a more careful manner.

    1. eLife Assessment

      This valuable study presents findings on the developmental roles of Nup107, a key nucleoporin, in regulating the larval-to-pupal transition in Drosophila melanogaster through its involvement in ecdysone signaling. The evidence supporting the authors' claims is solid, with robust experimental approaches including RNAi knockdown and rescue experiments. The authors propose that Nup107 influences EcR localization indirectly by reducing the expression of Halloween genes, a consequence of impaired Torso signaling. However, it remains uncertain whether Torso is the sole receptor tyrosine kinase involved, and this disruption ultimately leads to decreased ecdysone production. In addition, finding a mechanism would strengthen the findings as the currently proposed mechanism is not completely supported by the data.

    2. Reviewer #1 (Public review):

      This study provides a thorough analysis of Nup107's role in Drosophila metamorphosis, demonstrating that its depletion leads to developmental arrest at the third larval instar stage due to disruptions in ecdysone biosynthesis and EcR signaling. Importantly, the authors establish a novel connection between Nup107 and Torso receptor expression, linking it to the hormonal cascade regulating pupariation.

      The authors have addressed most of the concerns raised in my initial review, particularly those outlined in the public comments. However, I note that they have not directly responded to several specific points raised in the "Author Recommendations" section. That said, a key mechanistic question remains unresolved and deserves further experimental or at least conceptual clarification.

      It has been previously shown that Nup107 regulates the nuclear translocation of dpERK (Kim et al., 2010). This observation may provide a mechanistic explanation for the developmental arrest observed upon Nup107 depletion in the prothoracic gland (PG). Given that PG growth and ecdysone biosynthesis are driven by several receptor tyrosine kinases, it is plausible that loss of Nup107 impairs dpERK nuclear translocation, thereby functionally shutting down RTK-dependent transcriptional responses, including those activating Halloween gene expression. This model is supported by the finding that activated Ras (rasV12) can rescue the arrest, likely by generating sufficiently high levels of dpERK such that some fraction enters the nucleus despite impaired translocation. This hypothesis may explain the discrepancy between the complete developmental arrest observed upon Nup107 depletion and the developmental delay seen in Torso mutants.

      Similarly, the rescue by Torso, but not EGFR, may reflect differences in receptor activation thresholds. It has been proposed that Torso overexpression might leads to ligand-independent dimerization and constitutive activity, whereas EGFR overexpression may remain ligand-dependent and thus insufficient under compromised dpERK transport conditions. A critical experiment to validate this model would be to examine dpERK localization in PG cells upon Nup107 depletion. This would help establish whether defective nuclear import of dpERK underlies the observed developmental arrest. Even if technically challenging, the authors should at least discuss this hypothesis explicitly in the revised manuscript.

      In addition, it has been shown that TGFβ/Activin signaling regulates Torso expression in the prothoracic gland (PG). Therefore, it is plausible that this pathway may also be affected by impaired nuclear translocation of downstream effectors due to Nup107 depletion. This raises the possibility that Nup107 plays a broad regulatory role, impacting multiple signaling cascades-such as RTK and TGFβ/Activin pathways-by controlling the nuclear import of their key effectors.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Kawadkar et al investigates the role of Nup107 in developmental progression via regulation of ecdysone signaling. The authors identify an interesting phenotype of Nup107 whole body RNAi depletion in Drosophila development - developmental arrest at the late larval stage. Nup107-depleted larvae exhibit mis-localization of the Ecdysone receptor (EcR) from the nucleus to the cytoplasm and reduced expression of EcR taret genes in salivary glands, indicative of compromised ecdysone signaling. This mis-localization of EcR in salivary glands was phenocopied when Nup107 was depleted only in the prothoracic gland (PG), suggesting that it is not nuclear transport of EcR but presence of ecdysone (normally secreted from PG) that is affected. Consistently, whole body levels of ecdysone were shown to be reduced in Nup107 KD, particularly at the late third instar stage when a spike in ecdysone normally occurs. Importantly, the authors could rescue the developmental arrest and EcR mis-localization phenotypes of Nup107 KD by adding exogenous ecdysone, supporting the notion that Nup107 depletion disrupts biosynthesis of ecdysone, which arrests normal development. Additionally, they found that rescue of Nup107 KD phenotype can also be achieved by over-expression of the receptor tyrosine kinase torso, which is thought to be the upstream regulator of ecdysone synthesis in the PG. Transcript levels of torso are also shown to be downregulated in the Nup107KD, as are transcript levels of multiple ecdysone biosynthesis genes. Together, these experiments reveal a new role of Nup107 or nuclear pore levels in hormone-driven developmental progression, likely via regulation of levels of torso and torso-stimulated ecdysone biosynthesis.

      Strengths:

      The developmental phenotypes of an NPC component presented in the manuscript are striking and novel, and the data appears to be of high quality. The rescue experiments are particularly significant, providing strong evidence that Nup107 functions upstream of torso and ecdysone levels in regulation of developmental timing and progression.

      Weaknesses:

      The underlying mechanism is however not clear, and any insight into how Nup107 may regulate these pathways would greatly strengthen the manuscript. Some suggestions to address this are detailed below.

      Major questions:

      (1) Determining how specific this phenotype is to Nup107 vs. to reduced NPC levels overall would give some mechanistic insight. Does knocking down other components of the Nup107 subcomplex (the Y-complex) lead to similar phenotypes? Given the published gene regulatory function of Nup107, do other gene regulatory Nups such as Nup98 or Nup153 produce these phenotypes?

      (2) In a related issue, does this level of Nup107 KD produce lower NPC levels? It is expected to, but actual quantification of nuclear pores in Nup107-depleted tissues should be added. These and above experiments would help address a key mechanistic question - is this phenotype the result of lower numbers of nuclear pores or specifically of Nup107?

      (3) Additional experiments on how Nup107 regulates torso would provide further insight. Does Nup107 regulate transcription of torso or perhaps its mRNA export? Looking at nascent levels of the torso transcript and the localization of its mRNA can help answer this question. Or alternatively, does Nup107 physically bind torso?

      (4) The depletion level of Nup107 RNAi specifically in the salivary gland vs. the prothoracic gland should be compared by RT-qPCR or western blotting.

      (5) The UAS-torso rescue experiment should also include the control of an additional UAS construct - so Nup107; UAS-control vs Nup107; UAS-torso should be compared in the context of rescue to make sure the Gal4 driver is functioning at similar levels in the rescue experiment.

      Minor:

      (6) Figures and figure legends can stand to be more explicit and detailed, respectively.

      Comments on revisions:

      The revised manuscript addresses several outstanding issues, most importantly the question of whether the developmental delay phenotype of Nup107 is exhibited by other Nups.

      I recommend that the authors include the data they provide in the rebuttal letter on Nup153 KD not showing the delay phenotype (Figure R1) into the actual manuscript. It's an important mechanistic question raised by multiple reviewers, and would strengthen the authors' conclusions. Ideally, knock downs of other Nups of the Nup107 complex should be investigated, especially given that all those RNAi lines are publicly available.

      Figure 6B should also specify whether the torso transcript being measured is mRNA or nascent, as it would help understand whether it's transcription or mRNA stability that is affected by Nup107 KD.

    4. Reviewer #3 (Public review):

      These findings suggest that Nup107 is involved in regulating ecdysone signaling during developmental transitions, with depletion of Nup107 disrupting hormone-regulated processes. Moreover, the rescue experiments hint that Nup107 might directly influence EcR signaling and ecdysone biosynthesis, though the precise molecular mechanism remains unclear.

      Overall, the manuscript presents compelling data supporting Nup107's role in regulating developmental transitions.

      Comments on revisions:

      RNAi specificity: The authors now provide a more thorough discussion of off-target effects and justify their reliance on the Nup107KK RNAi line. The explanation regarding the predicted off-target for the GD line and their choice to use the KK line with a known insertion site is appropriate and adequately addresses the original concern.

      NPC component specificity: The authors clarify that among the Nup107 complex members tested, only Nup107 knockdown induced developmental arrest. Their inclusion of Nup153 as a control helps to support the specificity of the phenotype, although expanding this analysis beyond a single additional Nup would further strengthen the claim.

      Mechanistic clarity: The authors now distinguish between Nup107's upstream role in regulating torso and ecdysone biosynthetic genes versus direct control of EcR translocation. The clarification that EcR nuclear localization is 20E-dependent but Nup107-independent improves interpretive clarity.

      The molecular mechanism linking Nup107 to torso regulation remains somewhat speculative. A deeper exploration of whether Nup107 influences transcriptional regulation through chromatin association (as the authors suggest) would strengthen the mechanistic narrative.

      Conclusion:

      Overall, the authors have addressed the major concerns raised in the initial review, and the revised manuscript presents a more coherent and compelling case for Nup107 as a regulator of developmental timing via the ecdysone signaling axis. While some mechanistic questions remain, the core findings are supported by the data, and the work provides novel insights into NPC function in development.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study provides a thorough analysis of Nup107's role in Drosophila metamorphosis, demonstrating that its depletion leads to developmental arrest at the third larval instar stage due to disruptions in ecdysone biosynthesis and EcR signaling. Importantly, the authors establish a novel connection between Nup107 and Torso receptor expression, linking it to the hormonal cascade regulating pupariation.

      However, some contradictory results weaken the conclusions of the study. The authors claim that Nup107 is involved in the translocation of EcR from the cytoplasm to the nucleus. However, the evidence provided in the paper suggests it more likely regulates EcR expression positively, as EcR is undetectable in Nup107-depleted animals, even below background levels.

      We appreciate the concern raised in this public review. However, we must clarify that we do not claim that Nup107 directly regulates the translocation of EcR from the cytoplasm to nucleus, rather Nup107 regulates Ecdysone hormone (20E) synthesis which in turn affects EcR translocation. In the manuscript, we posited this hypothesis if Nup107 will regulate EcR nuclear translocation (9th line of 2nd paragraph on page 6). We have spelled this out more clearly as the 3rd subsection title of the Results section, and in the discussion (8th line of 2nd paragraph on page 11).

      20E acts through the EcR to induce the transcription of EcR responsive genes including the EcR. This creates a positive autoregulatory loop that enhances the EcR level through ecdysone signaling (1). Since Nup107 depletion leads to a reduction in ecdysone levels, it disrupts the transcription autoregulatory EcR expression loop. This can contribute to the reduced EcR levels seen in Nup107-depleted animals. 

      Additionally, the link between Nup107 and Torso is not fully substantiated. While overexpression of Torso appears to rescue the lack of 20E production in the prothoracic gland, the distinct phenotypes of Torso and Nup107 depletion-developmental delay in the former versus complete larval arrest in the latter complicate understanding of Nup107's precise role.

      We understand that there are differences in the developmental delay when Tosro and Nup107 depletion is analyzed. However, the two molecules being compared here are very different, and variability in their depletion could contribute observed phenotypic differences (2). Even if there is no variability of depletion of Torso and Nup107­­­, we believe that Nup107, being more widely expressed, and involved in the regulation of various cellular processes, induces stronger defects.

      Further, we think that RNAi-mediated depletion of Nup107 in prothoracic glands (PG) causes significant reduction in the PG size, which may exert a pronounced defect in 20E biosynthesis through the Halloween genes, inducing a stronger developmental arrest.

      To clarify these discrepancies, further investigation into whether Nup107 interacts with other critical signaling pathways related to the regulation of ecdysone biosynthesis, such as EGFR or TGF-β, would be beneficial and could strengthen the findings.

      In summary, although the study presents some intriguing observations, several conclusions are not well-supported by the experimental data.

      We agree with the reviewer’s suggestion. As noted in the literature, five RTKs-torso, InR, EGFR, Alk, and Pvr-stimulate the PI3K/Akt pathway, which plays a crucial role in the PG functioning and controlling pupariation and body size (3). We have checked the torso and EGFR signaling. We rescued Nup107 defects with the torso overexpression, however, constitutively active EGFR (BL-59843) did not rescue the phenotype (data was not shown). Nonetheless, we plan to examine the EGFR pathway activation by measuring the pERK levels in Nup107-depleted PGs.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Kawadkar et al investigates the role of Nup107 in developmental progression via the regulation of ecdysone signaling. The authors identify an interesting phenotype of Nup107 whole-body RNAi depletion in Drosophila development - developmental arrest at the late larval stage. Nup107-depleted larvae exhibit mis-localization of the Ecdysone receptor (EcR) from the nucleus to the cytoplasm and reduced expression of EcR target genes in salivary glands, indicative of compromised ecdysone signaling. This mis-localization of EcR in salivary glands was phenocopied when Nup107 was depleted only in the prothoracic gland (PG), suggesting that it is not nuclear transport of EcR but the presence of ecdysone (normally secreted from PG) that is affected. Consistently, whole-body levels of ecdysone were shown to be reduced in Nup107 KD, particularly at the late third instar stage when a spike in ecdysone normally occurs. Importantly, the authors could rescue the developmental arrest and EcR mislocalization phenotypes of Nup107 KD by adding exogenous ecdysone, supporting the notion that Nup107 depletion disrupts biosynthesis of ecdysone, which arrests normal development. Additionally, they found that rescue of the Nup107 KD phenotype can also be achieved by over-expression of the receptor tyrosine kinase torso, which is thought to be the upstream regulator of ecdysone synthesis in the PG. Transcript levels of the torso are also shown to be downregulated in the Nup107KD, as are transcript levels of multiple ecdysone biosynthesis genes. Together, these experiments reveal a new role of Nup107 or nuclear pore levels in hormone-driven developmental progression, likely via regulation of levels of torso and torso-stimulated ecdysone biosynthesis.

      Strengths:

      The developmental phenotypes of an NPC component presented in the manuscript are striking and novel, and the data appears to be of high quality. The rescue experiments are particularly significant, providing strong evidence that Nup107 functions upstream of torso and ecdysone levels in the regulation of developmental timing and progression.

      Weaknesses:

      The underlying mechanism is however not clear, and any insight into how Nup107 may regulate these pathways would greatly strengthen the manuscript. Some suggestions to address this are detailed below.

      Major questions:

      (1) Determining how specific this phenotype is to Nup107 vs. to reduced NPC levels overall would give some mechanistic insight. Does knocking down other components of the Nup107 subcomplex (the Y-complex) lead to similar phenotypes? Given the published gene regulatory function of Nup107, do other gene regulatory Nups such as Nup98 or Nup153 produce these phenotypes?

      We thank this public review for raising this concern. Working with a Nup-complex like the Nup107 complex, this concern is anticipated but difficult to address as many Nups function beyond their complex identity. Our observations with all other members of the Nup107-complex, including dELYS, suggest that except Nup107, none of the other tested Nup107-complex members could induce larval developmental arrest.

      In this study, we primarily focused on the Nup107 complex (outer ring complex) of the NPC. However, previous studies have reported that Nup98 and Nup153 interact with chromatin, with these investigations conducted in Drosophila S2 cells (4, 5, 6). We have now examined other nucleoporins outside of this complex, such as Nup153.

      We ubiquitously depleted Nup153 using the Actin5C-Gal4 driver and assessed the pupariation profile of the knockdown larvae in comparison to control larvae. In contrast to the Nup107 knockdown, when Nup153 is depleted to less than 50% levels, no impact on pupariation was observed (Auhtor response image 1)

      Author response image 1.

      Nup153 depletion does not affect the Drosophila metamorphosis. Actin5C-Gal4 is used as a ubiquitous driver. (A) Comparison of pupariation profiles of control and Nup153 knockdown organisms. (B) Quantification of Nup153 knockdown efficiency. Data are represented from at least three independent experiments. Statistical significance was derived from the Student’s t-test. Error bars represents SEM. ***p = <0.001.

      (2) In a related issue, does this level of Nup107 KD produce lower NPC levels? It is expected to, but actual quantification of nuclear pores in Nup107-depleted tissues should be added. These and the above experiments would help address a key mechanistic question - is this phenotype the result of lower numbers of nuclear pores or specifically of Nup107?

      We agree with the concern raised here, and to address the concern raised here, we stained the control and Nup107 depleted salivary glands with mAb414 antibody (exclusively FG-repeat Nup recognizing antibody). While Nup107 intensities are significantly reduced at the nuclear envelope in Nup107 depleted salivary glands, the mAb414 staining seems unperturbed (Author response image 2).

      Author response image 2.

      Nup107 depletion does not perturb overall NPC composition. Comparison of salivary gland nucleus upon control and Nup107 knockdown. The Nup107 is shown in green and mAb414, staining for other FG-repeat containing nucleoporins is shown in red. Scale bars, 5µm.

      (3) Additional experiments on how Nup107 regulates the torso would provide further insight. Does Nup107 regulate transcription of the torso or perhaps its mRNA export? Looking at nascent levels of the torso transcript and the localization of its mRNA can help answer this question. Or alternatively, does Nup107 physically bind the torso?

      While the concern regarding torso transcript level is genuine, we have already reported in the manuscript that Nup107 directly regulates torso expression. When Nup107 is depleted, torso levels go down, which in turn controls ecdysone production and subsequent EcR signaling (Figure 6B of the manuscript).

      However, the exact nature of Nup107 regulation on torso expression is still unclear. Since the Nup107 is known to interact with chromatin (7), it may affect torso transcription. The possibility of a stable and physiologically relevant interaction between Nup107 and the torso in a cellular context is unlikely largely due to their distinct subcellular localizations. If we investigate this further, it will require a significant amount of time for having reagents and experimentation, and currently stands beyond the scope of this manuscript.

      (4) The depletion level of Nup107 RNAi specifically in the salivary gland vs. the prothoracic gland should be compared by RT-qPCR or western blotting.

      Although we know that the Nup107 protein signal is reduced in SG upon knockdown (Figure 3B), we have not compared the Nup107 transcript level in these two tissues (SG and PG) upon RNAi. As suggested here, we evaluated the knockdown efficiency of Nup107 using the salivary gland-specific driver AB1-Gal4 and the prothoracic gland-specific driver Phm-Gal4. Our results indicate a significant reduction in Nup107 transcript levels upon Nup107 RNAi in both SG and PG compared to their respective controls (Author response image 3).

      Author response image 3.

      Nup107 levels are significantly reduced upon Nup107<sup>KK</sup> RNAi. Quantification of Nup107 transcript levels from control and Nup107 depleted larvae [tissue specific depletion using AB1-Gal4 (A) and Phm-Gal4 (B)]. Data are represented from at least three independent experiments. Statistical significance was derived from the Student’s t-test. Error bars represent SEM. **p = <0.004

      (5) The UAS-torso rescue experiment should also include the control of an additional UAS construct - so Nup107; UAS-control vs Nup107; UAS-torso should be compared in the context of rescue to make sure the Gal4 driver is functioning at similar levels in the rescue experiment.

      This is a very valid point, and we took this into account while planning the experiment. In such cases, often the GAL4 dilution can be critical. We have demonstrated in Figure S7, that GAL4 dilution is not blurring our observations. We used the Nup107<sup>KK</sup>; UAS-GFP as control alongside the Nup107<sup>KK</sup>; UAS-torso. We conclude that the presence of GFP signals in prothoracic glands and their reduced size indicates genes downstream to both UAS sequences are transcribed, and GAL4 dilution does not play a role here.

      Minor:

      (6) Figures and figure legends can stand to be more explicit and detailed, respectively.

      We have revisited all figures and their corresponding legends to ensure appropriate and explicit details are provided.

      Reviewer #3 (Public review):

      Summary:

      In this study by Kawadkar et al, the authors investigate the developmental role of Nup107, a nucleoporin, in regulating the larval-to-pupal transition in Drosophila through RNAi knockdown and CRISPR-Cas9-mediated gene editing. They demonstrate that Nup107, an essential component of the nuclear pore complex (NPC), is crucial for regulating ecdysone signaling during developmental transitions. The authors show that the depletion of Nup107 disrupts these processes, offering valuable insights into its role in development.

      Specifically, they find that:

      (1) Nup107 depletion impairs pupariation during the larval-to-pupal transition.

      (2) RNAi knockdown of Nup107 results in defects in EcR nuclear translocation, a key regulator of ecdysone signaling.

      (3) Exogenous 20-hydroxyecdysone (20E) rescues pupariation blocks, but rescued pupae fail to close.

      (4) Nup107 RNAi-induced defects can be rescued by activation of the MAP kinase pathway.

      Strengths:

      The manuscript provides strong evidence that Nup107, a component of the nuclear pore complex (NPC), plays a crucial role in regulating the larval-to-pupal transition in Drosophila, particularly in ecdysone signaling.

      The authors employ a combination of RNAi knockdown, CRISPR-Cas9 gene editing, and rescue experiments, offering a comprehensive approach to studying Nup107's developmental function.

      The study effectively connects Nup107 to ecdysone signaling, a key regulator of developmental transitions, offering novel insights into the molecular mechanisms controlling metamorphosis.

      The use of exogenous 20-hydroxyecdysone (20E) and activation of the MAP kinase pathway provides a strong mechanistic perspective, suggesting that Nup107 may influence EcR signaling and ecdysone biosynthesis.

      Weaknesses:

      The authors do not sufficiently address the potential off-target effects of RNAi, which could impact the validity of their findings. Alternative approaches, such as heterozygous or clonal studies, could help confirm the specificity of the observed phenotypes.

      This is a very valid point raised, and we are aware of the consequences of the off-target effects of RNAi. To assert the effects of authentic RNAi and reduce the off-target effects, we have used two RNAi lines (Nup107<sup>GD</sup> and Nup107<sup>KK</sup>) against Nup107. Both RNAi induced comparable levels of Nup107 reduction, and using these lines, ubiquitous and PG specific knockdown produced similar phenotypes. Although the Nup107<sup>GD</sup> line exhibited a relatively stronger knockdown compared to the Nup107<sup>KK</sup> line, we preferentially used the Nup107<sup>KK</sup> line because the Nup107<sup>GD</sup> line is based on the P-element insertion, and the exact landing site is unknown. Furthermore, there is an off-target predicted for the Nup107<sup>GD</sup> line, where a 19bp sequence aligns with the bifocal (bif) sequence. The bif-encoded protein is involved in axon guidance and regulation of axon extension. However, the Nup107<sup>KK</sup> line does not have a predicted off-target molecule, and we know its precise landing site on the second chromosome. Thus, the Nup107<sup>KK</sup> line was ultimately used in experimentation for its clearer and more reliable genetic background.

      We are also investigating Nup107 knockdown in the prothoracic gland, which exhibits polyteny. Additionally, the number of cells in the prothoracic gland is quite limited, approximately 50-60 cells (8). Given this, there is a possibility that a clonal study may not yield the phenotype.

      NPC Complex Specificity: While the authors focus on Nup107, it remains unclear whether the observed defects are specific to this nucleoporin or if other NPC components also contribute to similar defects. Demonstrating similar results with other NPC components would strengthen their claims.

      We thank this public review for raising this concern. Working with a Nup-complex like the Nup107 complex, this concern is anticipated but difficult to address as many Nups function beyond their complex identity. Our observations with all other members of the Nup107-complex, including dELYS, suggest that except Nup107, none of the other Nup107-complex members could induce larval developmental arrest. Since the study is primarily focused on the Nup107 complex (outer ring complex) of the NPC, we have not examined many more nucleoporins outside of this complex. But our observations with Nup153 knockdown, a nuclear basket nucleoporin, is comparable to control, with no delay in development (Author response image 1)

      Although the authors show that Nup107 depletion disrupts EcR signaling, the precise molecular mechanism by which Nup107 influences this process is not fully explored. Further investigation into how Nup107 regulates EcR nuclear translocation or ecdysone biosynthesis would improve the clarity of the findings.

      We appreciate the concern raised. Through our observation, we have proposed the upstream effect of Nup107 on the PTTH-torso-20E-EcR axis regulating developmental transitions. We know that Nup107 regulates torso levels, but we do not know if Nup107 directly interacts with torso. We would like to address whether Nup107 exerts control on PTTH levels also.

      However, we must emphasize that Nup107 does not directly regulate the translocation of EcR. On the contrary, we have demonstrated that when Nup107 is depleted only in the salivary gland, EcR translocates into the nucleus. Thus we conclude that the EcR translocation is 20E dependent and Nup107 independent. Further, we have argued that Nup107 regulates the expression of Halloween genes required for ecdysone biosynthesis. We are interested in identifying if Nup107 associates directly or through some protein to chromatin to bring about the changes in gene expression required for normal development.

      There are some typographical errors and overly strong phrases, such as "unequivocally demonstrate," which could be softened. Additionally, the presentation of redundant data in different tissues could be streamlined to enhance clarity and flow.

      Response: We thank the reviewer for this observation. We have put our best efforts to remove all typographical errors and have now made more reasonable statements based on our conclusions.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors):

      The manuscript presents compelling evidence that Nup107 plays a role in regulating ecdysone production. However, significant concerns remain regarding the effects on EcR localization and expression, as well as the claimed link between PTTH/Torso signaling and Nup107's function, as the evidence provided is not conclusive.

      The hypothesis that Nup107 mediates EcR translocation from the cytoplasm to the nucleus appears misinterpreted by the authors. Based on the presented images, particularly for the prothoracic gland (PG) Figure 3C, Nup107 depletion seems to impact EcR protein levels rather than its localization. This conclusion is supported by data showing that EcR transcripts are autonomously downregulated in the absence of Nup107. Furthermore, the restoration of nuclear EcR levels upon exogenous 20E supplementation suggests that (1) Nup107 is dispensable for EcR activation and function, and (2) its primary role lies in regulating ecdysone production.

      We appreciate the concern raised by reviewer. However, we must clarify that we do not claim that Nup107 directly regulates the translocation of EcR from the cytoplasm, rather Nup107 regulates Ecdysone hormone (20E) synthesis which in turn affects EcR translocation. In the manuscript, we posited this hypothesis if Nup107 will regulate EcR nuclear translocation (9th line of 2nd paragraph on page 6). We have spelled this out more clearly as the 3rd subsection title of the Results section, and in the discussion (8th line of 2nd paragraph on page 11).

      20E acts through the EcR to induce the transcription of EcR responsive genes including the EcR. This creates a positive autoregulatory loop that enhances the EcR level through ecdysone signaling (1). Since Nup107 depletion leads to a reduction in ecdysone levels, it disrupts the transcription autoregulatory EcR expression loop. This can contribute to the reduced EcR levels seen in Nup107-depleted animals.

      Given that nucleoporins are known to influence mRNA transport-for instance, Nup107 has been shown to control Scn5a mRNA transport (Guan et al., 2019)-the observed effects on Halloween gene and EcR expression may stem from disruptions in mRNA transport to the cytoplasm. The downregulation of Shade further supports this hypothesis, as restricted ecdysone biosynthesis typically induces Shade upregulation in peripheral tissues. Quantifying potential mRNA accumulation in the nuclei of PG cells in Nup107-depleted animals would clarify this.

      The reviewer raised a valid point, and we fully agree with the concern that Nup107 has been shown to control Scn5a mRNA transport (Guan et al., 2019). The observed effects on Halloween gene and EcR expression could indeed stem from disruptions in efficient mRNA export to the cytoplasm. However, if Nup107 were regulating the mRNA export of Halloween genes and EcR, we should not expect a rescue of the Nup107 developmental delay phenotype with torso overexpression. But, by overexpressing the torso in the Nup107 depletion background, we are activating the torso pathway dependent Halloween gene expression, and rescuing the developmental delay phenotype of Nup107 depletion.

      With the current data, it is difficult to conclusively claim a role for Nup107 in EcR translocation or expression. Additional experiments, such as EcR overexpression in Nup107-depleted animals or Nup107 overexpression, would help determine its precise role.

      We appreciate the concern raised by reviewer. We did attempt to rescue the Nup107 depletion phenotype by overexpressing EcR (BL-6868) in the Nup107-RNAi background. However, we were unable to rescue the Nup107 depletion dependent developmental delay phenotype with this approach. This further suggests that the phenotype is not merely due to low level of EcR, but it is due to low availability of ecdysone hormone and EcR signaling.

      The second major issue is the proposed link between Nup107 and PTTH/Torso signaling. The authors suggest that Nup107 regulates ecdysone production through Torso expression based on rescue experiments. However, this is inconsistent with the distinct phenotypes observed when Nup107 or Torso signaling is disrupted. While PTTH/Torso signaling causes only a modest developmental delay (12 hours to 2 days, depending on the mutant), Nup107 depletion results in a complete developmental arrest at the larval stage. This discrepancy raises doubts about the assertion that Torso overexpression alone rescues such a severe phenotype. One possibility is that PTTH levels are upregulated in Nup107-depleted animals, leading to overactivation of the pathway when Torso is overexpressed. Quantifying PTTH levels in Nup107-depleted animals could address this.

      The reviewer raised a valid point, and we fully acknowledge this concern. While we do not completely agree with the idea of PTTH upregulation in Nup107 depleted larvae, as suggested here, we believe that quantifying PTTH levels upon Nup107 depletion can provide a useful insight. To address it, we quantified PTTH levels in Nup107-depleted larvae and found no significant change in PTTH expression compared to controls (Author response image 4).

      Author response image 4.

      Nup107 knockdown does not affect the PTTH level. Quantitation of PTTH transcript levels from control and Nup107 depleted larvae (Prothoracic specific depletion Phm-Gal4). Data are represented from at least three independent experiments. Statistical significance was derived from the Student's t-test. ns is non-significant.

      Another possibility is that the stock used for Torso overexpression, which includes a trk mutant, may introduce genetic interactions that overactivate the pathway. Using a clean UAS-Torso stock would resolve this issue.

      We appreciate the reviewer’s observation regarding the use of the Torso overexpression line (BL-92604), which carries the trk null allele on the second chromosome. The cleaved form of the trk serves as ligand for the troso receptor. Since it may serve as ligand for the torso, I am not sure how trk null allele bearing line when used along for torso overexpression studies will overactivate the pathway. 

      We realized this concern and the fly line used in this study and reported in the manuscript was generated through the following genetic strategy using the BL-92604 line.  First, a double balancer stock (Sco/CyO; MKRS/TM6.Tb) was used to generate the Sco/CyO; UAS-torso/ UAS-torso genotype. This recombinant line was subsequently combined with the Nup107<sup>KK</sup> line. Through the use of the double balancer strategy, we effectively replaced Nup107 RNAi genotype on the second chromosome, thereby ensuring that our final experimental setup is free from trk mutant contamination, if at all.

      Moreover, the rescue of Nup107 depletion phenotypes by RasV12 overexpression suggests that multiple RTKs, not just Torso, are affected. EGFR signaling, the primary regulator of ecdysone biosynthesis in the PG during the last larval stage, is notably absent from the authors' analysis. EGFR inactivation is known to arrest development, and previous studies indicate that Nup107 can reduce EGFR pathway activity (Kim et al, 2010). The authors should analyze EGFR pathway activity in the absence of Nup107. Overexpressing EGF ligands like Vein or Spitz in the PG (rather than the receptor) in a Nup107-depleted background would provide more relevant insights.

      The RasGTPase is one of the common effector molecules downstream of an activated receptor kinase. Rescue with a constitutively activated form of RasGTPase (RasV12) suggests one of the routes which is activated downstream of the torso receptor. It does not directly suggest all different RTKs are affected and are involved. Our idea of performing a rescue experiment was to see if the pathway activated downstream of the torso involves RasGTPase. 

      As noted in the literature, five RTKs—torso, InR, EGFR, Alk, and Pvr—stimulate the PI3K/Akt pathway, which plays a crucial role in the PG for controlling pupariation and body size (3). Although EGFR signaling is important, PTTH/Torso signaling is considered the primary mediator of metamorphic timing. In response to the suggestion to analyze EGFR pathway activity in the absence of Nup107, we attempted to rescue the phenotype by overexpressing constitutively active EGFR (BL-59843) in the Nup107-depleted background (data was not shown). We used constitutively active EGFR to bypass the availability of its ligands (vein and spitz). Unfortunately, we were unable to rescue the phenotype with this approach, which further suggests that EGFR is not the targeted RTK pathway in this context. By rescuing with torso, we found that Nup107 regulates torso-mediated Ras/Erk signaling to control metamorphosis.

      Additional issues require clarification:

      (1) RNAi Efficiency: In Figure 1C, the Nup107GD line shows a stronger knockdown effect than Nup107KK, yet most experiments were conducted with the weaker line. This might explain the residual Nup107 protein observed in Figure 2. Could the authors justify this choice?

      This is a very valid point raised, and we are aware of the consequences of the off-target effects of RNAi. To assert the effects of authentic RNAi and reduce the off-target effects, we have used two RNAi lines (Nup107<sup>GD</sup> and Nup107<sup>KK</sup>) against Nup107. Both RNAi induced comparable levels of Nup107 reduction, and using these lines, ubiquitous and PG specific knockdown produced similar phenotypes. Although the Nup107<sup>GD</sup> line exhibited a relatively stronger knockdown compared to the Nup107<sup>KK</sup> line, we preferentially used the Nup107<sup>KK</sup> line because the Nup107<sup>GD</sup> line is based on the P-element insertion, and the exact landing site is unknown. Furthermore, there is an off-target predicted for the Nup107<sup>GD</sup> line, where a 19bp sequence aligns with the bifocal (bif) sequence. The bif-encoded protein is involved in axon guidance and regulation of axon extension. However, the Nup107<sup>KK</sup> line does not have a predicted off-target molecule, and we know its precise landing site on the second chromosome. Thus, the Nup107<sup>KK</sup> line was ultimately used in experimentation for its clearer and more reliable genetic background.

      (2) Control Comparisons: In Figure 3, the effects of Nup107 depletion on EcR expression in salivary glands (SG) and PG are shown, but only SG controls are provided. Including PG controls would enable proper comparisons. These controls should also be added to Figures 5, 6, and S5.

      As suggested by the reviewer, we have checked the EcR localization in prothoracic gland (Author response image 5), also. As shown in figure R5, when PGs isolated from control, Nup107-RNAi and torso overexpression in Nup107 background were stained for EcR, the observations made were indistinguishable from those made in SGs of the indicated genetic combinations. This indicated that Nup107 regulates EcR signaling by regulating the 20E biosynthesis.

      Author response image 5.

      Prothoracic gland’s specific torso expression rescues EcR nuclear translocation defects. Immunofluorescence-based detection of nucleocytoplasmic distribution of EcR (EcR antibody, red) in control, prothoracic gland specific Nup107 knockdown (Phm-Gal4>Nup107<sup>KK</sup>) and torso overexpressing PG-specific Nup107 knockdown (Phm-Gal4>Nup107<sup>KK</sup>; UAS-torso) third instar larval Prothoracic gland nuclei. DNA is stained with DAPI. Scale bars, 20 μm.

      (3) Clarify the function of Torso in the text: The authors must revise their description of Torso signaling as the primary regulator of ecdysone production in both the results and discussion sections. Specifically, in the results section, the claim that Torso depletion induces developmental arrest is inaccurate. Instead, available evidence, including Rewitz et al. 2009, demonstrates that Torso depletion causes a delay of approximately five days rather than a complete developmental arrest. This discrepancy should be corrected to avoid overstating the role of Torso signaling in ecdysone regulation and to align the manuscript with established findings.

      We agree with the reviewer. We have incorporated the suggestion at the relevant place in the main manuscript.

      Reviewer #3 (Recommendations for the authors):

      These findings suggest that Nup107 is involved in regulating ecdysone signaling during developmental transitions, with depletion of Nup107 disrupting hormone-regulated processes. Moreover, the rescue experiments hint that Nup107 might directly influence EcR signaling and ecdysone biosynthesis, though the precise molecular mechanism remains unclear.

      Overall, the manuscript presents compelling data supporting Nup107's role in regulating developmental transitions. However, I have a few comments for consideration:

      Major Comments:

      RNAi Specificity: While RNAi is a powerful tool, the authors do not sufficiently address potential off-target effects, which could undermine the conclusions. Although a mutant Nup107 is described, it is lethal-are heterozygous or clonal studies possible to validate the findings more robustly?

      This is a very valid point raised, and we are aware of the consequences of the off-target effects of RNAi. To assert the effects of authentic RNAi and reduce the off-target effects, we have used two RNAi lines (Nup107<sup>GD</sup> and Nup107<sup>KK</sup>) against Nup107. Both RNAi induced comparable levels of Nup107 reduction, and using these lines, ubiquitous and PG specific knockdown produced similar phenotypes. Although the Nup107<sup>GD</sup> line exhibited a relatively stronger knockdown compared to the Nup107<sup>KK</sup> line, we preferentially used the Nup107<sup>KK</sup> line because the Nup107<sup>GD</sup> line is based on the P-element insertion, and the exact landing site is unknown. Furthermore, there is an off-target predicted for the Nup107<sup>GD</sup> line, where a 19bp sequence aligns with the bifocal (bif) sequence. The bif-encoded protein is involved in axon guidance and regulation of axon extension. However, the Nup107<sup>KK</sup> line does not have a predicted off-target molecule, and we know its precise landing site on the second chromosome. Thus, the Nup107<sup>KK</sup> line was ultimately used in experimentation for its clearer and more reliable genetic background.

      Following the suggestion from the reviewer, we considered conducting heterozygous and clonal analyses using the Nup107 mutant. We have carried out Nup107 knockdown studies in the prothoracic gland, which has a limited number of cells (50-60 cells) and is known to exhibit polyteny (8). Keeping these aspects of the Prothoracic gland in mind, the possibility that a clonal study will yield the phenotype is scarce. However, we will consider moving forward with this approach also.

      (2) NPC Complex Specificity: It remains unclear whether the observed defects are specific to Nup107 or if other NPC components also cause similar defects. If the authors are unable to use Nup107 mutants, they could demonstrate similar defects with other critical NPC members to bolster their claim.

      We thank this public review for raising this concern. Working with a Nup-complex like the Nup107 complex, this concern is anticipated but difficult to address as many Nups function beyond their complex identity. Our analysis of Nup153 depleted organisms indicates no developmental delay/defect. We have also assessed effects of knockdown of all other members of the Nup107-complex, including dELYS, but except Nup107 no other member of the Nup107-complex could induce developmental arrest in the third instar stage causing lack of pupariation. However, the null mutant of Nup133, the direct interactor of Nup107 in the Nup107-complex, induces a delay in pupariation (unpublished data).

      (3) Molecular Mechanism of EcR Signaling: The manuscript shows that Nup107 depletion affects EcR signaling and ecdysone biosynthesis, but the molecular basis of this regulation is not fully explored. Does phosphorylated ERK (p-ERK) fail to enter the nucleus? Clarifying this mechanism would strengthen the study's impact.

      We appreciate the reviewer’s insightful comment and fully agree with the concern. To address this, we examined the subcellular localization of phosphorylated ERK (p-ERK) in the prothoracic gland of control larvae, Nup107-depleted larvae, and Nup107-depleted larvae with torso overexpression. In control larvae, p-ERK was predominantly localized in the nucleus. However, in Nup107-depleted larvae, p-ERK was largely retained in the cytoplasm, indicating impaired pathway activation and nuclear translocation. Notably, overexpression of the torso in the Nup107-depleted background restored nuclear localization of p-ERK in the prothoracic gland (Author response image 6). These findings suggest that Nup107 regulates Drosophila metamorphosis, in part, through modulation of torso-mediated MAPK signaling.

      Author response image 6.

      Nup107 regulates torso activation dependent p-ERK localization. Detection of nucleocytoplasmic distribution of p-ERK (anti- p-ERK antibody, green) in the third instar larval prothoracic glands of control, PG-specific Nup107 knockdown (Phm-Gal4>Nup107<sup>KK</sup>) and PG-specific torso overexpression in Nup107 knockdown background (Phm-Gal4>Nup107<sup>KK</sup>; UAS-torso). DNA is stained with DAPI. Scale bars, 20 µm.

      Minor Comments:

      (1) The manuscript contains typographical errors that may hinder readability. Additionally, some phrases (e.g., "unequivocally demonstrate") may be overly strong. Consider adjusting language to reflect the nature of the data more accurately.

      We agree with the reviewer. We have edited the manuscript accordingly to crease out such typographical errors at relevant places in the main manuscript.

      (2) The data presentation could be improved by eliminating redundancy. Some sections repeat similar findings in different tissues, which could be consolidated to improve clarity and flow.

      While we agree with the comment, we could not help ourselves in tissue redundancy for presenting our data for EcR translocation studies. I wish we could use another tissue. However, we have put EcR localization and p-ERK translocation data in the responses to present another non-redundant tissue perspective (Figures R5 and R6).

      References:

      (1) Varghese, Jishy, and Stephen M Cohen. “microRNA miR-14 acts to modulate a positive autoregulatory loop controlling steroid hormone signaling in Drosophila.” Genes & development vol. 21,18 (2007): 2277-82. doi:10.1101/gad.439807

      (2) Rewitz, Kim F et al. “The insect neuropeptide PTTH activates receptor tyrosine kinase torso to initiate metamorphosis.” Science (New York, N.Y.) vol. 326,5958 (2009): 1403-5. doi:10.1126/science.1176450

      (3) Pan, Xueyang, and Michael B O'Connor. “Coordination among multiple receptor tyrosine kinase signals controls Drosophila developmental timing and body size.” Cell reports vol. 36,9 (2021): 109644. doi:10.1016/j.celrep.2021.109644

      (4) Pascual-Garcia, Pau et al. “Metazoan Nuclear Pores Provide a Scaffold for Poised Genes and Mediate Induced Enhancer-Promoter Contacts.” Molecular cell vol. 66,1 (2017): 63-76.e6. doi:10.1016/j.molcel.2017.02.020

      (5) Pascual-Garcia, Pau et al. “Nup98-dependent transcriptional memory is established independently of transcription.” eLife vol. 11 e63404. 15 Mar. 2022, doi:10.7554/eLife.63404

      (6) Kadota, Shinichi et al. “Nucleoporin 153 links nuclear pore complex to chromatin architecture by mediating CTCF and cohesin binding.” Nature communications vol. 11,1 2606. 25 May. 2020, doi:10.1038/s41467-020-16394-3

      (7) Gozalo, Alejandro et al. “Core Components of the Nuclear Pore Bind Distinct States of Chromatin and Contribute to Polycomb Repression.” Molecular cell vol. 77,1 (2020): 67-81.e7. doi:10.1016/j.molcel.2019.10.017

      (8) Shimell, MaryJane, and Michael B O'Connor. “Endoreplication in the Drosophila melanogaster prothoracic gland is dispensable for the critical weight checkpoint.” microPublication biology vol. 2023 10.17912/micropub.biology.000741. 21 Feb. 2023, doi:10.17912/micropub.biology.000741

    1. eLife Assessment

      This study employed state-of-the-art quantitative imaging and genomics approaches to address a fundamental question regarding the establishment of Polycomb domains during Drosophila embryogenesis. The critical developmental stage was pinpointed to the maternal-to-zygotic transition, rather than earlier stages, providing clarification for the field. The roles of two factors, Zelda and GAGA-factor, were investigated, which reveal that Zelda, but not GAGA-factor, contributes to this process. These compelling findings have implications for chromatin and developmental biology.

    2. Reviewer #1 (Public review):

      This well-conceived manuscript investigates the mechanisms that shape the chromatin landscape following fertilization, using the Drosophila embryo as a model system. Importantly, the authors revisit conflicting data using new approaches and analysis to show that the silent H3K27me3 mark deposited by PRC2 is established de novo in the embryo in coordination with the slowing of the nuclear division cycle and activation of zygotic transcription. Unexpectedly, they demonstrate that the transcription factor GAF is not required for the deposition of this mark, but that the well-studied pioneer factor Zelda, which is required for widespread gene expression, is required for H3K27me3 deposition at a subset of regions. The experiments are rigorously performed, and interpretations are clear. Strengths of this manuscript include the rigor of the experimental design, careful analysis, and well-supported conclusions. Some additional citations, analysis, and broadening of the Discussion section to include additional models and data would further strengthen this manuscript.

    3. Reviewer #2 (Public review):

      Epigenetic silencing of target genes by the Polycomb pathway is central to maintenance of cell fates during development and depends on repressive chromatin states involving Polycomb complexes and histone modifications. However, the mechanisms by which these chromatin states are built at the earliest stages of development are unclear. Here, Gonzaga-Saavedra and colleagues use the premier experimental system for studying Polycomb gene regulation, Drosophila development, to investigate when Polycomb domains emerge and how they are assembled. Using a combination of CRISPR gene editing, imaging, and genomic profiling, they determine that while H3K27me3 is initially present in the first nuclear cycles, it quickly dissipates and does not re-emerge until mid-nuclear cycle 14, during the major wave of zygotic genome activation (ZGA). This finding helps resolve current discrepancies in the field, informs potential mechanisms of transgenerational inheritance, and indicates that repressive Polycomb domains are built de novo on target genes in embryogenesis. The authors then set out to examine how Polycomb domains are built. Through live imaging and immunofluorescence, they determine that the histone H3K27 methyltransferase, E(z), is present in nuclei at high levels throughout cleavage and blastoderm stages. By contrast, they determine that several Polycomb proteins that bind PREs (cis elements that demarcate Polycomb targets in the genome) are absent from early cleavage nuclei and progressively increase following nuclear cycle 10. These findings suggest that the absence of H3K27me3 in early embryos may be due to failure to assemble functional Polycomb complexes at target genes. Lastly, the authors test the requirement of two transcription factors with important roles in ZGA, GAF, and ZLD. Despite binding to many PREs and regulating chromatin accessibility in early embryos, they find that GAF is largely dispensable for the emergence of H3K27me3 domains. On the other hand, they find that the pioneer factor ZLD is required for proper H3K27me3 emergence; in its absence, some Polycomb domains accumulate greater levels of H3K27me3, whereas other Polycomb domains accumulate less H3K27me3.

      Strengths:

      The strengths of this study are manifold. It studies an important topic with broad interest to the chromatin and epigenetics fields. It is well-written with detailed method descriptions. In addition, the experimental design and rigor of execution are exceptional despite working with very small amounts of biological material. Example strengths include that the Polycomb proteins studied were tagged with the same epitope, permitting direct quantitative comparisons in imaging and in genomics experiments. Microscopy studies are quantified and performed both via live imaging and via immunofluorescence. The microscopy studies reinforce and extend conclusions made via ChIP. Sophisticated loss-of-function analyses allow for direct mechanistic tests of Polycomb domain emergence.

      Weaknesses:

      Overall, the study is quite strong already, but it can be further strengthened in several ways. First, several conclusions should be refined based on the data presented. Second, the extent to which ZLD is important for initiating Polycomb domain formation should be made clearer. Third, additional genomic profiling experiments are needed to provide insight into models explaining why H3K27me3 is absent prior to NC14.

    4. Reviewer #3 (Public review):

      Gonzaga-Saavedra et al report an analysis on genomic binding of Polycomb group proteins, and of H2Aub1 and H3K27me3 domain formation in the early Drosophila embryo. Using carefully staged embryos during the nuclear cycles (NC) leading up to the cellular blastoderm stage, the authors provide compelling evidence that H3K27me3 domains at PcG target genes are only established during NC14 and do not exist in NC13. In contrast, H2Aub1 domains already start to appear during NC13. The authors show that E(z), the catalytic subunit of the H3K27 histone methyltransferase PRC2, is readily detected in interphase nuclei during the rapid nuclear divisions in pre-blastoderm embryos. In contrast, the DNA-binding proteins Pho, Cg, and GAF that are known (Pho) or have been postulated (Cg, GAF) to anchor PRC2 and PRC1 to Polycomb Response Elements (PREs) in Polycomb target genes only start to show nuclear localization from NC10 onwards with gradually increasing nuclear concentrations, reaching a maximum during NC14. These data strongly corroborate the simple, straightforward view that targeting of PRC2 and PRC1 to PREs by sequence-specific DNA-binding proteins is a prerequisite for the formation of H3K27me3 and H2Aub1 domains at Polycomb target genes.

      The authors then explore the potential role of GAF/Trl in this process. They find that in embryos depleted of GAF/Trl, H3K27me3 domain formation is largely unperturbed.

      The authors also depleted the pioneer factor Zelda (Zld) and found that removal of Zld results in a more complex outcome. Zelda appears to counteract the accumulation of H3K27me3 at the Polycomb targets eve and zen, but also appears to be required for effective H3K27me3 domain formation at Polycomb targets such as amos or atonal.

      This is a very thorough study that reports data of superior technical quality that are highly relevant for the field. The study by Gonzaga-Saavedra et al extends and strengthens previous work from the labs of Eisen (Li et al, eLife 2014) and Zeitlinger (Chen et al, eLife 2013) to convincingly demonstrate that Polycomb domain formation in the early embryo occurs during ZGA but that such domains do not exist prior to ZGA. This should now finally put to rest earlier claims by the Iovino lab (Zenk et al, Science 2017) that H3K27me3 domains present in the zygote nucleus would be propagated and partially maintained during the rapid nuclear cleavage cycles and serve as seeds for H3K27me3 domain formation during ZGA.

      The experiments analyzing H3K27me3 domain formation in embryos depleted of GAF/Trl or Zelda will be of great interest to the field.

    5. Author response:

      Reviewer 1:

      We appreciate the reviewer’s positive assessment and in revision will expand the Discussion to clarify some of the mechanistic insights of this work, as well as to include expanded treatment of related studies in other model systems.

      Reviewer 2:

      We are grateful for the reviewer’s thorough and supportive comments. We will carefully revise assertions and conclusions for objectivity. Additional analysis of the Zelda experiments will be performed and experimental data tables will be updated to report these results. For the point about providing “insight into models explaining why H3K27me3 is absent prior to NC14,” we have recently submitted a related preprint that addresses this issue directly (Degen, Gonzaga-Saavedra, and Blythe, bioRxiv 2025). In summary, we find evidence that a maternal PcG imprint is indeed maintained through cleavage divisions, albeit through lower-order methylation states (maximally H3K27me2). We chose not to include these additional results in this manuscript to maintain the focus of this study on ZGA. Our revision of this manuscript will include a section in the Discussion that synthesizes the conclusions of the two studies.

      Reviewer 3:

      We thank the reviewer for recognizing the strength of our data and conclusions, and we agree that our results help settle conflicting claims in the field. We will emphasize Zelda’s context-dependent effects more clearly in the revised manuscript.

      References:

      Degen EA, Gonzaga-Saavedra N, Blythe SA. Lower-order methylation states underlie the maintenance and re-establishment of Polycomb modifications in Drosophila embryogenesis. bioRxiv [Preprint]. 2025 Jul 29:2025.07.25.666882. doi: 10.1101/2025.07.25.666882. PMID: 40766521; PMCID: PMC12324246.

    1. eLife Assessment

      This is a useful paper regarding the roles of brown adipose tissue and skeletal muscle in thermogenesis in mice, with potential significance for the field. The overall approach is innovative but on balance the evidence for the claim is incomplete, as cast immobilization, while innovative, is likely stressful, may impact muscle and BAT directly, and imposes an energetic cost of motion on the animal that is not accounted for. Further experiments are also needed to directly assess the role of adipose-derived BCAAs in thermogenesis. The authors have done a good job of textually editing their manuscript to clarify the findings and limitations of the study.

    2. Reviewer #1 (Public review):

      Summary:

      Heat production mechanisms are flexible, depending on a wide variety of genetic, dietary and environmental factors. The physiology associated with each mechanism is important to understand, since loss of flexibility associates with metabolic decline and disease.

      The phenomenon of compensatory heat production has been described in some detail in publications and reviews, notably by modifying BAT-dependent thermogenesis (for example by deleting UCP1 or impairing lipolysis, cited in this paper).

      These authors chose to eliminate exercise as an alternative means for maintaining body temperature. To do this, they cast either one or both mouse hindlimbs.

      This paper is set up as an evaluation of a loss of function of muscle on the functionality of BAT. However, the authors show that cast immobilization (CI) does not work as a (passive) loss of function, instead this procedure produces a dramatic gain of function.

      It does not test the hypothesis as stated, instead it adds an extraneous variable, which is that the animal is put under enormous stress, inducing b-adrenergic effectors, increased oxygen consumption, and IL6 expression in a variety of tissues, together with commensurate cachectic effects on muscle and fat. The BAT is stressed by this procedure, becoming super-induced but relatively poor functioning. This is an inaccurate experimental construct, and the paper is therefore full of wrong conclusions.

      Within hours and days of CI, there is massive muscle loss (leading to high circulating BCAAs), and loss of lipid reserves in adipose and liver. The lipid cycle that maintains BAT thermogenesis is depleted and the mouse is unable to maintain body temperature.

      I cannot agree with these statements in the Discussion -

      "We have here shown that cast immobilization suppressed skeletal muscle thermogenesis, resulting in failure to maintain core body temperature in a cold environment."

      • This result could also be attributed to high stress and decreased calorie reserves. Note also: CI suppresses 50% locomoter activity, but the actual work done by the mouse carrying bilateral casts is not taken into account (how heavy are they?). Presumably other muscles in the mouse body are compensating to allow the mouse to drag itself to the food source, to maintain food consumption, which remarkably, is unchanged. Is the demand for heat even the same when the mouse is wrapped in gypsum?

      I cannot be convinced that this approach (CI) can be interpreted at all in terms of organ communication during thermogenic challenge. This paper describes instead the resilience and adaptation of mouse physiology in the face of dragging around hind limb casts.

      From Rebuttal:

      "On the other hand, the experiment shown in Fig.1C involved acute cold exposure of mice 2 h after cast immobilization. This result suggests that, even before the depletion of energy stores by immobilization of skeletal muscle, cast immobilization may cause cold intolerance in mice."

      Since the mice are in acute recovery from the anesthetic, there can be no conclusions drawn about thermogenesis. Isoflurane is a great way to depress body temperature (http://www.ncbi.nlm.nih.gov/pubmed/12552204), and the recovery time is not known.

      "In addition, as the reviewer suggests, cast immobilization may result in BAT thermogenesis and cachectic effects on muscle and fat. However, circulating corticosterone concentrations and hypothalamic CRH gene expression are not significantly altered after cast immobilization (Figure 2_figure supplement 2D-F)."

      The absence of positive results from your stress assays does not exclude stress as the primary source of the results. These mice are not proceeding as normal with their lives - they are learning whole new behaviors in order to stay fed and watered.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors identified a previously unrecognized organ interaction where limb immobilization induces thermogenesis in BAT. They showed that limb immobilization by cast fixation enhances the expression of UCP1 as well as amino acid transporters in BAT, and amino acids are supplied from skeletal muscle to BAT during this process, likely contributing to increased thermogenesis in BAT. Furthermore, the experiments with IL-6 knockout mice and IL-6 administration to these mice suggest that this cytokine is likely involved in the supply of amino acids from skeletal muscle to BAT during limb immobilization.

      Strengths:

      The function of BAT plays a crucial role in the regulation of an individual's energy and body weight. Therefore, identifying new interventions that can control BAT function is not only scientifically significant but also holds substantial promise for medical applications. The authors have thoroughly and comprehensively examined the changes in skeletal muscle and BAT under these conditions, convincingly demonstrating the significance of this organ interaction.

      Weaknesses:

      Through considerable effort, the authors have demonstrated that limb-immobilized mice exhibit changes in thermogenesis and energy metabolism dynamics at their steady state. However, The impact of immobilization on the function of skeletal muscle and BAT during cold exposure has not been thoroughly analyzed.

      Comments on revisions:

      The authors appropriately responded to the reviewers' recommendations made during the previous round of peer review.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Heat production mechanisms are flexible, depending on a wide variety of genetic, dietary, and environmental factors. The physiology associated with each mechanism is important to understand since loss of flexibility is associated with metabolic decline and disease. The phenomenon of compensatory heat production has been described in some detail in publications and reviews, notably by modifying BAT-dependent thermogenesis (for example by deleting UCP1 or impairing lipolysis, cited in this paper). These authors chose to eliminate exercise as an alternative means of maintaining body temperature. To do this, they cast either one or both mouse hindlimbs. This paper is set up as an evaluation of a loss of function of muscle on the functionality of BAT.

      Strengths:

      The study is supported by a variety of modern techniques and procedures.

      Weaknesses:

      The authors show that cast immobilization (CI) does not work as a (passive) loss of function, instead, this procedure produces a dramatic gain of function, putting the animal under considerable stress, inducing b-adrenergic effectors, increased oxygen consumption, and IL6 expression in a variety of tissues, together with commensurate cachectic effects on muscle and fat. The BAT is put under considerable stress, super-induced but relatively poor functioning. Thus within hours and days of CI, there is massive muscle loss (leading to high circulating BCAAs), and loss of lipid reserves in adipose and liver. The lipid cycle that maintains BAT thermogenesis is depleted and the mouse is unable to maintain body temperature.

      I cannot agree with these statements in the Discussion:  

      "We have here shown that cast immobilization suppressed skeletal muscle thermogenesis, resulting in failure to maintain core body temperature in a cold environment."

      This result could also be attributed to high stress and decreased calorie reserves. Note also: CI suppresses 50% of locomotor activity, but the actual work done by the mouse carrying bilateral casts is not taken into account.

      We appreciate the reviewer's suggestion. We thank you for raising this issue. As the reviewers suggest, we also consider that cold intolerance resulting from cast immobilization may be attributed to high stress levels, decreased calorie reserves, or reduced systemic locomotor activity. Indeed, reductions in the weight of visceral adipose tissue weight and increases in lipid utilization were observed in the early phase of cast immobilization (Fig.2G and 2F). This suggests that the depletion of calorie reserves induced by stress may affect cold intolerance in cast immobilized mice (Fig.1A-1B). On the other hand, the experiment shown in Fig.1C involved acute cold exposure of mice 2 h after cast immobilization. This result suggests that, even before the depletion of energy stores by immobilization of skeletal muscle, cast immobilization may cause cold intolerance in mice. In addition, as the reviewer suggests, cast immobilization may result in BAT thermogenesis and cachectic effects on muscle and fat. However, circulating corticosterone concentrations and hypothalamic CRH gene expression are not significantly altered after cast immobilization (Figure 2_figure supplement 2D-F). This raises questions about the contribution of stress to the changes in the systemic energy metabolism in this model. As such, we responded to the reviewers’ comments by revising this statement at the beginning of the ‘Discussion’ section and adding a discussion on pages 16, in addition to the existing discussion on pages 17–18.

      Furthermore, to respond as best we could to the reviewer's comments, we performed additional experiments using the restraint stress model (Figure 7). We found that short-term restraint stress may recruit substrate supply from skeletal muscle for BAT thermogenesis via Il6 gene expression. Based on these data, we speculate that the interaction between BAT and skeletal muscle amino acid metabolism may operate under various physiological stress conditions, including infection and exercise, as well as skeletal muscle immobilization, stress, and cold exposure. This interaction may play a significant role in regulating body temperature and energy metabolism. We are currently investigating the effects of sympathetic activation on skeletal muscle amino acid metabolism and systemic thermoregulation via IL-6 secretion from skeletal muscle using a new model. These data will be reported in a subsequent report.

      "Thermoregulatory system in endotherms cannot be explained by thermogenesis based on muscle contraction alone, with nonshivering thermogenesis being required as a component of the ability to tolerate cold temperatures in the long term."

      This statement is correct, and it clearly showcases how difficult it is to interpret results using this CI strategy. The question to the author is- which components of muscle thermogenesis are actually inhibited by CI, and what is the relative heat contribution?

      We appreciate raising this important issue. This study required the measurements of skeletal muscle temperature and electromyography in mice with cast immobilization, but we were unable to perform these measurements. We have therefore described the reviewers suggest on page 18 as limitations of this study.

      In our additional experiments, we found that several genes that are usually activated in skeletal muscle during cold exposure are repressed in mice with cast immobilization (Figure 1_figure supplement 1_G-1K). Skeletal muscle is an important thermogenic organ. Although the role of the sarcolipin gene in non-shivering thermogenesis is well understood, the primary regulator of thermogenesis in metabolism and shivering remains unclear. In Future, we would like to use models in which key signals for energy metabolism are inhibited, such as muscle-specific PGC-1α-deficient mice and muscle-specific AMPK-deficient mice, to clarify important factors in skeletal muscle heat thermogenesis. We expect this approach to enable us to analyze the relative thermal contributions of each component of the heat production process in skeletal muscle, which has proven difficult in immobilized muscle models.

      This conclusion is overinterpreted:

      "In conclusion, we have shown that cast immobilization induced thermogenesis in BAT that was dependent on the utilization of free amino acids derived from skeletal muscle, and that muscle-derived IL-6 stimulated BCAA metabolism in skeletal muscle. Our findings may provide new insights into the significance of skeletal muscle as a large reservoir of amino acids in the regulation of body temperature".

      In terms of the production of the article - the data shown in the heat maps has oddly obscure log10 dimensions. The changes are minimal, approx. 1.5x increase/decrease and therefore significance would be key to reporting these data. Fig.3C heatmap is not suitable. What are the 6 lanes to each condition? Overall, this has little/no information.

      Rather than cherry-picking for a few genes, the results could be made more rigorous using RNA-seq profiling of BAT and muscle tissues.

      We agree that this is an important point. Indeed, our model of skeletal muscle immobilization reveals only modest changes in metabolomics and gene expression analysis. We consider this to be a weakness of the study. However, the interactive thermogenic system that we discovered between skeletal muscle and BAT may also function under other conditions, such as acute stress and cold exposure. We should investigate this further in future models involving such dramatic metabolic changes. In fact, it has been shown that the levels of several metabolites are significantly altered in BAT after acute cold exposure.[1] Therefore, we have corrected the conclusion of this section, as stated on page 18, and added it. We also performed an enrichment analysis on the metabolomics data in BAT following cast immobilization and included the results in Figure 2_figure Supplement 1A. In addition, we excluded the heatmap from Fig. 3C of the pre-revision manuscript, as advised by the reviewer. Although we excluded the results in Figure 3C, we consider Figure 3_figure supplement_1 to be sufficient for the text.  

      In addition, we agree with the reviewer's remarks on our gene expression analysis. In this study, we were unable to examine RNA-seq profiling of BAT and muscle tissue. Therefore, we have described this as a limitation of the study on page 20. However, we are interested in investigating the effect of IL-6 derived from skeletal muscle on RNA-seq profiling of skeletal muscle and BAT. We will conduct future RNA-seq analyses of BAT and skeletal muscle, using models of skeletal muscle immobilization, acute cold exposure and restraint stress.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors identified a previously unrecognized organ interaction where limb immobilization induces thermogenesis in BAT. They showed that limb immobilization by cast fixation enhances the expression of UCP1 as well as amino acid transporters in BAT, and amino acids are supplied from skeletal muscle to BAT during this process, likely contributing to increased thermogenesis in BAT. Furthermore, the experiments with IL-6 knockout mice and IL-6 administration to these mice suggest that this cytokine is likely involved in the supply of amino acids from skeletal muscle to BAT during limb immobilization.

      Strengths:

      The function of BAT plays a crucial role in the regulation of an individual's energy and body weight. Therefore, identifying new interventions that can control BAT function is not only scientifically significant but also holds substantial promise for medical applications. The authors have thoroughly and comprehensively examined the changes in skeletal muscle and BAT under these conditions, convincingly demonstrating the significance of this organ interaction.

      Weaknesses:

      Through considerable effort, the authors have demonstrated that limb-immobilized mice exhibit changes in thermogenesis and energy metabolism dynamics at their steady state. However, The impact of immobilization on the function of skeletal muscle and BAT during cold exposure has not been thoroughly analyzed.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors show that impairment of hind limb muscle contraction by cast immobilization suppresses skeletal muscle thermogenesis and activates thermogenesis in brown fat. They also propose that free BCAAs derived from skeletal muscle are used for BAT thermogenesis, and identify IL-6 as a potential regulator.

      Strengths:

      The data support the conclusions for the most part.

      Weaknesses: The data provided in this manuscript are largely descriptive. It is therefore difficult to assess the potential significance of the work. Moreover, many of the described effects are modest in magnitude, questioning the overall functional relevance of this pathway. There are no experiments that directly test whether BCAAs derived from adipose tissue are used for thermogenesis, which would require more robust tracing experiments. In addition, the rigor of the work should be improved. It is also recommended to put the current work in the context of the literature.

      We appreciate the reviewer's valuable feedback. As the reviewer pointed out, many of the effects described in this study are modest in magnitude. This reflects a limitation of our study, which used skeletal muscle immobilization as a model. To clarify the overall functional relevance of this pathway, we therefore plan to use alternative models in which BAT thermogenesis and systemic cachectic effect are more strongly induced. We have added this point to the 'Conclusion' section on page 18.

      In addition, previous findings reported that mitochondrial BCAA catabolism in brown adipocytes promotes systemic BCAA clearance, suggesting that BCAAs may be supplied to BAT from other organs during BAT thermogenesis.[5] However, as the reviewer rightly pointed out, the current study did not directly investigate whether BCAAs derived from adipose tissue contribute to thermogenic processes. In light of this, we have revised the manuscript to include a statement in the limitations section on page 20 that addresses this point. 

      Metabolomic analysis of white adipose tissue (WAT) following skeletal muscle immobilization revealed alterations in amino acid concentrations in WAT in response to cast immobilization (Author response image 1A). Notably, levels of BCAAs in WAT remained largely unchanged at 24 hours after cast immobilization, but increased significantly by day 7 (Author response image 1B). At the 24-hour time point, when BAT thermogenesis is known to be activated, WAT weights was found to be reduced (Fig. 2H). Gene expression analysis of amino acid metabolism-related genes in WAT at this time point revealed a modest upregulation of several genes (Author response image 1C). Furthermore, a slight increase in the uptake of [<sup>3</sup>H] leucine into WAT was observed following immobilization (Fig. 3C). Collectively, these findings suggest that BCAAs within WAT may be primarily metabolized locally rather than being mobilized and supplied to BAT. In addition, given the relatively low levels of BCAAs per tissue mass and the limited capacity for BCAA uptake in WAT compared to other tissues, we consider it unlikely that WAT serves as a major reservoir of BCAAs.

      Author respons image1.

      (A) Amino acids in epididymal white adipose tissue (eWAT) of IL-6 KO (–/–) and WT (+/+) mice without (control) or with bilateral cast immobilization for the indicated times. Results are presented as heat maps of the log10 value of the fold change relative to control WT mice and are means of four mice in each group. (B) BCAA concentrations in eWAT of IL-6 KO and WT mice without (control) or with bilateral cast immobilization for 1 or 7 days. (n = 4 per group) (C) RT and real-time PCR analysis of the expression of SLC1A5, SLC7A1, SLC38A2, SLC43A1, BCAT2 and BCKDHA genes in eWAT of mice without (control) or with bilateral cast immobilization for 24 h. (n = 6 per group). All data other than in (A) are means ± SEM. *p < 0.05, **p < 0.01, ***p < 0.001 as determined by Dunnett's test (B) or by the unpaired t test (C).

      Reviewer #1 (Recommendations for the authors): 

      • Gypsum is an irrelevant label. Label consistently, with a procedure acronym, like CI or Imm.

      We apologize for any confusion that our notation may have caused. We corrected all labels relating to the skeletal muscle immobilization model in mice to 'Imm'.

      There are many grammatical errors and typos. Search for an example on Fudure1. The sense of some sentences is enough to obscure their meaning.

      We appreciate the reviewer's points. We have checked the article for grammatical and typographical errors, correcting them where necessary.

      • Figures 6E and F need to be re-annotated in the legend and on figures.

      Following the peer reviewer's advice, we have re-annotated the Figure legends of this result.

      Reviewer #2 (Recommendations for the authors): 

      (1) It is difficult to understand how the data presented in Supplemental Table 1 were obtained. This appears to be data showing that the skeletal muscle weight of the hind limbs in mice accounts for 40 to 50% of the total skeletal muscle weight. How did the authors calculate the muscle weight? Specifically, how did they measure the weight of muscles that are neither in the hind limbs nor in the forelimbs ("Other")? Was this estimated from whole-body CT or MRI data?

      In the legend, it mentions "the posterior cervical region," but what exactly was measured in the posterior cervical region? The methods for this data should be clearly described.

      We appreciate the reviewers' comments. We apologize for any confusion caused by inadequate explanation of this data. This data was obtained by removing skeletal muscle from the posterior cervical region and measuring the weight of the wet tissue. We have taken care to remove most of the skeletal muscle, but some will remain. However, we do not believe that these errors are significant enough to alter the interpretation of the results. This has now been added to the 'Methods' section on page 21.

      (2) Through considerable effort, the authors have demonstrated that limb-immobilized mice exhibit changes in thermogenesis and energy metabolism dynamics at their steady state. However, it remains unclear why limb-immobilized mice have reduced tolerance to cold exposure. Was there any change in the abundance of energy metabolism-related genes during cold exposure between the immobilized and control mice? For example, if the gene expression of UCP1 and UCP2, which are typically upregulated in brown adipose tissue (BAT) and skeletal muscle during cold exposure, was suppressed in the immobilized mice, it might explain their reduced cold tolerance. Thus, the changes in the response of skeletal muscle and BAT to cold exposure between immobilized and control mice should also be analyzed.

      We thank the reviewer for the constructive comments. We consider the main weakness of this study to be the fact that we were unable to measure the temperature and electromyography (EMG) of the skeletal muscles of the cast-immobilized mice. Following the reviewers' advice, we analyzed the expression levels of several genes related to heat production or energy metabolism (Ucp1, Ucp2, Ucp3, Sln and Ppargc1a) in BAT and skeletal muscle of cast-immobilized mice after acute cold exposure (Figure1_figure supplement 1G-1K). The results showed that the expression of several genes that are usually increased in BAT and skeletal muscle during cold exposure was repressed in cast-immobilized mice. Notably, cast immobilization did not induce the UCP2 and PGC-1α genes at room temperature, and their upregulation during cold exposure was also suppressed in cast-immobilized mice. UCP2 is known to alter its expression in relation to energy metabolism, but it is unclear whether it regulates energy metabolism.[2] Additionally, UCP2 is understood to play a non-role in thermogenesis, and the function of the UCP2 in skeletal muscle remains unclear.[3] On the other hands, PGC-1α is widely recognized as a transcriptional coactivator that regulates various metabolic processes, including thermogenesis.[4] In our study, we found that the amounts of metabolites in the TCA cycle and the expression of the PGC-1α gene were decreased rapidly in immobilized skeletal muscle. This suggests that the metabolic rate is reduced in immobilized skeletal muscle (Figure 1_figure supplement 2A and 2F). In endothermic animals, energy expenditure in skeletal muscle plays a significant role in maintaining body temperature during both activity and rest. Hence, it is assumed that the reduced metabolic rate in skeletal muscle significantly impacts the maintenance of body temperature in cold conditions. Further investigation is required into the function of these genes in skeletal muscle thermogenesis, but we expect that the additional data suggest that the loss of muscle function due to immobilization affects the maintenance of body temperature under cold temperature. These results were discussed further on page 15.

      Reviewer #3 (Recommendations for the authors): 

      There are also more specific concerns related to the data supporting the claims.

      (1) The relevance of increasing thermogenesis in BAT after cast immobilization is unclear, as adult humans have very little BAT. Thermogenesis gene and protein expression should be measured in white adipose tissue.

      We would like to thank the reviewers for highlighting this important issue. We agree with the reviewer's comments. We did not observe significant changes in UCP1 expression in the subcutaneous adipose tissue of the inguinal region following skeletal muscle immobilization. We suspect that this is because skeletal muscle immobilization in mice did not exert a strong enough effect to induce browning of white adipose tissue. The ability of immobilizing skeletal muscle to activate thermogenesis in brown or beige adipocytes in adults remains unclear. We have therefore noted this limitation in our study in line 6.

      Additionally, in this study, we aimed to clarify the role of skeletal muscle as an amino acid reservoir under metabolic stress conditions that increase BAT thermogenesis. To this end, we employed models of skeletal muscle immobilization, acute cold exposure, and restraint stress. We also intend to analyze the metabolic interactions between beige adipose tissue and skeletal muscle in more detail using models that induce browning, such as exercise or cold acclimation.

      (2) In Figures 1E-G, there is no significant difference in UCP1 levels relative to the control, but body temperature is lowered from day 2 to day 7. How do the authors explain this?

      This is an important point. We consider the decrease in body temperature of mice following cast immobilization at room temperature to be the result of a reduction in systemic locomotor activity.

      (3) The small induction of PGC1a seen at 10 hours goes away after day 3. Why is this?

      This is an important point. Our investigation showed that the norepinephrine concentration in BAT and blood of cast-immobilized mice tends to increase, peaking at 24 hours of immobilization (Fig. 1H and Figure 2_figure supplement 2D), and then gradually returns to baseline. We speculate that this transient activation of the sympathetic nervous system may affect the expression of PGC1α in BAT. Additionally, although thermogenesis in BAT temporarily increases after skeletal muscle immobilization, studies from other research groups suggest that long-term skeletal muscle immobilization (two weeks) may increase non-shivering thermogenesis in skeletal muscle via high expression SLN.[6] Therefore, we hypothesize that other thermogenic mechanisms besides BAT might be involved during prolonged cast immobilization. We have added a discussion of these topics on page 16.

      (4) The metabolic cage data are marked in multiple places as significant, but the effect size is extremely small. Please describe how significance was calculated (Figure 5 supplement 1B, E, F).

      This is a valid point. This data was statistically analyzed using daily averages, with the results then being compiled. However, the figure was amended because it was not appropriate to use the original to demonstrate significant differences.

      (5) How does IL-6 increase BCAA levels in muscle?

      This is an important point. We are also investigating this issue with great interest. In future, we will use RNA-seq profiling to investigate the mechanism by which IL-6 regulates amino acid metabolism in skeletal muscle. This point was added as a

      limitation of the study on page 19.

      (6) What is the mechanism behind the elevated il6 levels after cast immobilization?

      We appreciate the reviewer's points. Since IL-6 gene expression in skeletal muscle increases in response to acute cold exposure and acute stress, we hypothesize that IL-6 is regulated by β-adrenergic effectors. In our preliminary experiments, stimulation with norepinephrine or with clenbuterol, a β2-adrenergic receptor agonist, suggests an increase in IL-6 gene expression and the intracellular free BCAA concentration in cultured mouse muscle cells (Author response image 2A-2D). Going forward, our plans include conducting further studies using a mouse model in which the sympathetic nervous system is activated by administering LPS intracerebroventricularly, as well as using muscle-specific β2-adrenergic receptor knockout mice.  

      Reference:

      (1) Okamatsu-Ogura, Y., et al. UCP1-dependent and UCP1-independent metabolic changes induced by acute cold exposure in brown adipose tissue of mice. Metabolism. 2020 113:  154396 doi: 10.1016/j.metabol.2020.154396.

      (2) Patrick Schrauwen and Matthijs Hesselink, UCP2 and UCP3 in muscle controlling body metabolism., J Exp Biol. 2002 Aug;205(Pt 15):2275-85. doi: 10.1242/jeb.205.15.2275.

      (3) C Y Zhang, et al., Uncoupling protein-2 negatively regulates insulin secretion and is a major link between obesity, beta cell dysfunction, and type 2 diabetes., Cell. 2001 Jun 15;105(6):745-55. doi: 10.1016/s0092-8674(01)00378-6.

      (4) Christophe Handschin and Bruce M Spiegelman, Peroxisome proliferator-activated receptor gamma coactivator 1 coactivators, energy homeostasis, and metabolism., Endocr Rev. 2006 Dec;27(7):728-35. doi: 10.1210/er.2006-0037.

      (5) Yoneshiro, et al., BCAA catabolism in brown fat controls energy homeostasis through SLC25A44. Nature. 2019 572(7771): 614-619 doi: 10.1038/s41586-019-1503-x.

      (6) Shigeto Tomiya, et al., Cast immobilization of hindlimb upregulates sarcolipin expression in atrophied skeletal muscles and increases thermogenesis in C57BL/6J mice., Am J Physiol Regul Integr Comp Physiol. 2019 Nov1;317(5):R649-R661.doi:10.1152/ajpregu.00118.2019.

    1. eLife Assessment

      This work presents a valuable resource combining scRNA-seq and spatial transcriptomics studies to map mouse pre-clinical models of colorectal cancer, identifying distinct cellular programs and microenvironments that could enhance patient stratification and therapeutic approaches in colorectal cancer. While the evidence provided in the manuscript are not fully validated, these solid data were collected and analyzed using a validated methodology that will be of interest to the community in future studies.

    2. Reviewer #2 (Public review):

      In their study, Avraham-Davidi et al. combined scRNA-seq and spatial mapping studies to profile two preclinical mouse models of colorectal cancer: Apcfl/fl VilincreERT2 (AV) and Apcfl/fl LSL-KrasG12D Trp53fl/fl Rosa26LSL-tdTomato/+ VillinCreERT2 (AKPV). In the first part of the manuscript, the authors describe the analysis of the normal colon and dysplastic lesions induced in these models following tamoxifen injection. They highlight broad variations in immune and stromal cell composition within dysplastic lesions, emphasizing the infiltration of monocytes and granulocytes, the accumulation of IL-17+gdT cells and the presence of a distinct group of endothelial cells. A major focus the study is the remodeling of the epithelial compartment, where most significant changes are observed. Using no-negative matrix factorization, the authors identify molecular programs of epithelial cell functions, emphasizing stemness, Wnt signaling, angiogenesis and inflammation as majors features associated with dysplastic cells. They conclude that findings from scRNA-seq analyses in mouse models are transposable to human CRC. In the second part of the manuscript, the authors aim to provide the spatial contexture for their scRNA-seq findings using Slide-seq and TACCO. They demonstrate that dysplastic lesions are disorganized and contain tumor-specific regions, which contextualize the spatial proximity between specific cell states and gene programs. Finally, they claim that these spatial organizations are conserved in human tumors and associate region-based gene signatures with patient outcome in public datasets. Overall, the data were collected and analyzed using solid and validated methodology to offer a useful resource to the community.

      Main comments:

      (1) Clarity. The manuscript would benefit from a substantial reorganization to improve clarity and accessibility for a broad readership. The text could be shortened and the number of figure panels reduced to emphasize the novel contributions of this work while minimizing extensive discussions on general and expected findings, such as tissue disorganization in dysplastic lesions. Additionally, figure panels are not consistently introduced in the correct order, and some are not discussed at all (e.g., Fig. S1D; Fig. 3C is introduced before Fig. 3A; several panels in Fig. 4 are not discussed). The annotation of scRNA-seq cell states is insufficiently explained, with no corresponding information about associated genes provided in the figures or tables. Multiple annotations are used to describe cell groups (e.g., TKN01 = γδ T and CD8 T, TKN05 = γδT_IL17+), but these are not jointly accessible in the figures, making the manuscript challenging to follow. It is also not clear what is the respective value of the two mouse models and timepoints of tissue collection in the analysis.

      (2) Novelty. While the study is of interest, it does not present major findings that significantly advance the field or motivate new directions and hypotheses. Many conclusions related to tissue composition and patient outcomes, such as the epithelial programs of Wnt signaling, angiogenesis, and stem cells, are well-established and not particularly novel. Greater exploration of the scRNA-seq data beyond cell type composition could enhance the novelty of the findings. For instance, several tumor microenvironment clusters uniquely detected in dysplastic lesions (e.g., Mono2, Mono3, Gran01, Gran02) are identified, but no further investigation is conducted to understand their biological programs, such as applying nNMF as was done for epithelial cells. Additional efforts to explore precise tissue localization and cellular interactions within tissue niches would provide deeper insights and go beyond the limited analyses currently displayed in the manuscript.

      (3) Validation. Several statements made by the authors are insufficiently supported by the data presented in the manuscript and should be nuanced in the absence of proper validation. For example: 1.) RNA velocity analyses: The conclusions drawn from these analyses are speculative and need further support. 2.) Annotations of epithelial clusters as dysplastic: These annotations could have been validated through morphological analyses and staining on FFPE slides. 3.) Conservation of mouse epithelial programs in human tumors: The data in Figure S5B does not convincingly demonstrate enrichment of stem cell program 16 in human samples. This should be more explicitly stated in the text, given the emphasis placed on this program by the authors. 4.) Figure S6E: Cluster Epi06 is significantly overrepresented in spatial data compared to scRNA-seq, yet the authors claim that cell type composition is largely recapitulated without further discussion, which reduces confidence in other conclusions drawn.<br /> Furthermore, stronger validation of key dysplastic regions (regions 6, 8, and 11) in mouse and human tissues using antibody-based imaging with markers identified in the analyses would have considerably strengthened the study. Such validation would better contextualize the distribution, composition, and relative abundance of these regions within human tumors, increasing the significance of the findings and aiding the generation of new pathophysiological hypotheses.

      Comments on revisions:

      The authors have improved the clarity of the manuscript and responded adequately to all my initial comments.<br /> I don't have any other comments. Congratulations to the authors on this work.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors conducted a spatial analysis of dysplastic colon tissue using the Slide-seq method. Their main objective is to build a detailed spatial atlas that identifies distinct cellular programs and microenvironments within dysplastic lesions. Next, they correlated this observation with clinical outcomes in human colorectal cancer.

      Strengths:

      The work is a good example of utilising spatial methods to study different tumour models. The authors identified a unique stem cell program to understand tumours gently and improve patient stratification strategies.

      Weaknesses:

      However, the study's predominantly descriptive nature is a significant limitation. Although the spatial maps and correlations between cell states are interesting observations, the lack of functional validation-primarily through experiments in mouse models-weakens the causal inferences regarding the roles these cellular programs play in tumour progression and therapy resistance.

      We thank the reviewer for this comment. Indeed, functional validation to pin down causal dependencies and a more thorough investigation of tumor progression and therapy resistance both in mouse model as well as human patients and/or patient derived samples would broaden the insights to be gained from this work. Unfortunately, this is beyond the scope of this study.

      The authors also missed an opportunity to link the mutational status of malignant cells with the cellular neighbourhoods.

      The data reported in this study only contains spatial data for one mouse model (AV). As spatial data for the other model (AKPV) is missing, it is not possible to link the mutational type of the model with the cellular neighborhoods. We did investigate whether there is extra somatic mutational heterogeneity in the AV data, both regarding single nucleotide variations (SNVs) and copy number variations (CNVs). But at the time when the mice were sacrificed (after 3 weeks) there was no significant mutational heterogeneity discoverable.

      Overall, the study contributes to profiling the dysplastic colon landscape. The methodologies and data will benefit the research community, but further functional validation is crucial to validate the biological and clinical implications of the described cellular interactions.

      Reviewer #2 (Public review):

      In their study, Avraham-Davidi et al. combined scRNA-seq and spatial mapping studies to profile two preclinical mouse models of colorectal cancer: Apcfl/fl VilincreERT2 (AV) and Apcfl/fl LSL-KrasG12D Trp53fl/fl Rosa26LSL-tdTomato/+ VillinCreERT2 (AKPV). In the first part of the manuscript, the authors describe the analysis of the normal colon and dysplastic lesions induced in these models following tamoxifen injection. They highlight broad variations in immune and stromal cell composition within dysplastic lesions, emphasizing the infiltration of monocytes and granulocytes, the accumulation of IL-17+gdT cells, and the presence of a distinct group of endothelial cells. A major focus of the study is the remodeling of the epithelial compartment, where the most significant changes are observed. Using non-negative matrix factorization, the authors identify molecular programs of epithelial cell functions, emphasizing stemness, Wnt signaling, angiogenesis, and inflammation as major features associated with dysplastic cells. They conclude that findings from scRNA-seq analyses in mouse models are transposable to human CRC. In the second part of the manuscript, the authors aim to provide the spatial context for their scRNA-seq findings using Slide-seq and TACCO. They demonstrate that dysplastic lesions are disorganized and contain tumor-specific regions, which contextualize the spatial proximity between specific cell states and gene programs. Finally, they claim that these spatial organizations are conserved in human tumors and associate region-based gene signatures with patient outcomes in public datasets. Overall, the data were collected and analyzed using solid and validated methodology to offer a useful resource to the community.

      Main comments:

      (1) Clarity

      The manuscript would benefit from a substantial reorganization to improve clarity and accessibility for a broad readership. The text could be shortened and the number of figure panels reduced to emphasize the novel contributions of this work while minimizing extensive discussions on general and expected findings, such as tissue disorganization in dysplastic lesions. Additionally, figure panels are not consistently introduced in the correct order, and some are not discussed at all (e.g., Figure S1D; Figure 3C is introduced before Figure 3A; several panels in Figure 4 are not discussed). The annotation of scRNA-seq cell states is insufficiently explained, with no corresponding information about associated genes provided in the figures or tables. Multiple annotations are used to describe cell groups (e.g., TKN01 = γδ T and CD8 T, TKN05 = γδT_IL17+), but these are not jointly accessible in the figures, making the manuscript challenging to follow. It is also not clear what is the respective value of the two mouse models and time points of tissue collection in the analysis.

      We thank the reviewer for this suggestion. We clarified and simplified the revised manuscript, however we believe that the current discussions are an important part of the manuscript and would be useful to readers. We reordered panels in Figures S1 and 3 to align with their appearance in the manuscript. We kept the order of other panels as it is to keep both context and coherence of those figures intact. We changed the way we reference cell clusters in the manuscript to better align with the naming scheme introduced in Figure 1B. The respective value of the two mouse models as well as the time points of tissue collection are described in lines 108-120 of the manuscript.

      (2) Novelty

      While the study is of interest, it does not present major findings that significantly advance the field or motivate new directions and hypotheses. Many conclusions related to tissue composition and patient outcomes, such as the epithelial programs of Wnt signaling, angiogenesis, and stem cells, are well-established and not particularly novel. Greater exploration of the scRNA-seq data beyond cell type composition could enhance the novelty of the findings. For instance, several tumor microenvironment clusters uniquely detected in dysplastic lesions (e.g., Mono2, Mono3, Gran01, Gran02) are identified, but no further investigation is conducted to understand their biological programs, such as applying nNMF as was done for epithelial cells. Additional efforts to explore precise tissue localization and cellular interactions within tissue niches would provide deeper insights and go beyond the limited analyses currently displayed in the manuscript.

      We thank the reviewer for this comment. Our study aimed to spatially characterize the tumor microenvironment, with scRNA-seq analysis serving to support this spatial characterization.

      Due to technical limitations—such as the number of samples and the limited capture efficiency of Slide-seq—the resolution of immune cell identification in our spatial analysis is constrained. Additionally, while immune and stromal cells formed distinct clusters, epithelial cells exhibited a continuum that was better captured using nNMF.

      Lastly, our manuscript provides a general characterization of monocyte and granulocyte populations in scRNA-seq (line 144) and their spatial microenvironments (line 400). We believe that additional analyses of these populations would be beyond the scope of this study and could place an unnecessary burden on the reader. Instead, we suggest that such analyses be explored in future studies.

      We remark that we analyzed tissue localization for two entirely different spatial transcriptomics assays (Slide-seq and Cartana) at the resolution of cell types and programs, which was feasible within the constraints of the sparsity, gene panel and sample size in the experiments. A future potential path to further increase the resolution of investigation in this dataset is to include other datasets, e.g. by the emerging transformer-based spatial transcriptomics integration methods.

      We also remark that the manuscript already includes an investigation of cellular interactions within tissue niches based on COMMOT (Fig 4k, Fig S8i, Supp Item 4).

      (3) Validation

      Several statements made by the authors are insufficiently supported by the data presented in the manuscript and should be nuanced in the absence of proper validation. For example:

      (a) RNA velocity analyses: The conclusions drawn from these analyses are speculative and need further support.

      We thank the reviewer for this comment. We clarified that our conclusions from the RNA velocity analysis need further support by experimental validation (lines 223-225), which is outside the scope of the current study.

      (b) Annotations of epithelial clusters as dysplastic: These annotations could have been validated through morphological analyses and staining on FFPE slides.

      We thank the reviewer for this comment. While this could have been a possible approach, our study primarily relies on scRNA-seq, which does not preserve tissue morphology, and Slide-seq of fresh tissue, where such an analysis is particularly challenging.

      (c) Conservation of mouse epithelial programs in human tumors: The data in Figure S5B does not convincingly demonstrate the enrichment of stem cell program 16 in human samples. This should be more explicitly stated in the text, given the emphasis placed on this program by the authors.

      We thank the reviewer for pointing this out. We clarified the section about the stem cell program 16 and references to Figures S5A and S5B (lines 269-274): while we do see correlation in the definition of human programs with the mouse stem cell program (Figure S5A), we do not see a correlated expression of the stem cell program across human and mouse (Figure S5B).

      (d) Figure S6E: Cluster Epi06 is significantly overrepresented in spatial data compared to scRNA-seq, yet the authors claim that cell type composition is largely recapitulated without further discussion, which reduces confidence in other conclusions drawn.

      We thank the reviewer for this remark. Indeed, Epi06 was a cluster which drew our attention during early analyses for its mixed expression profiles with contributions of vastly different cell types. We concluded that this is best explained by doublets, but we cannot rule out (partial) non-doublet explanations (e.g. undifferentiated cells). As doublet detection with Scrublet did not flag those cells as doublets, we kept these cells in the workflow, but excluded them from further interpretation. While in the previous version of the manuscript we only shortly hinted to this in figure legend 2A ("Cluster Epi06: doublets (not called by Scrublet)"), we expanded on this in the methods section of the revised manuscript (lines 863-869). Given the doublet interpretation, the observation that this cluster is significantly overrepresented in the annotation of the spatial data is not surprising as this annotation comes from the decomposition of compositional data which contains contributions of multiple cells per Slide-seq bead which are structurally very similar to doublets. While Epi06 appears enriched in S6E when comparing Slide-Seq to scRNA-seq, there are multiple technical  cross platform differences, including different per-gene sensitivities or capture biases for certain cell types (e.g. stromal cells suffering more from dissociation in scRNA compared to Slide-Seq). We believe that comparisons between disease states within a single platform are more biologically meaningful, like the comparison between normal and premalignant tissue, which is presented in Figure S6G. To increase confidence in the analysis and to assess whether intra-platform biological conclusions are affected by the inclusion/exclusion of Epi06, we recreated Figure S6G for a Slide-Seq cell type annotation without Epi06 in the reference (see Author response image 1). Even though Epi06 is missing in that annotation, the strong enrichments are consistently preserved between the two analysis variants, while as expected some less significant enrichments with larger FDR values are not preserved.

      Author response image 1.

      Significance (FDR, color bar, two-sided Welch’s t test on CLR-transformed compositions) of enrichment (red) or depletion (blue) of cell clusters (rows) in normal (N) or AV (AV) tissues based on Slide-seq (“spatial”) data or scRNA-seq ("sc”) including (A) or excluding (B) Epi06 in the reference for annotating the Slide-Seq data (A is identical to Figure S6G in the manuscript).<br />

      Furthermore, stronger validation of key dysplastic regions (regions 6, 8, and 11) in mouse and human tissues using antibody-based imaging with markers identified in the analyses would have considerably strengthened the study. Such validation would better contextualize the distribution, composition, and relative abundance of these regions within human tumors, increasing the significance of the findings and aiding the generation of new pathophysiological hypotheses.

      We agree with the reviewer with their assessment that validation by antibody-based imaging (or other spatial proteomics data) would have been useful follow-up experiments, yet these are beyond the scope of the current study.

      Reviewer #1 (Recommendations for the authors):

      AV and AKPV have different oncogenic mutations, and their impact on spatial neighbourhoods is unclear. Can authors perform an analysis to understand the contribution of oncogenic mutations on the spatial landscape of CRC?

      The data reported in this study only contains spatial data for one mouse model (AV). As spatial data for the other model (AKPV) is missing, it is not possible to comparatively link the mutational type of the model with the spatial landscape.

    1. eLife Assessment

      In this valuable study, Taber et al. used a battery of biophysical and structural approaches to characterize the impact of erythrocytosis-related mutations in prolyl hydroxylase domain protein 2 (PHD2). The authors show that PHD2 mutant proteins are destabilized, thus supporting the tenet that dysregulation of PHD2/hypoxia induced factor (HIF) axis underpins erythrocytosis, while providing solid evidence that N-terminal ODD prolyl hydroxylation of HIF is indispensable for these phenotypes. These findings were found to be of interest for researchers focusing on oxygen sensing in homeostasis and pathological states.

    2. Reviewer #1 (Public review):

      Summary:

      Taber et al report the biochemical characterization of 7 mutations in PHD2 that induce erythrocytosis. Their goal is to provide a mechanism for how these mutations cause the disease. PHD2 hydroxylates HIF1a in the presence of oxygen at two distinct proline residues (P564 and P402) in the "oxygen degradation domain" (ODD). This leads to the ubiquitylation of HIF1a by the VHL E3 ligase and its subsequent degradation. Multiple mutations have been reported in the EGLN1 gene (coding for PHD2), which are associated with pseudohypoxic diseases that include erythrocytosis. Furthermore, 3 mutations in PHD2 also cause pheochromocytoma and paraganglioma (PPGL), a neuroendocrine tumour. These mutations likely cause elevated levels of HIF1a, but their mechanisms are unclear. Here, the authors analyze mutations from 152 case reports and map them on the crystal structure. They then focus on 7 mutations, which they clone in a plasmid and transfect into PHD2-KO to monitor HIF1a transcriptional activity via a luciferase assay. All mutants show impaired activation. Some mutants also impaired stability in pulse chase turnover assays (except A228S, P317R, and F366L). In vitro purified PHD2 mutants display a minor loss in thermal stability and some propensity to aggregate. Using MST technology, they show that P317R is strongly impaired in binding to HIF1a and HIF2a, whereas other mutants are only slightly affected. Using NMR, they show that the PHD2 P317R mutation greatly reduces hydroxylation of P402 (HIF1a NODD), as well as P562 (HIF1a CODD), but to a lesser extent. Finally, BLI shows that the P317R mutation reduces affinity for CODD by 3-fold, but not NODD.

      Strengths:

      (1) Simple, easy-to-follow manuscript. Generally well-written.

      (2) Disease-relevant mutations are studied in PHD2 that provide insights into its mechanism of action.

      (3) Good, well-researched background section.

      Weaknesses:

      (1) Poor use of existing structural data on the complexes of PHD2 with HIF1a peptides and various metals and substrates. A quick survey of the impact of these mutations (as well as analysis by Chowdhury et al, 2016) on the structure and interactions between PHD2 peptides of HIF1a shows that the P317R mutation interferes with peptide binding. By contrast, F366L will affect the hydrophobic core, and A228S is on the surface, and it's not obvious how it would interfere with the stability of the protein.

      (2) To determine aggregation and monodispersity of the PHD2 mutants using size-exclusion chromatography (SEC), equal quantities of the protein must be loaded on the column. This is not what was done. As an aside, the colors used for the SEC are very similar and nearly indistinguishable.

      (3) The interpretation of some mutants remains incomplete. For A228S, what is the explanation for its reduced activity? It is not substantially less stable than WT and does not seem to affect peptide hydroxylation.

      (4) The interpretation of the NMR prolyl hydroxylation is tainted by the high concentrations used here. First of all, there is a likely a typo in the method section; the final concentration of ODD is likely 0.18 mM, and not 0.18 uM (PNAS paper by the same group in 2024 reports using a final concentration of 230 uM). Here, I will assume the concentration is 180 uM. Flashman et al (JBC 2008) showed that the affinity of the NODD site (P402; around 10 uM) for PHD2 is 10-fold weaker than CODD (P564, around 1 uM). This likely explains the much faster kinetics of hydroxylation towards the latter. Now, using the MST data, let's say the P317R mutation reduces the affinity by 40-fold; the affinity becomes 400 uM for NODD (above the protein concentration) and 40 uM for CODD (below the protein concentration). Thus, CODD would still be hydroxylated by the P317R mutant, but not NODD.

      (5) The discrepancy between the MST and BLI results does not make sense, especially regarding the P317R mutant. Based on the crystal structures of PHD2 in complex with the ODD peptides, the P317R mutation should have a major impact on the affinity, which is what is reported by MST. This suggests that the MST is more likely to be valid than BLI, and the latter is subject to some kind of artefact. Furthermore, the BLI results are inconsistent with previous results showing that PHD2 has a 10-fold lower affinity for NODD compared to CODD.

      (6) Overall, the study provides some insights into mutants inducing erythrocytosis, but the impact is limited. Most insights are provided on the P317R mutant, but this mutant had already been characterized by Chowdhury et al (2016). Some mutants affect the stability of the protein in cells, but then no mechanism is provided for A228S or F366L, which have stabilities similar to WT, yet have impaired HIF1a activation.

      Comments on revision:

      While the authors have addressed my concerns regarding the SEC experiments and the structural interpretation of most mutants, I remain unconvinced by their interpretation of the P317R mutant and affinity measurements. The BLI and MST data remain inconsistent for P317R binding to CODD, and the authors' response is essentially that the fluorescent labeling of P317R (but not other mutants) uniquely interferes with binding to the NODD/CODD peptides, which does not make a lot of sense. The fluorescent labeling target lysine residues; while there are lysine in PHD2 in proximity to the peptide binding site, labeling these sites would affect binding to all mutants, not only P317R (which does not introduce any new labeling site). Furthermore, the authors did not really address the discrepancy with the observations by Flashman et al (2008) that NODD binds more weakly than CODD, which is inconsistent with their BLI results. Another point that makes me doubt the validity of the BLI results is the poor fit of the sensorgrams and the slow dissociation kinetics, which is inconsistent with the relatively low affinity in the 2-6 uM range.

    3. Reviewer #2 (Public review):

      Summary:

      Mutations in the prolyl hydroxylase, PHD2, cause erythrocytosis and, in some cases, can result in tumorigenesis. Taber and colleagues test the structural and functional consequences of seven patient-derived missense mutations in PHD2 using cell-based reporter and stability assays, and multiple biophysical assays, and find that most mutations are destabilizing. Interestingly, they discover a PHD2 mutant that can hydroxylate the C-terminal ODD, but not the N-terminal ODD, which suggests the importance of N-terminal ODD for biology. A major strength of the manuscript is the multidisciplinary approach used by the authors to characterize the functional and structural consequences of the mutations. However, the manuscript had several major weaknesses, such as an incomplete description of how the NMR was performed, a justification for using neighboring residues as a surrogate for looking at prolyl hydroxylation directly, or a reference to the clinical case studies describing the phenotypes of patient mutations. Additionally, the experimental descriptions for several experiments are missing descriptions of controls or validation, which limits their strength in supporting the claims of the authors.

      Strengths:

      (1) This manuscript is well-written and clear.

      (2) The authors use multiple assays to look at the effects of several disease-associated mutations, which support the claims.

      (3) The identification of P317R as a mutant that loses activity specifically against NODD, which could be a useful tool for further studies in cells.

      Weaknesses:

      Major:

      (1) The source data for the patient mutations (Figure 1) in PHD2 is not referenced, and it's not clear where this data came from or if it's publicly available. There is no section describing this in the methods.

      (2) The NMR hydroxylation assay.

      A. The description of these experiments is really confusing. The authors have published a recent paper describing a method using 13C-NMR to directly detect proly-hydroxylation over time, and they refer to this manuscript multiple times as the method used for the studies under review. However, it appears the current study is using 15N-HSQC-based experiments to track the CSP of neighboring residues to the target prolines, so not the target prolines themselves. The authors should make this clear in the text, especially on page 9, 5th line, where they describe proline cross-peaks and refer to the 15N-HSQC data in Figure 5B.<br /> B. The authors are using neighboring residues as reporters for proline hydroxylation, without validating this approach. How well do CSPs of A403 and I566 track with proline hydroxylation? Have the authors confirmed this using their 13C-NMR data or mass spec?<br /> C. Peak intensities. In some cases, the peak intensities of the end point residue look weaker than the peak intensities of the starting residue (5B, PHD2 WT I566, 6 ct lines vs. 4 ct lines). Is this because of sample dilution (i.e., should happen globally)? Can the authors comment on this?

      (3) Data validating the CRISPR KO HEK293A cells is missing.

      (4) The interpretation of the SEC data for the PHD2 mutants is a little problematic. Subtle alterations in the elution profiles may hint at different hydrodynamic radii, but as the samples were not loaded at equal concentrations or volumes, these data seem more anecdotal, rather than definitive. Repeating this multiple times, using matched samples, followed by comparison with standards loaded under identical buffer conditions, would significantly strengthen the conclusions one could make from the data.

      Minor:

      (1) Justification for picking the seven residues is not clearly articulated. The authors say they picked 7 mutants with "distinct residue changes", but no further rationale is provided.

      (2) A major finding of the paper is that a disease-associated mutation, P317R, can differentially affect HIF1 prolyhydroxylation, however, additional follow-up studies have not been performed to test this in cells or to validate the mutant in another method. Is it the position of the proline within the catalytic core, or the identity of the mutation that accounts for the selectivity?

      Comments on revision:

      The revised manuscript addresses most of my concerns, i.e performing SEC experiments under matched sample concentrations, and incorporating additional data to justify the use of surrogate residues to monitor proline hydroxylation. I appreciate the improvements in the text to clarify the NMR experiments, but I still find their description confusing. Although the authors are using neighboring residues to monitor proline hydroxylation (which they justify convincingly using supplementary data), the language in the text suggests they are (and can?) monitor them directly (i.e. referring to proline cross-peaks in an 15N-HSQC spectrum). The axis labels in Figure 5B also seem to have become mislabeled in this revised version.

    4. Reviewer #3 (Public review):

      Summary:

      This is an interesting and clinically relevant in vitro study by Taber et al., exploring how mutations in PHD2 contribute to erythrocytosis and/or neuroendocrine tumors. PHD2 regulates HIFα degradation through prolyl-hydroxylation, a key step in the cellular oxygen-sensing pathway.

      Using a time-resolved NMR-based assay, the authors systematically analyze seven patient-derived PHD2 mutants and demonstrate that all exhibit structural and/or catalytic defects. Strikingly, the P317R variant retains normal activity toward the C-terminal proline but fails to hydroxylate the N-terminal site. This provides the first direct evidence that N-terminal prolyl-hydroxylation is not dispensable, as previously thought.

      The findings offer valuable mechanistic insight into PHD2-driven effects and refine our understanding of HIF regulation in hypoxia-related diseases.

      Strengths:

      The manuscript has several notable strengths. By applying a novel time-resolved NMR approach, the authors directly assess hydroxylation at both HIF1α ODD sites, offering a clear functional readout. This method allows them to identify the P317R variant as uniquely defective in NODD hydroxylation, despite retaining normal activity toward CODD, thereby challenging the long-held view that the N-terminal proline is biologically dispensable. The work significantly advances our understanding of PHD2 function and its role in oxygen sensing, and might help in the future interpretation and clinical management of associated erythrocytosis.

      Weaknesses:

      There is a lack of in vivo/ex vivo validation. This is actually required to confirm whether the observed defects in hydroxylation-especially the selective NODD impairment in P317R-are sufficient to drive disease phenotypes such as erythrocytosis.

      The reliance on HRE-luciferase reporter assays may not reliably reflect the PHD2 function and highlights a limitation in the assessment of downstream hypoxic signaling.

      The study clearly documents the selective defect of the P317R mutant, but the structural basis for this selectivity is not addressed through high-resolution structural analysis (e.g., cryo-EM).

      Given the proposed central role of HIF2α in erythrocytosis, direct assessment of HIF2α hydroxylation by the mutants would have strengthened the conclusions.

      Comments on revision:

      The revised manuscript by Taber et al. addresses the key points raised during the review process in a comprehensive and appropriate manner. While some limitations remain, such as the lack of in vivo validation or direct HIF2α assessment, I agree with the authors that these are beyond the scope of the current in vitro-focused study. The authors' primary goal was to define the structural and functional defects caused by disease-associated PHD2 mutations. In this respect, the evidence they present is largely convincing and methodologically appropriate. Additional clarifications and an expanded discussion of the luciferase assay's limitations and the P317R structural context strengthen the manuscript further.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Taber et al report the biochemical characterization of 7 mutations in PHD2 that induce erythrocytosis. Their goal is to provide a mechanism for how these mutations cause the disease. PHD2 hydroxylates HIF1a in the presence of oxygen at two distinct proline residues (P564 and P402) in the "oxygen degradation domain" (ODD). This leads to the ubiquitylation of HIF1a by the VHL E3 ligase and its subsequent degradation. Multiple mutations have been reported in the EGLN1 gene (coding for PHD2), which are associated with pseudohypoxic diseases that include erythrocytosis. Furthermore, 3 mutations in PHD2 also cause pheochromocytoma and paraganglioma (PPGL), a neuroendocrine tumour. These mutations likely cause elevated levels of HIF1a, but their mechanisms are unclear. Here, the authors analyze mutations from 152 case reports and map them on the crystal structure. They then focus on 7 mutations, which they clone in a plasmid and transfect into PHD2-KO to monitor HIF1a transcriptional activity via a luciferase assay. All mutants show impaired activation. Some mutants also impaired stability in pulse chase turnover assays (except A228S, P317R, and F366L). In vitro purified PHD2 mutants display a minor loss in thermal stability and some propensity to aggregate. Using MST technology, they show that P317R is strongly impaired in binding to HIF1a and HIF2a, whereas other mutants are only slightly affected. Using NMR, they show that the PHD2 P317R mutation greatly reduces hydroxylation of P402 (HIF1a NODD), as well as P562 (HIF1a CODD), but to a lesser extent. Finally, BLI shows that the P317R mutation reduces affinity for CODD by 3-fold, but not NODD.  

      Strengths: 

      (1) Simple, easy-to-follow manuscript. Generally well-written. 

      (2) Disease-relevant mutations are studied in PHD2 that provide insights into its mechanism of action. 

      (3) Good, well-researched background section. 

      Weaknesses: 

      (1) Poor use of existing structural data on the complexes of PHD2 with HIF1a peptides and various metals and substrates. A quick survey of the impact of these mutations (as well as analysis by Chowdhury et al, 2016) on the structure and interactions between PHD2 peptides of HIF1a shows that the P317R mutation interferes with peptide binding. By contrast, F366L will affect the hydrophobic core, and A228S is on the surface, and it's not obvious how it would interfere with the stability of the protein. 

      Thank you for the comment.  We have further analyzed the mutations on the available PHD2 crystal structures in complex with HIFα to discern how these substitution mutations may impact PHD2 structure and function.  This analysis has been added into the discussion.

      (2) To determine aggregation and monodispersity of the PHD2 mutants using size-exclusion chromatography (SEC), equal quantities of the protein must be loaded on the column. This is not what was done. As an aside, the colors used for the SEC are very similar and nearly indistinguishable. 

      Agreed. We have performed an additional experiment as suggested by the reviewer to further assess aggregation and hydrodynamic size.  The colors used in the graph were changed for clearer differentiation between samples.

      (3) The interpretation of some mutants remains incomplete. For A228S, what is the explanation for its reduced activity? It is not substantially less stable than WT and does not seem to affect peptide hydroxylation. 

      We agree with the reviewer that the causal mechanism for some of the tested disease-causing mutants remain unclear.  The negative findings also raise the notion, perhaps considered controversial, that there may be other substrates of PHD2 that are impacted by certain mutations, which contribute to disease pathogenesis.  A brief paragraph discussing this has been included in the discussion.

      (4) The interpretation of the NMR prolyl hydroxylation is tainted by the high concentrations used here. First of all, there is a likely a typo in the method section; the final concentration of ODD is likely 0.18 mM, and not 0.18 uM (PNAS paper by the same group in 2024 reports using a final concentration of 230 uM). Here, I will assume the concentration is 180 uM. Flashman et al (JBC 2008) showed that the affinity of the NODD site (P402; around 10 uM) for PHD2 is 10-fold weaker than CODD (P564, around 1 uM). This likely explains the much faster kinetics of hydroxylation towards the latter. Now, using the MST data, let's say the P317R mutation reduces the affinity by 40-fold; the affinity becomes 400 uM for NODD (above the protein concentration) and 40 uM for CODD (below the protein concentration). Thus, CODD would still be hydroxylated by the P317R mutant, but not NODD. 

      The HIF1α concentration was indeed an oversight, which will be corrected to 0.18 mM.  The study by Flashman et al.[1] showing PHD2 having a lower affinity to the NODD than CODD likely contributes to the differential hydroxylation rates via PHD2 WT.  We showed here via MST that PHD2 P317R had K[d] of 320 ± 20 uM for HIF1αCODD, which should have led to a severe enzymatic defect, even at the high concentrations used for NMR (180 uM).  However, we observed only a subtle reduction in hydroxylation efficiency in comparison to PHD2 WT.  Thus, we performed another binding method using BLI that showed a mild binding defect on CODD by PHD2 P317R, consistent with NMR data.  The perplexing result is the WT-like binding to the NODD by PHD2 P317R, which appears inconsistent with the severe defect in NODD hydroxylation via PHD2 P317R as measured via NMR.  These results suggest that there are supporting residues within the PHD2/NODD interface that help maintain binding to NODD but compromise the efficiency of NODD hydroxylation upon PHD2 P317R mutation. 

      (5) The discrepancy between the MST and BLI results does not make sense, especially regarding the P317R mutant. Based on the crystal structures of PHD2 in complex with the ODD peptides, the P317R mutation should have a major impact on the affinity, which is what is reported by MST. This suggests that the MST is more likely to be valid than BLI, and the latter is subject to some kind of artefact. Furthermore, the BLI results are inconsistent with previous results showing that PHD2 has a 10-fold lower affinity for NODD compared to CODD. 

      The reviewer’s structural prediction that P317R mutation should cause a major binding defect, while agreeable with our MST data, is incongruent with our NMR and the data from Chowdhury et al.[2] that showed efficient hydroxylation of CODD via PHD2 P317R.  Moreover, we have attempted to model NODD and CODD on apo PHD2 P317R structure and found that the mutation had no major impact on CODD while the mutated residue could clash with NODD, causing a shifting of peptide positioning on the protein.  However, these modeling predictions, like any in silico projections, would need experimental validation.  As mentioned in our preceding response, we also performed BLI, which showed that PHD2 P317R had a minor binding defect for CODD, consistent with the NMR results and findings by Chowdhury et al[2].  NODD binding was also measured with BLI as purified NODD peptides were not amenable for soluble-based MST assay, which showed similar K[d]’s for PHD2 WT and P317R.  Considering the absence of NODD hydroxylation via PHD2 P317R as measured by NMR and modeling on apo PHD2 P317R, we posit that P317R causes deviation of NODD from its original orientation that may not affect binding due to the other interactions from the surrounding elements but unfortunately disallows NODD from turnover.  Further study would be required to validate such notion, which we feel is beyond the scope of this manuscript.  

      (6) Overall, the study provides some insights into mutants inducing erythrocytosis, but the impact is limited. Most insights are provided on the P317R mutant, but this mutant had already been characterized by Chowdhury et al (2016). Some mutants affect the stability of the protein in cells, but then no mechanism is provided for A228S or F366L, which have stabilities similar to WT, yet have impaired HIF1a activation. 

      We thank the reviewer for raising these and other limitations.  We have expanded on the shortcomings of the present study but would like to underscore that the current work using the recently described NMR assay along with other biophysical analyses suggests a previously under-appreciated role of NODD hydroxylation in the normal oxygen-sensing pathway.  

      Reviewer #2 (Public review): 

      Summary: 

      Mutations in the prolyl hydroxylase, PHD2, cause erythrocytosis and, in some cases, can result in tumorigenesis. Taber and colleagues test the structural and functional consequences of seven patientderived missense mutations in PHD2 using cell-based reporter and stability assays, and multiple biophysical assays, and find that most mutations are destabilizing. Interestingly, they discover a PHD2 mutant that can hydroxylate the C-terminal ODD, but not the N-terminal ODD, which suggests the importance of N-terminal ODD for biology. A major strength of the manuscript is the multidisciplinary approach used by the authors to characterize the functional and structural consequences of the mutations. However, the manuscript had several major weaknesses, such as an incomplete description of how the NMR was performed, a justification for using neighboring residues as a surrogate for looking at prolyl hydroxylation directly, or a reference to the clinical case studies describing the phenotypes of patient mutations. Additionally, the experimental descriptions for several experiments are missing descriptions of controls or validation, which limits their strength in supporting the claims of the authors. 

      Strengths: 

      (1) This manuscript is well-written and clear. 

      (2) The authors use multiple assays to look at the effects of several disease-associated mutations, which support the claims. 

      (3) The identification of P317R as a mutant that loses activity specifically against NODD, which could be a useful tool for further studies in cells. 

      Weaknesses: 

      Major: 

      (1) The source data for the patient mutations (Figure 1) in PHD2 is not referenced, and it's not clear where this data came from or if it's publicly available. There is no section describing this in the methods. 

      Clinical and patient information on disease-causing PHD2 mutants was compiled from various case reports and summarized in an excel sheet found in the Supplementary Information.  The case reports are cited in this excel file.  A reference to the supplementary data has been added to the Figure 1 legend and in the introduction.

      (2) The NMR hydroxylation assay. 

      A. The description of these experiments is really confusing. The authors have published a recent paper describing a method using 13C-NMR to directly detect proly-hydroxylation over time, and they refer to this manuscript multiple times as the method used for the studies under review. However, it appears the current study is using 15N-HSQC-based experiments to track the CSP of neighboring residues to the target prolines, so not the target prolines themselves. The authors should make this clear in the text, especially on page 9, 5th line, where they describe proline cross-peaks and refer to the 15N-HSQC data in Figure 5B. 

      As the reviewer mentioned, the assay that we developed directly measures the target proline residues.  This assay is ideal when mutations near the prolines are studied, such as A403, Y565 (He et al[3]).  In this previous work, we observed that the shifting of the target proline cross-peaks due to change in electronegativity on the pyrrolidine ring of proline in turn impacted the neighboring residues[3], which meant that the neighboring residues can be used as reporter residues for certain purposes.  In this study, we focused on investigating the mutations on PHD2 while leaving the sequence of the HIF-1α unchanged by using solely 15N-HSQC-based experiments without the need for double-labeled samples.  Nonetheless, we thank the reviewer for pointing out the confusion in the text and we have corrected and clarified our description of this assay.

      B. The authors are using neighboring residues as reporters for proline hydroxylation, without validating this approach. How well do CSPs of A403 and I566 track with proline hydroxylation? Have the authors confirmed this using their 13C-NMR data or mass spec? 

      For previous studies, we performed intercalated 15N-HSQC and 13C-CON experiments for the kinetic measurements of wild-type HIF-1α and mutants.  We observed that the shifting pattern of A403 and I566 in the 15N-HSQC spectra aligned well with the ones of P402 and P564, respectively, in the 13C-CON spectra.  Representative data has been added to Supplemental Data.

      C. Peak intensities. In some cases, the peak intensities of the end point residue look weaker than the peak intensities of the starting residue (5B, PHD2 WT I566, 6 ct lines vs. 4 ct lines). Is this because of sample dilution (i.e., should happen globally)? Can the authors comment on this? 

      This is an astute observation by the reviewer.  We checked and confirmed that for all kinetic datasets, the peak intensities of the end point residue are always slightly lower than the ones of the starting.  This includes the cases for PHD2 A228S and P317R in 5B, although not as obvious as the one of PHD2 WT.  We agree with the reviewer that the sample dilution is a factor as a total volume of 16 microliters of reaction components was added to the solution to trigger the reaction after the first spectrum was acquired.  It is also likely that rate of prolyl hydroxylation becomes extremely slow with only a low amount of substrate available in the system.  Therefore, the reaction would not be 100% complete which was detected by the sensitive NMR experimentation.

      (3) Data validating the CRISPR KO HEK293A cells is missing. 

      We thank the reviewer for noting this oversight.  Western blots validating PHD2 KO in HEK293A cells have been added to the Supplementary Data file.

      (4) The interpretation of the SEC data for the PHD2 mutants is a little problematic. Subtle alterations in the elution profiles may hint at different hydrodynamic radii, but as the samples were not loaded at equal concentrations or volumes, these data seem more anecdotal, rather than definitive. Repeating this multiple times, using matched samples, followed by comparison with standards loaded under identical buffer conditions, would significantly strengthen the conclusions one could make from the data. 

      Agreed.  We have performed an additional experiment as suggested with equal volume and concentration of each PHD2 construct loaded onto the SEC column for better assessment of aggregation.  Notably, our conclusion remained unchanged.

      Minor: 

      (1) Justification for picking the seven residues is not clearly articulated. The authors say they picked 7 mutants with "distinct residue changes", but no further rationale is provided. 

      Additional justification for the selection of the mutants has been added to the ‘Mutations across the PHD2 enzyme induce erythrocytosis’ section.  Briefly, some mutants were chosen based on their frequency in the clinical data and their presence in potential mutational hot spots.  Various mutations were noted at W334 and R371, while F366L was identified in multiple individuals.  Additionally, 9 cases of PHD2-driven disease were reported to be caused from mutations located between residues 200 to 210 while 13 cases were reported between residues 369-379, so G206C and R371H were chosen to represent potential hot spots.  To examine a potential genotype-phenotype relationship, two of the mutants responsible for neuroendocrine tumor development, A228S and H374R, were also selected.  Finally, mutations located close or on catalytic core residues (P317R, R371H, and H374R) were chosen to test for suspected defects.   

      (2) A major finding of the paper is that a disease-associated mutation, P317R, can differentially affect HIF1 prolyhydroxylation, however, additional follow-up studies have not been performed to test this in cells or to validate the mutant in another method. Is it the position of the proline within the catalytic core, or the identity of the mutation that accounts for the selectivity? 

      This is the very question that we are currently addressing but as a part of a follow-up study.  Indeed, one thought is that the preferential defect observed could be the result of the loss of proline, an exceptionally rigid amino acid that makes contact with the backbone twice, or the addition of a specific amino acid, namely arginine, a flexible amino acid with an added charge at this site.  Although beyond the scope of this manuscript, we will investigate whether such and other characteristics in this region of PHD2/HIF1α interface contribute to the differential hydroxylation. 

      Reviewer #3 (Public review): 

      Summary: 

      This is an interesting and clinically relevant in vitro study by Taber et al., exploring how mutations in PHD2 contribute to erythrocytosis and/or neuroendocrine tumors. PHD2 regulates HIFα degradation through prolyl-hydroxylation, a key step in the cellular oxygen-sensing pathway. 

      Using a time-resolved NMR-based assay, the authors systematically analyze seven patient-derived PHD2 mutants and demonstrate that all exhibit structural and/or catalytic defects. Strikingly, the P317R variant retains normal activity toward the C-terminal proline but fails to hydroxylate the N-terminal site. This provides the first direct evidence that N-terminal prolyl-hydroxylation is not dispensable, as previously thought. 

      The findings offer valuable mechanistic insight into PHD2-driven effects and refine our understanding of HIF regulation in hypoxia-related diseases. 

      Strengths: 

      The manuscript has several notable strengths. By applying a novel time-resolved NMR approach, the authors directly assess hydroxylation at both HIF1α ODD sites, offering a clear functional readout. This method allows them to identify the P317R variant as uniquely defective in NODD hydroxylation, despite retaining normal activity toward CODD, thereby challenging the long-held view that the N-terminal proline is biologically dispensable. The work significantly advances our understanding of PHD2 function and its role in oxygen sensing, and might help in the future interpretation and clinical management of associated erythrocytosis. 

      Weaknesses: 

      (1) There is a lack of in vivo/ex vivo validation. This is actually required to confirm whether the observed defects in hydroxylation-especially the selective NODD impairment in P317R-are sufficient to drive disease phenotypes such as erythrocytosis.

      We thank the reviewer for this comment, and while we agree with this statement, the objective of this study per se was to elucidate the structural and/or functional defect caused by the various diseaseassociated mutations on PHD2.  The subsequent study would be to validate whether the identified defects, in particular the selective NODD impairment, would lead to erythrocytosis in vivo.  However, we feel that such study would be beyond the scope of this manuscript.

      (2) The reliance on HRE-luciferase reporter assays may not reliably reflect the PHD2 function and highlights a limitation in the assessment of downstream hypoxic signaling. 

      Agreed.  All experimental assays and systems have limitations.  The HRE-luciferase assay used in the present manuscript also has limitations such as the continuous expression of exogenous PHD2 mutants driven via CMV promoter.  Thus, we performed several additional biophysical methodologies to interrogate the disease-causing PHD2 mutants.  The limitations of the luciferase assay have been expanded in the revised manuscript. 

      (3) The study clearly documents the selective defect of the P317R mutant, but the structural basis for this selectivity is not addressed through high-resolution structural analysis (e.g., cryo-EM). 

      We thank the reviewer for the comment.  While solving the structure of PHD2 P317R in complex with HIFα substrate is beyond the scope for this study, a structure of PHD2 P317R in complex with a clinically used inhibitor has been solved (PDB:5LAT).  In analyzing this structure and that of PHD2 WT in complex with NODD, Chowdhury et al[2] stated that P317 makes hydrophobic contacts with LXXLAP motif on HIFα and R317 is predicted to interact differently with this motif.  While this analysis does not directly elucidate the reason for the preferential NODD defect, it supports the possibility that P317R substitution may be more detrimental for enzymatic activity on NODD than CODD.  We have discussed this notion in the revised manuscript. 

      (4) Given the proposed central role of HIF2α in erythrocytosis, direct assessment of HIF2α hydroxylation by the mutants would have strengthened the conclusions. 

      We thank the reviewer for this comment, but we feel that such study would be beyond the scope of the present study.  We observed that the PHD2 binding patterns to HIF1α and HIF2α were similar, and we have previously assigned >95% of the amino acids in HIF1α ODD for NMR study[3]. Thus, we first focused on the elucidation of possible defects on disease-associated PHD2 mutants using HIF1α as the substrate with the supposition that an identified deregulation on HIF1α could be extended to HIF2α paralog.  However, we agree with the reviewer that future studies should examine the impact of PHD2 mutants directly on HIF2α.  

      References:

      (1) Flashman, E. et al. Kinetic rationale for selectivity toward N- and C-terminal oxygen-dependent degradation domain substrates mediated by a loop region of hypoxia-inducible factor prolyl hydroxylases. J Biol Chem 283, 3808-3815 (2008).

      (2) Chowdhury, R. et al. Structural basis for oxygen degradation domain selectivity of the HIF prolyl hydroxylases. Nat Commun 7, 12673 (2016).

      (3) He, W., Gasmi-Seabrook, G.M.C., Ikura, M., Lee, J.E. & Ohh, M. Time-resolved NMR detection of prolyl-hydroxylation in intrinsically disordered region of HIF-1alpha. Proc Natl Acad Sci U S A 121, e2408104121 (2024).

      Reviewer #1 (Recommendations for the authors): 

      (1) To increase the impact and significance of this work, I would recommend determining the mechanism by which A228S and F366L impair PHD2. Are these mutations affecting interactions with proteins other than HIF1a? Furthermore, does the F366L mutation affect the hydroxylation rate? This should be measured. The authors should also perform a more in-depth structural analysis of these mutations and perhaps use AlphaFold to identify how these sites may be involved in other interactions. 

      We thank the reviewer for the recommendations.  A paragraph discussing the quandary of A228S and F366L has been added to the discussion as well as an in-depth structural analysis of each selected mutant.  While AlphaFold is excellent at predicting protein structures overall, its capability to predict the effect of single point mutation, such as those in this study, is limited.  Therefore, it was not utilized for this paper.

      (2) For the aggregation assay, I recommended injecting the same quantity of protein on the SEC. If the aggregation-prone mutants' yields were too low, then reduced amounts of the other mutants should be injected. 

      Agreed.  An additional experiment was performed in which similar concentrations of each mutant protein was loaded onto the SEC column and chromatograms was normalized according to the molecular concentration.  Results from this experiment have been added to replace the previously performed aggregation assay.  Notably, the data from the revised experiment did not change the outcome or conclusion of the study.

      (3) For the NMR kinetics data, the authors should discuss the impact of affinities and concentrations on the reaction rate and incorporate this analysis framework to interpret their data. 

      Done.  As discussed in depth in response to Public Reviewer 1’s fourth comment, we observed only a subtle reduction in hydroxylation efficiency of HIF1aCODD by PHD2 P317R in comparison to PHD2 WT.  Upon performing BLI, we found PHD2 P317R displays only a mild binding defect on the CODD and NODD.  The WT-like binding to the NODD by PHD2 P317R appears to be inconsistent with the severe defect in NODD hydroxylation via PHD2 P317R as measured via NMR.   These results suggest that there are supporting residues within the PHD2/NODD interface that help maintain binding to NODD but compromise the efficiency of NODD hydroxylation upon PHD2 P317R mutation.

      Reviewer #2 (Recommendations for the authors): 

      It is unclear where the source data came from describing the patient mutations, or if it is publicly available. Several minor issues were noted with several of the figures or methods: 

      (1) Figure 2C. It is not clear what data are being compared for significance. The lines don't seem to clearly distinguish this. 

      Done.  The significance lines have been adjusted in the figure to better convey which data are being compared.

      (2) Please incorporate the calculated biophysical constants (KD, TM, etc, average +/- std dev) from the tables into the figures or figure legends that show the data from which they are calculated.  

      Done.  References to the corresponding tables have been added to the appropriate figure legends.

      (3) Figure 3C, the data for F366L do not appear normalized in the same way as the other constructs. 

      CD melt values for F366L were normalized in the same way as other constructs but due to noisier data acquired between 25-37°C, the top value of the sigmoidal curve is slightly higher than the other constructs (F366L: 1.066, WT: 1.007, A228S: 1.000, P317R: 1.015, R371H: 1.005). 

      (4) For Figure 1B, it would be helpful to highlight the mutants characterized in the current study with a different color/symbol to help show the number of cases. 

      Done.  Dots representing the selected mutants have been highlighted in red in Figure 1B.

      (5) A description of the isotopic labeling of PHD2 is missing from the methods.

      Due to the nature of the NMR assay, no isotopic labeling was required for PHD2.

      Reviewer #3 (Recommendations for the authors): 

      (1) To further strengthen the manuscript, the authors could consider exploring the relevance of their in vitro findings in a more physiological context. 

      We thank the reviewer for the suggestion, and we will certainly consider furthering our investigation in a more physiological context for future studies.

      (2) If technically feasible, integrating direct analyses of HIF2α regulation by the PHD2 mutants would better reflect the clinical phenotype, given the known importance of HIF2α in erythrocytosis. 

      We agree that HIF2α is important in the context of erythrocytosis, but through MST we observed no difference in binding pattern between HIF1 and HIF2 and the selected PHD2 mutants.  As we had previously assigned >95% of residues for HIF1α ODD for NMR assay, we analyzed HIF1 with the supposition that any defects observed would likely apply to HIF2.  However, we agree that future studies on the impact of PHD2 mutants directly on HIF2 would be beneficial to supplement our understanding of pseudohypoxic disease.

      (3) Additionally, although perhaps more suitable for future work or discussion, structural modeling or highresolution structural studies of the P317R variant could offer valuable insight into the observed NODD selectivity defect. 

      We thank the reviewer for the suggestion. While solving the structure of PHD2 P317R in complex with NODD is beyond the scope of this manuscript, a crystal structure of PHD2 P317R in complex with an inhibitor has been solved and insights from this structure have been added to the discussion. 

      (4) Finally, a brief clarification or discussion of the limitations of the luciferase reporter assay-especially in the context of aggregation-prone mutants-would help readers better interpret the functional data. 

      We thank the reviewer for the suggestion.  The limitations of the luciferase reporter assay in regard to its inability to detect defects with aggregation-prone mutants have been elaborated on in the discussion.

    1. eLife Assessment

      This study presents a valuable and interesting finding that a combination of arginine methyltransferase inhibitors synergize with PARP inhibitors to eliminate ovarian and triple negative cancer cell lines in vitro and in vivo using preclinical mouse models. The data were collected and analyzed using solid and validated methodology and can be used as a starting point for the development of novel therapeutics. The work will be of broad interest to scientists working in the field of breast cancer and ovarian cancer.

    2. Reviewer #2 (Public review):

      Summary:

      The authors show that a combination of arginine methyltransferase inhibitors synergize with PARP inhibitors to kill ovarian and triple negative cancer cell lines in vitro and in vivo using preclinical mouse models.

      Strengths and weaknesses

      The experiments are well-performed, convincing and have the appropriate controls (using inhibitors and genetic deletions) and use statistics.

      They identify the DNA damage protein ERCC1 to be reduced in expression with PRMT inhibitors. As ERCC1 is known to be synthetic lethal with PARPi, this provides a mechanism for the synergy. They use cell lines only for their study in 2D as well as xenograph models.

      Comments on revisions:

      The authors have addressed by final concerns.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public Review):

      Summary:

      The authors show that a combination of arginine methyltransferase inhibitors synergize with PARP inhibitors to kill ovarian and triple negative cancer cell lines in vitro and in vivo using preclinical mouse models.

      Strengths and weaknesses

      The experiments are well-performed, convincing and have the appropriate controls (using inhibitors and genetic deletions) and use statistics.

      They identify the DNA damage protein ERCC1 to be reduced in expression with PRMT inhibitors. As ERCC1 is known to be synthetic lethal with PARPi, this provides a mechanism for the synergy. They use cell lines only for their study in 2D as well as xenograph models.

      We sincerely thank Reviewer #2 for the insightful and constructive feedback, as well as for the kind recognition of the scientific quality of our work: “The experiments are well-performed, convincing and have the appropriate controls (using inhibitors and genetic deletions) and use statistics.” We sincerely thank Reviewer #2 for their thoughtful and constructive comments during both rounds of review, which have significantly improved the quality of our manuscript. In response, we have incorporated new results from additional experiments into the figures (Figures 6M and 6N) and made comprehensive revisions throughout the text, figures, and supplementary materials. Following the reviewer’s valuable suggestions, we also revised the Discussion section. In the “Recommendations for the authors” sections, we have provided detailed point-by-point responses to each comment, which were instrumental in guiding our revisions. We believe these updates have substantially strengthened the manuscript and fully addressed all reviewer concerns.

      Reviewer #2 (Recommendations for the authors): 

      Although the authors have addressed each recommendation from the reviewer, further revision of the manuscript are still necessary, as outlined below.

      Add these additional comments in the text to further enhance the comprehension and clarity of the data.

      (1) If the authors kept the tumors of various sizes in Figure 7I, it would be important to assess the protein and/or mRNA level of ERCC1 to further support their mechanism.

      Question (1): Please add the figures of new experiments (treatment diagram, curves for tumor volume and qRT-PCR data) to Figure 6.

      We thank the reviewers for their constructive suggestions. In response to the reviewers’ comments, we have added the treatment diagram and qPCR results to Figure 6. In this experiment, we shortened the treatment duration to seven days to assess early molecular responses to therapy rather than downstream effects. As expected, such short-term treatment did not result in significant differences in tumor growth among groups. The new results are now presented in Figure 6, panels M and N. The corresponding results and figure legends will also be included in the revised version of the manuscript

      (2) Figure 2G: please explain why two bands remain for sgPRMT1.

      Question (2): In the answer, the authors stated, "Upon knockdown of the major isoforms by CRISPR/Cas9, expression of this minor isoform may have increased as part of a compensatory feedback mechanism, rendering it detectable by immunoblotting." Please put the statement into the discussion section.

      We sincerely thank the reviewers for their thoughtful and constructive suggestions. In response to these comments, we have carefully revised the manuscript and incorporated the corresponding information into the Discussion section to provide greater clarity and context for our findings.

      (3) (Previously point 5) What is the link with ERCC1 splicing because reduced overall ERCC1 expression is clear?

      Question (5): Please add the explanation you provide of links between ERCC1 splicing and PRMTi into the discussion section.

      "Furthermore, as shown in Figure 4G, we observed a reduction in the total ERCC1 mRNA reads following PRMTi treatment. This decrease may be attributed, at least in part, to the instability of the alternatively spliced ERCC1 transcripts, which could be more prone to degradation. In combination with the transcriptional downregulation of ERCC1 induced by PRMT inhibition, these alternative splicing events may lead to a further reduction in functional ERCC1 protein levels. This dual impact on ERCC1 expression, through both decreased transcription and the generation of unstable or nonfunctional isoforms, likely contributes to the enhanced cellular sensitivity to PARP inhibitors observed in our study."

      We sincerely thank the reviewers for their thoughtful and constructive suggestions. In response to these comments, we have carefully revised the manuscript and incorporated the corresponding information into the Discussion section to provide greater clarity and context for our findings.

      (4) (Previously 6) Figure 7J: From the graph, it seems like Olaparib+G715 and G715+G025 have a similar effect on tumor volume (two curves overlap). Please discuss.

      Question (6): In the answer, the authors stated, "Our in vitro and in vivo findings, together with previously published data, consistently demonstrate that GSK715 is more potent than both GSK025 and Olaparib. Notably, treatment with GSK715 alone led to significantly greater inhibition of tumor growth compared to either GSK025 or Olaparib administered individually. This higher potency of GSK715 also explains the comparable levels of tumor suppression observed in the combination groups, including GSK715 plus Olaparib and GSK715 plus GSK025. These results suggest that GSK715 is likely the primary driver of efficacy in the two drug combination settings." Please put the statement in the corresponding result section for Figure 6J.

      We sincerely thank the reviewers for their thoughtful and constructive suggestions. In response to these comments, we have carefully revised the manuscript and incorporated the corresponding information into the result section for Figure 6J to provide greater clarity and context for our findings.

    1. eLife Assessment

      This valuable study provides a comprehensive description of the Nematostella vectensis matrisome - the genes encoding the proteins of the extracellular matrix. The authors combine new mass spectrometry data with bioinformatic analyses of previously published genomic and single-cell RNAseq data. The analysis is thorough, and the discussion and conclusions are convincing. This work will be of interest to biologists working on the evolution of the matrisome, as well as more broadly those working with non-bilaterian animals.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript entitled "Molecular dynamics of the matrisome across sea anemone life history", Bergheim and colleagues report the prediction, using an established sequence analysis pipeline, of the "matrisome" - that is, the compendium of genes encoding constituents of the extracellular matrix - of the starlet sea anemone Nematostella vectensis. Re-analysis of an existing scRNA-Seq dataset allowed the authors to identify the cell types expressing matrisome components and different developmental stages. Last, the authors apply time-resolved proteomics to provide experimental evidence of the presence of the extracellular matrix proteins at three different stages of the life cycle of the sea anemone (larva, primary polyp, adult) and show that different subsets of matrisome components are present in the ECM at different life stages with, for example, basement membrane components accompanying the transition from larva to primary polyp and elastic fiber components and matricellular proteins accompanying the transition from primary polyp to the adult stage.

      Strengths:

      The ECM is a structure that has evolved to support the emergence of multicellularity and different transitions that have accompanied the complexification of multicellular organisms. Understanding the molecular makeup of structures that are conserved throughout evolution is thus of paramount importance.

      The in-silico predicted matrisome of the sea anemone has the potential to become an essential resource for the scientific community to support big data annotation efforts and better understand the evolution of the matrisome and of ECM proteins, an important endeavor to better understand structure/function relationships. Toward this goal, the authors provide a comprehensive list with extensive annotations and cross-referencing of the 551 genes encoding matrisome proteins in the sea anemone genome.

      This study is also an excellent example of how integrating datasets generated using different -omic modalities can shed light on various aspects of ECM metabolism, from identifying the cell types of origins of matrisome components using scRNA-Seq to studying ECM dynamics using proteomics.

      Weakness:

      - Prior proteomic studies on the ECM of vertebrate organisms have shown the importance of allowing certain post-translational modifications during database search to ensure maximizing peptide-to-spectrum matching and accurately evaluating protein quantification. Such PTMs include the hydroxylation of lysines and prolines that are collagen-specific PTMs. Multiple reports have shown that omitting these PTMs while analyzing LC-MS/MS data would lead to underestimating the abundance of collagens and the misidentification of certain collagens. While the authors in their response state that the inclusion of these PTMs only led to a modest increase in protein identification, they do not comment on the impact of including these PTMs on PSMs or protein abundance (precursor ion intensity).

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript entitled "Molecular dynamics of the matrisome across sea anemone life history", Bergheim and colleagues report the prediction, using an established sequence analysis pipeline, of the "matrisome" - that is, the compendium of genes encoding constituents of the extracellular matrix - of the starlet sea anemone Nematostella vectensis. Re-analysis of an existing scRNA-Seq dataset allowed the authors to identify the cell types expressing matrisome components and different developmental stages. Last, the authors apply time-resolved proteomics to provide experimental evidence of the presence of the extracellular matrix proteins at three different stages of the life cycle of the sea anemone (larva, primary polyp, adult) and show that different subsets of matrisome components are present in the ECM at different life stages with, for example, basement membrane components accompanying the transition from larva to primary polyp and elastic fiber components and matricellular proteins accompanying the transition from primary polyp to the adult stage. 

      Strengths: 

      The ECM is a structure that has evolved to support the emergence of multicellularity and different transitions that have accompanied the complexification of multicellular organisms. Understanding the molecular makeup of structures that are conserved throughout evolution is thus of paramount importance. 

      The in-silico predicted matrisome of the sea anemone has the potential to become an essential resource for the scientific community to support big data annotation efforts and understand better the evolution of the matrisome and of ECM proteins, an important endeavor to better understand structure/function relationships. This study is also an excellent example of how integrating datasets generated using different -omic modalities can shed light on various aspects of ECM metabolism, from identifying the cell types of origins of matrisome components using scRNA-Seq to studying ECM dynamics using proteomics. 

      We greatly appreciate the positive feedback regarding the design of our study and the evolutionary significance of our findings.

      Weaknesses: 

      My concerns pertain to the three following areas of the manuscript: 

      (1) In-silico definition of the anemone matrisome using sequence analysis: 

      a) While a similar computational pipeline has been applied to predict the matrisome of several model organisms, the authors fail to provide a comprehensive definition of the anemone matrisome: In the text, the authors state the anemone matrisome is composed of "551 proteins, constituting approximately 3% of its proteome (see page 6, line 14), but Figure 1 lists 829 entries as part of the "curated" matrisome, Supplementary Table S1 lists the same 829 entries and the authors state that "Here, we identified 829 ECM proteins that comprise the matrisome of the sea anemone Nematostella vectensis" (see page 17, line 10). Is the sea anemone matrisome composed of 551 or 829 genes? If we refer to the text, the additional 278 entries should not be considered as part of the matrisome, but what is confusing is that some are listed as glycoproteins and the "new_manual_annotation" proposed by the authors and that refer to the protein domains found in these additional proteins suggest that in fact, some could or should be classified as matrisome proteins. For example, shouldn't the two lectins encoded by NV2.3951 and NV2.3157 be classified as matrisome-affiliated proteins? Based on what has been done for other model organisms, receptors have typically been excluded from the "matrisome" but included as part of the "adhesome" for consistency with previously published matrisome; the reviewer is left wondering whether the components classified as "Other" / "Receptor" should not be excluded from the matrisome and moved to a separate "adhesome" list. 

      In addition to receptors, the authors identify nearly 70 glycoproteins classified as "Other". Here, does other mean "non-matrisome" or "another matrisome division" that is not core or associated? If the latter, could the authors try to propose a unifying term for these proteins? Unfortunately, since the authors do not provide the reasons for excluding these entries from the bona fide matrisome (list of excluding domains present, localization data), the reader is left wondering how to treat these entries. 

      Overall, the study would gain in strength if the authors could be more definitive and, if needed, even propose novel additional matrisome annotations to include the components for now listed as "Other" (as was done, for example, for the Drosophila or C. elegans matrisomes). 

      The reviewer is correct to point out the confusing terminology used throughout our manuscript, where both the total of 829 proteins constituting the curated list of ECM domain proteins and the actual matrisome (excluding "others") were referred to as "matrisomes". In general, we followed the example set by Naba & Hynes in their 2012 paper (Mol Cell Proteomics. 2012 Apr;11(4):M111.014647. doi: 10.1074/mcp.M111.014647), where they define the "matrisome" as encompassing all components of the extracellular matrix ("core matrisome") and those associated with it ("matrisome-associated" proteins). This corresponds to our group of 551 proteins, comprising both core matrisome and matrisomeassociated proteins. The Naba & Hynes paper also contains the inclusive and exclusive domain lists for the matrisome that we applied for our dataset. In the revised manuscript, we have now labelled the group of 829 proteins as "curated ECM domain proteins/genes", which includes all proteins positively selected for containing a bona fide ECM domain. After excluding non-matrisomal proteins such as receptors, we arrive at the 551 proteins that constitute the "Nematostella matrisome". We have maintained this terminology throughout the revised manuscript and have revised Figures 1B and 4B accordingly.

      Regarding the category of "other" proteins, which by definition are not part of the matrisome although containing ECM domains, we have taken the reviewer's advice and classified these in more detail. We categorized all receptors as "adhesome" (202 proteins).  The remaining group of “other” secreted ECM domain proteins were then further subcategorized. Those exhibiting significant matches in the ToxProt database were subclassified as "putative venoms" (15 proteins). This group also includes the two lectins (NV2.3951 and NV2.3157), which had been originally shifted to the “other” category due to their classification as venoms. We categorized as “adhesive proteins” (28 proteins) factors such as coadhesins that due to their domain architecture resemble bioadhesive proteins described in proteomic studies of other invertebrate species, such as corals or sponges (see also https://doi.org/10.1016/j.jprot.2022.104506). Further sub-categories are stress/injury response proteins (9 proteins) and ion channels (6 proteins). The remaining 17 proteins were categorized as “uncharacterized ECM domain proteins”. These include highly diverse proteins possessing either single ECM domains or novel domain combinations. We decided to retain those in our dataset as candidates for future functional characterization.

      b) It is surprising that the authors are not providing the full currently accepted protein names to the entries listed in Supplementary Table S1 and have used instead "new_manual_annotation" that resembles formal protein names. This liberty is misleading. In fact, the "new_manual_annotation" seems biased toward describing the reason the proteins were positively screened for through sequence analysis, but many are misleading because there is, in fact, more known about them, including evidence that they are not ECM proteins. The authors should at least provide the current protein names in addition to their "new_manual_annotations". 

      c) To truly serve as a resource, the Table should provide links to each gene entry in the Stowers Institute for Medical Research genome database used and some sort of versioning (this could be added to columns A, B, or D). Such enhancements would facilitate the assessment of the rigor of the list beyond the manual QC of just a few entries. 

      d) Since UniProt is the reference protein knowledge database, providing the UniProt IDs associated with the predicted matrisome entries would also be helpful, giving easy access to information on protein domains, protein structures, orthology information, etc. 

      e) In conclusion, at present, the study only provides a preliminary draft that should be more rigorously curated and enriched with more comprehensive and authoritative annotations if the authors aspire the list to become the reference anemone matrisome and serve the community. 

      Table S1 has been updated to include links to the respective Stowers Institute IDs (first two columns), as well as SwissProt IDs and current descriptions from both the Stowers Institute (SI) and Swissprot.

      In our manual annotations, we prioritized these over automated ones due to the considerable effort invested in examining each sequence individually. The cnidaria-specific minicollagens and NOWA proteins might serve as an example. According to the SI descriptions, the minicollagens are annotated as “keratin-associated protein, predicted or hypothetical protein, collagen-like protein and pericardin”. We classified these as minicollagens on the basis of overall domain architecture and of signature domains and sequence motifs, such as minicollagen cysteine-rich domains (CRDs) and polyproline stretches (doi: 10.1016/j.tig.2008.07.001). NOWA is a CTLD/CRD-containing protein that is part of nematocyst tubules (doi:10.1016/j.isci.2023.106291). The first two NOWA isoforms, according to Si descriptions, were annotated as aggrecan and brevican core proteins, which is very misleading. We therefore feel that our manual annotations better serve the cnidarian research community in classifying these proteins.

      Automated annotations of ECM proteins often rely on similarities between individual domains, neglecting overall domain composition. For example, Swissprot descriptions annotate 31 TSP1 domain-containing proteins in our list as "Hemicentin-1", but closer inspection reveals that only one sequence (NV2.24790) qualifies as Hemicentin-1 due to its characteristic vWFA, Ig-like, TSP1, G2 nidogen, and EGF-like domain architecture. Regarding novel protein annotations, NV2.650 might serve as an example. While SI descriptions annotate this protein as "epidermal growth factor" based on the presence of several EGF-like domains, our analysis reveals two integrin alpha N-terminal domains that classify this sequence as integrin-related. We have therefore assigned a description (Secreted integrin-N-related protein) that references this defining domain and avoids misclassification within the EGF family.

      In cases where the automated annotation (including those in Genbank) matched our own findings, we adopted the existing description, as seen with netrin-1 (NV2.7734). We acknowledge that our manual annotations are not flawless and will be refined by future research. Nonetheless, we offer them as an approximation to a more accurate definition of the identified protein list.

      (2) Proteomic analysis of the composition of the mesoglea during the sea anemone life cycle: 

      a) The product of 287 of the 829 genes proposed to encode matrisome components was detected by proteomics. What about the other ~550 matrisome genes? When and where are they expressed? The wording employed by the authors (see line 11, page 13) implies that only these 287 components are "validated" matrisome components. Is that to say that the other ~550 predicted genes do not encode components of the ECM? This should be discussed. 

      Obviously, our wording was not sufficiently accurate here. In the revised Fig. 1B we indicated that 210 of the 551 matrisome (core and associated) proteins were confirmed by mass spectrometry. In total, 287 proteins were identified by mass spectrometry, meaning that 77 of those are non-matrisomal proteins belonging to the “adhesome” (47) and “other” (30) groups. The fact that the remaining 542 proteins of the matrisome predicted by our in silico analysis could not be identified has two major reasons: (1) Our study was focussed on the molecular dynamics of the mesoglea. Therefore, only mesogleas were isolated for the mass spectrometry analysis and nematocysts were mostly excluded by extensive washing steps. As nematocysts contribute significantly to the predicted matrisome, this group of proteins is underrepresented in the mass spectrometry analysis. (2) A significant fraction of the predicted ECM proteins constitutes soluble factors and transmembrane receptors. These might not be necessarily part of the mesoglea isolates. In addition, the isolation and solubilization method we applied might have technical limitations. Although we used harsh conditions for solubilizing the mesoglea samples (90°C and high DTT concentrations), we cannot exclude that we missed proteins which resisted solubilization and thus trypsinization. We confirmed that all genes predicted by the in silico analysis have transcriptomic profiles as demonstrated in supplementary table S4. We have clarified these points in the revised results part (p.6) and also revised the statement in line 16, page 13.

      b) Can the authors comment on how they have treated zero TMT values or proteins for which a TMT ratio could not be calculated because unique to one life stage, for example? 

      We did not include these proteins in the analysis of the respective statistical comparison. This involved only very few proteins (about 10).  

      c) Could the authors provide a plot showing the distribution of protein abundances for each matrisome category in the main figure 4? In mammals, the bulk of the ECM is composed of collagens, followed by fibrillar ECM glycoproteins, the other matrisome components being more minor. Is a similar distribution observed in the sea anemone mesoglea? 

      We have included such a plot showing protein abundances across life stages and protein categories (Fig. 4A). Collagens and basement membrane proteoglycans (perlecan) are the most abundant protein categories in the core matrisome while secreted factors dominate in the matrisome-associated group.

      d) Prior proteomic studies on the ECM of vertebrate organisms have shown the importance of allowing certain post-translational modifications during database search to ensure maximizing peptide-to-spectrum matching. Such PTMs include the hydroxylation of lysines and prolines that are collagen-specific PTMs. Multiple reports have shown that omitting these PTMs while analyzing LC-MS/MS data would lead to underestimating the abundance of collagens and the misidentification of certain collagens. The authors may want to reanalyze their dataset and include these PTMs as part of their search criteria to ensure capturing all collagen-derived peptides. 

      Thank you for this suggestion. We have re-analyzed our dataset including lysine and proline hydroxylation as PTM. While we obtained in total 70 more proteins using this approach, this additional group did not contain any large collagen or minicollagen we had not detected before. We only obtained two additional collagen-like proteins with very short triple helical domains (V2t013973001.1, NV2t024002001.1), one being a fragment. We don’t feel this justifies implementing a re-analysis of the proteome in our study.

      e) The authors should ensure that reviewers are provided with access to the private PRIDE repository so the data deposited can also be evaluated. They should also ensure that sufficient meta-data is provided using the SRDF format to allow the re-use of their LCMS/MS datasets. 

      We apologize for not providing the reviewer access in our initial submission and have asked the editorial office to forward the PRIDE repository link to all reviewers immediately after receiving the reviews. We did upload a metadata.csv file with the proteomics dataset. This file contains an annotation of all TMT labels to the samples and conditions and replicates used in the manuscript. It contains similar information as an SRDF format file. In addition, the search output files on protein and psm level have been provided. So, from our point of view, we provided all necessary information to reproduce the analysis.

      (3) Supplementary tables: 

      The supplementary tables are very difficult to navigate. They would become more accessible to readers and non-specialists if they were accompanied by brief legends or "README" tabs and if the headers were more detailed (see, for example, Table S2, what does "ctrl.ratio_Larvae_rep2" exactly refer to? Or Table S6 whose column headers using extensive abbreviations are quite obscure). Similarly, what do columns K to BX in Supplementary Table S1 correspond to? Without more substantial explanations, readers have no way of assessing these data points. 

      We have revised the tables and removed any redundant data columns. We also included detailed explanations of the used abbreviations, both in the headers and in a separate README file. Some of the information was apparently lost during the conversion to pdf files. We will therefore upload the original .xls files when submitting the revised manuscript.

      Reviewer #2 (Public review): 

      This work set out to identify all extracellular matrix proteins and associated factors present within the starlet sea anemone Nematostella vectensis at different life stages. Combining existing genomic and transcriptomic datasets, alongside new mass spectometry data, the authors provide a comprehensive description of the Nematostella matrisome. In addition, immunohistochemistry and electron microscopy were used to image whole mount and decellularized mesoglea from all life stages. This served to validate the de-cellularization methods used for proteomic analyses, but also resulted in a very nice description of mesoglea structure at different life stages. A previously published developmental cell type atlas was used to identify the cell type specificity of the matrisome, indicating that the core matrisome is predominantly expressed in the gastrodermis, as well as cnidocytes. The analyses performed were rigorous and the results were clear, supporting the conclusions made by the authors. 

      Thank you. We greatly appreciate the positive assessment of our study.

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript by Bergheim et al investigates the molecular and developmental dynamics of the matrisome, a set of gene products that comprise the extracellular matrix, in the sea anemone Nematostella vectensis using transcriptomic and proteomic approaches. Previous work has examined the matrisome of the hydra, a medusozoan, but this is the first study to characterize the matrisome in an anthozoan. The major finding of this work is a description of the components of the matrisome in Nematostella, which turns out to be more complex than that previously observed in hydra. The authors also describe the remodeling of the extracellular matrix that occurs in the transition from larva to primary polyp, and from primary polyp to adult. The authors interpret these data to support previously proposed (Steinmetz et al. 2017) homology between the cnidarian endoderm with the bilaterian mesoderm. 

      Strengths: 

      The data described in this work are robust, combining both transcriptome and proteomic interrogation of key stages in the life history of Nematostella, and are of value to the community. 

      Thank you for your positive assessment of our dataset. 

      Weaknesses: 

      The authors offer numerous evolutionary interpretations of their results that I believe are unfounded. The main problem with extending these results, together with previous results from hydra, into an evolutionary synthesis that aims to reconstruct the matrisome of the ancestral cnidarian is that we are considering data from only two species. I agree with the authors' depiction of hydra as "derived" relative to other medusozoans and see it as potentially misleading to consider the hydra matrisome as an exemplar for the medusozoan matrisome. Given the organismal and morphological diversity of the phylum, a more thorough comparative study that compares matrisome components across a selection of anthozoan and medusozoan species using formal comparative methods to examine hypotheses is required. 

      Specifically, I question the author's interpretation of the evolutionary events depicted in this statement: 

      "The observation that in Hydra both germ layers contribute to the synthesis of core matrisome proteins (Epp et al. 1986; Zhang et al. 2007) might be related to a secondary loss of the anthozoan-specific mesenteries, which represent extensions of the mesoglea into the body cavity sandwiched by two endodermal layers." 

      Anthozoans and medusozoans are evolutionary sisters. Therefore, the secondary loss of "anthozoan-like mesenteries" in hydrozoans is at least as likely as the gain of this character state in anthozoans. By extension, there is no reason to prefer the hypothesis that the state observed in Nematostella, where gastroderm is responsible for the synthesis of the core matrisome components, is the ancestral state of the phylum. Moreover, the fossil evidence provided in support of this hypothesis (Ou et al. 2022) is not relevant here because the material described in that work is of a crown group anthozoan, which diversified well after the origin of Anthozoa. The phylogenetic structure of Cnidaria has been extensively studied using phylogenomic approaches and is generally well supported (Kayal et al. 2018; DeBiasse et al. 2024). Based on these analyses, anthozoans are not on a "basal" branch, as the authors suggest. The structure of cnidarian phylogeny bifurcates with Anthozoa forming one clade and Medusozoa forming the other. From the data reported by Bergheim and coworkers, it is not possible to infer the evolutionary events that gave rise to the different matrisome states observed in Nematostella (an anthozoan) and hydra (a medusozoan). Furthermore, I take the observation in Fig 5 that anthozoan matrisomes generally exhibit a higher complexity than other cnidarian species to be more supportive of a lineage-specific expansion of matrisome components in the Anthozoa, rather than those components being representative of an ancestral state for Cnidaria. Whatever the implication, I take strong issue with the statement that "the acquisition of complex life cycles in medusozoa, that are distinguished by the pelagic medusa stage, led to a secondary reduction in the matrisome repertoire." There is no causal link in any of the data or analyses reported by Bergheim and co-workers to support this statement and, as stated above, while we are dealing with limited data, insufficient to address this question, it seems more likely to me that the matrisome expanded in anthozoans, contrasting with the authors' conclusions. While the discussion raises many interesting evolutionary hypotheses related to the origin of the cnidarian matrisome, which is of vital interest if we are to understand the origin of the bilaterian matrisome, a more thorough comparative analysis, inclusive of a much greater cnidarian species diversity, is required if we are to evaluate these hypotheses. 

      DeBiasse MB, Buckenmeyer A, Macrander J, Babonis LS, Bentlage B, Cartwright P, Prada C, Reitzel AM, Stampar SN, Collins A, et al. 2024. A Cnidarian Phylogenomic Tree Fitted With Hundreds of 18S Leaves. Bulletin of the Society of Systematic Biologists [Internet] 3. Available from: https://ssbbulletin.org/index.php/bssb/article/view/9267

      Epp L, Smid I, Tardent P. 1986. Synthesis of the mesoglea by ectoderm and endoderm in reassembled hydra. J Morphol [Internet] 189:271-279. Available from: https://pubmed.ncbi.nlm.nih.gov/29954165/ 

      Kayal E, Bentlage B, Sabrina Pankey M, Ohdera AH, Medina M, Plachetzki DC, Collins AG, Ryan JF. 2018. Phylogenomics provides a robust topology of the major cnidarian lineages and insights on the origins of key organismal traits. BMC Evol Biol [Internet] 18:1-18. Available from: https://bmcecolevol.biomedcentral.com/articles/10.1186/s12862-018-1142-0

      Ou Q, Shu D, Zhang Z, Han J, Van Iten H, Cheng M, Sun J, Yao X, Wang R, Mayer G. 2022. Dawn of complex animal food webs: A new predatory anthozoan (Cnidaria) from Cambrian. The Innovation 3:100195 

      Steinmetz PRH, Aman A, Kraus JEM, Technau U. 2017. Gut-like ectodermal tissue in a sea anemone challenges germ layer homology. Nature Ecology & Evolution 2017 1:10 [Internet] 1:1535-1542. Available from: https://www.nature.com/articles/s41559-017-0285-5

      Zhang X, Boot-Handford RP, Huxley-Jones J, Forse LN, Mould AP, Robertson DL, Li L, Athiyal M, Sarras MP. 2007. The collagens of hydra provide insight into the evolution of metazoan extracellular matrices. J Biol Chem [Internet] 282:6792-6802. Available from: https://pubmed.ncbi.nlm.nih.gov/17204477/ 

      We agree with the reviewer that only the analysis of several additional anthozoan and medusozoan representatives will yield a valid basis for a reconstruction of the ancestral cnidarian matrisome and allow statements about ancestral or novel features within the phylum. We have therefore revised our statements in the discussion part of the manuscript by implementing the cited literature and also findings from medusozoan genome analysis (e.g. Gold et al., 2018) demonstrating that changes in gene content are as common in the anthozoans as in medusozoans, which questioned the previously stated “basal” state of Nematostella or of anthozoans in general.

      Reviewer #1 (Recommendations for the authors): 

      (1) In Figure 2A, an "o" is missing in the labeling of the "developing cnidcytes" population. 

      Thank you, we have corrected the typo.

      (2) It would be helpful to have the different life stages indicated as headers of the heat maps presented in Figure 4. 

      We have included symbolic representations for the different life stages on top of the heat maps in addition to the respective labels at the bottom.

      Reviewer #2 (Recommendations for the authors): 

      Important changes: 

      (1) Figure 2B The x-axis tissue names should be changed to something more easily readable/understandable - some are clear, but others are not. Perhaps abbreviations could be expanded in the legend. 

      We have expanded the legend in Fig. 2B to render it more easily readable. We have also rotated the maps in A to have them aligned with the ones in Fig.3B.

      (2) Figure 3B This figure would be improved by the inclusion of cluster names, to understand better the mapping. 

      We have added relevant cluster names to Fig. 3B and as stated above aligned the orientation of the maps in Fig. 2B and Fig. 3B.

      (3) Figure 3C As with 2B, I find the y-axis cnidocyte cell state names to be unclear at times. Perhaps abbreviations could be expanded in the legend. 

      All abbreviations were expanded in Fig.3C axis labels.

      (4) Many of the supplementary tables are not well exported or easily readable as is (gene names are truncated, headers truncated, etc), which means that they may not be easily usable by researchers in the field interested in following up on this work in other contexts. Indeed, to be more usable, please consider sharing these supplementary data as .csv files, for example, instead of as .pdfs. 

      We are sorry for this inconvenience, which was obviously caused by the conversion to pdf files. We will upload the original csv files when submitting the revised manuscript.

      Smaller nitpicky comments: 

      (5) Page 2 line 4 & page 3 line 7: Please consider a term other than "pre-bilaterian". The drawing/ordering of a phylogeny of extant species is not meaningful in terms of more or less ancestral. e.g. if the tips are flipped in the drawing of the tree, can we say that bilaterians are pre-cnidarians? What does that mean? 

      We have used that term on the basis that cnidarians existed before the appearance of bilaterians according to the fossil record and molecular phylogenies (McFadden et al., 2021; Adoutte et al., 2000;Cavalier-Smith et al., 1996; Collins, 1998; Kim et al., 1999; Medina et al., 2001; Wainright et al., 1993). To acknowledge remaining uncertainties in the timing of origin of animals, we will use the term “early-diverging metazoans” instead, which is widely accepted in the cnidarian community. 

      (6) Page 3 line 9 I was confused by the use of "gastrula-shaped body" to describe cnidarians, which are on the whole very morphologically diverse and don't all resemble gastrulae (that can also be quite diverse). 

      This term is sometimes used to refer to the diploblastic cnidarian body plan (outer ectoderm, inner endoderm) with a mouth that corresponds to the blastopore. To avoid misunderstandings, we changed it in the revised manuscript to “Cnidarians, the sister group to bilaterians, are characterized by a simple body plan with a central body cavity and a mouth opening surrounded by tentacles.”

      Reviewer #3 (Recommendations for the authors): 

      (1) In general, I felt there was a lot of discussion about protein structure and diversity that is difficult to follow without a figure. I think some of the information in Supplementary Figures S5, S9, and S11 should be in the main figures. 

      Following the reviewer’s suggestion, we have integrated Fig. S5 (collagens) into the main Fig. 2 and Fig. S9 (polydoms) into Fig. 4. As metalloproteases are not extensively discussed in the manuscript (and also due to the large size of the figure) we have kept Fig. S11 as a supplementary figure.

      (2) Page 3, Line 7: The use of the term "pre-bilaterian" is inappropriate. Cnidarians and bilaterians are evolutionary sisters. Therefore, each lineage derives from the same split and is the same age. The cnidarian lineage is not older than the bilaterian lineage. 

      Following a similar request by reviewer 2 we have replaced this term by “early diverging metazoans”.

      (3) Page 5, Line 10. How were in silico matrisomes from early-branching metazoan species predicted? 

      We applied the same bioinformatic pipeline as for the Nematostella matrisome. We clarified this in the respective methods part.

      (4) Page 16, Line 8: This should be Thus. 

      Obviously, the wording of this sentence was ambiguous. We changed it to ”In contrast, the adult mesoglea is significantly enriched in elastic fiber components, such as fibrillins and fibulin. This compositional shift likely adds to the visco-elastic properties (Gosline 1971a, b) of the growing body column (Fig. 4B,D, supplementary table S7).”

    1. eLife Assessment

      This fundamental work demonstrates that compartmentalized cellular metabolism is a dominant input into cell size control in a variety of mammalian cell types and in Drosophila. The authors show that increased pyruvate import into the mitochondria in liver-like cells and in primary hepatocytes drives gluconeogenesis but reduces cellular amino acid production, suppressing protein synthesis. The evidence supporting the conclusions is compelling, with a variety of genetic and pharmacologic assays rigorously testing each step of the proposed mechanism. This work will be of interest to cell biologists, physiologists, and researchers interested in cell metabolism, and is significant because stem cells and many cancers exhibit metabolic rewiring of pyruvate metabolism.

    2. Reviewer #1 (Public review):

      Summary:

      The study examines how pyruvate, a key product of glycolysis that influences TCA metabolism and gluconeogenesis, impacts cellular metabolism and cell size. It primarily utilizes the Drosophila liver-like fat body, which is composed of large post-mitotic cells that are metabolically very active. The study focuses on the key observations that over-expression of the pyruvate importer MPC complex (which imports pyruvate from the cytoplasm into mitochondria) can reduce cell size in a cell-autonomous manner. They find this is by metabolic rewiring that shunts pyruvate away from TCA metabolism and into gluconeogenesis. Surprisingly, mTORC and Myc pathways are also hyper-active in this background, despite the decreased cell size, suggesting a non-canonical cell size regulation signaling pathway. They also show a similar cell size reduction in HepG2 organoids. Metabolic analysis reveals that enhanced gluconeogenesis suppresses protein synthesis. Their working model is that elevated pyruvate mitochondrial import drives oxaloacetate production and fuels gluconeogenesis during late larval development, thus reducing amino acid production and thus reducing protein synthesis.

      Strengths:

      The study is significant because stem cells and many cancers exhibit metabolic rewiring of pyruvate metabolism. It provides new insights into how the fate of pyruvate can be tuned to influence Drosophila biomass accrual, and how pyruvate pools can influence the balance between carbohydrate and protein biosynthesis. Strengths include its rigorous dissection of metabolic rewiring and use of Drosophila and mammalian cell systems to dissect carbohydrate:protein crosstalk.

      Comments on revised version:

      The study remains an important dissection of how metabolic compartmentalization can influence cell size. It nicely uses Drosophila and a variety of metabolic approaches. The various pathway analyses and rigorous quantitation are strengths.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors leverage multiple cellular models including the drosophila fat body and cultured hepatocytes to investigate the metabolic programs governing cell size. By profiling gene programs in the larval fat body during the third instar stage - in which cells cease proliferation and initiate a period of cell growth - the authors uncover a coordinated downregulation of genes involved in mitochondrial pyruvate import and metabolism. Enforced expression of the mitochondrial pyruvate carrier restrains cell size, despite active signaling of mTORC1 and other pathways viewed as traditional determinants of cell size. Mechanistically, the authors find that mitochondrial pyruvate import restrains cell size by fueling gluconeogenesis through the combined action of pyruvate carboxylase and phosphoenolpyruvate carboxykinase. Pyruvate conversion to oxaloacetate and use as a gluconeogenic substrate restrains cell growth by siphoning oxaloacetate away from aspartate and other amino acid biosynthesis, revealing a tradeoff between gluconeogenesis and provision of amino acids required to sustain protein biosynthesis. Overall, this manuscript is extremely rigorous, with each point interrogated through a variety of genetic and pharmacologic assays. The major conceptual advance is uncovering the regulation of cell size as a consequence of compartmentalized metabolism, which is dominant even over traditional signaling inputs. The work has implications for understanding cell size control in cell types that engage in gluconeogenesis but more broadly raise the possibility that metabolic tradeoffs determine cell size control in a variety of contexts.

      Comments on revised version:

      I have had a chance to review the manuscript and response to reviewer comments. I was extremely positive about this manuscript at first submission, and thought that the manuscript rigorously reported a surprising observation with broad implications across fields. The notion that intracellular metabolic networks can be dominant determinants of cell size, even over traditional signaling inputs, is surprising and important. The authors also provide convincing mechanistic insights into how the observed metabolic changes could affect cell size regulation. I think my previous comments and summary remain applicable for the revised manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      In this article, Toshniwal et al. investigate the role of pyruvate metabolism in controlling cell growth. They find that elevated expression of the mitochondrial pyruvate carrier (MPC) leads to decreased cell size in the Drosophila fat body, a transformed human hepatocyte cell line (HepG2), and primary rat hepatocytes. Using genetic approaches and metabolic assays, the authors find that elevated pyruvate import into cells with forced expression of MPC increases the cellular NADH/NAD+ ratio, which drives the production of oxaloacetate via pyruvate carboxylase. Genetic, pharmacological, and metabolic approaches suggest that oxaloacetate is used to support gluconeogenesis rather than amino acid synthesis in cells over-expressing MPC. The reduction in cellular amino acids impairs protein synthesis, leading to impaired cell growth.

      Strengths:

      This study shows that the metabolic program of a cell, and especially its NADH/NAD+ ratio, can play a dominant role in regulating cell growth.

      The combination of complementary approaches, ranging from Drosophila genetics to metabolic flux measurements in mammalian cells, strengthens the findings of the paper and shows a conservation of MPC effects across evolution.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      The study examines how pyruvate, a key product of glycolysis that influences TCA metabolism and gluconeogenesis, impacts cellular metabolism and cell size. It primarily utilizes the Drosophila liver-like fat body, which is composed of large post-mitotic cells that are metabolically very active. The study focuses on the key observations that overexpression of the pyruvate importer MPC complex (which imports pyruvate from the cytoplasm into mitochondria) can reduce cell size in a cell-autonomous manner. They find this is by metabolic rewiring that shunts pyruvate away from TCA metabolism and into gluconeogenesis. Surprisingly, mTORC and Myc pathways are also hyper-active in this background, despite the decreased cell size, suggesting a non-canonical cell size regulation signaling pathway. They also show a similar cell size reduction in HepG2 organoids. Metabolic analysis reveals that enhanced gluconeogenesis suppresses protein synthesis. Their working model is that elevated pyruvate mitochondrial import drives oxaloacetate production and fuels gluconeogenesis during late larval development, thus reducing amino acid production and thus reducing protein synthesis. 

      Strengths: 

      The study is significant because stem cells and many cancers exhibit metabolic rewiring of pyruvate metabolism. It provides new insights into how the fate of pyruvate can be tuned to influence Drosophila biomass accrual, and how pyruvate pools can influence the balance between carbohydrate and protein biosynthesis. Strengths include its rigorous dissection of metabolic rewiring and use of Drosophila and mammalian cell systems to dissect carbohydrate:protein crosstalk. 

      Weaknesses: 

      However, questions on how these two pathways crosstalk, and how this interfaces with canonical Myc and mTORC machinery remain. There are also questions related to how this protein:carbohydrate crosstalk interfaces with lipid biosynthesis. Addressing these will increase the overall impact of the study. 

      We thank the reviewer for recognizing the significance of our work and for providing constructive feedback. Our findings indicate that elevated pyruvate transport into mitochondria acts independently of canonical pathways, such as mTORC1 or Myc signaling, to regulate cell size. To investigate these pathways, we utilized immunofluorescence with well-validated surrogate measures (p-S6 and p-4EBP1) in clonal analyses of MPC expression, as well as RNAseq analyses in whole fat body tissues expressing MPC. These methods revealed surprising hyperactivation of mTORC1 and Myc signaling in Drosophila fat body cells expressing MPC, which are dramatically smaller than control cells. One explanation of these seemingly contradictory observations could be an excess of nutrients that activate mTORC1 or Myc pathways. However, our data is inconsistent with a nutrient surplus that could explain this hyperactivation. Instead, we observed reduced amino acid abundance upon MPC expression, which is very surprising given the observed hyperactivation of mTORC1. This led us to hypothesize the existence of a feedback mechanism that senses an inappropriate reduction in cell size and activates signaling pathways to promote cell growth. The best-characterized “sizer” pathway for mammalian cells is the Cyclin D/CDK4 complex, which has been well studied in the context of cell size regulation of the cell cycle (PMID 10970848, 34022133). However, the mechanisms that sense cell size in post-mitotic cells, such as fat body cells and hepatocytes, remain poorly understood. Investigating the hypothesized size-sensing mechanisms at play here is a fascinating direction for future research.

      For the current study, we conducted epistatic analyses with mTORC1 pathway members by overexpressing PI3K and knocking down the TORC1 inhibitor Tuberous Sclerosis Complex 1 (Tsc1). These manipulations increased the size of control fat body cells but not those overexpressing the MPC (Supplementary Fig. 3c, 3d). Regarding Myc, its overexpression increased the size of both control and MPC+ clones (Supplementary Fig. 3e), but Myc knockdown had no additional effect on cell size in MPC+ clones (Supplementary Fig. 3f). These results suggest that neither mTORC1, PI3K, nor Myc is epistatic to the cell size effects of MPC expression. Consequently, we shifted our focus to metabolic mechanisms regulating biomass production and cell size.

      When analyzing cellular biomolecules contributing to biomass, we observed a significant impact on protein levels in Drosophila fat body cells and mammalian MPC-expressing HepG2 spheroids. Triglyceride abundance in MPC-expressing HepG2 spheroids and whole fat body cells showed a statistically insignificant decrease compared to controls. Furthermore, lipid droplets in fat body cells were comparable in MPC-expressing clones when normalized to cell size.

      Interestingly, RNA-seq analysis revealed modestly increased expression of fatty acid and cholesterol biosynthesis pathways in MPC-expressing fat body cells. Upregulated genes included major SREBP targets, such as ATPCL (2.08-fold), FASN1 (1.15-fold), FASN2 (1.07-fold), and ACC (1.26-fold). Since mTORC1 promotes SREBP activation and MPC-expressing cells showed elevated mTOR activity and upregulation of SREBP targets, we hypothesize that SREBP is modestly activated in these cells. Nonetheless, our data on amino acid abundance and its impact on protein synthesis activity suggest that protein abundance is likely to play a prominent causal role in regulating cell size in response to increased pyruvate transport into mitochondria.

      Reviewer #2 (Public review): 

      In this manuscript, the authors leverage multiple cellular models including the drosophila fat body and cultured hepatocytes to investigate the metabolic programs governing cell size. By profiling gene programs in the larval fat body during the third instar stage - in which cells cease proliferation and initiate a period of cell growth - the authors uncover a coordinated downregulation of genes involved in mitochondrial pyruvate import and metabolism. Enforced expression of the mitochondrial pyruvate carrier restrains cell size, despite active signaling of mTORC1 and other pathways viewed as traditional determinants of cell size. Mechanistically, the authors find that mitochondrial pyruvate import restrains cell size by fueling gluconeogenesis through the combined action of pyruvate carboxylase and phosphoenolpyruvate carboxykinase. Pyruvate conversion to oxaloacetate and use as a gluconeogenic substrate restrains cell growth by siphoning oxaloacetate away from aspartate and other amino acid biosynthesis, revealing a tradeoff between gluconeogenesis and provision of amino acids required to sustain protein biosynthesis. Overall, this manuscript is extremely rigorous, with each point interrogated through a variety of genetic and pharmacologic assays. The major conceptual advance is uncovering the regulation of cell size as a consequence of compartmentalized metabolism, which is dominant even over traditional signaling inputs. The work has implications for understanding cell size control in cell types that engage in gluconeogenesis but more broadly raise the possibility that metabolic tradeoffs determine cell size control in a variety of contexts. 

      We thank the reviewer for their thoughtful recognition of our efforts, and we are honored by the enthusiasm the reviewer expressed for the findings and the significance of our research. We share the reviewer’s opinion that our work might help to unravel metabolic mechanisms that regulate biomass gain independent of the well-known signaling pathways.

      Reviewer #3 (Public review): 

      Summary: 

      In this article, Toshniwal et al. investigate the role of pyruvate metabolism in controlling cell growth. They find that elevated expression of the mitochondrial pyruvate carrier (MPC) leads to decreased cell size in the Drosophila fat body, a transformed human hepatocyte cell line (HepG2), and primary rat hepatocytes. Using genetic approaches and metabolic assays, the authors find that elevated pyruvate import into cells with forced expression of MPC increases the cellular NADH/NAD+ ratio, which drives the production of oxaloacetate via pyruvate carboxylase. Genetic, pharmacological, and metabolic approaches suggest that oxaloacetate is used to support gluconeogenesis rather than amino acid synthesis in cells over-expressing MPC. The reduction in cellular amino acids impairs protein synthesis, leading to impaired cell growth. 

      Strengths: 

      This study shows that the metabolic program of a cell, and especially its NADH/NAD+ ratio, can play a dominant role in regulating cell growth.

      The combination of complementary approaches, ranging from Drosophila genetics to metabolic flux measurements in mammalian cells, strengthens the findings of the paper and shows a conservation of MPC effects across evolution.

      Weaknesses: 

      In general, the strengths of this paper outweigh its weaknesses. However, some areas of inconsistency and rigor deserve further attention. 

      Thank you for reviewing our manuscript and offering constructive feedback. We appreciate your recognition of the significance of our work and your acknowledgment of the compelling evidence we have presented. We have carefully revised the manuscript in line with the reviewers' recommendations.

      The authors comment that MPC overrides hormonal controls on gluconeogenesis and cell size (Discussion, paragraph 3). Such a claim cannot be made for mammalian experiments that are conducted with immortalized cell lines or primary hepatocytes. 

      We appreciate the reviewer’s insightful comment. Pyruvate is a primary substrate for gluconeogenesis, and our findings suggest that increased pyruvate transport into mitochondria increases the NADH-to-NAD+ ratio, and thereby elevates gluconeogenesis. Notably, we did not observe any changes in the expression of key glucagon targets, such as PC, PEPCK2, and G6PC, suggesting that the glucagon response is not activated upon MPC expression. By the statement referenced by the reviewer, we intended to highlight that excess pyruvate import into mitochondria drives gluconeogenesis independently of hormonal and physiological regulation. 

      It seems the reviewer might also have been expressing the sentiment that our in vitro models may not fully reflect the in vivo situation, and we completely agree.  Moving forward, we plan to perform similar analyses in mammalian models to test the in vivo relevance of this mechanism. For now, we will refine the language in the manuscript to clarify this point.

      Nuclear size looks to be decreased in fat body cells with elevated MPC levels, consistent with reduced endoreplication, a process that drives growth in these cells. However, acute, ex vivo EdU labeling and measures of tissue DNA content are equivalent in wild-type and MPC+ fat body cells. This is surprising - how do the authors interpret these apparently contradictory phenotypes? 

      We thank the reviewer for raising this important issue. The size of the nucleus is regulated by DNA content and various factors, including the physical properties of DNA, chromatin condensation, the nuclear lamina, and other structural components (PMID 32997613). Additionally, cytoplasmic and cellular volume also impact nuclear size, as extensively documented during development (PMID 17998401, PMID 32473090).

      In MPC-expressing cells, it is plausible that the reduced cellular volume impacts chromatin condensation or the nuclear lamina in a way that slightly decreases nuclear size without altering DNA content. Specifically, in our whole-fat body experiments using CG-Gal4 (as shown in Supplementary Figure 2a-c), we noted that after 12 hours of MPC expression, cell size was significantly reduced (Supplementary Figure 2c and Author Response Figure 1A). However, the reduction in nuclear size is modestly different at 24 hours and significantly different at 36 hours (Author Response Figure 1B), suggesting that the reduction in cell size is a more acute response to MPC expression, followed only later by effects on nuclear size.

      In clonal analyses, this relationship was further clarified. MPC-expressing cells with a size greater than 1000 µm² displayed nuclear sizes comparable to control cells, whereas those with a drastic reduction in cell size (less than 1000 µm²) exhibited smaller nuclei (Author Response Figure 1C and 1D). These observations collectively suggest that changes in nuclear size are more likely to be downstream rather than upstream of cell size reduction. Given that DNA content remains unaffected, we focused on investigating the rate of protein synthesis. Our findings suggest that protein synthesis might play a causal role in regulating cell size, thereby reinforcing the connection between cellular and nuclear size in this context.

      Author response image 1.<br />

      Cell Size vs. Nuclear Size in MPC-Expressing Fat Body Cells A. Cell size comparison between control (blue, ay-GFP) and MPC+ (red, ay-MPC) fat body cells over time, measured in hours after MPC expression induction. B. Nuclear area measurements from the same fat body cells in ay-GFP and ay-MPC groups. C. Scatter plot of nuclear area vs. cell area for control (ay-GFP) cells, including the corresponding R<sup>2</sup> value. D. Scatter plot of nuclear area vs. cell area for MPC-expressing (ay-MPC) cells, with the respective R² value.

      This figure highlights the relationship between nuclear and cell size in MPC-expressing fat body cells, emphasizing the distinct cellular responses observed following MPC induction.

      In Figure 4d, oxygen consumption rates are measured in control cells and those overexpressing MPC. Values are normalized to protein levels, but protein is reduced in MPC+ cells. Is oxygen consumption changed by MPC expression on a per-cell basis? 

      As described in the manuscript, MPC-expressing cells are smaller in size. In this context, we felt that it was most appropriate to normalize oxygen consumption rates (OCR) to cellular mass to enable an accurate interpretation of metabolic activity. Therefore, we normalized OCR with protein content to account for variations in cellular size and (probably) mitochondrial mass. 

      Trehalose is the main circulating sugar in Drosophila and should be measured in addition to hemolymph glucose. Additionally, the units in Figure 4h should be related to hemolymph volume - it is not clear that they are. 

      We appreciate this valuable suggestion. In the revised manuscript, we have quantified trehalose abundance in circulation and within fat bodies. As described in the Methods section and following the approach outlined in Ugrankar-Banerjee et al. (2023, we bled 10 larvae (either control or MPC-expressing) using forceps onto parafilm. From this, 2 microliters of hemolymph were collected for glucose measurement. The hemolymph was treated with trehalase overnight, and the resulting glucose derived from trehalose was measured. We have observed that trehalose levels were also elevated in hemolymph of fat body-specific MPC-expressing larvae, further supporting our conclusion that MPC expression in fat body induces a hyperglycemic state. These data are now included in Figure 4h of the revised manuscript, and the details are further mentioned in the revised materials and methods.  

      Measurements of NADH/NAD ratios in conditions where these are manipulated genetically and pharmacologically (Figure 5) would strengthen the findings of the paper. Along the same lines, expression of manipulated genes - whether by RT-qPCR or Western blotting - would be helpful to assess the degree of knockdown/knockout in a cell population (for example, Got2 manipulations in Figures 6 and S8). 

      We appreciate this suggestion, which will provide additional rigor to our study. We have already quantified NADH/NAD+ ratios in HepG2 cells under UK5099, NMN, and Asp supplementation, as presented in Figure 6k. As suggested, we have quantified the expression of Got2 manipulations mentioned in Figure 6j using RT-qPCR, this data is presented in revised Supplementary Figure 8f-h. In addition, Supplementary Figure 8i has been updated with western blot analysis of Got2 expression in knock-out cells used to perform the size analysis in HepG2 cells.

      Additionally, we have also analysed the efficiency of pcb (Supplementary Figure 6a-c), pdha (Supplementary Figure 6f-h), dlat (Supplementary Figure 6f, g and i), pepck2 (Supplementary Figure 6n-p), fbp  (Supplementary Figure 6n, m, q)  manipulations used to modulate the expression of these genes. These validations will ensure the robustness of our findings and strengthen the conclusions of our study.

      Reviewer #1 (Recommendations for the authors): 

      General questions: 

      (1) MPC over-expression in HepG2 cells altered the redox balance and the NADH/NAD+ ratio. This is suggested to help drive the metabolic rewiring from protein to carbohydrate biosynthesis. In line with this overexpression of Nmnat (which makes NAD+) or NDX rescues cell size and elevates protein biosynthesis. However, mechanistically it is unclear exactly how these redox NAD+ changes directly impact protein biosynthesis. Some additional explanations will strengthen this portion of the study. 

      Our data indicate that the altered redox state of the cell, particularly elevated NADH levels, affects the rate of protein synthesis. A similar relationship between redox balance and protein synthesis has been observed during embryonic development (PMID: 39879975), although the underlying mechanism remains uncharacterized. Our study suggests that increased NADH levels reprogram cellular carbohydrate metabolism, shifting it from glycolysis toward gluconeogenesis. This metabolic shift necessitates the use of oxaloacetate by PEPCK2, instead of its diversion toward GTP-mediated aspartate synthesis. Aspartate, which can be anaplerotically converted into glutamate and proline, plays a critical role in protein biosynthesis. Thus, the conversion of oxaloacetate to phosphoenolpyruvate represents a key metabolic node influencing protein synthesis under altered redox conditions. Additionally, since aspartate serves as a precursor for NAD biosynthesis, this may suggest a feedforward loop reinforcing the metabolic rewiring. Nonetheless, the precise relationship between NADH concentration and redox status and the regulation of protein synthesis warrants further investigation in future studies.

      (2) In the MPC1/2 (MPC+) over-expression background, can blocking of gluconeogenesis downstream in the carbohydrate synthesis pathway rescue the phenotype? 

      We knocked down FBPase (Drosophila fbp) using an RNAi construct, achieving approximately 60% reduction in FBPase expression in Drosophila. Notably, FBPase knockdown in fat body cells overexpressing MPC rescued the reduced cell size phenotype. These findings are presented in Figure 4o and Supplementary Figures 6n–q.

      (3) Biomass accrual and cell size are also influenced by lipogenesis. The study suggests mTORC and Myc are uncoupled to cell size determination per se, but how lipogenesis regulatory pathways like SREBP are impacted by MPC overexpression is not really explored. How lipid membrane synthesis inter-relates to this protein/carbohydrate crosstalk would add to the understanding of the system. 

      As mentioned above - When analyzing cellular biomolecules contributing to biomass, we observed a significant impact on protein levels in Drosophila fat body cells and mammalian MPC-expressing HepG2 spheroids. Triglyceride abundance in MPC-expressing HepG2 spheroids and whole fat body cells showed a statistically insignificant decrease compared to controls. Furthermore, lipid droplets in fat body cells were comparable in MPC-expressing clones when normalized to cell size.

      Interestingly, RNA-seq analysis revealed increased expression of fatty acid and cholesterol biosynthesis pathways in MPC-expressing fat body cells. Upregulated genes included major SREBP targets, such as ATPCL (2.08-fold), FASN1 (1.15-fold), FASN2 (1.07-fold), and ACC (1.26-fold). Since mTOR promotes SREBP activation and MPC-expressing cells showed elevated mTOR activity and upregulation of SREBP targets, we hypothesize that SREBP is modestly activated in these cells. Nonetheless, our data on amino acid abundance and its impact on protein synthesis activity suggest that protein abundance, rather than lipids, is likely to play a larger causal role in regulating cell size in response to increased pyruvate transport into mitochondria.

      Reviewer #2 (Recommendations for the authors): 

      I have only minor suggestions for the authors to consider. 

      Minor points 

      (1) Wherever possible, scale bars should be labeled with units or indicated comparisons (e.x. Supplementary Fig. 1). To make the data as accessible as possible, it would be helpful for the authors to include the data presented in Supplementary Figure 1 as an associated table as well. 

      We have corrected this in the revised manuscript and included the table. 

      (2) To support the conclusions about TCA cycle flux (lines 280-284), it will be helpful for the authors to consider relative metabolite pool sizes (which they should have on hand) in addition to labeling rate and fraction. 

      We thank the reviewer for this suggestion. We have included the metabolite counts with fractional abundance changes side by side in Supplementary Figure 5. 

      (3) believe (?) there is a typo in lines 326-328; PEPCK KO increases (not decreases) the size of spheroids/cells. 

      We thank the reviewer for pointing out this error. We have corrected this in the revised manuscript.

      (4) Supplementary Figure 7b: PHD has 3 phospho sites that have independent regulation; the specific phosphosite queried should be listed on the figure and unless all 3 sites are probed the claims about lack of change in phosphorylation (line 337) should be removed. 

      We thank the reviewer for bringing this to our attention. We have included this in the revised manuscript.

      (5) (Optional) I appreciate the effort the authors undertook to acquire cytoplasmic and mitochondrial ratios of NADH/NAD. While I recognize that many labs perform this assay, it is difficult for this reviewer to envision how accurately these values reflects the ratios present in the intact cell given how quickly these redox couples interconvert and significant post-harvest metabolic flux (see for ex PMID: 31767181), even with the extremely rapid fractionation protocol described in the methods. The present data certainly support the notion that MPC+ cells are more reduced, but these ratios may reflect a capacity for reductive metabolism rather than a bona fide NADH/NAD ratio; for example, Figure 7f shows almost identical NADH/NAD ratios in the cytoplasm and mitochondria, even though these compartments are frequently considered to have (sometimes vastly) different redox states. If the authors are willing, I would support them by including a brief discussion of the caveat of this method for new readers in the field. 

      We agree with this important note from the reviewers. This is an important caveat of the technique that we used for these analyses. We have included a description of this caveat in the manuscript (Revised Manuscripts lines 393 to 395).

      Reviewer #3 (Recommendations for the authors): 

      Minor points: 

      (1) Line 327 - "smaller" should be "bigger". 

      We thank the reviewer for pointing out this error. We have corrected this in the revised manuscript.

      (2) For Figure 7 - references to panels e and f in the text, and descriptions of e and f in the Figure Legend are switched with regard to the Figure itself. 

      We thank the reviewer for pointing out this error. We have corrected this in the revised manuscript.

      (3) Line 449 - "reduced" is missing its R 

      We thank the reviewer for pointing out this error. We have corrected this in the revised manuscript.

      (4) Some additional, careful proofreading is needed - several other punctuation errors were found. 

      We thank the reviewer for pointing out these errors. 

      We thank the reviewer for bringing this to our attention. We have conducted very careful proofreading and corrected errors.

    1. eLife Assessment

      This study presents a screen for small-molecule activators of the kinase GCN2 that phosphorylates the eukaryotic translation initiation factor 2 alpha (eIF2α) in response to diverse stress stimuli. Among the compounds identified, one stands out as a potent activator that functions independently of GCN1, which is important for probing mechanisms of Integrated Stress Response regulation and may have translational relevance in the context of pathogenic GCN2 mutations. While some reviewers found the biochemical analyses convincing, others viewed the cellular evidence as limited, particularly with respect to time points, endogenous readouts, and broader cell-type validation, which prevents a clear assessment of the compound's potential potency in a physiological context.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript describes a chemical screen for activators of the eIF2 kinase GCN2 (EIF2AK4) in the integrated stress response (ISR). Recently, reported inhibitors of GCN2 and other protein kinases have been shown at certain concentrations to paradoxically activate GCN2. The study uses CHO cells and ISR reporter screens to identify a number of GCN2 activator compounds, including a potent "compound 20." These activators have implications for the development of new therapies for ISR-related diseases. For example, although not directly pursued in this study, these GCN2 activators could be helpful for the treatment of PVOD, which is reported for patients with certain GCN2 loss-of-function mutations. The identified activators are also suggested to engage with the GCN2 directly and can function while devoid of GCN1, a co-activator of GCN2.

      Strengths:

      The manuscript appears to be a largely rigorous study that flows in a logical manner. The topic is interesting and significant.

      Weaknesses:

      Portions of the manuscript are not fully clear. There are some experimental presentation and design concerns that should be addressed to support the stated conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Zhu, Emanuelli, and colleagues describe a novel pharmacological activator of the Integrated Stress Response kinase GCN2. The work is conclusive and biochemically solid. This work significantly adds to the pharmacological arsenal targeting the ISR and, in particular, GCN2.

      Strengths:

      Strong biochemistry, novel molecular activator of GCN2 (GCN1 independent).

      Weaknesses:

      The rationale for the screen is not exploited in the results (e.g., pathogenic GCN2 mutants), and lots of cell-based read-outs are not endogenous.

      Major points

      (1) Regarding the justification of the work. Since the authors justify the screen for GCN2 activators with loss-of-function mutants associated with diseases, it would be of interest to evaluate whether the best compounds identified in the study are indeed able to prompt activation of those mutants (or at least of the most prevalent). This approach could actually go in parallel with the docking experiments carried out in the last figure of the manuscript, where mutants could be modelized as well.

      (2) The compounds are only tested using « artificial » proximal signaling outputs. It would be interesting to evaluate whether the best identified compounds are capable of prompting endogenous eIF2alpha phosphorylation in cellular models.

      (3) Other GCN2 activators (other than GCN2iB, e.g., HC-7366) were recently identified. In this context, it would be of interest to carry out a small benchmarking study to evaluate how the compounds identified in the current study perform against the previously identified molecules.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors describe the results of a high-throughput screen for small-molecule activators of GCN2. Ultimately, they find 3 promising compounds. One of these three, compound 20 (C20), is of the most interest both for its potency and specificity. The major new finding is that this molecule appears to activate GCN2 independent of GCN1, which suggests that it works by a potentially novel mechanism. Biochemical analysis suggests that each binds in the ATP-binding pocket of GCN2, and that at least in vitro, C20 is a potent agonist. Structural modeling provides insight into how the three compounds might dock in the pocket and generates testable hypotheses as to why C20 perhaps acts through a different mechanism than other molecules.

      Strengths:

      Of the 3 compounds identified by the authors, C20 is the most interesting, not just for its intriguing mechanistic distinction as being GCN1-independent (shown genetically in two distinct cell lines, CHO and 293T in Figure 4, and in contrast to other GCN2 activators) but also for its potency. In in-cellulo assays, compound 21 appears as more of an ISR enhancer than an activator per se, and although compound 18 and compound 21 lead to upregulation of the ISR targets (Figure 2), that degree of upregulation is probably not significantly different from that induced by those compounds in Gcn2-/- cells. For C20, the effect appears stronger (although it is unclear whether the authors performed statistical analysis comparing the two genotypes in Figure 2D). In Figure 3, only C20 activates the ISR robustly in both CHO and 293T. Ultimately, C20 might be a tool for providing mechanistic insight into the details of GCN2 activation and regulation, and could be exploited therapeutically.

      Weaknesses:

      There are some limitations to the existing work. As the authors acknowledge, they do not use any of the compounds in animals; their in vivo efficacy, toxicity, and pharmacokinetics are unknown. But even in the context of the in cellulo experiments, it is puzzling that none of the three compounds, including C20, has any effects in HeLa cells when Neratinib does. It's beyond the scope of this paper to address definitively why that is, but it would at least be reassuring to know that C20 activates the ISR in a wider range of cells, including ideally some primary, non-immortalized cells. In addition, the ISR is a complex, feedback-regulated response whose output varies depending on the time point examined. The in cellulo analysis in this paper is limited to reporter assays at 18 hours and qRT-PCR assays at 4 and 8 hours. A more extensive examination of the behavior of the relevant ISR mRNAs and proteins (eIF2, ATF4, CHOP, cell viability, etc.) for C20 across a more extensive time course would give the reader a clearer sense of how this molecule affects ISR output. I also find it a bit strange that the authors describe C20 as "demonstrat(ing) weak inhibition of ... PKR"-the measured IC50 is ~4 μM, which is right around its EC50 for GCN2 activation. This raises the confounding possibility that C20 would simultaneously activate GCN2 while inhibiting PKR. While perhaps inhibition of PKR is not relevant under the conditions when GCN2 would be activated either experimentally or therapeutically, examining in cells the effects of C20 on GCN2 and PKR across a dose range would shed light on whether this cross-reactivity is likely to be of concern.

    5. Author response:

      We thank the editors and reviewers for their encouraging comments and constructive feedback. We will revise the text to enhance clarity as suggested. New experiments are planned to address questions raised regarding the time course of responses to the hit compounds. We also intend to examine additional endogenous readouts of the integrated stress response, including effects on translation. The effects of lead compound 20 will be examined in a wider range of cells, including primary cells.

    1. eLife Assessment

      This fundamental work substantially advances our understanding of tissue deformation and growth patterns during the earliest stages of mammalian heart development. One of the strengths of the work is the compelling quantitative approach to analyzing time-lapse imaging data using an original computational pipeline, which goes beyond the current state of the art and provides new insights into heart tube formation. Overall, this rigorous study will be of broad interest to computational and developmental biologists studying tissue dynamics.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Raiola et al. conducted a quantitative analysis of tissue deformation during the formation of the primitive heart tube from the cardiac crescent in mouse embryos. Using the tools developed to analyze growth, anisotropy, strain, and cell fate from time-lapse imaging data of mouse embryos, the authors elucidated the compartmentalization of tissue deformation during heart tube formation and ventricular expansion. This paper describes how each region of the cardiac tissue changes to form the heart tube and ventricular chamber, contributing to our understanding of the earliest stages of cardiac development.

      Strengths:

      In order to understand tissue deformation in cardiac formation, it is commendable that the authors effectively utilized time-lapse imaging data, a data pipeline, and in silico fate mapping.

      The study clarifies the compartmentalization of tissue deformation by integrating growth, anisotropy, and strain patterns in each region of the heart.

      Weaknesses:

      The significance of the compartmentalization of tissue deformation for the heart tube formation remains unclear.

    3. Reviewer #2 (Public review):

      The authors address an important challenge in developmental biology: the quantitative description of tissue deformation during organogenesis. They have developed a new pipeline to quantify early heart tube morphogenesis in the mouse, with cellular resolution. They adopt an elegant approach by integrating multiple 3D time-lapse datasets into a dynamic atlas of cardiac morphogenesis in order to compute spatio-temporal deformation patterns. The main findings highlight a strong compartmentalization of cell behaviors, with tissue growth and anisotropy exhibiting complementary and spatially segregated patterns. Using these data, the authors developed an in-silico fate mapping tool to interrogate cell displacement within the myocardium. This virtual model provides new mechanistic insights into how the bilateral cardiac primordia converge and transform into a three-dimensional heart tube. The authors identify "belt-like" constraints at the arterial and venous poles that prevent tissue expansion and thus shape the ventricular barrel morphology.

      The computational framework is highly innovative and impressive, providing an unprecedented 3D model of tissue deformation during heart morphogenesis. It also opens avenues for testing hypotheses regarding tissue growth and the forces that cause cell motion. However, the proposed model of ventricular chamber formation with the two constraining belts remains hypothetical, lacking biological validation and requiring strengthening or modulation.

      Overall, this carefully performed study provides a new model for exploring tissue deformation during organogenesis and will be of broad interest to computational and developmental biologists.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Raiola and colleagues entitled "Quantitative computerized analysis demonstrates strongly compartmentalized tissue deformation patterns underlying mammalian heart tube formation" takes a highly quantitative approach to interrogating the earliest stages of cardiogenesis (12 hours, from early cardiac crescent to early heart tube) in a new and innovative way. The paper presents a new computational framework to help identify both regional and temporal patterns of tissue deformation at cellular resolution. The method is applied to live embryo imaging data (newly generated and from the group's previous pioneering work). In the initial setup, the new model was applied directly to raw time-lapse data, and the results were compared to actual cell tracks identified manually, showing close correlations of the model with the manual tracking. Next, they integrated spatial and temporal information from different embryos to generate a new model for tissue movement, driven by parameters such as tissue growth and anisotropy. Key findings from their model suggest that there are distinct compartments of tissue deformation patterns as the bilateral cardiac crescent develops into the linear heart tube, and that the ventricular chamber forms by a defined expansion pattern, as a 'hemi-barrel shape', with the aterial and venous poles (IFT and OFT) acting as the harnessing belts constraining the expansion of the chamber further. Lastly, the model is tested for its ability to predict future residence of cardiac crescent cells in the heart tube, which it seems to be able to do successfully based on fate tracking validation experiments.

      Strengths:

      The manuscript provides an exceptionally careful analysis of a critical stage during heart development - that of the earliest stages of morphogenesis, when the heart forms its first tube and chamber structures. While numerous studies have interrogated this stage of heart development, few studies have performed time-lapse imaging, and, to my knowledge, no other report has performed such in in-depth quantitative analysis and modeling of this complex process. The computational model applied to normal heart development of the myocardium (labelled by Nkx2-5) has revealed multiple new and interesting concepts, such as the distinct compartments of tissue deformation patterns and the growth trajectories of the emerging ventricle. The fact that the model operates at cellular resolution and over a nearly continuous time period of approximately 12 hours allows for unprecedented depth of the analysis in a largely unbiased manner. Going forward, one can imagine such models revealing additional information on these processes, performing analyses of subpopulations that form the heart, and maybe most importantly, applying the model to various perturbation models (genetic or otherwise). The manuscript is very well written, and the data display is accessible and transparent.

      Weaknesses:

      No major weaknesses are noted with the study. It would have been very exciting to see the model applied to any kind of perturbation, for example, a left-right defect model, or a model with compromised cardiac progenitor populations. However, the amount of live imaging required for such analyses renders this out of scope for the current study.

    5. Author response:

      We are going to modify the text following Reviewer’s comments and perform embryo direct labelling experiments to experimentally address the contraction of the two “belts” proposed in our model. We feel that this aspect is feasible in a reasonable time and important for the model proposed. We appreciate the relevance of using this framework to identify molecular drivers of the regionalized tissue behaviours uncovered and how these might be altered in mutant models, but feel that these aspects demand efforts beyond the the reasonable revision periods.

    1. eLife Assessment

      This work presents valuable new data on the role of D-Serine and how it competes with its stereoisomer L-Serine to influence metabolism. The work presents a variety of solid experimental data combined with simulated results to investigate the mechanisms focused on one-carbon metabolism, which is relevant for several research fields. However, some claims are only partially supported by data, and critical areas comparing L- vs D-Serine and further mechanistic studies are incomplete. Furthermore, while the work has potential for various fields, the work has only been studied in a limited cell type and context.

    2. Reviewer #1 (Public review):

      Summary:

      The authors demonstrate the stereoselective role of D-serine in 1C metabolism, showing that D-serine competes with L-serine and inhibits mitochondrial L-serine transport. They observe expression of 1C metabolites in their metabolomics approach in primary cortical neurons treated with L-serine, D-serine, and a mixture of both. Their conclusions are based on the reduction in levels of glycine, polyamines, and their intermediates and formate. Single-cell RNA sequencing of N2a cells showed that cells treated with D-serine enhanced expression of genes associated with mitochondrial functions, such as respiratory chain complex assembly, and mitochondrial functions, with downregulation of genes related to amino acid transport, cellular growth, and neuron projection extension. Their work demonstrates that D-serine inhibits tumor cell proliferation and induces apoptosis in neural progenitor cells, highlighting the importance of D-serine in neurodevelopment.

      Strengths:

      D-amino acids are a marvel of nature. It is fascinating that nature decided to make two versions of the same molecule, in this case, an amino acid. While the L-stereoisomer plays well-known roles in biology, the D-stereoisomer seems to function in obscurity. Research into these novel signaling molecules is gathering momentum, with newer stereoisomers being discovered. D-serine has been the most well-studied among the different stereoisomers, and we still continue to learn about this novel neurotransmitter. The roles of these molecules in the context of metabolism is not well studied. The authors aim to elucidate the metabolic role of D-serine in the context of neuronal maturation with implications for 1C metabolism and in cell proliferation. The metabolic role of these molecules is just beginning to be uncovered, especially in the context of mammalian biology. This is the strength of the manuscript. The authors have done important work in prior publications elucidating the role of D-amino acids. The advancement of the field of D-amino acids in mammalian biology is significant, as not much is known. The presentation of RNA seq data is a valuable resource to the community, however, with caveats as mentioned below.

      Weaknesses:

      The following are some of the issues that come out in a critical reading of the manuscript. Addressing these would only strengthen and clarify the work.

      (1) Kinetic assessment of D-serine versus L-serine: While the authors mention that D-serine is not a good substrate for SHMT2 compared to L-serine, the kinetic data are presented for only D-serine. In a substrate comparison with an enzyme, data must be presented for L-serine as well to make the conclusion about substrate specificity and affinity. Since the authors talk about one versus another substrate, there needs to be a kinetic comparison of both with Km (affinity). (Ref Figure 2 panel).

      (2) Molecular Dynamics simulations, while a good first step in modeling interactions at the active site, rely on force fields. These force fields are approximations and do not represent all interactions occurring in the natural world. Setting up the initial conditions in the simulations can impact the final results in non-equilibrium scenarios. The basic question here is this: Is the simulated trajectory long enough so that the system reaches thermodynamic equilibrium and the measured properties converge? Prior studies have shown mixed results with the conclusion that properties of biological systems tend to converge in multi-second trajectories (not nanosecond scales as reported by the authors) and transition rates to low probability conformations require more time. (Ref Figure 2C).

      (3) The authors use N2a cell line to demonstrate D-serine burden on primary cortical neurons. N2a is an immortalized cell line, and its properties are very different from primary neurons. The authors need to mention a rationale for the use of an immortalized cell line versus primary neurons. The transcriptomic profile of an immortalized cell line is different compared to a primary cell. Hence, the response to D-serine may vary between the two different cell types.

      (4) In Figure 4D, the authors mention that D-serine activates the cleavage of caspase 3. Figure 4D shows only cleaved caspase 3 as a single band. They need to show the full blot that contains the cleaved fragments along with the major caspase 3 band.

      (5) In Figure panel 4, the authors use neural progenitor cells (NPCs). They need to demonstrate that the population they are working with is NPCs and not primary neurons. There must be a figure panel staining for NPC markers like SOX2 and PAX6. Also, Figure S5 needs to be properly labeled. It is confusing from the legend what panels B-E refer to? Also, scale bars are not indicated.

      (6) In Supplementary Figure panel 7F, the authors mention phosphatidyl L-serine and phosphatidyl D-serine. A chromatogram of the two species would clarify their presence as they used 2D-HPLC. On an MS platform, these 2 species are not distinguishable. Including a chromatogram of the 2 species would be helpful to the readers.

      (7) The authors mention about enantiomeric shift of serine metabolism during neural development, which appears to be a discussion of prior published data from Hubbard et al, 2013, Burk et al, 2020, and Bella et a,l 2021 in Supplementary Figure panels 8 A-E. This should not be presented as a figure panel, as it gives the false impression that the authors have performed the experiment, which is clearly not the case. However, its discussion can well serve as part of the manuscript in the discussion section.

      (8) The entire presentation of the section on enantiomeric shift of serine metabolism during neural development (lines 274-312) is a discussion and should be part of the discussion section and not in the results section. This is misleading.

      (9) The discussion section is not well written. There is no mention of recent work related to D-serine that has a direct bearing on its metabolic properties. In the discussion section, paragraph 1, the authors mention that their work demonstrates the selective synthesis of D-serine in mature neurons as opposed to neural progenitor cells. This concept has been referred to in prior publications:

      (a) Spatiotemporal relationships among D-serine, serine racemase, and D-amino acid oxidase during mouse postnatal development. PMID:14531937.

      (b) D-cysteine is an endogenous regulator of neural progenitor cell dynamics in the mammalian brain. PMID:34556581.

      (10) In the abstract, in lines 101 and 102, the authors mention "how D-serine contributes to cellular metabolism beyond neurotransmission remains largely unknown". In 2023, a paper in Stem Cell Reports by Roychaudhuri et al (PMID:37352848) showed that D and L-serine availability impacts lipid metabolism in the subventricular zone in mice, affecting proliferative properties of stem-cell derived neurons using a comprehensive lipidomics approach. There is no mention of this work even in the discussion section, as it bears directly on L and D-serine availability in neurons, which the authors are investigating. In the discussion section in lines 410-411, the authors mention the role of D-serine in neurogenesis, but surprisingly don't refer to the above reference. The role of D-serine in neurogenesis has been demonstrated in the Sultan et al (lines 855-857) and Roychaudhuri et al references.

      (11) Both D-serine and the structurally similar stereoisomer D-cysteine (sulfur versus oxygen atom) have a bearing on 1C metabolism and the folate cycle. With reference to the folate cycle, Roychaudhuri et al in 2024 (PMID:39368613) have shown in rescue experiments in mice that supplementing a higher methionine diet provides folate cycle precursors to rescue the high insulin phenotype in SR-deficient mice. Since 1C metabolism is being discussed in this manuscript, the authors seem to overlook prior work in the field and not include it in their discussion, even when it is the same enzyme (SR) that synthesizes both serine and cysteine. Since the field of D-amino acid research is in its infancy, the authors must make it a point to include prior work related to D-serine at least, and not claim that it is not known. The known D-stereoisomers are not many, hence any progress in the area must include at least a discussion of the other structurally related stereoisomers.

      (12) Racemases (serine and aspartate) in general are promiscuous enzymes and known to synthesize other stereoisomers in addition to D-serine, D-cysteine, and D-aspartate. A few controls, like D-aspartate, D-cysteine, or even D-alanine must be included in their study to demonstrate the specific actions of D-serine, especially in the N2a cell treatment experiments. Cysteine and Serine are almost identical in structure (sulfur versus oxygen atom), and both are synthesized by serine racemase (published). Cysteine has also been very recently shown to inhibit tumor growth and neural progenitor cell proliferation. (PMIDs: 40797101 and 34556581). How the authors' work relates to the existing findings must be discussed, and this would put things in perspective for the reader.

    3. Reviewer #2 (Public review):

      Summary:

      This study by Suzuki et al. reports an interesting stereo-selective role of D-serine in regulating one-carbon metabolism during neurodevelopment to adapt the functional transition, probably through the competition with mitochondrial transport of L-serine. The authors provide a multi-layered set of evidence, including metabolomics, enzyme assays, mitochondrial transport competition, and functional assays in immature/neural progenitor cells, to build up a conceptual integration of D-serine as both a neurotransmitter and a metabolic regulator in the central neural system, which raises a broad potential interest to the neuroscience and metabolism communities.

      Strengths:

      This work provides a conceptual advance that D-serine not only serves as a traditional neurotransmitter in the central neural system but also critically contributes to metabolic regulation of neural cells. The authors performed solid metabolomic assays to validate the suppressive effect of D-serine on the one-carbon metabolic pathway, providing some evidence that D-serine competitively inhibits mitochondrial serine transport, but not directly impairs SHMT2 enzymatic activity. All these data indicate a critical role of D-serine synthesis during neural maturation and suggest a potential translational strategy for targeting serine metabolism in neural tumors.

      Weaknesses:

      (1) The detailed mechanism by which D-serine competes with L-serine for its mitochondrial transport is not investigated. For example, although the authors made some discussion, they did not provide direct genetic or biochemical evidence linking these effects to the specific transporters, such as SFXN1.

      (2) Unlike tumor cells, where SHMT2 usually plays a predominant role in catalyzing serine/THF-derived one-carbon metabolism, normal cells may employ both SHMT1 and SHMT2 to do the work. Even under certain conditions that SHMT2-mediated one-carbon metabolism is suppressed, the activity of SHMT1 could be elevated for compensation. Thus, it is important to investigate whether D-serine affects SHMT1 activity or changes the balance between SHMT1- and SHMT2-mediated one-carbon metabolism. To this aim, the authors are strongly encouraged to perform a metabolic flux assay (MFA) by using 13C-labeled L-serine in the model cells in the presence and absence of D-serine.

      (3) A defect in serine-derived one-carbon metabolism may cause multiple cellular stress responses. It is valuable to detect whether cellular NADPH/NADH, GSH, or ROS is altered before and after D-serine treatment.

      (4) The physiological relevance between D-serine and neural cell maturation/death should be further tested and discussed, since the dosage of D-serine used in the in vitro assay is much higher than that in physiological conditions.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript presents a comprehensive and well-executed investigation into the metabolic role of D-serine in the central nervous system. The authors provide solid evidence that D-serine competitively inhibits mitochondrial L-serine transport, thereby impairing one-carbon metabolism. This stereoselective mechanism reduces glycine and formate production, suppresses cellular proliferation, and induces apoptosis in immature neural cells and glioblastoma stem cells. Developmental analyses further reveal a physiological enantiomeric shift in serine metabolism during neurogenesis, aligning with the transition from proliferation to maturation. Overall, the study bridges developmental neurobiology, cancer metabolism, and amino acid transport, uncovering a previously unrecognized metabolic function of D-serine beyond its role in neurotransmission.

      Strengths:

      (1) The discovery that D-serine inhibits one-carbon metabolism by competing for mitochondrial L-serine transport-rather than through enzymatic inhibition or receptor-mediated signaling-represents a significant and previously underappreciated mechanism. This finding has broad implications for understanding metabolic regulation during neurodevelopment and offers potential relevance for targeting metabolic vulnerabilities in cancer.

      (2) The authors integrate metabolomics, mitochondrial transport assays, molecular dynamics simulations, genetic and pharmacologic perturbations, transcriptomics, and both in vitro and ex vivo models. The breadth of experimental approaches, combined with the coherence of the findings across systems, provides strong support for the central conclusions and enhances the overall impact of the study.

      (3) The temporal shift in D-/L-serine levels during neurodevelopment is elegantly linked to the transition from proliferative to mature neuronal states. The selective vulnerability of neural progenitors and tumor cells-contrasted with the resistance of mature neurons-highlights a biologically meaningful and potentially targetable metabolic distinction.

      Weaknesses:

      (1) While the authors attribute D-serine's metabolic effects to competition with mitochondrial L-serine transport, the specific identity of the transporter(s) mediating this process remains undefined. This represents a meaningful mechanistic gap, as the central conclusion depends on D-serine limiting mitochondrial L-serine availability to inhibit one-carbon metabolism.

      (2) The effective concentrations of D-serine used in vitro (IC₅₀ ≈ 1-2 mM) exceed typical brain levels (~0.3 mM). While the authors acknowledge this, a more focused discussion on whether higher local D-serine concentrations could arise in specific microenvironments - such as synaptic compartments, tumor niches, or pathological states-would help contextualize the in vitro findings and strengthen their physiological relevance. For example, disruptions in D-serine clearance or altered expression of serine racemase and transporters in disease contexts could lead to localized accumulation. Moreover, differences between extracellular and intracellular D-serine pools - and the mechanisms governing their regulation - may further influence its metabolic impact in vivo.

      (3) While the manuscript focuses on neural stem/progenitor cells and neural tumors, it remains unclear whether the anti-proliferative effects of D-serine are specific to neural lineages or extend to other highly proliferative non-neural cell types. A brief discussion addressing this point would help clarify the scope of D-serine's metabolic impact and whether its mechanism of action reflects a unique vulnerability in neural cells or a more general feature of proliferative metabolism. This distinction is particularly relevant for assessing the broader therapeutic potential of targeting mitochondrial L-serine transport.

    1. eLife Assessment

      Plasmodesmata are channels that allow cell-cell communication in plants; based on the functional similarities between facilitated transport within plasmodesmata and into the nucleus, the authors speculate that nuclear pore complex proteins might be involved in plasmodesmata function. If supported, this would transform our understanding of cell-to-cell communication in plants. The authors localize nuclear pore complex proteins to plasmodesmata using proteomics and heterologous overexpression; however, the data are incomplete since key controls for localization, functionality, and expression level of fluorescent protein fusions are absent.

    2. Reviewer #1 (Public review):

      Summary:

      Plasmodesmata are channels that allow cell-cell communication in plants; based on the functional similarities between facilitated transport within plasmodesmata and into the nucleus, the authors speculate that nuclear pore complex proteins might be involved in plasmodesmata function. In this manuscript, they localize nuclear pore complex proteins to plasmodesmata using proteomics and heterologous overexpression. They also document a possible plasmodesmata transport defect in a mutant affecting one nuclear pore complex protein.

      Strengths:

      The main strength of this manuscript is the interesting and novel hypothesis. This work could open exciting new directions in our understanding of plasmodesmata function and cell-cell communication in plants. They also localized many NUPs (12/35 Arabidopsis NUPs).

      Weaknesses:

      The main weakness of this manuscript is that the data are incomplete. While the authors appropriately and frequently acknowledge caveats to their data, two controls are essential to interpret the results that fluorescently-tagged NUPs localize to the plasmodesmata: (1) assessment of the expression level of these fluorescently-tagged NUPs to determine whether the plasmodesmata localization might be an overexpression artefact; (2) assessment of the function of the fluorescently-tagged NUPs, either by molecular complementation of a knockout mutant phenotype or by biochemical methods to test whether the fluorescently-tagged NUP incorporates into nuclear pore complexes. Conducting these experiments for even one fluorescently-tagged NUP would substantially strengthen this manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aim to address whether nuclear pore complex components localize and function at PD in plant cells to mediate cell-to-cell communication.

      Strengths:

      (1) Novelty and Significance:<br /> The core hypothesis, drawing parallels between PD and NPC transport, is highly original and addresses a critical gap in understanding plant intercellular communication. The idea that phase-separated domains formed by FG-NUPs could act as diffusion barriers at PD offers a plausible and sophisticated explanation for their complex transport properties, including size exclusion and facilitated translocation. This could fundamentally change how we view PD function.

      (2) Comprehensive Evidence:<br /> The study employs a rigorous and diverse set of experimental approaches, including a comprehensive bioinformatic analysis of both moss and Arabidopsis NUPs in available PD proteomic datasets, extensive imaging analysis of Nup localization in vivo, and functional transport assays using a loss-of-function nup mutant (cpr5). The transport assay is particularly important to provide functional evidence linking CPR5 to PD-mediated transport. The finding that callose levels were not significantly different in cpr5 mutants under these conditions is helpful and supports a distinct, callose-independent mechanism of transport regulation.

      (3) Objectivity:<br /> The authors are forthright in discussing the limitations and potential artifacts of their own data, clearly distinguishing between observations and definitive conclusions.

      Weaknesses:

      While the claims are generally justified as hypotheses or consistent observations, the authors themselves extensively detail the caveats, which are worth reiterating for clarity:

      (1) Potential Overexpression Artifacts in Localization:<br /> Although efforts were made to control expression levels, the authors acknowledge that transient overexpression could still lead to NUP accumulation at PD, either as a physiologically relevant accumulation under excess conditions or due to mis-targeting, or even as storage depots. The resolution of confocal microscopy also does not allow for a definitive conclusion on the nature of the location.

      (2) Proteomics Purity:<br /> The authors note that the presence of NUPs in PD fractions/proteomics cannot definitively rule out contamination, as PD cannot currently be purified to absolute homogeneity and is often contaminated with other organelles, including the nucleus.

      (3) CPR5 Mutant Interpretation:<br /> While cpr5 mutants exhibited reduced macromolecular transport, the authors state that they cannot exclude that the reduced transport is due to secondary effects in the cpr5 mutants, which show rather severe phenotypic defects. This is an important distinction, as CPR5 has known roles in defense responses and hormone signaling that could indirectly influence PD integrity, independent of callose deposition. The lack of effect on small molecule transport is a good control, but the broader pleiotropic effects of cpr5 mutants remain a consideration.

      (4) Conceptual Distinction between NPC and PD:<br /> The authors correctly point out that while similarities exist, the physical assembly of NUPs at PD must differ from that at the NPC due to the presence of the desmotubule and smaller cytoplasmic sleeve width at PD. Moreover, nucleocytoplasmic transport depends on karyopherin proteins that interact with the NPC central channel to complete the transport. Yet the role of karyopherins in this case is not clear. Therefore, the proposed "PD pore complex" may bear some NPC features, but not be identical.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript presents a step towards testing the hypothesis that plasmodesmata have homology to nuclear pores. The similarities between the two structures have long been noted as both structures allow the transport of proteins and nucleic acids, and both structures are composed of curved membranes. The manuscript has identified nuclear pore proteins (NUPs) in plasmodesmal protein fractions and uses live imaging in a non-endogenous system and functional assays of a mutant to propose that this might be a bona fide association.

      The conclusions the authors seek to draw are that: NUPs are present in plasmodesmal protein fractions; NUPs localise at plasmodesmata; NUPs might form a pore-gating complex at plasmodesmata, regulating non-specific (2xGFP) and specific (SHR) transport through plasmodesmata

      The authors then use these conclusions to propose the possibility that phase separation mediates transport through plasmodesmata. If there is phase separation at plasmodesmata or a nuclear pore-like complex, it would revolutionise the community. However, this data is insufficient to act as a cornerstone for such a discovery.

      Strengths:

      The strength of the manuscript lies in the boldness and novelty of the idea.

      Weaknesses:

      The weaknesses lie in the lack of informative controls. The authors' own assessments of their data suggest they agree with this - in their abstract alone, they point out that the transport defects they observe might be off-target effects, and suggest there is a requirement in the future to determine whether the NUPs are bona fide PD components.

      Across the proteomic and live imaging experiments, the conclusions could be stronger if they compared the NUP localisation and accumulation with ER proteins - the question of whether NUPs behave like other ER proteins is not addressed. As NUPs reside in the nuclear envelope, continuous with the ER, and the ER traverses plasmodesmata, a comparison between the NUPs and ER proteins would be extremely informative.

      Regarding the proteomic identification of NUPs in plasmodesmal fractions, the authors place significant weight on their own metric for PD enrichment, the PD score. As I understand it, this a metric derived from addition of two factors: a two component enrichment score that is the difference between intensity of peptides of a given protein in the PD fraction and cell wall fraction, added to the difference between intensity of peptides of a given protein in the PD fraction and total cell fraction, and a feature score that is a factor that describes representation of protein domains contained in said given protein in the plasmodesmal fraction relative to the representation of that domain in proteins in the whole proteome. The features chosen for analysis are not indicated, and the feature factor, as I understand it, is a score common to all proteins with a given feature. While each of the factors carries a measure of meaning and information, I do not understand how adding them is mathematically or biologically meaningful.

    1. eLife Assessment

      This important study demonstrates the potential of synthetic gene circuits to detect and target aberrant RAS activity in cancer cell lines. The circuit design is novel and the evidence supporting the claims is convincing. As a proof-of-concept, this will be of broad interest to researchers in synthetic biology and therapeutics development, while future work will be required to help translate this technology toward clinical applications in cancer therapeutics and address potential limitations of the strategy.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript presents a comprehensive study on the developing synthetic gene circuits targeting mutant RAS expressing cells. The aim of this study is to use these RAS targeting circuits as cancer cell classifiers and enable the selective expression of an output protein in correlation with RAS activity. The system is based on the bacterial two-component system NarX/NarL. A RAS-binding domain is fused to a NarX mutant either defective in the ATP binding (N509A) or the phosphorylation site (H399Q). Nanocluster formation of RAS-GTP reconstitutes an active histidine kinase sensor dimer that phosphorylates the response regulator NarL thus leading to the expression of an output protein. The integration of RAS-dependent MAPK responsive elements to express the RAS sensor components generates RAS circuits with an extended dynamic range between mutant and wild-type RAS. The selectivity of the RAS circuits is confirmed in a set of cancer cell lines expressing endogenous levels of mutant or wild-type RAS or oncogenes affecting RAS signaling upstream or downstream. Expression of the suicide gene HSV thymidine kinase as an outcome protein kills RAS-driven cancer cells demonstrating the functionality of the system.

      Strengths:

      This proof-of-concept study convincingly demonstrates the potential of synthetic gene circuits to target oncogenic RAS in tumor cell lines, act as RAS mutant cell classifier, and induce the killing of RAS-driven cells.

      Weaknesses:

      A therapeutic strategy based on of this four-plasmid system may be difficult to implement in RAS-driven solid cancers. However, potential solutions are discussed.

    3. Reviewer #2 (Public review):

      The manuscript describes an interesting approach towards designing genetic circuits to sense different RAS mutants in the context of cancer therapeutics. The authors created sensors for mutant RAS and incorporated feed-forward control that leverages endogenous RAS/MAPK signaling pathways in order to dramatically increase the circuits' dynamic range. The modularity of the system is explored through the individual screening of several RAS binding domains, transmembrane domains, and MAPK response elements, and the author further extensively screened different combinations of circuit components. This is an impressive synthetic biology demonstration that took it all the way to cancer cell lines. However, given the sole demonstrated output in the form of fluorescent proteins, the authors' claims related to therapeutic implications require additional empirical evidence or, otherwise, expository revision.

      Major comments:

      "These therapies are limited to cancers with KRASG12C mutations" is technically accurate. However, in this fast-moving field, there are examples such as MRTX1133 which holds the promise to target the very G12D mutation that is the focus of this paper. There are broader efforts too. It would help the readers better appreciate the background if the authors could update the intro to reflect the most recent landscape of RAS-targeting drugs.

      Only KRASG12D was used as a model in the design and optimization work of the genetic circuits. Other mutations should be quite experimentally feasible and comparisons of the circuits' performances across different KRAS mutations would allow for stronger claims on the circuits' generalizability. Particularly, the cancer cell line used for circuit validation harbored a KRASG13D mutation. While the data presented do indeed support the circuit's "generalizability," the model systems would not have been consistent in the current set of data presented.

      In Figure 2a, the text claims that "inactivation of endogenous RAS with NF1 resulted in a lower YFP/RBDCRD-NarX expression," but Figure 2a does not show a statistically significant reduction in expression of SYFP (measured by "membrane-to-total signal ratio [RU]).

      The therapeutic index of the authors' systems would be better characterized by a functional payload, other than florescent proteins, that for example induce cell death, immune responses, etc.

      Regarding data presented in "Mechanism of action" (Figure 2), the observations are interesting and consistent across different fluorescent reporters. However, with regard to interpretations of the underlying molecular mechanisms, it is not clear whether the different output levels in 2b, 2c, and 2d are due to the pathway as described by the authors or simply from varied expression levels of RBDCRD-NarX itself (2a) that is nonlinearly amplified by the rest of the circuit. From a practical standpoint, this caveat is not critical with respect to the signal-to-noise ratios in later parts of the paper. From a mechanistic interpretation standpoint, claims made forth in this section are not clearly substantiated. Some additional controls would be nice. For example, if the authors express NarXs that constitutively dimerize on the membrane, what would the RasG12D-responsiveness look like? Does RasG12D alter the input-output curve of NarL-RE? How would Figure 4f compare to a NaxR constitutively dimerized control that only relies on transcriptional amplification of the Ras-dependent promoters? It's also possible that these Ras could affect protein production at the post-transcriptional or even post-translational levels, which were not adequately considered.

      The text claims that "in contrast to what we saw in HEK293 overexpressing RAS (Figure 5d), the "AND-gate" RAS-targeting circuits do not generate higher output than the EF1a-driven, binding-triggered RAS sensor in HCT-116. Instead, the improved dynamic range results from decreased leakiness in HCT- 116k.o." Comparing the experiment from Figure 5d, which looks at activation in KRASG12D and KRASWT, to the experiments in Figure 6b-d, which looks at activation in HCT-116WT and HCT-116KO is misleading. In Fig 5d., cells are transfected with KRASG12D and KRASWT to emulate high levels of mutant RAS and high levels of wild-type RAS. In Figures 6b-d, HCT-116WT has endogenous levels of mutant RAS, while the KCT-116KO is a knock-out cell line, and does not have mutant or WT RAS. Therefore, the improved dynamic range or "decreased leakiness in HCT-116KO" in comparison to Figure 5d. is more comparable to the NF1 condition from Figure 2, which deactivates endogenous RAS. While this may not be feasible, the most accurate comparison would have been an HCT-116KO line with KRASWT stably integrated.

      We couldn't locate the citation or discussion of Figure 4d in the text. Conversely, based on the text description, Figure 6g would contain exciting results. But we couldn't find Figure 6g anywhere ... unless it was a typo and the authors meant Figure 6f, in which case the cool results in Figure S8 could use more elaboration in the main text.

      Comments on revisions:

      Now that the authors have extensively addressed my comments through text and additional experiments, I am supportive of its conclusions. I thank them for the rigorous updates and congratulate them on an important piece demonstrating the potential of synthetic biology circuits.

    4. Reviewer #3 (Public review):

      Summary:

      Mutations that result in consistent RAS activation constitute a major driver of cancer. Therefore, RAS is a favorable target for cancer therapy. However, since normal RAS activity is essential for the function of normal cells, a mechanism that differentiates aberrant RAS activity from normal one is required to avoid severe adverse effects. To this end, the authors designed and optimized a synthetic gene circuit that is induced by active RAS-GTP. The circuit components, such as RAS-GTP sensors, dimerization domains, and linkers. To enhance the circuit selectivity and dynamic range, the authors designed a synthetic promoter comprised of MAPK-responsive elements to regulate the expression of the RAS sensors, thus generating a feed-forward loop regulating the circuit components. Circuit outputs with respect to circuit design modification were characterized in standard model cell lines using basal RAS activity, active RAS mutants, and RAS inactivation.

      This approach is interesting. The design is novel and could be implemented for other RAS-mediated applications. The data support the claims, and while this circuit may require further optimization for clinical application, it is an interesting proof of concept for targeting of aberrant RAS activity. I therefore recommend accepting this paper.

      Strengths:

      Novel circuit design, through optimization and characterization of the circuit components, solid data.

      Weaknesses:

      This manuscript could significantly benefit from testing the circuit performance in more realistic cell lines, such as patient-derived cells driven by RAS mutations, as well as in corresponding non-cancer cell lines with normal RAS activity. Furthermore, testing with therapeutic output proteins in vitro, and especially in vivo, would significantly strengthen the findings and claims.

      Summary:

      Given the revision made, I would recommend a minor revision that discusses the specificity limitations of this experimental setup.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The manuscript by Senn and colleagues presents a comprehensive study on the developing synthetic gene circuits targeting mutant RAS-expressing cells. This study aims to exploit these RAS-targeting circuits as cancer cell classifiers, enabling the selective expression of an output protein in correlation with RAS activity. The system is based on the bacterial two-component system NarX/NarL. A RAS-binding domain, the RBDCRD domain of the RAS effector protein CRAF, is fused to the histidine kinase domain, which carries an inactivating amino acid exchange either in its ATP-binding site (N509A) or in its phosphorylation site (H399Q). Dimerization or nanocluster formation of RAS-GTP reconstitutes an active histidine kinase sensor dimer that phosphorylates the response regulator NarL. The phosphorylated DNA-binding protein NarL, fused to the transcription activator domain VP48, binds its responsive element and induces the expression of the output protein. In comparison to mutated RAS, the effect of the RAS activator SOS-1 and the RAS inhibitor NF1 on the sensing ability as well as the tunability of the RAS sensor were examined. A RAS targeting circuit with an AND gate was designed by expressing the RAS sensor proteins under the control of defined MAPK response elements, resulting in a large increase in the dynamic range between mutant and wild-type RAS. Finally, the RAS targeting circuits were evaluated in detail in a set of twelve cancer cell lines expressing endogenous levels of mutant or wild-type RAS or oncogenes affecting RAS signaling upstream or downstream. 

      Strengths: 

      This proof-of-concept study convincingly demonstrates the potential of synthetic gene circuits to target oncogenic RAS in tumor cell lines and to function, at least in part, as an RAS mutant cell classifier. 

      Weaknesses: 

      The use of an appropriate "therapeutic gene" might revert the oncogenic properties of RAS mutant cell lines. However, a therapeutic strategy based on this four-plasmid-based system might be difficult to implement in RAS-driven solid cancers. 

      Thank you for the insightful comments. We agree that the delivery of a four-plasmid system represents a major challenge for translating RAS-targeting circuits into therapeutic applications. Reducing the number of plasmids –ideally consolidating all components onto a single vector– will be critical for clinical implementation.

      Viral delivery is generally the most efficient strategy for DNA-based therapies, but viral vectors have limited packaging capacities, which differ by virus type[1]. The RAS_sensor_F.L.T. circuit under the EF1α promoter requires ~7.7 kb for the sensing components alone, excluding the output gene. This exceeds the packaging limit of adeno-associated virus (AAV) and is at the upper boundary for lentiviral vectors but could potentially be accommodated by larger vectors such as γ-retroviruses, poxviruses, or herpesviruses¹. Co-transduction with dual AAVs [2] or ongoing engineering to expand packaging capacity [3] may also offer future solutions. An additional route to reduce construct size could be alternative splicing, especially given redundancy between the two NarX fusion proteins[4]. 

      An advantage of our current architecture is that synthetic response elements replace constitutive promoters, reducing construct size. For example, the MAPK-driven PY2_NarX&NarL circuits range between 4.9 and 5.2 kb depending on the transactivation domain, bringing them within AAV packaging limits for the sensor module[5], though co-delivery of the output gene would still be necessary. For lentiviruses, this is within the packaging capacity of 8 kb<sup>1</sup> and would allow for inclusion of ~3 kb output genes.

      Still, assembling multiple modules onto a single vector introduces new challenges, including possible crosstalk or interference between neighboring promoters [6]. For example, placing the output gene too close to MAPK response elements may trigger unwanted MAPKdependent expression, potentially bypassing the intended AND-gate logic. Moreover, expressing three genes under separate response elements may shift expression ratios and reduce circuit functionality. Nonetheless, the absence of constitutive promoters and the RAS-dependence of MAPK response elements could provide partial robustness, since even unintended activation would still reflect RAS signaling to some extent. Further, our data (Fig. 1d) show that some deviation in component levels can be tolerated, provided all parts are sufficiently expressed. Nonetheless, assembling the circuit on a single vector will require careful design and rigorous validation to ensure optimal performance. 

      While addressing this is beyond the scope of the current study, we agree that future efforts should focus on vector consolidation and delivery strategies. We now include a paragraph discussing these challenges in the revised manuscript.

      Reviewer #2 (Public review): 

      The manuscript describes an interesting approach towards designing genetic circuits to sense different RAS mutants in the context of cancer therapeutics. The authors created sensors for mutant RAS and incorporated feed-forward control that leverages endogenous RAS/MAPK signaling pathways in order to dramatically increase the circuits' dynamic range. The modularity of the system is explored through the individual screening of several RAS binding domains, transmembrane domains, and MAPK response elements, and the author further extensively screened different combinations of circuit components. This is an impressive synthetic biology demonstration that took it all the way to cancer cell lines. However, given the sole demonstrated output in the form of fluorescent proteins, the authors' claims related to therapeutic implications require additional empirical evidence or, otherwise, expository revision. 

      Thank you very much for the thoughtful evaluation, precise critique, and constructive suggestions.

      As correctly noted, our study initially focused on developing and optimizing input sensors and processing units for synthetic gene circuits targeting mutated RAS. To address the concern regarding therapeutic relevance, we have now incorporated functional validation using a clinically relevant output protein: herpes simplex virus thymidine kinase (HSV-TK), which converts ganciclovir into a cytotoxic compound. We replaced the mCerulean reporter with HSV-TK and tested the resulting RAS-targeting circuits in both RAS-mutant and wild-type cancer cell lines. The results, now presented in a new chapter (Figure 8 and Supplementary Fig. 14), demonstrate robust killing of RAS-mutant cells and support the potential therapeutic utility of these circuits.

      Major comments: 

      "These therapies are limited to cancers with KRASG12C mutations" is technically accurate. However, in this fast-moving field, there are examples such as MRTX1133 which holds the promise to target the very G12D mutation that is the focus of this paper. There are broader efforts too. It would help the readers better appreciate the background if the authors could update the intro to reflect the most recent landscape of RAS-targeting drugs. 

      Thank you for this helpful suggestion. We have updated the introduction to reflect the rapidly evolving landscape of RAS-targeting therapies, including the development of inhibitors for nonG12C mutations such as KRASG12D (e.g., MRTX1133). Given the pace and breadth of these advances, we also refer readers to a recent comprehensive review that provides an in-depth overview of current RAS-targeting strategies.

      Only KRASG12D was used as a model in the design and optimization work of the genetic circuits. Other mutations should be quite experimentally feasible and comparisons of the circuits' performances across different KRAS mutations would allow for stronger claims on the circuits' generalizability. Particularly, the cancer cell line used for circuit validation harbored a KRASG13D mutation. While the data presented do indeed support the circuit's "generalizability," the model systems would not have been consistent in the current set of data presented. 

      To further support the generalizability of our RAS sensor, we titrated plasmid doses for a panel of oncogenic RAS variants, including multiple KRAS mutants as well as HRAS<sup>G12D</sup and NRAS<sup>G12D</sup. Across all tested variants, we observed concentration-dependent activation of the RAS sensor. At 1.67 ng/well, the sensor output for all oncogenic RAS variants was at least as high as that for KRAS<sup>G12D</sup>, suggesting that the behavior observed in our initial design and optimization is representative of a broader set of RAS mutations.

      We also noted that high overexpression of wildtype HRAS and NRAS can lead to substantial activation of the sensor, exceeding that observed with wildtype KRAS. This underscores the importance of considering all RAS isoforms when assessing circuit specificity and avoiding potential off-target activation in healthy cells.

      In Figure 2a, the text claims that "inactivation of endogenous RAS with NF1 resulted in a lower YFP/RBDCRD-NarX expression," but Figure 2a does not show a statistically significant reduction in expression of SYFP (measured by "membrane-to-total signal ratio [RU]). 

      Thank you for pointing this out. We repeated the experiment to reassess the effect of NF1 on RBDCRD-NarX-SYFP2 expression and were able to confirm statistical significance. Accordingly, we have replaced Figure 2a with updated data. To facilitate better visual comparison across conditions, we also standardized the y-axis range across all relevant flow cytometry plots.

      The therapeutic index of the authors' systems would be better characterized by a functional payload, other than florescent proteins, that for example induce cell death, immune responses, etc. 

      Thank you for this insightful comment. We agree that fluorescent reporters are limited to approximating expression levels, and that a functional output protein is more appropriate for assessing therapeutic potential. To address this, we replaced mCerulean with the therapeutic suicide-gene, HSV-TK, and tested the circuits in RAS-mutant and wild-type cancer cell lines. These experiments demonstrate that our circuits can express functional proteins and induce cell death in two RAS-mutant cell lines while showing low toxicity in a RAS wild type cell line (new chapter including Fig. 8 and Supplementary Fig.14). 

      Comparing confluence of cells transfected with the RAS-targeting circuits to cells transfected with non-toxic GFP-output negative control or the constitutively expressed EF1αHSV-TK positive control allowed us to estimate the killing-strength of the circuits in each cell line. In RAS-mutant HCT-116 the confluence curves were similar to the positive control, indicating effective killing (Fig. 8b). At lower DNA dose in HCT-116, or in SW620 with lower transfection efficiency, the killing of transfected RAS-driven cancer cells was less pronounced, falling approximately midway between the controls (Fig. 8g&j). In the RAS wild type cell line, Igrov-1, the RAS circuits showed continued growth similar to the non-toxic negative control (Fig. 8d), suggesting low toxicity. 

      While this may indicate low circuit activation in Igrov-1, an alternative explanation for the low toxicity could also be insufficient transfection efficiency. Testing in SW620 –which had similar transfection efficiency as Igrov-1 (Supplementary Fig. 14a)– showed that this moderate transfection efficiency was sufficient for RAS-circuit-dependent killing (Fig. 8d & 8g), supporting the notion of low activation in Igrov-1 and selective cytotoxicity in RAS-driven cancer cells.

      Nonetheless, it is important to note that comparisons between the cell lines need to be interpreted cautiously because of inter-cell line differences in transfection, growth, and HSV-TK/ganciclovir (GCV)-sensitivity (Supplementary Fig. 14) and further validation will be essential. 

      A conclusive assessment will require more efficient delivery strategies, such as viral vectors (as discussed above). Efficient delivery would allow to investigate selectivity in a more realistic setting with patient-derived RAS-mutant cancer and healthy cells as well as testing in an vivo model. While beyond the scope of the current study, we view it as a critical direction for future work and have therefore added a paragraph about this to our discussion.

      Regarding data presented in "Mechanism of action" (Figure 2), the observations are interesting and consistent across different fluorescent reporters. However, with regard to interpretations of the underlying molecular mechanisms, it is not clear whether the different output levels in 2b, 2c, and 2d are due to the pathway as described by the authors or simply from varied expression levels of RBDCRD-NarX itself (2a) that is nonlinearly amplified by the rest of the circuit. From a practical standpoint, this caveat is not critical with respect to the signal-to-noise ratios in later parts of the paper. From a mechanistic interpretation standpoint, claims made forth in this section are not clearly substantiated. Some additional controls would be nice. For example, if the authors express NarXs that constitutively dimerize on the membrane, what would the RasG12Dresponsiveness look like? Does RasG12D alter the input-output curve of NarL-RE? How would Figure 4f compare to a NaxR constitutively dimerized control that only relies on transcriptional amplification of the Ras-dependent promoters? 

      This is a great point. We agree that the observed differences in output levels (Fig. 2) could arise from non-linear amplification due to increased expression of RBDCRD-NarX, rather than RAS binding or dimerization alone. To further investigate this possibility, we performed titrations of KRAS<sup>G12D</sup> in combination with the functional RAS sensor and a series of constitutively active and inactive control constructs (Supplementary Fig. 4).

      Inactive controls lacking NarX dimerization showed only a modest increase in output expression, similar to direct mCerulean expression under the EF1α promoter. Transfection of the output plasmid alone, with NarL, or with NarL and non-RAS-binding RBD<sup>R89L</sup> CRD<sup>C168S</sup> -NarX, resulted in minimal RAS-dependent increases (Supplementary Fig. 4a). Importantly, after normalization using the EF1α-driven mCherry transfection control, these effects were fully or even slightly over-compensated (Supplementary Fig. 4b), showing that we don’t include the effect of EF1α-dependent increased leakiness in the data presented throughout the manuscript, but also that –due to the normalization– we potentially underestimate the dynamic range of the RAS-targeting circuits.

      In contrast, constitutively dimerizing NarX controls (both membrane-bound and cytosolic dimerized via the FKBP–FRB system) exhibited a more pronounced RAS-dependent increase in output –even after normalization– confirming the presence of non-linear amplification (up to 3–4fold). However, this effect was still lower than that achieved with the functional RAS-binding sensor (8-fold at 1.67 ng/well KRAS<sup>G12D</sup>; 14-fold at 5–15 ng/well), indicating that the increase in expression of the sensor parts is not the full explanation of the effect we see. Instead, RAS binding and dimerization further amplify the response and are necessary for full activation (Supplementary Fig. 4b).

      We also addressed the reviewer’s suggestion by testing the MAPK response elements used in Fig. 4f with constitutively dimerizing NarX. These controls generally showed lower fold changes between KRAS<sup>G12D</sup>; and KRAS<sup>WT</sup> than the corresponding RAS-binding circuits  (Supplementary Fig. 7), with one exception: the combination of SRE_NarX and PY2_NarL-VP48. 

      Together, these data show that non-linear amplification via increased expression and dimerization contributes to output activation. However, RAS binding and induced dimerization of the NarX sensor are required for full functionality and enhanced signal strength. This underscores that integrating the MAPK response elements with the binding-based RAS sensor into RAS-targeting circuits generally improves the distinction between cells with KRAS<sup>G12D</sup>;  and KRAS<sup>WT</sup> and that it was the combination that allowed to reach maximal fold changes.

      It's also possible that these Ras could affect protein production at the post-transcriptional or even post-translational levels, which were not adequately considered. 

      Thank you for this comment. We now mention in the manuscript the potential mechanisms by which (over-)activated RAS or MAPK signaling can increase protein synthesis. We cite relevant reports of the mechanisms we found, including upregulation of translational initiation and machinery[10]  and ribosomal biogenesis[11].

      The text claims that "in contrast to what we saw in HEK293 overexpressing RAS (Figure 5d), the "AND-gate" RAS-targeting circuits do not generate higher output than the EF1a-driven, bindingtriggered RAS sensor in HCT-116. Instead, the improved dynamic range results from decreased leakiness in HCT- 116k.o." Comparing the experiment from Figure 5d, which looks at activation in KRASG12D and KRASWT, to the experiments in Figure 6b-d, which looks at activation in HCT-116WT and HCT-116KO is misleading. In Fig 5d., cells are transfected with KRASG12D and KRASWT to emulate high levels of mutant RAS and high levels of wild-type RAS. In Figures 6b-d, HCT-116WT has endogenous levels of mutant RAS, while the KCT-116KO is a knock-out cell line, and does not have mutant or WT RAS. Therefore, the improved dynamic range or "decreased leakiness in HCT-116KO" in comparison to Figure 5d. is more comparable to the NF1 condition from Figure 2, which deactivates endogenous RAS. While this may not be feasible, the most accurate comparison would have been an HCT-116KO line with KRASWT stably integrated. 

      Thank you for this input. We understand that comparing the results from HEK293 cells transfected with KRAS<sup>G12D</sup>;  or KRAS<sup>WT</sup> (Fig. 5d) to those from HCT-116<sup>WT</sup>    and HCT-116<sup>k.o</sup>. cells (Fig. 6b–d) may be misleading if interpreted as a direct comparison of RAS signaling levels. Our intent was not to compare HEK293 with KRAS<sup>WT</sup> directly to HCT-116<sup>k.o</sup>.., but rather to contrast the behavior of the EF1α-driven RAS sensor and the MAPK-responsive RAS-targeting circuits within each cell line context.

      Specifically, we observed that in HEK293 cells expressing KRAS<sup>G12D</sup>, the MAPK-based RAS-targeting circuits produced higher output than the EF1α-expressed RAS sensor. In contrast, in HCT-116<sup>WT</sup> cells, the EF1α-expressed RAS sensor resulted in higher output levels than the RAS-targeting circuits. Despite this, the MAPK-driven circuits showed an improved dynamic range compared to the EF1α-expressed RAS sensor in HCT-116, due to the reduced background expression in the HCT-116<sup>k.o</sup>.. cells. We have revised the manuscript text to clarify this distinction.

      We agree that an HCT-116<sup>k.o</sup> cell line with stable integration of KRAS<sup>WT</sup> would provide a more direct comparison. Nonetheless, HCT-116<sup>k.o</sup>.. cells still express endogenous NRAS and HRAS, both of which are capable of activating the RAS sensor (as shown in Fig. 1g). Therefore, we believe that HCT-116<sup>k.o</sup>. cells are more comparable to HEK293 with KRAS<sup>WT</sup> than to the NF1 condition in Fig. 2, in which all endogenous RAS isoforms are inactivated.

      We couldn't locate the citation or discussion of Figure 4d in the text. Conversely, based on the text description, Figure 6g would contain exciting results. But we couldn't find Figure 6g anywhere ... unless it was a typo and the authors meant Figure 6f, in which case the cool results in Figure S8 could use more elaboration in the main text. 

      Thank you for this helpful observation. The figure references were indeed incorrect due to a typo. The results discussed in the text refer to Figure 6f (not 6g), which is now Figure 7a in the revised version. To further highlight these findings, we have added a new Figure 7b that better illustrates how different MAPK response elements enabled us to identify, for each RAS-mutant cell line, a RAS-targeting circuit that showed stronger activation than in all RAS wild-type lines. We have also expanded the corresponding section in the main text to elaborate on these results and their significance.

      Reviewer #3 (Public review): 

      Summary: 

      Mutations that result in consistent RAS activation constitute a major driver of cancer. Therefore, RAS is a favorable target for cancer therapy. However, since normal RAS activity is essential for the function of normal cells, a mechanism that differentiates aberrant RAS activity from normal one is required to avoid severe adverse effects. To this end, the authors designed and optimized a synthetic gene circuit that is induced by active RAS-GTP. The circuit components, such as RAS-GTP sensors, dimerization domains, and linkers. To enhance the circuit selectivity and dynamic range, the authors designed a synthetic promoter comprised of MAPK-responsive elements to regulate the expression of the RAS sensors, thus generating a feed-forward loop regulating the circuit components. Circuit outputs with respect to circuit design modification were characterized in standard model cell lines using basal RAS activity, active RAS mutants, and RAS inactivation. 

      This approach is interesting. The design is novel and could be implemented for other RASmediated applications. The data support the claims, and while this circuit may require further optimization for clinical application, it is an interesting proof of concept for targeting aberrant RAS activity. 

      Strengths: 

      Novel circuit design, through optimization and characterization of the circuit components, solid data. 

      Weaknesses: 

      This manuscript could significantly benefit from testing the circuit performance in more realistic cell lines, such as patient-derived cells driven by RAS mutations, as well as in corresponding non-cancer cell lines with normal RAS activity. Furthermore, testing with therapeutic output proteins in vitro, and especially in vivo, would significantly strengthen the findings and claims. 

      Thank you very much for the thoughtful and supportive comments. We fully agree with the reviewer’s suggestions for improving the translational potential of the RAS-targeting circuits.

      As a first step toward therapeutic relevance, we replaced the fluorescent reporter with HSV-TK, a clinically validated suicide gene, and demonstrated killing in RAS-mutant cancer cell lines. This is described above and in the new section of the manuscript (Figure 8).

      We also agree that testing in patient-derived cancer cells and especially healthy cells with wild-type RAS activity will be essential. However, testing in primary or patient-derived cells presents delivery challenges: transient transfection of our current four-plasmid system is unlikely to achieve sufficient expression. As discussed in our response to Reviewer #1, development of a more efficient delivery strategy –such as viral vector-based delivery– is a necessary next step.

      Once a delivery system is established, identifying relevant off-target tissues throughout the body with high physiological RAS signaling will be key to assessing selectivity. While comparative data on RAS activation across healthy tissues are scarce[12,13], recent atlases of transcription factor activity[14,15] provide insights to identify off-target cells with high activation of RAS-dependent transcription factors and may even approximate RAS activity across healthy tissue. Alternatively, our single-input sensors for RAS and MAPK pathway activity could be used in vivo to identify off-target cells based on endogenous activity.

      Once relevant target and off-target cells have been identified, patient-derived cancer and healthy cells can help select and adapt cancer-specific RAS-targeting circuits and nominate therapeutic candidates for further safety and efficacy assessment[6,8].

      Reviewer #1 (Recommendations for the authors): 

      For the most part, the data in this study are very convincing and very well presented. The cartoons make it easier to understand the complex experimental setups. 

      (1) Did the authors use wild-type Sos-1 or a constitutively active membrane-bound catalytic domain in their studies? How is SOS-1 activated when in case Sos-1 wild-type was used? 

      Thank you for this feedback. We used the constitutively active catalytic domain of Sos-1 (AA5641049; PDB ID 2II0). 

      (2) Figure 1f: In case of KRAS-G12D, it looks like the output expression does not really correlate with the RAS-GTP level. Can the authors give an explanation? 

      Thank you for this interesting question. We believe the observed discrepancy arises primarily from differences in the sensitivity and readout dynamics of the two assays. The RAS-GTP pulldown ELISA appears insufficiently sensitive to detect small changes in RAS-GTP levels at lower KRAS<sup>G12D</sup> plasmid doses (0.19, 0.56, or 1.67 ng). Only at 5 ng and 15 ng do we observe clear increases in RAS-GTP signal (25% and 700%, respectively). In contrast, the RAS sensor shows strong activation already in the 0.56–5 ng range but begins to saturate at higher doses (see Figure 1f and Figure 1e).

      Beyond the differing technical sensitivities of the ELISA (plate reader) and flow cytometry, an important conceptual distinction may further explain this behavior: the RAS sensor likely integrates RAS signaling over time. Once NarX binds RAS-GTP and dimerizes, it activates NarL, triggering mCerulean expression. If the rate of mCerulean production exceeds its degradation, signal accumulates throughout the assay duration. Thus, the flow cytometry readout reflects time-integrated signaling, allowing small differences in RAS-GTP to be amplified into measurable differences in output—especially at low input levels. This may explain why flow cytometry detects circuit activation earlier and more steeply than the pulldown assay, which provides a snapshot of RAS-GTP abundance at a single time point and saturates less readily at high input levels.

      Together, these factors likely explain the observed differences in signal dynamics: the RAS sensor exhibits steep activation followed by saturation at high plasmid doses (flow cytometry), while the ELISA shows limited sensitivity at low doses but a broader linear range at higher doses.

      (3) Figure 2b: It appears that even in the case of KRAS-G12D and Sos-1, only a few cells are positive. Does this result depend on low cell density, low transfection efficiency, or a wide range of the expression level? As a control, nuclear staining could be shown. 

      Thank you for this question. In the experiment shown in Figure 2b, our goal was to assess the membrane localization of the RBD^CRD-NarX-SYFP2 construct, which serves as a proxy for RAS-bound sensor. To enable accurate computational segmentation and separation of membrane signal from adjacent cells, we intentionally reseeded cells at low density in glassbottom plates for confocal imaging.

      The observed variability in signal likely reflects a combination of transient transfection and heterogeneous expression levels. While the overall transfection efficiency was approximately 70%, expression varied between individual cells. To account for this, we analyzed the membrane-to-total signal ratio per cell, which internally normalizes the membrane signal to the total cellular expression of SYFP2 and controls for differences in transfection efficiency.

      In response to the reviewer’s suggestion, we have updated the figure to include nuclear staining to aid interpretation. We would like to emphasize, however, that the images are intended to illustrate subcellular localization per cell, not expression frequency or intensity across the population.

      Minor points 

      (1) Figure 1b: "The third plasmid expresses NarL, .." should be changed to "The third plasmid expresses NarL-VP48, .." 

      Done

      (2) Figure 1c, right part: The orange arrow should be labeled NarX-H399Q (not N509A). 

      Done

      (3) Supplementary Table 6 and 7: [cells/wells] - should probably be [cells 10*3/well]. 

      Thank you for these points, we updated the manuscript accordingly

      Reviewer #2 (Recommendations for the authors): 

      Minor comments: 

      (1) N509A seems mislabeled in Figure 1b. 

      (2) It would help the readers if the authors could elaborate a bit on what is known about the RBD and CRD mutations used here. 

      Thank you for the input, we added a paragraph in the paper to expand on the effect of these commonly used mutations.

      (3) The KRASWT&Sos1 condition is not explained within the text for Figure 1f, which is the first figure with the KRASWT&Sos1 condition, but rather later on for Figure 2a. Adding a description of this condition to the discussion of Figure 1f would add clarity to this figure. 

      Thank you, we corrected this.

      (4) Citing AlphaFold2 structural predictions as having "revealed that longer linkers between the sensor's RBDCRD and NarX-derived domains could bring the NarX domains into closer proximity" is probably an overstatement. AlphaFold2 generally has low confidence in the placement of long flexible linkers, and the longer linkers in the illustration could facilitate NarX and NarL being even farther apart than they are in the original design. 

      Thank you for this input. We agree that AlphaFold2 predictions generally have low confidence in the placement of long, flexible linkers, and we did not intend to imply that the structural models were predictive of actual linker conformations. Rather, the models were used heuristically to generate the hypothesis that longer linkers might facilitate better positioning of the NarX domains for dimerization.

      As described in the Methods, we manually rotated the flexible linker regions to explore plausible conformations. These exploratory models showed that with a short (1x GGGGS) linker, it was more challenging to bring the NarX domains into close proximity, whereas longer linkers allowed greater positional flexibility. This modeling exercise provided a structural rationale for experimentally testing longer linkers. We have revised the manuscript text to clarify that the structural predictions were used to motivate linker design –not to validate or predict structural outcomes.

      (5) Figure 3b shows that the fold change (KRASG12D/KRASWT) is higher at shorter linker lengths and lower at longer linker lengths, and that the output expression of mCerulean is lower at shorter linker lengths and higher at longer linker lengths. Having a bar plot with the output expression mCerulean levels comparing KRASG12D and KRASWT next to each other would be a significantly more informative representation of this data. In particular, the readers might be interested in understanding the effect of linker length on off-target activation from the sensor, which is not clear from this figure. 

      Thank you for the suggestion. We adapted Figure 3b to better present this. 

      (6) While it is implied that the sentence "Among the tested binding domains, the Ras association domain (RA) of the natural RAS effector Rassf5, the RAS association domain 2 (RA2) of the phospholipase C epsilon (PLCe)33, and the synthetic RAS binder K5534 showed a slightly higher or similar dynamic range." is comparing these RAS binding domains to RBDCRD, for clarity it should be noted what the point of reference is for this "slightly higher or similar dynamic range." 

      (7) Claims are made throughout the text that require supporting data, and thus require a reference to a figure, but there are a few instances where the reference is several sentences after the discussion of data and findings begins. For example, the discussion of Figure 3c begins with the claim "Among the tested binding domains, the Ras association domain (RA) of the natural RAS effector Rassf5, the RAS association domain 2 (RA2) of the phospholipase C epsilon (PLCe)33, and the synthetic RAS binder K5534 showed a slightly higher or similar dynamic range," but there is no reference to the data or figure being discussed until the end of the discussion of Figure 3c. This formatting is also present in Figure 3d and Figure 6f. 

      Thank you for mentioning these imprecisions and inconsistencies, we addressed them in the manuscript. 

      (8) In Figures 5d and 5e, the formatting of underscores and dashes is occasionally inconsistent within the text. (ex. "PY2_NarX_FLT or PY2_NarL-FLT" on page 13.). 

      Thank you for this precise observation. The formatting differences were intentional and reflect distinct design principles. Specifically:

      An underscore (e.g., PY2_NarX_FLT) denotes that two separate proteins are expressed –here, PY2-driven RBDCRD-NarX and EF1α-driven NarL-F.L.T.

      A dash (e.g., PY2_NarL-F.L.T.) indicates a fusion protein –i.e., PY2-driven NarL-F.L.T. combined with EF1α-driven RBDCRD-NarX.

      This notation is used to distinguish expression sources and fusion constructs while avoiding redundancy with the base circuit (EF1α_NarX + EF1α_NarL-VP48). We hope the included schematic diagrams in each relevant figure helps the reader interpret these combinations.

      (9) The text claims that "loss-of-function mutations in RBDCRD decreased activation. However, the dynamic range was only 3-fold" and attributes this claim to Figure 6a. For a claim about specific fold-change activation, one would expect a corresponding figure with quantitative measurements of this fluorescence to be referenced. 

      Thank you for this remark. We made a supplementary figure (Supplementary Fig. 11) to show the quantitative measurement of the 3-fold dynamic range between HCT-116<sup>WT</sup> and HCT-116<sup>k.o</sup>. when using the EF1a-expressed RAS sensor with NarL-VP48.

      (10) The claim of this Figure 2d is that the effect of RAS-GTP levels on mCerulean output is amplified in comparison to Figures 2a, 2b, and 3c, representing expression, RAS binding, and dimerization respectively. While visually this might be true from the figure, the readers might be confused by the lack of significance between the control and the NF1 condition, alongside the variation between the triplicates. Could this experiment be repeated to gain clearer data and to support their claim more effectively? 

      Thank you for this important observation. To address the concern regarding variability and statistical significance in Figure 2d, we repeated the experiment using 24-well plates to increase the number of cells analyzed per condition. This improved the consistency of the data and allowed us to reduce variability across replicates. As a result, we now observe a statistically significant difference between the control and the NF1 condition. The updated results are shown in the revised Figure 2.

      (11) The readers might be less familiar with the concept of "composability" than "modularity" and it would be good to explain it if the authors did intend to use the former. 

      Thank you for this comment. We changed it to modularity to avoid confusion. 

      References

      (1) Shahryari, A., Burtscher, I., Nazari, Z. & Lickert, H. Engineering Gene Therapy: Advances and Barriers. Advanced Therapeutics vol. 4 Preprint at https://doi.org/10.1002/adtp.202100040 (2021).

      (2) Mcclements, M. E. & Maclaren, R. E. Adeno-Associated Virus (AAV) Dual Vector Strategies for Gene Therapy Encoding Large Transgenes. YALE JOURNAL OF BIOLOGY AND MEDICINE vol. 90 (2017).

      (3) Wagner, H. J., Weber, W. & Fussenegger, M. Synthetic Biology: Emerging Concepts to Design and Advance Adeno-Associated Viral Vectors for Gene Therapy. Advanced Science vol. 8 Preprint at https://doi.org/10.1002/advs.202004018 (2021).

      (4) Doshi, J., Willis, K., Madurga, A., Stelzer, C. & Benenson, Y. Multiple Alternative Promoters and Alternative Splicing Enable Universal Transcription-Based Logic Computation in Mammalian Cells. Cell Rep 33, 108437 (2020).

      (5) Wu, Z., Yang, H. & Colosi, P. Effect of genome size on AAV vector packaging. Molecular Therapy 18, 80–86 (2010).

      (6) Dastor, M. et al. A Workflow for in Vivo Evaluation of Candidate Inputs and Outputs for Cell Classifier Gene Circuits. ACS Synth Biol 7, 474–489 (2018).

      (7) Preuß, E. et al. TK.007: A novel, codon-optimized HSVtk(A168H) mutant for suicide gene therapy. Hum Gene Ther 21, 929–941 (2010).

      (8) Angelici, B., Shen, L., Schreiber, J., Abraham, A. & Benenson, Y. An AAV gene therapy computes over multiple cellular inputs to enable precise targeting of multifocal hepatocellular carcinoma in mice. Sci Transl Med 13, (2021).

      (9) Mesnil, M. & Yamasaki, H. Bystander Effect in Herpes Simplex Virus-Thymidine Kinase/Ganciclovir Cancer Gene Therapy: Role of Gap-Junctional Intercellular Communication 1. CANCER RESEARCH vol. 60 http://aacrjournals.org/cancerres/articlepdf/60/15/3989/2478218/ch150003989.pdf (2000).

      (10) Proud, C. G. Ras, PI3-kinase and mTOR signaling in cardiac hypertrophy. Cardiovascular Research vol. 63 403–413 Preprint at https://doi.org/10.1016/j.cardiores.2004.02.003 (2004).

      (11) Azman, M. S. et al. An ERK1/2driven RNAbinding switch in nucleolin drives ribosome biogenesis and pancreatic tumorigenesis downstream of RAS oncogene. EMBO J 42, (2023).

      (12) von Lintig, F. C. et al. Ras activation in normal white blood cells and childhood acute lymphoblastic leukemia. Clin Cancer Res 6, 1804–10 (2000).

      (13) Guha, A., Feldkamp, M. M., Lau, N., Boss, G. & Pawson, A. Proliferation of human malignant astrocytomas is dependent on Ras activation. Oncogene 15, 2755–2765 (1997).

      (14) Pan, L. et al. HTCA: a database with an in-depth characterization of the single-cell human transcriptome. Nucleic Acids Res 51, D1019–D1028 (2023).

      (15) Pan, L. et al. Single Cell Atlas: a single-cell multi-omics human cell encyclopedia. Genome Biol 25, (2024).

    1. eLife Assessment

      This study constitutes a fundamental advance for the uveal melanoma research field that might be exploited to target this deadly cancer and, more generally, for targeting transcriptional dependency in cancers. This work substantially advances our understanding of pharmacological inhibition of SWI/SNF as a therapeutic approach for cancer. The study is well written and provides compelling evidence, including comprehensive datasets, compound screens, gene expression analysis, epigenetics, as well as animal studies.

    2. Reviewer #1 (Public review):

      Summary:

      The presented study by Centore and colleagues investigates the inhibition of BAF chromatin remodeling complexes. The study is well written and includes comprehensive datasets, including compound screens, gene expression analysis, epigenetics, as well as animal studies. This is an important piece of work for the uveal melanoma research field, and sheds light on a new inhibitor class, as well as a mechanism that might be exploited to target this deadly cancer for which no good treatment options exist.

      Strengths:

      This is a comprehensive and well-written study.

      Weaknesses:

      There are minimal weaknesses.

    3. Reviewer #2 (Public review):

      Summary:

      The authors generate an optimized small molecule inhibitor of SMARCA2/4 and test it in a panel of cell lines. All uveal melanoma (UM) cell lines in the panel are growth inhibited by the inhibitor making the focus of the paper. This inhibition is correlated with loss of promoter occupancy of key melanocyte transcription factors e.g. SOX10. SOX10 overexpression and a point mutation in SMARCA4 can rescue growth inhibition exerted by the SMARCA2/4 inhibitor. Treatment of a UM xenograft model results in growth inhibition and regression which correlates with reduced expression of SOX10 but not discernible toxicity in the mice. Collectively, the data suggest a novel treatment of uveal melanoma.

      Strengths:

      There are many strengths of the study, including the strong challenge of the on-target effect, the assays used and the mechanistic data. The results are compelling as are the effects of the inhibitor. The in vivo data is dose-dependent and doses are low enough to be meaningful and associated with evidence of target engagement.

      Weaknesses:

      The authors have addressed weaknesses in the revised version.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript reports the discovery of new compounds that selectively inhibit SMARCA4/SMARCA2 ATPase activity and have pronounced effects on uveal melanoma cell proliferation. They induce apoptosis and suppress tumor growth, with no toxicity in vivo. The report provides biological significance by demonstrating that the drugs alter chromatin accessibility at lineage specific gene enhancer regions and decrease expression of lineage specific genes, including SOX10 and SOX10 target genes.

      Strengths:

      The study provides compelling evidence for the therapeutic use of these compounds and does a thorough job at elucidating the mechanisms by which the drugs work. The study will likely have a high impact on the chromatin remodeling and cancer fields. The datasets will be highly useful to these communities.

      Weaknesses:

      The authors have addressed all my concerns.

    5. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer 1:

      While BAP1 mutant UM cell lines were included for some of the experiments, it seems the in-vivo data mentioned in the response to the reviewers comment is missing? The authors stated that "MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor." But the CDX model data shown in Figure 4 is from 92.1 cells. If this data is available, then the manuscript would benefit from its addition.

      We thank the reviewer for bringing this to our attention. As the reviewer mentioned, we show 92-1 CDX model in our manuscript. Additionally, strong tumor growth inhibition in MP-46  CDX model treated with our BAF ATPase inhibitor can be found in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/).

      Reviewer 3:<br /> Supplementary Figure 2C<br /> Is the T910M mutation in the parental MP41 cells heterozygous? If so, the authors should indicate this in the figure legend. If this is a homozygous mutation, the authors should explain how the inhibitors suppress SMARCA4 activity in cells that have a LOF mutation.

      We thank the reviewer for bringing this to our attention. We updated the figure legend accordingly to reflect the genotype of the mutations highlighted in the table.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The presented study by Centore and colleagues investigates the inhibition of BAF chromatin remodeling complexes. The study is well-written, and includes comprehensive datasets, including compound screens, gene expression analysis, epigenetics, as well as animal studies. This is an important piece of work for the uveal melanoma research field, and sheds light on a new inhibitor class, as well as a mechanism that might be exploited to target this deadly cancer for which no good treatment options exist.

      Strengths:

      This is a comprehensive and well-written study.

      Weaknesses:

      There are minimal weaknesses.

      We thank the reviewer for the positive comments.

      Reviewer #2 (Public Review):

      Summary:

      The authors generate an optimized small molecule inhibitor of SMARCA2/4 and test it in a panel of cell lines. All uveal melanoma (UM) cell lines in the panel are growth-inhibited by the inhibitor making the focus of the paper. This inhibition is correlated with the loss of promoter occupancy of key melanocyte transcription factors e.g. SOX10. SOX10 overexpression and a point mutation in SMARCA4 can rescue growth inhibition exerted by the SMARCA2/4 inhibitor. Treatment of a UM xenograft model results in growth inhibition and regression which correlates with reduced expression of SOX10 but not discernible toxicity in the mice. Collectively the data suggest a novel treatment of uveal melanoma.

      Strengths:

      There are many strengths of the study including the strong challenge of the on-target effect, the assays used, and the mechanistic data. The results are compelling as are the effects of the inhibitor. The in vivo data is dose-dependent and doses are low enough to be meaningful and associated with evidence of target engagement.

      Weaknesses:

      The authors introduce the field stating that SMARCA4 inhibitors are more effective in SMARCA2 deficient cancers and the converse. Since the desirable outcome of cancer therapy would be synthetic lethality it is not clear why a dual inhibitor is desirable. Wouldn't this be associated with more side effects? It is not known how the inhibitor developed here impacts normal cells, in particular T cells which are essential for any durable response to cancer therapies in patients. Another weakness is that the UM cell lines used do not molecularly resemble metastatic UM. These UM most frequently have mutations in the BAP1 tumor suppressor gene. It is not clear if the described SMARCA2/4 inhibitor is efficacious in BAP1 mutant UM cell lines in vitro or BAP1 mutant patient-derived xenografts in vivo.

      We thank the reviewer for their insightful and constructive comments. As we demonstrate in Fig. 1d, uveal melanoma cells are selectively and deeply sensitive to BAF ATPase inhibition, and provides a therapeutic window. This is confirmed in Fig. 4a-c, as we demonstrated robust tumor growth inhibition, achieved at a dose well-tolerated in xenograft study. FHD-286, a dual BRM/BRG1 inhibitor similar to FHT-1015 with optimized physical properties, has been evaluated in a Phase I trial in patients with metastatic uveal melanoma (NCT04879017) and manuscript describing results of this clinical trial is currently in preparation.

      As the reviewer mentioned, BAP1 loss is a signature of metastatic uveal melanoma. MP38 is a BAP1 mutant uveal melanoma cell line, and we demonstrated growth inhibition and robust caspase 3/7 activity in response to FHT-1015 (Supplementary Fig. 3a and 3f). MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript reports the discovery of new compounds that selectively inhibit SMARCA4/SMARCA2 ATPase activity that work through a different mode as previously developed SMARCA4/SMARCA2 inhibitors. They also demonstrate the anti-tumor effects of the compounds on uveal melanoma cell proliferation and tumor growth. The findings indicate that the drugs exert their effects by altering chromatin accessibility at binding sites for lineage-specific transcription factors within gene enhancer regions. In uveal melanoma, altered expression of the transcription factor, SOX10, and SOX10 target gene underlies the anti-proliferative effects of the compounds. This study is significant because the discovery of new SMARCA4/SMARCA2 inhibitory compounds that can abrogate uveal melanoma tumorigenicity has therapeutic value. In addition, the findings provide evidence for the therapeutic use of these compounds in other transcription factor-dependent cancers.

      Strengths:

      The strengths of this manuscript include biochemical evidence that the new compounds are selective for SMARCA4/SMARCA2 over other ATPases and that the mode of action is distinct from a previously developed compound, BRM014, which binds the RecA lobe of SMARCA2. There is also strong evidence that FHT1015 suppresses uveal melanoma proliferation by inducing apoptosis. The in vivo suppression of tumor growth without toxicity validates the potential therapeutic utility of one of the new drugs. The conclusion that FHT1015 primarily inhibits SMARCA4 activity and thereby suppresses chromatin accessibility at lineage-specific enhancers is substantiated by ATAC-seq and ChIP-seq studies.

      Weaknesses:

      The weaknesses include a lack of more precise information on which SMARCA4/SMARCA2 residues the drugs bind. Although the I1173M/I1143M mutations are evidence that the critical residues for binding reside outside the RecA lobe, this site is conserved in CHD4, which is not affected by the compounds. Hence, this site may be necessary but not sufficient for drug binding or specifying selectivity. A more precise evaluation of the region specifying the effect of the new compounds would strengthen the evidence that they work through a novel mode and that they are selective. Another concern is that the mechanisms by which FHT1015 promotes apoptosis rather than simply cell cycle arrest are not clear. Does SOX10 or another lineage-specific transcription factor underlie the apoptotic effects of the compounds?

      We thank the reviewer for the valuable comments.

      We believe that our dual ATPase inhibitor is selective and additional insights into binding specificity and selectivity for earlier stage compounds of this series were recently published in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/).

      The reviewer also poses a great question regarding the mechanism of apoptosis. The mechanism of apoptosis is extremely complex, but we observed a decrease in pro-survival BCL-2 protein expression in response to FHT-1015, in the experiment corresponding to Supplementary Fig. 5e. In the experiment described in Fig. 3k, we also monitored caspase 3/7 activity over time, and SOX10 overexpression rescued 92-1 cells from FHT-1015 induced apoptosis. This suggests the role of SOX10 as an important mediator of response to BAF ATPase inhibition, including apoptosis induced by FHT-1015.

      Additional Reviews:

      The referees would like to draw the authors' attention to the following issues that would best benefit from additional revision. 

      The clinical relevance of the study would be strengthened by the use of uveal melanoma cell lines with BAP1 mutations that better represent metastatic uveal melanoma. The use of patient-derived xenografts would also be pertinent and would be a useful addition. Similarly, attention to the effects of the inhibitor on non-cancerous proliferative cells such as blood/T/immune cells would also strengthen the manuscript. As the study reports the administration of one of the inhibitors in mice for the xenograft experiments, it would be important to assess any potential effects on blood cell counts and better discuss the eventual toxicity or lack of toxicity and how it was assessed. 

      The authors should better explain how SOX10 over expression can rescue viability in the presence of the inhibitor. Similarly given the critical roles of BRG1, SOX10, and MITF in cutaneous melanoma some specific discussion on the sensitivity of cutaneous melanoma cells to the inhibitor should be considered, and potential differences with uveal melanoma highlighted. 

      Aside from these issues, the authors are urged to consider the other points mentioned below. 

      Reviewer #1 (Recommendations For The Authors): 

      Figure 1d, as well as the text in the manuscript referring to this figure, would benefit from indicating specific cell lines used for UM. The same for the sentence in line 153. 

      We thank the reviewer for bringing this to our attention. We have added the cell line names and updated the manuscript accordingly.

      For any of the studies conducted, is there any link with the genetics of UM? E.g. BAP1 wildtype/BAP1 mutant? 

      As addressed above in the public review section, MP38 is a BAP1 mutant uveal melanoma cell line, and we demonstrated growth inhibition and robust caspase 3/7 activity in response to FHT-1015 (Supplementary Fig. 3a and 3f). MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor.

      Row 191 - How were peaks classified as enhancer-occupied? 

      We used annotatePeaks function of HOMER package to annotate genomic locations, as well as H3K27ac ChIP-seq to annotate peaks as enhancer-occupied. We thank the reviewer to pointing it out and have updated the manuscript accordingly to include this information.

      Row 259, the two cell lines should be named, also in Figure 3i. 

      We have added the cell line names and updated the manuscript accordingly.

      Reviewer #2 (Recommendations For The Authors): 

      As a proof of concept, this study is truly excellent and the authors should be commended. However, it is desirable that new knowledge in cancer is translated to the clinic. To this end there are a few things needed to strengthen the study. 

      I am rephrasing my statements from the public review to say that I would recommend testing the inhibitor in T cells (side effects) and BAP1 mutant cell lines (for clinical relevance). 

      As addressed in the public review section, MP38 is a BAP1 mutant uveal melanoma cell line, and we demonstrated growth inhibition and robust caspase 3/7 activity in response to FHT-1015 (Supplementary Fig. 3a and 3f). MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor.

      Regarding concerns for any potential side effect on T cells, we observed an increase in both CD4 and CD8 T-cell populations in the peripheral blood and the spleen, when naïve, non-tumor bearing CD-1 mice were dosed with SMARCA2/4 dual ATPase inhibitor FHD-286 once daily for 14 days. FHD-286 is a compound similar to FHT-1015 described in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/). In addition, FHD-286 has been tested in tumor bearing syngeneic models. When B16F10 tumor bearing C57BL/6 were dosed with FHD-286 for 10 days, we observed an increase in CD69+ activated CD8 T-cell infiltration in the tumor microenvironment (doi:10.1136/jitc-2022-SITC2022.0888).

      Reviewer #3 (Recommendations For The Authors): 

      (1) Determine drug binding by crystal structure or generate additional SMARCA4 or SMARCA2 mutations in the region near I1173/I1143 that are not conserved in CHD4 and test them in an ATPase assay for effects on drug inhibition. For example, Q1166 in SMARCA4 and Q1136 in SMARCA4 could be changed to Alanine as in CHD4. Would this abrogate drug inhibition? 

      We believe that our dual ATPase inhibitor is selective and additional insights into binding specificity and selectivity for earlier stage compounds of this series were recently published in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/).

      (2) The finding that SOX10 can rescue the antiproliferative effects of FHT1015 suggests that SMARCA4 is primarily needed for SOX10 expression. However, the co-occupancy of SMARCA4 and SOX10 at enhancers suggests that they cooperate to promote chromatin accessibility. It is unclear how over-expression of SOX10 can promote chromatin accessibility in drug-inhibited cells since SOX10 does not have chromatin remodeling activity. ATAC-seq in cells over-expressing SOX10 and treated with the drug could identify SOX10-dependent targets that do not require SMARCA4 activity and clarify the mechanism. It would also be informative to determine if SOX10 over-expression abrogates the effects of FHT1015 on both cell cycle and apoptosis, helping to resolve whether it is a partial or complete rescue of proliferation. 

      We agree that running ATAC-seq in cells overexpressing SOX10 would clarify this mechanism. However, shifts in corporate strategy deprioritized any further experiments for this project. One potential mechanism that SOX10 overexpression can partially rescue BAF inhibition phenotype is through overexpressed SOX10 localizing to open chromatin regions (mostly promoters) across the genome. We know from our ATAC-seq data (Fig. 2) that BAF inhibition leads to loss of chromatin accessibility at SOX10 enhancer sites, while promoter regions are only partially affected. Therefore, we think that overexpression of SOX10 would allow upregulation of its target genes via binding to the promoter regions. In this model, the enhancer-driven SOX10 target genes are likely to remain silenced.  

      (3) Although the in vivo studies indicate that the drugs are well-tolerated, additional in vitro studies to determine the effects of the drug on the proliferation/survival of non-cancerous cells would further validate their therapeutic utility.

      Author Response: The reviewer raises a critical question. FHD-286, a dual BRM/BRG1 inhibitor similar to FHT-1015 with optimized physical properties, has been evaluated in a Phase I trial in patients with metastatic uveal melanoma (NCT04879017), and it was well tolerated at continuous daily dose of up to 7.5 mg QD and at intermittent dose of up to 17.5 mg QD.  Manuscript describing results of this clinical trial is currently in preparation.

    1. eLife Assessment

      The is a valuable evaluation of a previously published simulation model on the role of heterozygote advantage in shaping MHC diversity, showing that the conclusions from this model hold only within a narrow parameter range that might be unrealistic. The author presents an alternative model, in which MHC homozygotes with duplicated MHC genes outperform heterozygotes with single genes, thereby challenging the explanation that heterozygote advantage will lead to high allelic variation at a given MHC gene. The topic is highly relevant for eco-immunology and evolutionary genetics, but several major aspects of the author's claim need to be clarified to make the model interpretable. While the work has the potential to improve our understanding of the question of how the extraordinary diversity at the MHC locus evolves, without this addition, the conclusions remain incomplete.

    2. Reviewer #1 (Public review):

      The manuscript "Heterozygote advantage cannot explain MHC diversity, but MHC diversity can explain heterozygote advantage" explores two topics. First, it is claimed that the recently published conclusion by Mattias Siljestam and Claus Rueffler (in the following referred to as [SR] for brevity) that heterozygote advantage explains MHC diversity does not withstand even a very slight change in ecological parameters. Second, a modified model that allows an expansion of the MHC gene family shows that homozygotes outperform heterozygotes. This is an important topic and could be of potential interest to readers if the conclusions are valid and non-trivial.

      Let me first comment on the second part of the manuscript that describes the fitness advantage of the 'gene family expansion'. I think this, by itself, is a totally predictable result. It appears obvious that with no or a little fitness penalty, it becomes beneficial to have MHC-coding genes specific to each pathogen. A more thorough study that takes into account a realistic (most probably non-linear in gene number) fitness penalty, various numbers of pathogens that could grossly exceed the self-consistent fitness limit on the number of MHC genes, etc, could be more informative. Yet, as I understood the narrative of the manuscript, the expansion of the gene family serves as a mere counter-example to the disputed finding of [SR], rather than a systematic study of the eco-evolutionary consequences of this process.

      Now to the first part of the manuscript, which claims that the point made in [RS] is not robust and breaks down under a small change in the parameters. An addition or removal of one of the pathogens is reported to affect "the maximum condition", a key ecological characteristic of the model, by an enormous factor 10^43, naturally breaking down all the estimates and conclusions made in [RS]. This observation is not substantiated by any formulas, recipes for how to compute this number numerically, or other details, and is presented just as a self-standing number in the text. The only piece of information given in the manuscript is that, unlike in [SR], the adjustable parameter c_{max} is kept constant when the number of pathogens is changed.

      In my opinion, the information provided in the manuscript does not allow one to conclude anything about the relevance and the validity of its main claim. At the same time, the simulations done in [SR] are described with a fair amount of detail. Which allows me to assume that the conclusions made in [SR] are fairly robust and, in particular, have been demonstrated not to be too sensitive to changes in the main "suspect', c_{max}. Let me briefly justify my point.

      First, it follows from Eqs (4,5) in the main text and (A12-A13) in the Appendix that c_{max} and K do not independently affect the dynamics of the model, but it's rather their ratio K/c_max that matters. It can be seen by dividing the numerator and denominator of (5) by c_max. Figure 3 shows the persistent branching for 4 values of K that cover 4 decades. As it appears from the schemes in the top row of Figure 3, those simulations are done for the same positions and widths/virulences of pathogens. So the position of x* should be the same in all 4 cases, presumably being at the center of pathogens, (x*,x*) = (0,0). According to the definition of x* given in the Appendix after Eqs (A12-A13), this means that c_max remains the same in all 4 cases. So one can interpret the 4 scenarios shown in Figure 3 as corresponding not to various K, but to various c_max that varied inversely to K. That is, the results would have been identical to those shown in Figure 3 if K were kept constant and c_max were multiplied by 0.1, 1, 10, and 100, or scaled as 1/K. This begs the conclusion that the branching remains robust to changes in c_max that span 4 decades as well.

      Naturally, most, if not all, the dynamics will break down if one of the ecological characteristics changes by a factor of 10^43, as it is reported in the submitted manuscript. As I wrote above, there is no explanation behind this number, so I can only guess that such a number is created by the removal or addition of a pathogen that is very far away from the other pathogens. Very far in this context means being separated in the x-space by a much greater distance than 1/\nu, the width of the pathogens' gaussians. Once again, I am not totally sure if this was the case, but if it were, some basic notions of how models are set up were broken. It appears very strange that nothing is said in the manuscript about the spatial distribution of the pathogens, which is crucial to their effects on the condition c. In [SP], it is clearly shown where the pathogens are.

      Another argument that makes me suspicious in the utility of the conclusions made in the manuscript and plays for the validity of [SP] is the adaptive dynamics derivation of the branching conditions. It is confirmed by numerics with sufficient accuracy, and as it stands in its simple form of the inequality between two widths, the branching condition appears to be pretty robust with respect to reasonable changes in parameters.

      Overall, I strongly suspect that an unfortunately poor setup of the model reported in the manuscript has led to the conclusions that dispute the much better-substantiated claims made in [SD].

    3. Reviewer #2 (Public review):

      Summary:

      This study addresses the population genetic underpinnings of the extraordinary diversity of genes in the MHC, which is widespread among jawed vertebrates. This topic has been widely discussed and studied, and several hypotheses have been suggested to explain this diversity. One of them is based on the idea that heterozygote genotypes have an advantage over homozygotes. While this hypothesis lost early on support, a reason study claimed that there is good support for this idea. The current study highlights an important aspect that allows us to see results presented in the earlier published paper in a different light, changing strongly the conclusions of the earlier study, i.e., there is no support for a heterozygote advantage. This is a very important contribution to the field. Furthermore, this new study presents an alternative hypothesis to explain the maintenance of MHC diversity, which is based on the idea that gene duplications can create diversity without heterozygosity being important. This is an interesting idea, but not entirely new.

      Strengths:

      (1) A careful re-evaluation of a published model, questioning a major assumption made by a previous study.

      (2) A convincing reanalysis of a model that, in the light of the re-analysis-loses all support.

      (3) A convincing suggestion for an alternative hypothesis.

      Weaknesses:

      (1) The statement that the model outcome of Siljestam and Rueffler is very sensitive to parameter values is, in this form, not correct. The sensitivity is only visible once a strong assumption by Siljestam and Rueffler is removed. This assumption is questionable, and it is well explained in the manuscript by J. Cherry why it should not be used. This may be seen as a subtle difference, but I think it is important to pin done the exact nature of the problem (see, for example, the abstract, where this is presented in a misleading way).

      (2) The title of the study is very catchy, but it needs to be explained better in the text.

    4. Reviewer #3 (Public review):

      This manuscript describes a careful and thorough evaluation of an evolutionary simulation model published previously. The model and this report address the question, whether heterozygote advantage (HA) by itself as a selection mechanism can explain a substantial level of allelic diversity as it is often seen in MHC immune genes. Despite decades of research on the topic of pathogen-mediated selection for MHC diversity, it remains an open question by which specific selection mechanisms this exceptional allelic diversity is maintained.

      The previously published paper posits, in contrast to various previous studies, that HA is, in fact, able to maintain a level of allelic diversity as seen in many populations, just by itself, given certain conditions. The current manuscript now challenges this conclusion by highlighting that the previous model results only hold under very narrow parameter ranges.

      Besides criticizing some of the conceptual points of the previous paper, the author carefully rebuilt the previously published model and replicated their results, before then evaluating the robustness of the model results to reasonable variation in different parameters. From this evaluation, it becomes clear that the previously reported results hinge strongly on a certain scaling or weighing factor that is adjusted for every parameter setting and essentially counteracts the changes induced by changing the parameters. The critical impact of this one parameter is not clearly stated in the previous paper, but raises serious doubts about the generalizability of the model to explain MHC allelic variation across diverse vertebrate species.

      Given the fact that the MHC genes are among the most widely studied genes in vertebrates, and that understanding their evolution will shed light on their association with various complex diseases, the insights from this report and the general discussion of how MHC diversity evolved are of interest to at least some of the community. The manuscript is very well written and makes it easy to follow the theoretical and methodological details of the model and the arguments. I have only a few minor comments that I am detailing below. Furthermore, I would be very interested to read a response by the previous authors, especially on the relevance of this scaling/weighing factor that they introduced into their model, as it is possible that I might have missed something about its meaning.